Algoliterary Game: Naive Bayes

October 17th, 2018 § 0 comments

The 1st edition of this algoliterary game was developed for the workshop ‘Machine Learning Tools for Literary Creation’, that took place in Mundaneum from 8 till 12 October 2018. It was a collaboration with Gijs de Heij, Arts² Arts Numériques, with the support of Mundaneum and Communauté Française/Arts Numériques.

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of independence between every pair of features in relation to the class.
Naive Bayes classifiers are famous for document classification, such as spam filtering. They require a small amount of training data to estimate the necessary parameters. As Machine Learning methods they can be extremely fast compared to more sophisticated methods, but are difficult to generalise. Naive Bayes is often used as baseline for other prediction models. It is known as a decent classifier, but a bad estimator, so the probability weights of predictions are not to be taken too seriously.
A little bit of history
Naive Bayes methods are named after Thomas Bayes, who was a reverend living in England in the 18th century. He was obsessed by the question of the chances one has to win a raffle. Thomas Bayes had great impact on history of reasoning under uncertainty, but his own history remains ironically uncertain. His exact birth date is unknown. The paper that would influence history was brought under attention only after his death in 1761, by his friend Richard Price who stumbled upon it. The paper highlights the notion of ‘likelihood’, the reasoning forward from hypothetical pasts that lay the foundation to then work backward to the most probably one.
In 1774 Pierre-Simon Laplace, a French mathematician, born in 1749 in Normandy, unaware of the work by Bayes, publishes a paper ‘Treatise on the Probability of the Causes of Events’. He solves the problem of how to make inferences backward from observed effects to their probable causes, using calculus. This is now called Laplace’s Law. If we really know nothing about our raffle ahead of time, we know that –  if we buy 3 tickets and all of them are winners, the expected proportion of winning tickets is exactly 4/5 – the chances that we win are: the number of wins + 1 / number of attempts +2. It is the first simple rule of thumb for confronting small data in real world.
Laplace published a book ‘Philosophical Essay on Probabilities’ for general audience, which is ‘still one of the best, laying out his theory and considering its applications to law, science, everyday life’ following Brian Christian & Tom Griffiths in their book ‘Algorithms to Live by’. While Laplace did most of the work, it is interesting to see that the Thomas Bayes won the naming of the method, Bayes’s Rule.

 

The game

(To optimize this game, a test phase and a confusion matrix need to be added)

1. GATHER YOUR SAMPLE DATA: write 6 short sentences in the same style, of which 3 sentences are positive, 3 sentences negative.

2. PROCESS THE TEXT:
2.1. Decide the unit of analysis (word/character/bigram…).
2.2. Split your sentences in units.
2.3. Mark each unit as being positive or negative.
2.4. Create your vocabulary: a collection of all unique units of all 6 sentences.

3. PREPARE THE TRANSFORMATION from WORDS to NUMBERS:
3.1. Display the units in 1 row of a grid. These are the columns of your matrix.
3.2. For each sentence, display the units as 1 row in the columns of the grid.
3.3. Count the probability that a sentence in your model is positive:
number of positive sentences / total number of sentences
3.4. Count the probability that a sentence in your model is negative:
number of negative sentences / total number of sentences
3.5. Count all positive units.
3.6. Count all negative units.
3.7. Count all units (your vocabulary size).

4. The TRAINING starts! For each unit you make the following calculation:
4.1. if the unit is positive:
the probability that a sentence in your model is positive * the probability that the word is positive
This means:
number of positive sentences / total number of sentences
*
number of times that the word is used as a positive example + 1
/
total number of positive words + vocabulary size
4.2. else:
the probability that a sentence in your model is negative * the probability that the word is negative
This means:
number of negative sentences / total number of sentences
*
number of times that the word is used as a negative example + 1
/
total number of negative words + vocabulary size

5. SMOOTHING UNITS: each cell in your grid should have a number now. Add 0.000001 to cells that have 0. This avoids that calculations end up being zero.

6. SMOOTHING UNKNOWN UNITS: add one last column to your grid with the label ‘Unknown’. Fill this column with smoothing numbers.

7. THE PREDICTION CAN START!
7.1. Invent a new sentence in the same style as your training data.
7.2. Split your sentence in the type of units you chose in the beginning
7.3. Calculate the probability that the new sentence is positive:
7.3.4. Find the corresponding probabilities for each of the positive units in your grid
7.3.5. If the unit does not exist, pick the smoothing number of the ‘unknown unit’.
7.3.6. Multiply the probability that a sentence in your model is positive with all individual probabilities of units of your new sentence
7.4. Calculate the probability that the new sentence is negative:
7.4.4. Find the corresponding probabilities for each of the negative units in your grid
7.4.5. If the unit does not exist, pick the smoothing number of the ‘unknown unit’.
7.4.6. Multiply the probability that a sentence in your model is negative with all individual probabilities of units of your new sentence

8. Compare the outcome of 7.3.6. and 7.4.6.

9. ORACLE: the highest value of 11. is the prediction made by this model.

Leave a Reply