Probability is one of the most fundamental concepts in AI. That’s because it allows us to make calculated predictions and decisions, rather than blind guesses. For example, if you were thinking about picking a business strategy, then rather than make a guess, you would want to know the probability of success and failure. You may also want to know the probability of being able to attain resources for the strategy. In this way, probabilities help the field of AI to be as accurate and reliable as possible because AI is deployed in mission-critical fields. Sometimes, even in safety-critical fields.

What is probability?

Probability is simply an estimate for the chance that an unknown event will occur or not occur. In more mathematical terms, you could say it estimates whether an unknown event will evaluate to true or false.

Probability functions and distributions

There is a brute force method to calculate probability. It goes like this:

(1) Count all trials – Let’s say  10.
(2) See how many of those trials led to the event you are expecting – Let’s say 5 trials.
(3) Then, the probability of the event is 5/10.

This seems easy enough. But when it comes to real world data, the number of trials in total will be in the 1000s. It would be really resource intensive for a computer to count over all 1000 trials, and then determine the probability of an event. This is where functions and distributions come in.

The binomial distribution, for instance, is used when there are only two possible events: true and false. For example, a coin toss in which there can only be heads (heads = true) and tails (heads = false). Now, if you wanted to know the probability of winning 400 coin tosses, then one way to do it would be to count all 400 coin tosses and see how many of them you win. But this, as mentioned before, would be highly resource intensive. Therefore, mathematicians came up with the binomial distribution function. The binomial function does all the work for you. So, if you wanted to know the chance of winning 400 coin tosses, all you’d have to do is input the probability of heads and tails to the function, and it would give you the probability of winning at the end of 400 coin tosses.

Similarly, there is the normal distribution function which is used when an event has a “normal” value range. Most probabilities in the real world can be modelled using a normal distribution because most things in the world – like height, weight, income, etc – have a normal value range.

In the end, probability functions and distributions simplify the calculation for you. Therefore, it is good to understand all the different distributions and their functional designs.

Likelihood

Likelihood is sort of a cross-verification between the probability that you have so far, and any new data that you might acquire. For example, suppose you have calculated, using a probability function, the probability of your favorite sports team winning another cup as 66%. After that, you get new data that says the team lost 3/4 matches. Now, the question is: Is 66% win probability likely according to the new data? Well, in this case, it is not. 66% translates to 2/3 wins, and losing 3/4 matches is definitely not 2/3 wins. Not even close. Likelihood estimates allow us to verify already computed probabilities based on new data.

Prior

A prior is like a history. In the case of likelihood, you would qualify your probability estimate with new data. Similarly, in the case of prior, you attempt to qualify your probability estimate with past data. Let’s take your favorite sports team again. If you think they win 66% of the time, and in history they seem to have won 60% of the time, then your probability estimate is pretty close. Otherwise, it’s not.

Posterior (funny terminology I know, but it is what it is)

What do here is basically you put your probability estimate into action. So, say that you corrected your probability estimate to 50% based on likelihood and prior data. Now, you actually see your team playing two games. If they win one game and lose the second, then you increase the probability of the win chance being 50%. Alternatively, if the team wins both the games, then you decrease the probability of the win chance being 50% and you increase the probability of the win chance being 100%; and, finally, if the team loses both the games, then you, again, decrease the probability of the win chance being 50% and you increase the probability of the win chance being 0%.

Likelihoods, priors and posteriors are about determining the probability of the probability

It’s funny isn’t it? But that’s how scientists make sure that the machine is being thorough. By first calculating probabilities for all possible events, and then calculating the probabilities of those probabilities using likelihood testing, prior testing and posterior testing.

Let’s conclude for now

Probability is a central concept in Artificial Intelligence. In this post, we’ve seen an overview of some probability based methods. Hopefully this gives you a pretty good idea to start. As for the rest, pick up a book on the concepts mentioned in this post and work through all the examples. Also, code! Coding is essential. Understand the functions mathematically, and then code them into a computer. Make the computer execute the functions. Thanks for reading!