Please don’t begin reading this without reading correlation – I

In this one, we discuss how to calculate the strength of a correlation.


We learned in the previous post that correlation is about direction of change. Vectors happen to be a mathematical construct that capture direction. As it turns out, when two vectors are colinear, they represent a very strong correlation. Look at the following diagram:


In (1) when the two vectors are joined end to end, they lie on the same line. Similarly, the vectors in (2) also lie on the same line when joined end to end. This indicates a high degree of togetherness i.e. a strong correlation. In fact, both (1) and (2) represent perfect correlations, the only difference being that (1) represents a perfect positive correlation (same direction) while (2) shows a perfect negative correlation (opposite directions). 

Now, let’s look at (3). The first vector is changing somewhat along the same line as the second, but not exactly. There is part of it that tries to change in its own line, which is what creates the angle, when you join the vectors end to end. So, this is a somewhat positive correlation, but it is not the strongest. The same goes for (4), except that the vectors clearly change in mostly opposite directions, which shows a somewhat negative correlation, but not the strongest negative correlation. 

Finally, look at (5). There is no projection of either vector on the other. So, when vectors join at a 90 degree angle, it means that they simply don’t even approach change along the same line. Thus, they are not correlated at all. Such vectors are called orthagonal vectors. 

Orthagonal vectors represent 0 correlation (the weakest there is), and colinear vectors represent a perfect 1 (for positive) or a perfect -1 (for negative) correlation (the strongest there is). 

How to calculate the degree of colinearity between two vectors? 

We do that using vector dot product.  What dot product does is it tells you what the projection of one vector is on another. If the vectors are at a 90 degree angle from each other, then there is no projection of either vector on the other. See the figure below.

If there is even a slight angle between the vectors, then there always is at least one projection that lands on the other vector. The next figure shows this.


As you can see in the figure, even if there is a slight non-90 degree angle, then there is at least one projection that falls on the other vector, which means there is at least some correlation between the two vectors.

In the case of the last diagram in the figure, v1 and v2 are perfectly colinear. There is a zero angle between them. So the dot product, v1.v2 would be 1 exactly. In other words, the correlation between the two is the strongest that it can be. It is literally perfect.

Note: the angle could have also been 180, in which case, it would be a perfect negative correlation. A 0 degree angle means a perfect positive correlation. The dot product in the case of a perfect negative correlation (180 degree angle) would be -1 exactly.  

Dot product based vector inter-projections tell us how independently each vector is changing compared to the other. That is why scientists use vector dot product to compute the strength of a correlation. There is a very simple formula for the dot product. You could look it up on any internet resource. But that’s basically how you determine the strength of a correlation. 

Coming up in correlation – III: Why statisticians/data scientists (unlike mathematicians) are not satisfied with dot product alone, and why they came up with certain tweaks to the formula.

Thank you for reading.