One of the typical mistakes in statistics or data analysis is "correlation is interpretted as causality." In other words, if we find the correlation between A and B, and it seems as A increases, B also increases as well. Let's look at the cases below:
- The students with higher SAT score has more stamina.
- As the number of doctors per person increases in a region, the medical costs increases in the region.
- Those who take annual checkups has less chances of diabetes.
For the first example, this correlation cannot be interpretted as "because the student has higher SAT score, they can have more stamina." Based on this wrong interpretation, ones can derive the conclusion that ones need to make the students SAT score higher in order to increase their stamina.
We need to be careful for some cases where correlation looks causality as below.
- Common factor*
- Pure luck**
If both phenomena can be caused by another factor in common, two phenoma looks correlated and seems to have the causal relationship. In the above example, if they mixed all students across grades, the common factor might be the age of the students. As they grow, they get smater and gain more stamina.
For the second part, the two phenena purely conincidentally seems to have correlation. You can find a lot but ridiculously correlated examples in here. In this website, you can find the highly correlated examples between:
- US spending on science, space and tech vs suicides by hanging
- # of drowned people in a pool vs Films Nicolas Cage appeared in
- Cheese consumption vs # of people who died due to bedsheets
*this is callsed "confounding factor"
** this is called "sprious correlation"
Word of the day: litmus test, in which has one single dominant facotor.