The temptation when observing strong correlations (either positive or negative) is to associate that with causation. That is, movement in one variable causes movement in the other. Correlation does not imply causation, therefore it’s important to consider the underlying logical relationship between the two variables. Correlations that are not grounded in logic are considered spurious, as in below:
And while spurious correlations like this are easy to see there’s no logic to tie the two together, it is more common to perceive causation when a variable is in fact masking the underlying cause.
Consider a hypothetical situation where we are trying to understand what drives sales of a particular product across a chain of retail stores in the midwest U.S. We find this unexpected relationship:
We could inadvertently assume swimsuit sales are driven by per capita energy consumption and rationalize its existence through various (well-intended) hypothesis.
In fact, energy consumption is masking, or proxying for, another driver: temperature. As temperature rises, demand for air conditioning spikes, driving energy consumption higher. Energy is not causing swimsuit demand, but temperature is instead: