The mathematical purpose of this lesson is to compute (using technology) and interpret the correlation coefficient for a bivariate, numerical data set. The work of this lesson connects to previous work because students learned how to read and interpret the correlation coefficient. The work of this lesson connects to upcoming work because students will learn to distinguish between correlation and causation.
When students use the value of the correlation coefficient to describe the relationship between two variables, they are looking for and making use of structure (MP7). To make sense of the relationship between variables, students reason abstractly and quantitatively (MP2). When students examine relationships to think about correlations, they also consider additional variables that might have an influence on any trends they see. Deciding which variables need to be included is a part of the process of modeling with mathematics (MP4).
- Describe (orally and in writing) the strength and sign of the relationship between variables based on the correlation coefficient.
- Use technology to calculate the correlation coefficient and describe the strength of a relationship based on that value.
- Let’s look closer at correlation coefficients.
Students should have access to graphing technology that can compute the least-squares regression line and correlation coefficient from a set of bivariate data. Acquire devices that can run Desmos (recommended) or other graphing technology. It is ideal if each student has their own device. (Desmos is available under Math Tools.)
- I can describe the strength of a relationship between two variables.
- I can use technology to find the correlation coefficient and explain what the value tells me about a linear model in everyday language.
A number between -1 and 1 that describes the strength and direction of a linear association between two numerical variables. The sign of the correlation coefficient is the same as the sign of the slope of the best fit line. The closer the correlation coefficient is to 0, the weaker the linear relationship. When the correlation coefficient is closer to 1 or -1, the linear model fits the data better.
The first figure shows a correlation coefficient which is close to 1, the second a correlation coefficient which is positive but closer to 0, and the third a correlation coefficient which is close to -1.
A relationship between two numerical variables is negative if an increase in the data for one variable tends to be paired with a decrease in the data for the other variable.
A relationship between two numerical variables is positive if an increase in the data for one variable tends to be paired with an increase in the data for the other variable.
A relationship between two numerical variables is strong if the data is tightly clustered around the best fit line.
A relationship between two numerical variables is weak if the data is loosely spread around the best fit line.