Lesson 7
The Correlation Coefficient
7.1: Which One Doesn’t Belong: Linear Models (5 minutes)
Warm-up
This warm-up prompts students to compare four scatter plots displaying data with linear and nonlinear trends. It gives students a reason to use language precisely (MP6) and gives you the opportunity to hear how they use terminology and talk about characteristics of the items in comparison to one another. To allow all students to access the activity, each item has one obvious reason it does not belong. Encourage students to move past the obvious reasons and find reasons based on mathematical properties.
Launch
Arrange students in groups of 2–4. Display the scatter plots for all to see. Ask students to indicate when they have noticed one that does not belong and can explain why. Give students 1 minute of quiet think time and then time to share their thinking with their small group. In their small groups, tell each student to share their reasoning as to why a particular item does not belong and together, find at least one reason each item doesn’t belong.
Student Facing
Which one doesn’t belong?
Student Response
For access, consult one of our IM Certified Partners.
Activity Synthesis
Ask each group to share one reason why a particular item does not belong. Record and display the responses for all to see. After each response, ask the class if they agree or disagree. Since there is no single correct answer to the question asking which one does not belong, attend to students’ explanations and ensure the reasons given are correct.
During the discussion, ask students to explain the meaning of any terminology they use, such as linear, nonlinear, and random. Also, press students on unsubstantiated claims.
7.2: Card Sort: Scatter Plot Fit (20 minutes)
Activity
In this activity, students are given cards displaying scatter plots of data that can be fit by linear models with varying accuracy. Cards show data that is random, poorly fit by a linear model, well fit by a linear model, and data that is better fit by another type of function, such as quadratic or exponential. Students should begin to recognize these differences and the connection to the correlation coefficient.
A sorting task gives students opportunities to analyze representations, statements, and structures closely and make connections (MP2, MP7).
Monitor for different ways groups choose to categorize the scatter plots, but especially for categories that distinguish between plots that would be modeled well with a linear function and those that would not.
Launch
Arrange students in groups of 2. Tell them that in this activity, they will sort some cards into categories of their choosing. When they sort the scatter plots, they should work with their partner to come up with categories. Distribute one copy of the blackline master to each group.
Design Principle(s): Support sense-making; Maximize meta-awareness
Student Facing
Your teacher will give you a set of cards that show scatter plots of data. Sort the cards into 2 categories of your choosing. Be prepared to explain the meaning of your categories. Then, sort the cards into 2 categories in a different way. Be prepared to explain the meaning of your new categories.
Student Response
For access, consult one of our IM Certified Partners.
Activity Synthesis
Select groups of students to share their categories and how they sorted their equations. You can choose as many different types of categories as time allows, but ensure that one set of categories distinguishes between plots that would be modeled well with a linear function and those that would not. Attend to the language that students use to describe their categories and equations, giving them opportunities to describe their equations more precisely. Highlight the use of terms like linear model, fit, and non-linear.
Display the scatter plots with the best fit lines and \(r\) values.
Give students 1 minute of quiet think time, and then 1 minute to discuss the things they notice with their partner, followed by a whole-class discussion.
Among things students should notice are:
- the sign of \(r\) is the same as the sign of the slope of the best fit line
- the values for \(r\) seem to go from -1 to 1
- the closer \(r\) is to 1 or -1, the stronger the linear relationship between the variables
- the closer \(r\) is to 0, the weaker the linear relationship between the variables
Note that the sign of the correlation coefficient matches the sign of the slope of the best fit line, but the value for \(r\) is not otherwise related to the slope. If \(r = 0.8\), the best fit line will have a positive slope, but whether the slope is 0.2 or 2,000 is not clear without examining the data.
Supports accessibility for: Memory; Language
7.3: Matching Correlation Coefficients (10 minutes)
Activity
In this activity, students gain a better understanding of correlation coefficients by taking turns with a partner to match scatter plots and correlation coefficients. Students trade roles, explaining their thinking and listening, providing opportunities to explain their reasoning and critique the reasoning of others (MP3).
Launch
Tell students that the \(r\) value is called a correlation coefficient. A correlation coefficient is one way to measure the strength of a linear relationship. Tell students that:
- the sign of \(r\) is the same as the sign of the slope of the best fit line
- the values for \(r\) go from -1 to 1, inclusive
- the closer \(r\) is to 1 or -1, the stronger the linear relationship between the variables
- the closer \(r\) is to 0, the weaker the linear relationship between the variables
Arrange students in groups of 2. Tell students that for each scatter plot, one partner finds the associated correlation coefficient and explains why they think it goes with that scatter plot. The other partner’s job is to listen and make sure they agree. If they don’t agree, the partners discuss until they come to an agreement. For the next scatter plot, the students swap roles. If necessary, demonstrate this protocol before students start working.
Supports accessibility for: Conceptual processing; Memory
Student Facing
- Take turns with your partner to match a scatter plot with a correlation coefficient.
- For each match you find, explain to your partner how you know it’s a match.
- For each match your partner finds, listen carefully to their explanation. If you disagree, discuss your thinking and work to reach an agreement.
- \(r = \text-1\)
- \(r = \text-0.95\)
- \(r = \text-0.74\)
- \(r = \text-0.06\)
- \(r = 0.48\)
- \(r = 0.65\)
- \(r = 0.9\)
- \(r = 1\)
Student Response
For access, consult one of our IM Certified Partners.
Student Facing
Are you ready for more?
Jada wants to know if the speed that people walk is correlated with their texting speed. To investigate this, she measured the distance, in feet, that 5 of her friends walked in 30 seconds and the number of characters they texted during that time. Each of the 5 friends took 4 walks for a total of 20 walks. Here are the results of the first 20 walks.
distance (feet) |
number of characters texted |
distance (feet) |
number of characters texted |
105 |
142 |
95 |
138 |
125 |
110 |
125 |
110 |
115 |
120 |
160 |
80 |
140 |
98 |
175 |
64 |
145 |
102 |
130 |
106 |
160 |
89 |
140 |
95 |
170 |
72 |
150 |
95 |
140 |
100 |
155 |
90 |
130 |
107 |
160 |
74 |
105 |
113 |
135 |
108 |
Over the next few days, the same 5 friends practiced walking and texting to see if they could walk faster and text more characters. They did not record any more data while practicing. After practicing, each of the 5 friends took another 4 walks. Here are the results of the final 20 walks.
distance (feet) |
number of characters texted |
distance (feet) |
number of characters texted |
140 |
140 |
165 |
151 |
150 |
155 |
170 |
136 |
160 |
151 |
190 |
143 |
155 |
170 |
205 |
132 |
180 |
125 |
205 |
128 |
205 |
130 |
210 |
140 |
225 |
95 |
215 |
109 |
175 |
161 |
220 |
105 |
195 |
108 |
230 |
126 |
155 |
142 |
225 |
138 |
- What do you notice about the 2 scatter plots?
- Jada noticed that her friends walked further and texted faster during the last 20 walks than they did during the first 20 walks. Since both were faster, she predicts that the correlation coefficient of the line of best fit for the last 20 walks will be closer to -1 than the correlation coefficient of the line of best fit for the first 20 walks. Do you agree with Jada? Explain your reasoning.
- Use technology to find an equation of the line of best fit and the correlation coefficient for each data set. Was your answer to the previous question correct?
- Why do you think the correlation coefficients for the 2 data sets are so different? Explain your reasoning.
Student Response
For access, consult one of our IM Certified Partners.
Anticipated Misconceptions
Students may struggle with starting to match the scatter plots with a correlation coefficient. Guide students by asking them about the sign of the correlation coefficients. Ask them to sort the cards into groups that make sense and use that to connect to the correlation coefficient values. Ask them: “How does the sign of the correlation coefficient relate to the linear model?”
Activity Synthesis
The purpose of this discussion is for students to understand that the correlation coefficient is a formal way to quantify the strength of a linear relationship between variables, and that the sign of the correlation coefficient tells you whether or not the variables show a positive or negative association.
Here are some questions for discussion.
- “What does the sign of the correlation coefficient tell you about the data?” (If it is negative, then \(y\) tends to decrease as \(x\) increases. If it is positive, then \(y\) tends to increase as \(x\) increases.)
- “What does it mean to have a correlation coefficient of 1 or -1?” (It means that the data is perfectly linear and is fit exactly by a linear function.)
Design Principle(s): Support sense-making
Lesson Synthesis
Lesson Synthesis
Here are some questions for discussion.
- “What might a scatter plot look like when its line of best fit has a correlation coefficient of 0.9? Sketch it.” (It looks like points that follow a linear model very closely. The linear model has a positive slope.)
- “What does a scatter plot look like when its line of best fit has a correlation coefficient of -0.5? Sketch it.” (It looks like a loosely scattered cloud of data that trends downward from left to right.)
- “One line of best fit has a correlation coefficient of 0.88, and the other line of best fit has a correlation coefficient of -0.88. Han claims that the one with a positive correlation coefficient fits its data better. Is Han correct? Explain your reasoning.” (Han is probably not correct. The sign of the correlation coefficient tells you about the relationship between the variables—not the fit of the data. The positive correlation coefficient just means that as \(x\) increases, \(y\) also tends to increase. The residuals should also be examined in both cases to determine which data is fit by a linear model better.)
- “Why is it important to know the correlation coefficient for a linear model?” (The correlation coefficient is a way to quantify the fit of a given linear model and it allows you to compare the fits of different linear models for the same data.)
7.4: Cool-down - What Is a Correlation Coefficient? (5 minutes)
Cool-Down
For access, consult one of our IM Certified Partners.
Student Lesson Summary
Student Facing
While residuals can help pick the best line to fit the data among all lines, we still need a way to determine the strength of a linear relationship. Scatter plots of data that are close to the best fit line are better modeled by the line than scatter plots of data that are farther from the line.
The correlation coefficient is a convenient number that can be used to describe the strength and direction of a linear relationship. Usually represented by the letter \(r\), the correlation coefficient can take values from -1 to 1. The sign of the correlation coefficient is the same as the sign of the slope for the best fit line. The closer the correlation coefficient is to 0, the weaker the linear relationship. When the correlation coefficient is closer to 1 or -1, the linear model fits the data better.
While it is possible to try to fit a linear model to any data, you should always look at the scatter plot to see if there is a possible linear trend. The correlation coefficient and residuals can also help determine whether the linear model makes sense to use to estimate the situation. In some cases, another type of function might be a better fit for the data, or the two variables you are examining may be uncorrelated, and you should look for other connections using other variables.