# Lesson 7

The Correlation Coefficient

## 7.1: Which One Doesn’t Belong: Linear Models (5 minutes)

### Warm-up

This warm-up prompts students to compare four scatter plots displaying data with linear and nonlinear trends. It gives students a reason to use language precisely (MP6) and gives you the opportunity to hear how they use terminology and talk about characteristics of the items in comparison to one another. To allow all students to access the activity, each item has one obvious reason it does not belong. Encourage students to move past the obvious reasons and find reasons based on mathematical properties.

### Launch

Arrange students in groups of 2–4. Display the scatter plots for all to see. Ask students to indicate when they have noticed one that does not belong and can explain why. Give students 1 minute of quiet think time and then time to share their thinking with their small group. In their small groups, tell each student to share their reasoning as to why a particular item does not belong and together, find at least one reason each item doesn’t belong.

### Student Facing

Which one doesn’t belong?

### Student Response

For access, consult one of our IM Certified Partners.

### Activity Synthesis

Ask each group to share one reason why a particular item does not belong. Record and display the responses for all to see. After each response, ask the class if they agree or disagree. Since there is no single correct answer to the question asking which one does not belong, attend to students’ explanations and ensure the reasons given are correct.

During the discussion, ask students to explain the meaning of any terminology they use, such as linear, nonlinear, and random. Also, press students on unsubstantiated claims.

## 7.2: Card Sort: Scatter Plot Fit (20 minutes)

### Activity

In this activity, students are given cards displaying scatter plots of data that can be fit by linear models with varying accuracy. Cards show data that is random, poorly fit by a linear model, well fit by a linear model, and data that is better fit by another type of function, such as quadratic or exponential. Students should begin to recognize these differences and the connection to the correlation coefficient.

A sorting task gives students opportunities to analyze representations, statements, and structures closely and make connections (MP2, MP7).

Monitor for different ways groups choose to categorize the scatter plots, but especially for categories that distinguish between plots that would be modeled well with a linear function and those that would not.

### Launch

Arrange students in groups of 2. Tell them that in this activity, they will sort some cards into categories of their choosing. When they sort the scatter plots, they should work with their partner to come up with categories. Distribute one copy of the blackline master to each group.

Conversing: MLR8 Discussion Supports. As students work in pairs, ask them to take turns finding a match or category and explaining their reasoning to their partner. Display the following sentence frames for all to see: “_____ and _____ are alike because . . . .”, “I disagree because . . . .”, and “These are similar because . . . .” Encourage students to challenge each other when they disagree. This will help students clarify their understanding of linear models.
Design Principle(s): Support sense-making; Maximize meta-awareness

### Student Facing

Your teacher will give you a set of cards that show scatter plots of data. Sort the cards into 2 categories of your choosing. Be prepared to explain the meaning of your categories. Then, sort the cards into 2 categories in a different way. Be prepared to explain the meaning of your new categories.

### Student Response

For access, consult one of our IM Certified Partners.

### Activity Synthesis

Select groups of students to share their categories and how they sorted their equations. You can choose as many different types of categories as time allows, but ensure that one set of categories distinguishes between plots that would be modeled well with a linear function and those that would not. Attend to the language that students use to describe their categories and equations, giving them opportunities to describe their equations more precisely. Highlight the use of terms like linear model, fit, and non-linear.

Display the scatter plots with the best fit lines and $$r$$ values.

Give students 1 minute of quiet think time, and then 1 minute to discuss the things they notice with their partner, followed by a whole-class discussion.

Among things students should notice are:

1. the sign of $$r$$ is the same as the sign of the slope of the best fit line
2. the values for $$r$$ seem to go from -1 to 1
3. the closer $$r$$ is to 1 or -1, the stronger the linear relationship between the variables
4. the closer $$r$$ is to 0, the weaker the linear relationship between the variables

Note that the sign of the correlation coefficient matches the sign of the slope of the best fit line, but the value for $$r$$ is not otherwise related to the slope. If $$r = 0.8$$, the best fit line will have a positive slope, but whether the slope is 0.2 or 2,000 is not clear without examining the data.

Action and Expression: Develop Expression and Communication. Maintain a display of important terms and vocabulary. Review the following terms from previous lessons that students may have used or wanted words for during this activity: fit, linear model, non-linear, increasing, decreasing, pattern, random. Correlation coefficient can be added to this display as a new term.
Supports accessibility for: Memory; Language

## 7.3: Matching Correlation Coefficients (10 minutes)

### Activity

In this activity, students gain a better understanding of correlation coefficients by taking turns with a partner to match scatter plots and correlation coefficients. Students trade roles, explaining their thinking and listening, providing opportunities to explain their reasoning and critique the reasoning of others (MP3).

### Launch

Tell students that the $$r$$ value is called a correlation coefficient. A correlation coefficient is one way to measure the strength of a linear relationship. Tell students that:

1. the sign of $$r$$ is the same as the sign of the slope of the best fit line
2. the values for $$r$$ go from -1 to 1, inclusive
3. the closer $$r$$ is to 1 or -1, the stronger the linear relationship between the variables
4. the closer $$r$$ is to 0, the weaker the linear relationship between the variables

Arrange students in groups of 2. Tell students that for each scatter plot, one partner finds the associated correlation coefficient and explains why they think it goes with that scatter plot. The other partner’s job is to listen and make sure they agree. If they don’t agree, the partners discuss until they come to an agreement. For the next scatter plot, the students swap roles. If necessary, demonstrate this protocol before students start working.

Representation: Develop Language and Symbols. Display or provide charts with symbols and meanings. Create a chart of the $$r$$ value ranging from -1 to 1. Label with the corresponding features of the various domains of the $$r$$ value. Select students to label the chart with the corresponding descriptors for each domain. Small sketches or print outs of example scatter plots can be added to the appropriate areas of the chart.
Supports accessibility for: Conceptual processing; Memory

### Student Facing

1. Take turns with your partner to match a scatter plot with a correlation coefficient.
2. For each match you find, explain to your partner how you know it’s a match.
3. For each match your partner finds, listen carefully to their explanation. If you disagree, discuss your thinking and work to reach an agreement.
1. $$r = \text-1$$
2. $$r = \text-0.95$$
3. $$r = \text-0.74$$
4. $$r = \text-0.06$$
5. $$r = 0.48$$
6. $$r = 0.65$$
7. $$r = 0.9$$
8. $$r = 1$$

### Student Response

For access, consult one of our IM Certified Partners.

### Student Facing

#### Are you ready for more?

Jada wants to know if the speed that people walk is correlated with their texting speed. To investigate this, she measured the distance, in feet, that 5 of her friends walked in 30 seconds and the number of characters they texted during that time. Each of the 5 friends took 4 walks for a total of 20 walks. Here are the results of the first 20 walks.

 distance (feet) number of characters texted distance (feet) number of characters texted 105 142 95 138 125 110 125 110 115 120 160 80 140 98 175 64 145 102 130 106 160 89 140 95 170 72 150 95 140 100 155 90 130 107 160 74 105 113 135 108

Over the next few days, the same 5 friends practiced walking and texting to see if they could walk faster and text more characters. They did not record any more data while practicing. After practicing, each of the 5 friends took another 4 walks. Here are the results of the final 20 walks.

 distance (feet) number of characters texted distance (feet) number of characters texted 140 140 165 151 150 155 170 136 160 151 190 143 155 170 205 132 180 125 205 128 205 130 210 140 225 95 215 109 175 161 220 105 195 108 230 126 155 142 225 138
1. What do you notice about the 2 scatter plots?
2. Jada noticed that her friends walked further and texted faster during the last 20 walks than they did during the first 20 walks. Since both were faster, she predicts that the correlation coefficient of the line of best fit for the last 20 walks will be closer to -1 then the correlation coefficient of the line of best fit for the first 20 walks. Do you agree with Jada? Explain your reasoning.
3. Use technology to find an equation of the line of best fit and the correlation coefficient for each data set. Was your answer to the previous question correct?
4. Why do you think the correlation coefficients for the 2 data sets are so different? Explain your reasoning.

### Student Response

For access, consult one of our IM Certified Partners.

### Anticipated Misconceptions

Students may struggle with starting to match the scatter plots with a correlation coefficient. Guide students by asking them about the sign of the correlation coefficients. Ask them to sort the cards into groups that make sense and use that to connect to the correlation coefficient values. Ask them: “How does the sign of the correlation coefficient relate to the linear model?”

### Activity Synthesis

The purpose of this discussion is for students to understand that the correlation coefficient is a formal way to quantify the strength of a linear relationship between variables, and that the sign of the correlation coefficient tells you whether or not the variables show a positive or negative association.

Here are some questions for discussion.

• “What does the sign of the correlation coefficient tell you about the data?” (If it is negative, then $$y$$ tends to decrease as $$x$$ increases. If it is positive, then $$y$$ tends to increase as $$x$$ increases.)
• “What does it mean to have a correlation coefficient of 1 or -1?” (It means that the data is perfectly linear and is fit exactly by a linear function.)
Speaking: MLR8 Discussion Supports. Use this routine to support whole-class discussion. Each time a student shares their ideas about the meaning of the correlation coefficient, press for details by requesting students to challenge an idea, elaborate on an idea, or give an example. Revoice student responses to demonstrate mathematical language use by applying appropriate language. Call students' attention to any words or phrases that helped clarify the original statement. If needed, practice phrases or words through choral response. This provides more students with an opportunity to produce language as they share their ideas about the relationship between data represented in scatter plots and correlation coefficients.
Design Principle(s): Support sense-making

## Lesson Synthesis

### Lesson Synthesis

Here are some questions for discussion.

• “What might a scatter plot look like when its line of best fit has a correlation coefficient of 0.9? Sketch it.” (It looks like points that follow a linear model very closely. The linear model has a positive slope.)
• “What does a scatter plot look like when its line of best fit has a correlation coefficient of -0.5? Sketch it.” (It looks like a loosely scattered cloud of data that trends downward from left to right.)
• “One line of best fit has a correlation coefficient of 0.88, and the other line of best fit has a correlation coefficient of -0.88. Han claims that the one with a positive correlation coefficient fits its data better. Is Han correct? Explain your reasoning.” (Han is probably not correct. The sign of the correlation coefficient tells you about the relationship between the variables—not the fit of the data. The positive correlation coefficient just means that as $$x$$ increases, $$y$$ also tends to increase. The residuals should also be examined in both cases to determine which data is fit by a linear model better.)
• “Why is it important to know the correlation coefficient for a linear model?” (The correlation coefficient is a way to quantify the fit of a given linear model and it allows you to compare the fits of different linear models for the same data.)

## 7.4: Cool-down - What Is a Correlation Coefficient? (5 minutes)

### Cool-Down

For access, consult one of our IM Certified Partners.

## Student Lesson Summary

### Student Facing

While residuals can help pick the best line to fit the data among all lines, we still need a way to determine the strength of a linear relationship. Scatter plots of data that are close to the best fit line are better modeled by the line than scatter plots of data that are farther from the line.

The correlation coefficient is a convenient number that can be used to describe the strength and direction of a linear relationship. Usually represented by the letter $$r$$, the correlation coefficient can take values from -1 to 1. The sign of the correlation coefficient is the same as the sign of the slope for the best fit line. The closer the correlation coefficient is to 0, the weaker the linear relationship. When the correlation coefficient is closer to 1 or -1, the linear model fits the data better.

While it is possible to try to fit a linear model to any data, you should always look at the scatter plot to see if there is a possible linear trend. The correlation coefficient and residuals can also help determine whether the linear model makes sense to use to estimate the situation. In some cases, another type of function might be a better fit for the data, or the two variables you are examining may be uncorrelated, and you should look for other connections using other variables.