Lesson 6
Residuals
6.1: Math Talk: Differences in Expectations (5 minutes)
Warm-up
The purpose of this Math Talk is to elicit strategies and understandings students have for subtracting an estimated value from an actual value. These understandings help students develop fluency and will be helpful later in this lesson when students will need to be able to compute residuals from a linear model.
Launch
Display the instructions for all to see. Ask students, ”How would the answers to these two questions be different?“
- Actual value: 21 cars. Estimated value: 20 cars
- Actual value: 20 cars. Estimated value: 21 cars
(The first question has an answer of 1 car. The second question has an answer of -1 car.)
Display one problem at a time. Give students quiet think time for each problem and ask them to give a signal when they have an answer and a strategy. Keep all problems displayed throughout the talk. Follow with a whole-class discussion.
Supports accessibility for: Memory; Organization
Student Facing
Mentally calculate how close the estimate is to the actual value using the difference: \(\text{actual value} - \text{estimated value}\).
Actual value: 24.8 grams. Estimated value: 19.6 grams
Actual value: $112.11. Estimated value: $109.30
Actual value: 41.5 centimeters. Estimated value: 45.90 centimeters
Actual value: -1.34 degrees Celsius. Estimated value: -2.45 degrees Celsius
Student Response
For access, consult one of our IM Certified Partners.
Activity Synthesis
Ask students to share their strategies for each problem. Record and display their responses for all to see. To involve more students in the conversation, consider asking:
- “Who can restate \(\underline{\hspace{.5in}}\)’s reasoning in a different way?”
- “Did anyone have the same strategy but would explain it differently?”
- “Did anyone solve the problem in a different way?”
- “Does anyone want to add on to \(\underline{\hspace{.5in}}\)’s strategy?”
- “Do you agree or disagree? Why?”
Design Principle(s): Optimize output (for explanation)
6.2: Oranges Return (15 minutes)
Activity
This lesson uses the data from the video of orange weights from the activity Orange You Glad We’re Boxing Fruit. The mathematical purpose of this activity is to introduce the concept of residuals, and to have students plot and analyze the residuals to informally assess the fit of a function. In this activity, students also fit a function to data, and use the function to solve problems. Students learn that a residual is the difference between the actual \(y\)-value for a point and the expected \(y\)-value for the point on the linear model with the same associated \(x\)-value.
Launch
Students will use what they learned in an earlier lesson to create a spreadsheet and best fit line of the data from the video about weighing oranges, stated here as a quick review.
Copy and paste the table into a blank line using the Desmos graphing tool available in Math Tools, or by navigating to desmos.com/calculator. Do not include the table header when copying. A scatter plot will appear in the graphing window.
To find the equation of the best fit line, type \(y_1 \sim ax_1 + b\), and Desmos will compute the values for the parameters \(a\) and \(b\).
In Desmos, to show the \(y\)-value predicted by the linear model at a given \(x\)-value, click and hold on the best fit line at the \(x\)-value you're considering. The \((x,y)\) coordinates will be displayed. When the parameters of a best fit line are calculated, the residuals are also calculated. They are stored in a list, which you can see in the table and on the graph by clicking on the box labeled “plot.” Because Desmos graphs the residuals automatically, some questions are slightly different than the print version.
Supports accessibility for: Visual-spatial processing
Student Facing
Use this data from the video about weighing oranges to answer the questions.
number of oranges | weight in kilograms |
---|---|
3 | 1.027 |
4 | 1.162 |
5 | 1.502 |
6 | 1.617 |
7 | 1.761 |
8 | 2.115 |
9 | 2.233 |
10 | 2.569 |
- Use technology to make a scatter plot of orange weights and find the line of best fit.
- What does the linear model estimate for the weight of the box of oranges for each of the numbers of oranges?
number of oranges actual weight in kilograms linear estimate weight in kilograms 3 1.027 4 1.162 5 1.502 6 1.617 7 1.761 8 2.115 9 2.233 10 2.569 - Compare the weights of the box with 3 oranges in it to the estimated weight of the box with 3 oranges in it. Explain or show your reasoning.
- How many oranges are in the box when the linear model estimates the weight best? Explain or show your reasoning.
- How many oranges are in the box when the linear model estimates the weight least well? Explain or show your reasoning.
- The difference between the actual value and the value estimated by a linear model is called the residual. If the actual value is greater than the estimated value, the residual is positive. If the actual value is less than the estimated value, the residual is negative. For the orange weight data set, what is the residual for the best fit line when there are 3 oranges?
- With digital technology, you can graph the residuals all at once. Check out the graph of the residuals. When graphed on the same axes as the scatter plot, what are the coordinates of the point where \(x = 8\) and \(y\) has the value of the residual?
- Which point on the scatter plot has the residual closest to zero? What does this mean about the weight of the box with that many oranges in it?
- How can you use the residuals to decide how well a line fits the data?
Student Response
For access, consult one of our IM Certified Partners.
Launch
The digital version of this activity includes instructions for plotting the residuals of the data. If you will be using graphing technology other than Desmos for this activity, you may need to prepare alternate instructions.
Display the data from the video about weighing oranges:
number of oranges | weight in kilograms |
---|---|
3 | 1.027 |
4 | 1.162 |
5 | 1.502 |
6 | 1.617 |
7 | 1.761 |
8 | 2.115 |
9 | 2.233 |
10 | 2.569 |
Supports accessibility for: Visual-spatial processing
Student Facing
- For the scatter plot of orange weights from a previous lesson, use technology to find the line of best fit.
- What level of accuracy makes sense for the slope and intercept values? Explain your reasoning.
- What does the linear model estimate for the weight of the box of oranges for each of the number of oranges?
number of oranges actual weight in kilograms linear estimate weight in kilograms 3 1.027 4 1.162 5 1.502 6 1.617 7 1.761 8 2.115 9 2.233 10 2.569 - Compare the weights of the box with 3 oranges in it to the estimated weight of the box with 3 oranges in it. Explain or show your reasoning.
- How many oranges are in the box when the linear model estimates the weight best? Explain or show your reasoning.
- How many oranges are in the box when the linear model estimates the weight least well? Explain or show your reasoning.
- The difference between the actual value and the value estimated by a linear model is called the residual. If the actual value is greater than the estimated value, the residual is positive. If the actual value is less than the estimated value, the residual is negative. For the orange weight data set, what is the residual for the best fit line when there are 3 oranges? On the same axes as the scatter plot, plot this residual at the point where \(x = 3\) and \(y\) has the value of the residual.
-
Find the residuals for each of the other points in the scatter plot and graph them.
- Which point on the scatter plot has the residual closest to zero? What does this mean about the weight of the box with that many oranges in it?
- How can you use the residuals to decide how well a line fits the data?
Student Response
For access, consult one of our IM Certified Partners.
Anticipated Misconceptions
Students may not understand how to determine if the linear model estimates the weight of oranges well or poorly. Ask them to determine the weight that the model estimates, then ask how close that estimate is to the actual weight.
Activity Synthesis
Compare student answers to the question about the point that the line estimates best to the answer for the question about the residual closest to zero.
Show a graph of the residuals.
Ask students:
- “What does it mean for the residual to be positive? Negative?” (The residual is positive when the actual data value is greater than what the model estimates for that \(x\) value and negative when the actual data value is less than the estimate.)
- “What does it mean when a residual is on or close to the horizontal axis?” (It means that the line of best fit passes through or comes close to passing through that point in the graph.)
- “Find the residual that has the furthest vertical distance from the horizontal axis. What does this mean in the context of the scatter plot and the line of best fit?” (The residual that is furthest from the horizontal axis has the same \(x\)-coordinate as the point that is the greatest vertical distance away from the line of best fit in the scatter plot.)
Design Principle(s): Support sense-making
6.3: Best Residuals (15 minutes)
Activity
In this activity, students take turns with a partner, matching graphs of residuals to scatter plots that display linear models. Students trade roles, explaining their thinking and listening, providing opportunities to explain their reasoning and critique the reasoning of others (MP3). They should begin to recognize that a plot of the residuals for data that is fit well by a linear model shows residuals that are close to the \(x\)-axis and do not show a noticeable trend.
Launch
Arrange students in groups of 2. Demonstrate how to set up and find matches. Choose a student to be your partner. Mix up the cards and place them face up. Point out that the cards contain either a scatter plot with a linear model or a graph of the residuals. Select one of each style of card and then explain to your partner why you think the cards do or do not match. Demonstrate productive ways to agree or disagree—for example, by explaining your mathematical thinking or asking clarifying questions. Give each group a set of cut-up cards for matching.
Design Principle(s): Support sense-making; Optimize output (for explanation)
Supports accessibility for: Conceptual processing; Visual-spatial processing
Student Facing
- Match the scatter plots and given linear models to the graph of the residuals.
- Turn the scatter plots over so that only the residuals are visible. Based on the residuals, which line would produce the most accurate estimates? Which line fits its data worst?
Student Response
For access, consult one of our IM Certified Partners.
Student Facing
Are you ready for more?
-
Tyler estimates a line of best fit for some data about the mass, in grams, of different numbers of apples. Here is the graph of the residuals.
- What does Tyler’s line of best fit look like according to the graph of the residuals?
- How well does Tyler’s line of best fit model the data? Explain your reasoning.
-
Lin estimates a line of best fit for the same data. The graph shows the residuals.
- What does Lin’s line of best fit look like in comparison to the data?
- How well does Lin’s line of best fit model the data? Explain your reasoning.
-
Kiran also estimates a line of best fit for the same data. The graph shows the residuals.
- What does Kiran’s line of best fit look like in comparison to the data?
- How well does Kiran’s line of best fit model the data? Explain your reasoning.
- Who has the best estimate of the line of best fit—Tyler, Lin, or Kiran? Explain your reasoning.
Student Response
For access, consult one of our IM Certified Partners.
Activity Synthesis
The goal is to make sure students understand the connections between a scatter plot displaying a linear model and a graph of the residuals. A good linear model for the data will have residuals that are scattered on either side of the \(x\)-axis without a clear pattern and close to the axis.
Much discussion takes place between partners. Invite students to share how they made the matches.
- “What were some ways you handled finding the matches for B and C? Recall that B and C used the same data but different linear models.” (The line in C was not as good of a fit as line B, so I knew the graph of the residuals would be farther away from the horizontal axis.)
- “Look at the matches for E and J. How can you tell from the graph of the residuals that the linear model is not a line of best fit?” (The first half of the residuals are positive and the second half are negative. This lets me know that the line does not pass through the middle of the data.)
- “What do you notice about the residuals in graph K? Explain what you notice in the context of scatter plot A.” (The residuals go in a u-shaped pattern. The data in the scatter plot does not appear linear and is curved, so when you plot a line through it, you would expect the residuals to show the curvature.)
- “Describe any difficulties you experienced and how you resolved them.” (It was tough figuring out how to decide where to begin when finding matches. I used my partner’s strategy of looking at the values on the \(x\)-axis to help narrow down the choices.)
Lesson Synthesis
Lesson Synthesis
Here are some questions for discussion.
- “Tyler looks at a graph of residuals and notices one of the points is 0.5 units above the horizontal axis and another point is 0.5 units below the horizontal axis. He says the point that is 0.5 units above the horizontal axis is closer to the line of best fit in the scatter plot because it is positive. Is Tyler correct? Explain your reasoning.” (Tyler is not correct, because both points are equidistant from the line of best fit in the scatter plot. The sign of the residual just tells you whether the point in the scatter plot is above or below the line of best fit. The absolute value of the residual tells you the vertical distance of the point from the line of best fit.)
- “When looking at the residuals for a linear model for data following a linear trend, Priya found that roughly half of the residuals were positive and the other half were negative. Do her findings about the residuals provide evidence to support the claim that the linear model used is a line of best fit? What else should Priya look for?” (Yes, there is evidence to support the claim. A line of best fit should go through the middle of the data, so roughly half the points should be above the line of best fit and the other half below the line of best fit. Priya should also look for any patterns in the residuals. If there is a pattern, then it is likely not the line of best fit. For example, in Graph L from the Best Residuals activity, the residuals were half above and half below the line of best fit, but they followed a pattern where the first half of the residuals were above the line and the second half of the residuals were below the line.)
- “How can you use a graph of the residuals to informally assess the fit of a function?” (You look to see how far away the residuals are from the horizontal axis. The closer they are, the better the fit. Also, look to see if the positive and negative residuals are distributed randomly. If most of them are positive or negative, or if they form some pattern, then the function is probably not a great fit.)
6.4: Cool-down - Deciding from Residuals (5 minutes)
Cool-Down
For access, consult one of our IM Certified Partners.
Student Lesson Summary
Student Facing
When fitting a linear model to data, it can be useful to look at the residuals. Residuals are the difference between the \(y\)-value for a point in a scatter plot and the value predicted by the linear model for that \(x\) value.
For example, in the scatter plot showing the length of the fish and the age of the fish, the residual for the fish that is 2 years old and 100 mm long is 8.06 mm, because the point is at \((2,100)\) and the linear function has the value 91.94 mm (\(34.08 \boldcdot 2 + 23.78\)) when \(x\) is 2. The residual of 8.06 mm means that the actual fish is about 8 millimeters longer than the linear model estimates for a fish of that same age.
When the point on the scatter plot is above the line, it has a positive residual. When the point on the scatter plot is below the line, the residual is a negative value. A line that has smaller residuals would be more likely to produce estimates that are close to the actual value.