Lesson 14

Outliers & Means

These materials, when encountered before Algebra 1, Unit 1, Lesson 14 support success in that lesson.

14.1: Math Talk: Outliers (5 minutes)

Warm-up

The purpose of this Math Talk is to elicit strategies and understandings students have for multiplying by 1.5 and adding to another value. These understandings help students develop fluency and will be helpful later when students will need to be able to compute the cut-off values for outliers.

Launch

Display one problem at a time. Give students quiet think time for each problem and ask them to give a signal when they have an answer and a strategy. Keep all problems displayed throughout the talk. Follow with a whole-class discussion.

Student Facing

Solve each expression mentally.

\(20 + 1.5(10)\)

\(20 - 1.5(10)\)

\(20 + 1.5(14)\)

\(20 + 1.5(13)\)

Student Response

For access, consult one of our IM Certified Partners.

Activity Synthesis

Ask students to share their strategies for each problem. Record and display their responses for all to see. To involve more students in the conversation, consider asking:

  • “Who can restate \(\underline{\hspace{.5in}}\)’s reasoning in a different way?”
  • “Did anyone have the same strategy but would explain it differently?”
  • “Did anyone solve the problem in a different way?”
  • “Does anyone want to add on to \(\underline{\hspace{.5in}}\)’s strategy?”
  • “Do you agree or disagree? Why?”

14.2: Mountain Hike (15 minutes)

Activity

Students compare the mean of a set of data with the mean after an outlier is introduced, and notice that the outlier has a significant effect on the mean.

Making statistical technology available gives students an opportunity to choose appropriate tools strategically (MP5) to quickly compute the new means with technology instead of computing by hand each time.

Student Facing

two people hiking up a mountain trail

Andre records how long it takes him (in minutes) to hike a mountain each day for 6 days.

  • 50
  • 52
  • 58
  • 55
  • 59
  • 50
  1. Calculate the mean number of minutes it takes Andre to hike a mountain.
  2. Andre plans to hike the same mountain trail one more day. Estimate the time it will take him to complete the trail for the seventh day. Explain your reasoning.
  3. What do you think will happen to the mean time for the week if Andre's grandfather comes on the hike with him for the seventh day?
  4. Calculate the mean number of minutes using the values when Andre's grandfather comes on the hike for the seventh day: 50, 52, 58, 55, 59, 50, 130.
  5. If Andre’s grandfather did not come with him on the hike, Andre thinks he could have finished the trail in 60 minutes. Calculate the mean hiking time using Andre’s estimate for the seventh day: 50, 52, 58, 55, 59, 50, 60.

Student Response

For access, consult one of our IM Certified Partners.

Activity Synthesis

Ask students what they notice and wonder about the mean values related to whether Andre’s grandfather goes on the hike with him or not. (Students may notice that the mean significantly increases when Andre’s grandfather goes on the hike, and that their estimate was likely inaccurate. They may wonder why the mean changed so drastically with only 1 new data point, and (if their estimate was wrong) why the mean changes in a way that’s different from what they thought would happen.)

Point out that the mean changes so much because 130 minutes is significantly greater than all the other values. When the seventh value is closer to the rest of the data (59 minutes), the mean is not very different from the original.

  • “As Andre trains for a mountain hiking race, he wants to track how long it takes him to climb the mountain. Should he include the 130-minute data point in the mean time it takes him to hike the mountain? Explain your reasoning.” (He should not include the time in his calculation of mean since he will probably not be running the race with his grandfather.)
  • “Even if Andre’s grandfather did not come with him on the hike, there are reasons it might take Andre 130 minutes to complete the hike. What are some circumstances that might increase Andre’s hiking time to 130 minutes and would make it reasonable to include in his mean time for the race?” (If the weather was bad or if he was injured, Andre’s time might be much longer and could also affect his race time, so he might consider including the value in his calculation of mean.)

Highlight that a value should be left in the analysis if it was collected accurately and in the right conditions for the situation at hand or if we are unsure of why it is different. If we are sure it is a typo, or the value is under significantly different conditions that don’t fit the situation at hand, we might leave it out. The most important thing, though, is that students are not under the impression that data should be thrown out just because it’s different and doesn’t match what we want. The default should always be to include the data and only remove if we are certain that it was an error or is measuring something very different than we intended.

14.3: The Meaning of an Outlier (20 minutes)

Activity

Students are presented several data sets and have to use a formula to determine whether they have outliers or not. Students then make guesses about possible scenarios in which the outlier is included in the data set. Since students are only guessing, it it important they understand that none of the outlier values should be removed. Students should explain their reasoning for deciding whether to include the outlier. 

Launch

Provide students with the formula for identifying an outlier. To determine if a data point is an outlier efficiently, students should follow these steps:

1. Arrange the data in order.

2. Find the median, first quartile (Q1), and third quartile (Q3).

3. Calculate the interquartile range (IQR) using \(\text{Q3} - \text{Q1}\).

4. Compute the values that will determine outliers with the expressions \(\text{Q1}–1.5 \boldcdot \text{IQR}\) and \(\text{Q3} + 1.5 \boldcdot \text{IQR}\).

Explain to students that the answers to the expressions they compute gives a range to use to identify outliers. \(\text{Q1}–1.5 \boldcdot \text{IQR}\) gives the lower boundary of the range, while \(\text{Q3} + 1.5 \boldcdot \text{IQR}\) gives the upper boundary. Any number that is outside of this range (whether too high or too low) is identified as an outlier.

Student Facing

For each set of data, answer these questions:

  • Use the data to compute the quartiles (Q1 and Q3) and the interquartile range.
  • Use the expressions \(\text{Q1} - 1.5\boldcdot \text{IQR}\) and \(\text{Q3} + 1.5\boldcdot \text{IQR}\) to help find where an outlier may be.
  • Are any of the values outliers? Explain your reasoning.
  1. A group of students recorded the distance, in miles, of the park nearest their home:
    2.3, 4, 1.6, 15, 3.8, 0.75, 1.7
  2. Han visits a website to price the next phone he wants to get. He sees the following prices, in dollars:
    200, 485, 492, 512, 453, 503
  3. The amount of points Clare scored in her last 8 basketball games are:
    17, 14, 16, 2, 13, 14, 15, 17
  4. Kiran’s math test scores, as a percentage, were:
    57, 82, 80, 85, 89, 84
  5. The height in feet of the roller coasters at the amusement park are:
    415, 456, 423, 442, 30

Student Response

For access, consult one of our IM Certified Partners.

Activity Synthesis

Here are sample questions to promote a class discussion:

  • “If a value is accidentally recorded twice, should both data points be included in the data set?” (No, if it was an accident, that means that the 2nd appearance of the data point did not actually happen.)
  • “Could you identify an outlier without using the formula?” (No, the formula ensures that identificaton of an outlier is not left to chance. It needs to be supported mathematically, or else people would identify outliers based on their opinions, which is not appropriate when making statistical decisions.)

Tell students that in some cases, what can be considered an outlier is clear because it is so atypical compared to the rest of the data often revealed by visual representations. At other times, whether a point should be considered an outlier is harder to determine just by looking at the data. In those cases, the rule that students are encouraged to use in this class is the one given in this activity developed by John Tukey.