Lesson 10

The Effect of Extremes

10.1: Battle Royale (5 minutes)

Warm-up

This warm-up prompts students to think about what variables they may use to analyze a situation. Then, students describe data displays they may use to compare two sets of data. Choosing variables and planning a process for comparing data sets engage students in aspects of mathematical modeling (MP4).

Listen for groups that choose a variable other than the number of wins to determine the top players in the game and groups that select different data displays or ways of comparing data sets to share with the whole group.

Launch

Arrange students in groups of 2. Tell students to think quietly about their answers to the questions for about 1 minute before discussing with their partner and then sharing with the whole group.

Student Facing

Several video games are based on a genre called "Battle Royale" in which 100 players are on an island and they fight until only 1 player remains and is crowned the winner. This type of game can often be played in solo mode as individuals or in team mode in groups of 2.

1. What information would you use to determine the top players in each mode (solo and team)? Explain your reasoning.
2. One person claims that the best solo players play game A. Another person claims that game B has better solo players. How could you display data to help inform their discussion? Explain your reasoning.

Student Response

Student responses to this activity are available at one of our IM Certified Partners

Activity Synthesis

Select previously identified students to share their solutions. If it does not come up in discussion, ask students how they might interpret a situation in which a small group of players has significantly greater number of wins or players defeated than the rest of the group. It might mean that most of the players are not very good and there are a few who are or it might mean that there are a few dominant players who are much better than average players.

10.2: Separated by Skew (15 minutes)

Activity

The mathematical purpose of this activity is to help students understand how measures of center for distributions with different shapes are impacted by changes in the data. Students will create a dot plot, then describe the shape of the distribution and find measures of center. They will investigate how the measures of center change when the data set changes. Students create and investigate a data set from a given a set of parameters including the shape of the distribution. Monitor for students using the correct terminology to describe the shape of the distribution.

This activity works best when each student has access to technology that computes measures of center and displays dot plots or histograms easily because students will benefit from seeing the relationship in a dynamic way. If students don't have individual access, projecting the distributions would be helpful during the launch.

Launch

Arrange students into groups of 2–4. After they have completed the problem asking them to add values to the original data, give each group one of these distribution descriptions:

1. uniform distribution with data between 4 and 12
2. skewed right with most of the values at 10
3. skewed left with most of the values at 10
4. symmetric with most of the values at 4 and 16

Remind students of the words used to describe shapes of distributions: symmetric, skewed, bell-shaped, uniform, bimodal.

Students will be using the applet at ggbm.at/TFKvWYSS, embedded in the activity.

Writing: MLR 5 Co-Craft Questions. Before asking students to answer the given questions, display only the data and ask students to write down possible mathematical questions about this data distribution. Keep in mind students do not need to answer their created questions. Invite students to share and revise their questions with a partner, and then with the whole class. Record questions shared with the class in a public space. This helps students produce the language of mathematical questions as they begin to reason about the relationship between extreme values and the measure of center. Design Principle(s): Maximize meta-awareness

Student Facing

1. Use the applet to create a dot plot that represents the distribution of the data, then describe the shape of the distribution.

2. Find the mean and median of the data.

3. Find the mean and median of the data with 2 additional values included as described.

1. Add 2 values to the original data set that are greater than 14.

2. Add 2 values to the original data set that are less than 6.

3. Add 1 value that is greater than 14 and 1 value that is less than 6 to the original data set.

4. Add the two values, 50 and 100, to the original data set.

4. Change the values so the distribution fits the description given to you by your teacher, then find the mean and median.

5. Find another group that created a distribution with a different description. Explain your work and listen to their explanation, then compare your measures of center.

Student Response

Student responses to this activity are available at one of our IM Certified Partners

Launch

Provide access to devices that can run GeoGebra or other statistical technology.

Arrange students into groups of 2–4. After they have completed the problem asking them to add values to the original data, give each group one of these distribution descriptions:

1. uniform distribution with data between 4 and 12.
2. skewed right with most of the values at 10.
3. skewed left with most of the values at 10.
4. symmetric with most of the values at 4 and 16.

Remind students of the words used to describe shapes of distributions: symmetric, skewed, bell-shaped, uniform, bimodal.

Writing: MLR 5 Co-Craft Questions. Before asking students to answer the given questions, display only the data and ask students to write down possible mathematical questions about this data distribution. Keep in mind students do not need to answer their created questions. Invite students to share and revise their questions with a partner, and then with the whole class. Record questions shared with the class in a public space. This helps students produce the language of mathematical questions as they begin to reason about the relationship between extreme values and the measure of center. Design Principle(s): Maximize meta-awareness

Student Facing

1. Use technology to create a dot plot that represents the distribution of the data, then describe the shape of the distribution.

• 6
• 7
• 8
• 8
• 9
• 9
• 9
• 10
• 10
• 10
• 10
• 11
• 11
• 11
• 12
• 12
• 13
• 14
2. Find the mean and median of the data.

3. Find the mean and median of the data with 2 additional values included as described.

1. Add 2 values to the original data set that are greater than 14.

2. Add 2 values to the original data set that are less than 6.

3. Add 1 value that is greater than 14 and 1 value that is less than 6 to the original data set.

4. Add the two values, 50 and 100, to the original data set.

4. Share your work with your group. What do you notice is happening with the mean and median based on the additional values?

5. Change the values so that the distribution fits the description given to you by your teacher, then find the mean and median.

6. Find another group that created a distribution with a different description. Explain your work and listen to their explanation, then compare your measures of center.

Student Response

Student responses to this activity are available at one of our IM Certified Partners

Anticipated Misconceptions

Students may have difficulty using technology to create dot plots so you may need to demonstrate how to use the technology. Students may confuse mean and median. Ask them to refer to previous work in which they calculated each measure of center. After students input the additional values as directed, they may use the wrong $$n$$ when calculating the new mean. Remind students to complete detailed calculations.

Activity Synthesis

The goal is to make sure that students understand that the median is the preferred measure of center when a distribution is skewed or if there are extreme values, and the the mean is the preferred measure of center when a distribution is symmetric and there are no extreme values. Here are some questions for discussion.

• “What do you notice and wonder about the mean and median for each of these distributions?” (I noticed that sometimes the median did not change and the mean did. I wondered what would happen if I added a value of 1,000 to the data set.)
• “For which distributions does it look like the mean best represents what is typical in the data?” (The symmetric distributions.)
• “When is the median a better statistic to describe typical values?” (The skewed distributions.)
• “Why is the median a better statistic for skewed distributions?” (When you add extreme values to a data set they tend to have a greater effect on the mean than the median.)
Engagement: Develop Effort and Persistence. Break the class into small group discussion groups and then invite a representative from each group to report back to the whole class.
Supports accessibility for: Language; Social-emotional skills; Attention

10.3: Plots Matching Measures (10 minutes)

Activity

The mathematical purpose of this activity is to recognize the relationship between measures of center and the shape of the distribution by creating and describing distributions with given measures of center. Listen for students using the terms symmetric, uniform, and skewed. When students create a dot plot with a given mean and median using technology they are engaging in MP7 because they are using their understanding of the structure of the distribution to adjust individual data values to change the measures of center. Making a spreadsheet available gives students an opportunity to choose appropriate tools strategically (MP5).

Launch

Keep students in their groups.

Action and Expression: Internalize Executive Functions. Provide students with grid or graph paper to organize their work with the 3 dot plots.
Supports accessibility for: Language; Organization

Student Facing

Create a possible dot plot with at least 10 values for each of the conditions listed. Each dot plot must have at least 3 values that are different.

1. a distribution that has both mean and median of 10
2. a distribution that has both mean and median of -15
3. a distribution that has a median of 2.5 and a mean greater than the median
4. a distribution that has a median of 5 and a median greater than the mean

Student Response

Student responses to this activity are available at one of our IM Certified Partners

Student Facing

The mean and the median are by far the most common measures of center for numerical data. There are other measures of center, though, that are sometimes used. For each measure of center, list some possible advantages and disadvantages. Be sure to consider how it is affected by extremes.

1. Interquartile mean: The mean of only those points between the first quartile and the third quartile.

2. Midhinge: The mean of the first quartile and the third quartile.

3. Midrange: The mean of the minimum and maximum value.

4. Trimean: The mean of the first quartile, the median, the median again, and the third quartile. So we are averaging four numbers as the median is counted twice.

Student Response

Student responses to this activity are available at one of our IM Certified Partners

Activity Synthesis

The purpose of this discussion is for students to understand why the median is the preferred measure of center when a distribution is skewed or if there are extreme values, and the mean is the preferred measure of center when a distribution is symmetric and there are no extreme values.

For each description, select 2–3 groups to share their dot plots.

Here are some questions for discussion.

• “For the first and second dot plot, what do the distribution shapes have in common? Why do we choose the mean as the more appropriate measure of center?” (Symmetric. The mean of a set of data gives equal importance to each value to find the center, so it is a preferred measure of center when it accurately represents typical values for data.)
• “What do the shapes of the dot plots have in common when the mean is greater than the mean?” (Skewed right.)
• “What information does the shape of the skewed distributions tell you about the median and mean?” (When distributions are skewed right they will likely have a mean that is greater than the median because the values to the right disproportionately impact the mean. When distributions are skewed left they will likely have a mean that is less than the median because the values to the left disproportionately impact the mean.)
Speaking: MLR 8 Discussion Supports. As groups share their dot plots with the whole class, revoice student ideas for determining the appropriate measure of center based on the shape of the distribution. Be sure to amplify mathematical uses of language by restating a statement as a question in order to clarify, apply appropriate language, and involve more students. For example, “Can someone else explain why the median is used for skewed data?”  Press for details in students’ explanations by requesting students to elaborate on an idea or give an example from their data representation. This will help students define sequences with equations and answer questions about the context. Design Principle(s): Support sense-making

Lesson Synthesis

Lesson Synthesis

Here are some questions to draw out the relationship between measures of center and the shape of the distribution.

• “Why is the median preferred to the mean for skewed data?” (The values way to the right (or left) in skewed data have a greater effect on the mean, so the median is preferred to better reflect the typical values.)
• “When an extreme value is present, why is the median preferred to the mean?” (Extreme values have a greater effect on the mean than the median so the median is preferred.)
• “When data is symmetric or approximately symmetric, why is the mean preferred to the median?” (The mean takes into account every data value so it is the preferred measure when it is representative of what is typical for the data.)

10.4: Cool-down - Shape and Statistics (5 minutes)

Cool-Down

Cool-downs for this lesson are available at one of our IM Certified Partners

Student Lesson Summary

Student Facing

Is it better to use the mean or median to describe the center of a data set?

The mean gives equal importance to each value when finding the center. The mean usually represents the typical values well when the data has a symmetric distribution. On the other hand, the mean can be greatly affected by changes to even a single value.

The median tells you the middle value in the data set, so changes to a single value usually do not affect the median much. So, the median is more appropriate for data that is not very symmetric.

We can look at the distribution of a data set and draw conclusions about the mean and the median.

Here is a dot plot showing the amount of time a dart takes to hit a target in seconds. The data produces a symmetric distribution.

When a distribution is symmetric, the median and mean are both found in the middle of the distribution. Since the median is the middle value (or mean of the two middle values) of a data set, you can use the symmetry around the center of a symmetric distribution to find it easily. For the mean, you need to know that the sum of the distances away from the mean of the values greater than the mean is equal to the sum of the distances away from the mean of the values less than the mean. Using the symmetry of the symmetric distribution you can see that there are four values 0.1 second above the mean, two values 0.2 seconds above the mean, one value 0.3 seconds above the mean, and one value 0.4 seconds above the mean. Likewise, you can see that there are the same number of values the same distances below the mean.

Here is a dot plot using the same data, but with two of the values changed, resulting in a skewed distribution.

When you have a skewed distribution, the distribution is not symmetric, so you are not able to use the symmetry to find the median and the mean. The median is still 1.4 seconds since it is still the middle value. The mean, on the other hand, is now about 1.273 seconds. The mean is less than the median because the lower values (0.3 and 0.4) result in a smaller value for the mean.

The median is usually more resistant to extreme values than the mean. For this reason, the median is the preferred measure of center when a distribution is skewed or if there are extreme values. When using the median, you would also use the IQR as the preferred measure of variability. In a more symmetric distribution, the mean is the preferred measure of center and the MAD is the preferred measure of variability.