Lesson 11

Comparing and Contrasting Data Distributions

  • Let’s investigate variability using data displays and summary statistics.

11.1: Math Talk: Mean

Evaluate the mean of each data set mentally.

27, 30, 33

61, 71, 81, 91, 101

0, 100, 100, 100, 100

0, 5, 6, 7, 12

11.2: Describing Data Distributions

  1. Your teacher will give you a set of cards. Take turns with your partner to match a data display with a written statement.
    1. For each match that you find, explain to your partner how you know it’s a match.
    2. For each match that your partner finds, listen carefully to their explanation. If you disagree, discuss your thinking and work to reach an agreement.
  2. After matching, determine if the mean or median is more appropriate for describing the center of the data set based on the distribution shape. Discuss your reasoning with your partner. If it is not given, calculate (if possible) or estimate the appropriate measure of center. Be prepared to explain your reasoning.

11.3: Visual Variability and Statistics

Each box plot summarizes the number of miles driven each day for 30 days in each month. The box plots represent, in order, the months of August, September, October, November, and December.

  1. The five box plots have the same median. Explain why the median is more appropriate for describing the center of the data set than the mean for these distributions.
  2. Arrange the box plots in order of least variability to greatest variability. Check with another group to see if they agree.
    1. Box plot from 0 to 90 by 10’s. Miles driven each day in August. Whisker from 5 to 20. Box from 20 to 60 with vertical line at 40. Whisker from 60 to 62.
    2. Box plot from 0 to 90 by 10’s. Miles driven each day in September. Whisker from 5 to 10. Box from 10 to 70 with vertical line at 40. Whisker from 70 to 90.
    3. Box plot from 0 to 90 by 10’s. Miles driven each day in October. Whisker from 15 to 20. Box from 20 to 70 with vertical line at 40. Whisker from 70 to 85.
    4. Box plot from 0 to 90 by 10’s. Miles driven each day in November. Whisker from 10 to 30. Box from 30 to 70 with vertical line at 40. Whisker from 70 to 80.
    5. Box plot from 0 to 90 by 10’s. Miles driven each day in December. Whisker from 10 to 30. Box from 30 to 50 with vertical line at 40. Whisker from 50 to 62.
  3. The five dot plots have the same mean. Explain why the mean is more appropriate for describing the center of the data set than the median.
  4. Arrange the dot plots in order of least variability to greatest variability. Check with another group to see if they agree.
    1. Dot plot from 5 to 15 by 1’s. Beginning at 5, number of dots above each increment is 0, 1, 1, 2, 3, 4, 3, 2, 1, 1, 0.
    2. Dot plot from 5 to 15 by 1’s. Beginning at 5, number of dots above each increment is 0, 1, 2, 3, 7, 15, 7, 3, 2, 1, 0.
    3. Dot plot from 5 to 15 by 1’s. Beginning at 5, number of dots above each increment is 0, 11, 10, 8, 5, 3, 5, 8, 10, 11, 0.
    4. Dot plot from 5 to 15 by 1’s. Beginning at 5, number of dots above each increment is 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 0.
    5. Dot plot from 5 to 15 by 1’s. Beginning at 5, number of dots above each increment is 0, 0, 0, 0, 0, 35, 0, 0, 0, 0, 0.


  1.  These two box plots have the same median and the same IQR. How could we compare the variability of the two distributions?

    Two box plots
  2. These two dot plots have the same mean and the same MAD. How could we compare the variability of the two distributions?

    Dot plot from 0 to 10 by 1. Beginning at 1, number of dots above each increment is 0, 1, 2, 0, 4, 7, 2, 1, 2, 1, 0.
    Dot plot from 0 to 10 by 1. Beginning at 1, number of dots above each increment is 0, 0, 1, 3, 5, 3, 3, 4, 1, 0, 0.
 

Summary

The mean absolute deviation, or MAD, is a measure of variability that is calculated by finding the mean distance from the mean of all the data points. Here are two dot plots, each with a mean of 15 centimeters, displaying the length of sea scallop shells in centimeters.

Dot plot from 11 to 19 by 1’s. Length in centimeters. Beginning at 11, number of dots above each increment is 0, 1, 2, 3, 5, 3, 2, 1, 0
Dot plot from 11 to 19 by 1’s. Length in centimeters. Beginning at 11, number of dots above each increment is 0, 0, 2, 4, 5, 4, 2, 0, 0.

Notice that both dot plots show a symmetric distribution so the mean and the MAD are appropriate choices for describing center and variability. The data in the first dot plot appear to be more spread apart than the data in the second dot plot, so you can say that the first data set appears to have greater variability than the second data set. This is confirmed by the MAD. The MAD of the first data set is 1.18 centimeters and the MAD of the second data set is approximately 0.94 cm. This means that the values in the first data set are, on average, about 1.18 cm away from the mean and the values in the second data set are, on average, about 0.94 cm away from the mean. The greater the MAD of the data, the greater the variability of the data.

The interquartile range, IQR, is a measure of variability that is calculated by subtracting the value for the first quartile, Q1, from the value for the the third quartile, Q3. These two box plots represent the distributions of the lengths in centimeters of a different group of sea scallop shells, each with a median of 15 centimeters.

Box plot from 2 to 20 by 1’s. Length in centimeters. Whisker from 3 to 5. Box from 5 to 19 with vertical line at 15. Whisker from 19 to 20.
Box plot from 2 to 20 by 1’s. Length in centimeters. Whisker from 3 to 9. Box from 9 to 19 with vertical line at 15. Whisker from 19 to 20.

Notice that neither of the box plots have a symmetric distribution. The median and the IQR are appropriate choices for describing center and variability for these data sets. The middle half of the data displayed in the first box plot appear to be more spread apart, or show greater variability, than the middle half of the data displayed in the second box plot. The IQR of the first distribution is 14 cm and 10 cm for the second data set. The IQR measures the difference between the median of the second half of the data, Q3, and the median of the first half, Q1, of the data, so it is not impacted by the minimum or the maximum value in the data set. It is a measure of the spread of the middle 50% of the data.

The MAD is calculated using every value in the data while the IQR is calculated using only the values for Q1 and Q3.

Video Summary

Glossary Entries

  • statistic

    A quantity that is calculated from sample data, such as mean, median, or MAD (mean absolute deviation).