Lesson 10
The Effect of Extremes
- Let’s see how statistics change with the data.
10.1: Battle Royale
Several video games are based on a genre called "Battle Royale" in which 100 players are on an island and they fight until only 1 player remains and is crowned the winner. This type of game can often be played in solo mode as individuals or in team mode in groups of 2.
- What information would you use to determine the top players in each mode (solo and team)? Explain your reasoning.
- One person claims that the best solo players play game A. Another person claims that game B has better solo players. How could you display data to help inform their discussion? Explain your reasoning.
10.2: Separated by Skew
-
Use the applet to create a dot plot that represents the distribution of the data, then describe the shape of the distribution.
-
Find the mean and median of the data.
-
Find the mean and median of the data with 2 additional values included as described.
-
Add 2 values to the original data set that are greater than 14.
-
Add 2 values to the original data set that are less than 6.
-
Add 1 value that is greater than 14 and 1 value that is less than 6 to the original data set.
-
Add the two values, 50 and 100, to the original data set.
-
-
Change the values so the distribution fits the description given to you by your teacher, then find the mean and median.
-
Find another group that created a distribution with a different description. Explain your work and listen to their explanation, then compare your measures of center.
10.3: Plots Matching Measures
Create a possible dot plot with at least 10 values for each of the conditions listed. Each dot plot must have at least 3 values that are different.
- a distribution that has both mean and median of 10
- a distribution that has both mean and median of -15
- a distribution that has a median of 2.5 and a mean greater than the median
- a distribution that has a median of 5 and a median greater than the mean
The mean and the median are by far the most common measures of center for numerical data. There are other measures of center, though, that are sometimes used. For each measure of center, list some possible advantages and disadvantages. Be sure to consider how it is affected by extremes.
-
Interquartile mean: The mean of only those points between the first quartile and the third quartile.
-
Midhinge: The mean of the first quartile and the third quartile.
-
Midrange: The mean of the minimum and maximum value.
-
Trimean: The mean of the first quartile, the median, the median again, and the third quartile. So we are averaging four numbers as the median is counted twice.
Summary
Is it better to use the mean or median to describe the center of a data set?
The mean gives equal importance to each value when finding the center. The mean usually represents the typical values well when the data has a symmetric distribution. On the other hand, the mean can be greatly affected by changes to even a single value.
The median tells you the middle value in the data set, so changes to a single value usually do not affect the median much. So, the median is more appropriate for data that is not very symmetric.
We can look at the distribution of a data set and draw conclusions about the mean and the median.
Here is a dot plot showing the amount of time a dart takes to hit a target in seconds. The data produces a symmetric distribution.
When a distribution is symmetric, the median and mean are both found in the middle of the distribution. Since the median is the middle value (or mean of the two middle values) of a data set, you can use the symmetry around the center of a symmetric distribution to find it easily. For the mean, you need to know that the sum of the distances away from the mean of the values greater than the mean is equal to the sum of the distances away from the mean of the values less than the mean. Using the symmetry of the symmetric distribution you can see that there are four values 0.1 second above the mean, two values 0.2 seconds above the mean, one value 0.3 seconds above the mean, and one value 0.4 seconds above the mean. Likewise, you can see that there are the same number of values the same distances below the mean.
Here is a dot plot using the same data, but with two of the values changed, resulting in a skewed distribution.
When you have a skewed distribution, the distribution is not symmetric, so you are not able to use the symmetry to find the median and the mean. The median is still 1.4 seconds since it is still the middle value. The mean, on the other hand, is now about 1.273 seconds. The mean is less than the median because the lower values (0.3 and 0.4) result in a smaller value for the mean.
The median is usually more resistant to extreme values than the mean. For this reason, the median is the preferred measure of center when a distribution is skewed or if there are extreme values. When using the median, you would also use the IQR as the preferred measure of variability. In a more symmetric distribution, the mean is the preferred measure of center and the MAD is the preferred measure of variability.
Glossary Entries
- statistic
A quantity that is calculated from sample data, such as mean, median, or MAD (mean absolute deviation).