Lesson 7
Box Plots and Interquartile Range
7.1: Notice and Wonder: Two Parties (5 minutes)
Warm-up
In earlier lessons, students learned that the mean absolute deviation (MAD) is a measure of variability. In this warm-up, they study two distributions that appear very different but turn out to have the same MAD. Students notice that the MAD may not fully tell us about the variability of a data set. The work here motivates the need to have a different way to quantify variability, which is the focus of this lesson. While students may notice and wonder many things about these images, highlight ideas related to the variability of the data sets.
Launch
Arrange students in groups of 2. Display the dot plots for all to see. Ask students to identify at least one thing they notice and at least one thing they wonder about the dot plots, and to give a signal when they have both. Give students 1 minute of quiet think time, and then 1 minute to discuss their observation and question with their partner. Follow with a whole-class discussion.
Student Facing
Here are dot plots that show the ages of people at two different parties. The mean of each distribution is marked with a triangle.
What do you notice and what do you wonder about the distributions in the two dot plots?
Student Response
For access, consult one of our IM Certified Partners.
Activity Synthesis
Invite students to share what they noticed and wondered. Record and display their responses for all to see. If possible, record the relevant reasoning on or near the image. After each response, ask the class if they agree or disagree and to explain alternative ways of thinking, referring back to the dot plots each time. Discuss:
- “Do you think the ages of the people at the first party are alike or different? What about the ages of the people at the second party?”
- “The MAD for both data sets is approximately 10.5 years. What does a MAD of 10.5 years tell us in this context?”
- “Is the MAD a useful description of variability in the first data set? What about in the second data set?”
Two key ideas to uncover here are:
- The MAD is a way to summarize variation from the mean, but the single number does not always tell us how the data are distributed.
- The same MAD could result from very different distributions.
If the key ideas above are not uncovered during discussion, be sure to highlight them.
7.2: The Five-Number Summary (15 minutes)
Activity
This activity introduces students to the five-number summary and the process of identifying the five numbers. Students learn how to partition the data into four sets: using the median to decompose the data into upper and lower halves, and then finding the middle of each half to further decompose it into quarters. They learn that each value that decomposes the data into four parts is called a quartile, and the three quartiles are the first quartile (Q1), second quartile (Q2, or the median), and third quartile (Q3). Together with the minimum and maximum values of the data set, the quartiles provide a five-number summary that can be used to describe a data set without listing or showing each data value.
Students reason abstractly and quantitatively (MP2) as they identify and interpret the quartiles in the context of the situation given.
To allow more time for the synthesis, find the median together with the whole group. Then, assign half of the groups to find the 25th percentile and the other half to find the 75th percentile. After 2 minutes of group work time, ask a group from each half to share their results with the class.
During the synthesis, define and compute the range (\(42 - 7 = 35\)) and interquartile range (\(29 - 10.5 = 18.5\)) of the data.
Launch
Explain to students that they previously summarized variability by finding the MAD, which involves calculating the distance of each data point from the mean and then finding the average of those distances. Explain that we will now explore another way to describe variability and summarize the distribution of data. Instead of measuring how far away data points are from the mean, we will decompose a data set into four equal parts and use the markers that partition the data into quarters to summarize the spread of data.
Remind students that when there is an even number of values, the median is the average of the middle two values.
Arrange students in groups of 2. Give groups 8–10 minutes to complete the activity. Follow with a whole-class discussion.
Student Facing
Here are the ages of the people at one party, listed from least to greatest.
- 7
- 8
- 9
- 10
- 10
- 11
- 12
- 15
- 16
- 20
- 20
- 22
- 23
- 24
- 28
- 30
- 33
- 35
- 38
- 42
-
-
Find the median of the data set and label it “50th percentile.” This splits the data into an upper half and a lower half.
-
Find the middle value of the lower half of the data, without including the median. Label this value “25th percentile.”
-
Find the middle value of the upper half of the data, without including the median. Label this value “75th percentile.”
-
-
You have split the data set into four pieces. Each of the three values that split the data is called a quartile.
- We call the 25th percentile the first quartile. Write “Q1” next to that number.
- The median can be called the second quartile. Write “Q2” next to that number.
- We call the 75th percentile the third quartile. Write “Q3” next to that number.
-
Label the lowest value in the set “minimum” and the greatest value “maximum.”
-
The values you have identified make up the five-number summary for the data set. Record them here.
minimum: _____ Q1: _____ Q2: _____ Q3: _____ maximum: _____
-
The median of this data set is 20. This tells us that half of the people at the party were 20 years old or younger, and the other half were 20 or older. What do each of these other values tell us about the ages of the people at the party?
- the third quartile
- the minimum
- the maximum
Student Response
For access, consult one of our IM Certified Partners.
Student Facing
Are you ready for more?
There was another party where 21 people attended. Here is the five-number summary of their ages.
minimum: 5 Q1: 6 Q2: 27 Q3: 32 maximum: 60
- Do you think this party had more children or fewer children than the earlier one? Explain your reasoning.
- Were there more children or adults at this party? Explain your reasoning.
Student Response
For access, consult one of our IM Certified Partners.
Activity Synthesis
Ask a student to display the data set they have decomposed and labeled, or display the following image for all to see.
Focus the conversation on students' interpretation of the five numbers. Discuss:
- “In this context, what do the minimum and maximum values tell us?” (The ages of the youngest and oldest partygoers.)
- “Why are Q1 called 25th percentile, Q2 50th percentile, and Q3 75th percentile?” (Each quartile tells us how many quarters of the ordered data values are accounted for up to that point. The first quartile tells us that one quarter, or 25 percent, of data values are less than or equal to that value. The second quartile tells us that two quarters, or 50 percent, of data values are less than or equal to that value, and so on.)
- “In this context, what does Q1 (10.5) tell us?” (That a quarter of the partygoers are 10.5 years old or younger.)
- “What does Q3 (29) tell us?” (That three quarters of the partygoers are 29 years old or younger.)
- “How do the five numbers help us to see the distribution of the data?” (It divides the values in the data into sections containing one fourth of the values each. This gives us an idea about the distribution of the data by looking at how varied each section is.)
Supports accessibility for: Visual-spatial processing
Design Principle(s): Support sense-making
7.3: Human Box Plot (15 minutes)
Activity
Previously, students learned to identify the median, quartiles, and five-number summary of data sets. They also calculated the range and interquartile range of distributions. In this activity, students rely on those experiences to make sense of box plots. They explore this new representation of data kinesthetically: by creating a human box plot to represent class data on the lengths of student names, which they collected in the Finding the Middle activity in an earlier lesson.
Launch
Before the lesson, use thin painter’s tape to make a number line on the ground. If the floor is tiled with equal-sized tiles, consider using the tiles for the intervals of the number line. Otherwise, mark off equal intervals on the tape. The number line should cover at least the distance between least data value (the fewest number of letters in a student's name) to the greatest (the most number of letters).
Provide each student with a copy of the data on the lengths of students’ names from the Finding the Middle activity. If any students were absent then, add their names and numbers of letters to the data set.
Give students 4–5 minutes to find the quartiles and write the five-number summary of the data. Then, invite several students to share their findings and come to an agreement on the five numbers. Record and display the summary for all to see.
Explain to students that the five-number summary can be used to make another visual representation of a data set called a box plot. Tell students that they will create a human box plot in a similar fashion as when they were finding the median.
-
Return to students the index cards from the lesson on finding the median. If any students were absent when the cards were made, give them each an index card and ask them to record on the card their full name and the number of letters in their name. If any student who made a card is absent, have another student with the same number of letters in their name hold the card of the absent student.
-
Ask students to stand up, holding their index card in front of them, and place themselves on the point on the number line that corresponds to their number. (Consider asking students to do so without speaking at all.) Students who have the same number of letters should stand one in front of the other.
-
Hold up the index cards that have been labeled with “minimum.” Ask students who should claim the card, then hand the card to the appropriate student. Do the same for the other labels of a five-number summary. If any of the quartiles falls between two students' numbers, write that number of the index card and have both students hold that card together.
Now that the five numbers are identified and each associated with one or more students, use wide painter's tape to construct a box plot.
- Form a rectangle on the ground by affixing the tape around the group of students between Q1 and Q3. If a quartile is between two people, put the tape down between them. If a quartile has the value of a student's number, put the tape down at that value and have the student stand on it.
- Put a tape segment at Q2, from the top side of the rectangle to the bottom side, to subdivide the rectangle into two smaller rectangles. If Q2 is a student's number, have the student stand on the tape.
- For the left whisker, affix the end of tape to the Q1 end of the rectangle; extend it to where the student holding the “minimum” card is standing. Do the same for the right whisker, from Q3 to the maximum.
- Tape the five-number summary cards and students’ cards that correspond to them in the right locations.
This image shows an example of a completed human box plot.
Explain to students that they have made a human box plot. Consider taking a picture of the box plot for reference and discussion later.
Supports accessibility for: Language; Organization
Student Facing
Your teacher will give you the data on the lengths of names of students in your class. Write the five-number summary by finding the data set's minimum, Q1, Q2, Q3, and the maximum.
Pause for additional instructions from your teacher.
Student Response
For access, consult one of our IM Certified Partners.
Activity Synthesis
Tell students that a box plot is a representation of a data set that shows the five-number summary. Discuss:
- “Where can the median be seen in the box plot?” (It is the line inside the box.) What about the first and third quartiles? (The left and right sides of the box.)
- “Where can the IQR be seen in the box plot?” (It is the length of the box.)
- “The two segments of tape on the two ends are called ‘whiskers.’ What do they represent?” (The lower one-fourth of the data and the upper one-fourth of data.)
- “How many people are part of the box, between Q1 and Q3? Approximately what fraction of the data set is that number?” (About half. Note that the number of people that are part of the box may not be exactly one half of the total number of people, depending on whether the number of data points is odd or even, and depending on how the values are distributed.)
- “Why might it be helpful to summarize a data set with a box plot?” (It could help us see how close together or spread out the values are, and where they are concentrated.)
Explain to students that we will draw and analyze box plots in upcoming activities and further explore why they might be useful.
Design Principle(s): Support sense-making; Maximize meta-awareness
7.4: Studying Blinks (15 minutes)
Activity
In the last activity, students constructed a box plot based on the five-number summary of their name length data. In this activity, they learn to draw a box plot and they explore the connections between a dot plot and a box plot of the same data set.
Launch
Tell students that they will now draw a box plot to represent another set of data. For their background information, explain that scientists believe people blink their eyes to keep the surface of the eye moist and also to give the brain a brief rest. On average, people blink between 15 and 20 times a minute; some blink less and others blink much more.
Arrange students in groups of 2. Give 4–5 minutes of quiet work time for the first set of questions and a minute to discuss their work with their partner. Ask students to pause afterwards.
Display the box plot for all to see. Reiterate that a box plot is a way to represent the five-number summary and the overall distribution. Explain:
-
“The left and right sides of the box are drawn at the first and third quartiles (Q1 and Q3).”
-
“A vertical line inside the box is drawn at the median (Q2).”
-
“The two horizontal lines (or 'whiskers') extend from the first quartile to the minimum and from the third quartile to the maximum.”
-
“The height of the box does not give additional information about the data, but should be tall enough to distinguish the box from the whiskers.”
Ask students to now draw a box plot on the same grid, above their dot plot. Give students 4–5 minutes to complete the questions. Follow with a whole-class discussion.
Supports accessibility for: Organization; Attention
Student Facing
Twenty people participated in a study about blinking. The number of times each person blinked while watching a video for one minute was recorded. The data values are shown here, in order from smallest to largest.
- 3
- 6
- 8
- 11
- 11
- 13
- 14
- 14
- 14
- 14
- 16
- 18
- 20
- 20
- 20
- 22
- 24
- 32
- 36
- 51
-
- Use the grid and axis to make a dot plot of this data set.
- Find the median (Q2) and mark its location on the dot plot.
- Find the first quartile (Q1) and the third quartile (Q3). Mark their locations on the dot plot.
-
What are the minimum and maximum values?
-
A box plot can be used to represent the five-number summary graphically. Let’s draw a box plot for the number-of-blinks data. On the grid, above the dot plot:
- Draw a box that extends from the first quartile (Q1) to the third quartile (Q3). Label the quartiles.
- At the median (Q2), draw a vertical line from the top of the box to the bottom of the box. Label the median.
- From the left side of the box (Q1), draw a horizontal line (a whisker) that extends to the minimum of the data set. On the right side of the box (Q3), draw a similar line that extends to the maximum of the data set.
-
You have now created a box plot to represent the number of blinks data. What fraction of the data values are represented by each of these elements of the box plot?
- The left whisker
- The box
- The right whisker
Student Response
For access, consult one of our IM Certified Partners.
Student Facing
Are you ready for more?
Suppose there were some errors in the data set: the smallest value should have been 6 instead of 3, and the largest value should have been 41 instead of 51. Determine if any part of the five-number summary would change. If you think so, describe how it would change. If not, explain how you know.
Student Response
For access, consult one of our IM Certified Partners.
Activity Synthesis
Display the dot plot and the box plot for all to see.
Discuss:
- “How many data values are included in each part of the box plot?” (5 data values in each part.)
- “If you just look at the box plot, can you tell what any of the data values are?” (Only the minimum and the maximum values.)
- “If you just look at the dot plot, can you tell where the median is? Can you tell which values of the data make up the middle half of the data? Can you tell where each quarter of the data values begin and end?” (It is possible to tell, but it is not straightforward; it requires some counting.)
The focus of this activity is on constructing a box plot and understanding its parts, rather than on interpreting it in context. If students seem to have a good grasp of the drawing process and what the parts entail and mean, consider asking them to interpret the plots in the context of the research study. Ask: “Suppose you are the scientist who conducted the research and are writing an article about it. Write 2–3 sentences that summarize your findings, based on your analyses of the dot plot and the box plot.”
Design Principle(s): Maximize meta-awareness; Cultivate conversation
Lesson Synthesis
Lesson Synthesis
In this lesson, we learn about the five-number summary, a way of measuring variability for distributions with the median as a more appropriate measure of center, and another way to graphically represent numerical data using the median and quartiles. Review with students:
- “What are the quartiles for a numerical data set?” (Numbers that show where we split the data up so it is in quarters.)
- “What is the Interquartile range (IQR)? What does it mean?” (The IQR is the difference between the third and first quartile. It is a measure of the variability or spread of the data. It tells us how much “space” the middle half of the data occupies.)
- “How is a box plot made?” (The box is a rectangle with the left side at Q1 and the right side at Q3. The line inside the box is the median. The “whiskers” on the sides extend to the minimum and maximum values of the data set.)
- “What does a box plot tell you about the shape, center, and spread of a distribution?” (The median is the line in the middle, which tells you about the center. The IQR is the width of the box in the middle, which tells you about the spread. You can also tell if the distribution is roughly symmetrical.)
7.5: Cool-down - Boxes and Dots (5 minutes)
Cool-Down
For access, consult one of our IM Certified Partners.
Student Lesson Summary
Student Facing
Earlier we learned that the mean is a measure of the center of a distribution and the MAD is a measure of the variability (or spread) that goes with the mean. There is also a measure of spread that goes with the median. It is called the interquartile range (IQR).
Finding the IQR involves splitting a data set into fourths. Each of the three values that splits the data into fourths is called a quartile.
- The median, or second quartile (Q2), splits the data into two halves.
- The first quartile (Q1) is the middle value of the lower half of the data.
- The third quartile (Q3) is the middle value of the upper half of the data.
For example, here is a data set with 11 values.
12 | 19 | 20 | 21 | 22 | 33 | 34 | 35 | 40 | 40 | 49 |
Q1 | Q2 | Q3 |
- The median is 33.
- The first quartile is 20. It is the median of the numbers less than 33.
- The third quartile 40. It is the median of the numbers greater than 33.
The difference between the maximum and minimum values of a data set is the range. The difference between Q3 and Q1 is the interquartile range (IQR). Because the distance between Q1 and Q3 includes the middle two-fourths of the distribution, the values between those two quartiles are sometimes called the middle half of the data.
The bigger the IQR, the more spread out the middle half of the data values are. The smaller the IQR, the closer together the middle half of the data values are. This is why we can use the IQR as a measure of spread.
A five-number summary can be used to summarize a distribution. It includes the minimum, first quartile, median, third quartile, and maximum of the data set. For the previous example, the five-number summary is 12, 20, 33, 40, and 49. These numbers are marked with diamonds on the dot plot.
Different data sets can have the same five-number summary. For instance, here is another data set with the same minimum, maximum, and quartiles as the previous example.
A box plot represents the five-number summary of a data set.
It shows the first quartile (Q1) and the third quartile (Q3) as the left and right sides of a rectangle or a box. The median (Q2) is shown as a vertical segment inside the box. On the left side, a horizontal line segment—a “whisker”—extends from Q1 to the minimum value. On the right, a whisker extends from Q3 to the maximum value.
The rectangle in the middle represents the middle half of the data. Its width is the IQR. The whiskers represent the bottom quarter and top quarter of the data set.
The box plots for these data sets are shown above the corresponding dot plots.
We can tell from the box plots that, in general, the pugs in the group are lighter than the beagles: the median weight of pugs is 7 kilograms and the median weight of beagles is 10 kilograms. Because the two box plots are on the same scale and the rectangles have similar widths, we can also tell that the IQRs for the two breeds are very similar. This suggests that the variability in the beagle weights is very similar to the variability in the pug weights.