3.1: Study Selection (5 minutes)
The mathematical purpose of this activity is to understand the importance of random selection when selecting a sample. Students evaluate 4 methods for selecting a group of people to participate in a survey to determine the benefits and drawbacks of each.
Display the four methods for all to see.
A reporter wants to know how people feel about the governor of her state. She decides to ask 100 people their opinions and thinks of several ways to ask the 100 people. For each method, explain the benefits and drawbacks, then choose the method for selecting 100 people that would best represent the people of the state.
- Go to the capital city and find 100 people interested in politics to respond to the survey.
- Ask the 100 most politically influential people in the state to respond to the survey.
- Obtain census data for the state and select 100 people from the list to survey using a random process.
- Ask 50 registered voters who voted for the governor and 50 registered voters who did not vote for the governor to respond to the survey.
The goal of this discussion is for students to understand the importance of random selection when selecting a sample. Ask students, “What are the benefits and drawbacks of each method?” and record their thinking for all to see. During the discussion, ask students to explain the meaning of any terminology they use, such as random selection or sample. Also, press students on unsubstantiated claims.
- “Which method is most likely to represent the people of the state?” (With random selection, the selection of the people of the state who participate in the survey is due to chance rather than other factors.)
- “What group is being sampled in each method?” (The first method samples only people interested in politics in the capital. The second method samples only the 100 most politically influential people. The third method samples from the state’s population who took part in the census. The fourth method samples registered voters.)
- “Why is random selection important when selecting a sample?” (Random selection is important so that the sample better represents the whole population.)
3.2: Hip Hop Memory (10 minutes)
The mathematical purpose of this activity is to understand the importance of randomness in dividing a sample into groups for an experimental study. Students classify the study, determine which method is best for splitting subjects into two groups to compare, and discuss the importance of randomness when assigning groups.
Conduct a short demonstration similar to the activity students will complete in this task.
Divide the class into 2 halves based on where students are in the room. Tell students that half the class will try to memorize as many words as they can in a list in silence and the other half of the class will try to memorize as many words as they can in a list while students are clapping.
For the first half of the class, tell students to be silent and to put down their pencils. Display the word list for 3 seconds:
After 3 seconds, ask students to write all the words they remember. Display the word list again and ask students to count the number of words they recalled correctly. Record the number of words each student recalled correctly for the first half of the class. Display the data for all to see.
For the second half of the class, ask the first half of the class to start clapping. Display the words for 3 seconds:
After 3 seconds, ask students to write all the words they remember. Display the word list again and ask students to count the number of words they recalled correctly. Record the data for the second half of the class.
Compare the data for each group. Ask, “Is there evidence to suggest one of the conditions made the word recall easier?” (The mean for the silent group is slightly greater, but only by a little bit, so I’m not sure if it’s enough to conclude that it matters.) Students will explore how to analyze data from experiments in a future lesson.
Ask students how the experiment could be improved. (Using a random process to divide the class rather than where people are sitting. A better experiment would have used the same list of words and separated the two groups to make sure that all the conditions were the same except for the clapping.)
Arrange students in groups of 2. After quiet work time, ask students to compare their responses to their partner’s and decide if they are both correct, even if they are different. Follow with a whole-class discussion.
Supports accessibility for: Visual-spatial processing; Conceptual processing; Organization
A research group interested in comparing the effect of different types of music on short-term memory gathers 200 volunteers for a study. One group will listen to a hip hop music playlist while trying to memorize a list of 20 words. A second group will listen to a playlist of orchestral music while trying to memorize the list of 20 words. After a break, the number of words recalled correctly by each individual is measured and the results for the two groups are compared.
- Is this an experimental study or an observational study? Explain your reasoning.
- Which group do you hypothesize will recall more words? Explain your reasoning.
- Here are some options for splitting the volunteers into groups. Which method will best address the intention of the study? Explain your reasoning.
- Divide groups based on their preferred music style.
- Divide groups based on their age. The youngest 100 listen to hip hop music, and the older 100 listen to orchestral music.
- Divide groups based on the order in which they come in to do the study. The first 100 listen to hip hop music, and the next 100 listen to orchestral music.
- Write all the volunteer names on slips of paper, put them in a jar and shake it, then draw out 100 slips. These will listen to the hip hop playlist while the others listen to orchestral music.
Some students may not understand why the method for dividing the subjects into groups matters. Ask students to think of variables other than music that could influence the results. Then, ask students how those variables might accidentally show up in the different groups.
The mathematical purpose of this activity is to understand the importance of randomness in dividing a sample into groups for an experimental study. Here are some questions for discussion:
- “What are the 2 variables being studied in this experimental study?” (The type of music and the ability to memorize words.)
- “Why are the first three methods not the best way to divide people for the study?” (The first method may have additional variables that come in to play. For example, the study may then be testing whether listening to your preferred music is helpful or not rather than the type of music itself. The second method adds an issue with age. For example, younger people may be better able to memorize words than older people, so the results would not be due entirely to the type of music. The third method may seem like a way to divide people that does not introduce other variables, but the variables could be hidden. For example, a group of people who are very good at memorizing words may all come in at once and be put into the same group.)
- “Why is the last method the best choice for this study?” (It is the best choice because all the other options introduce the possibility that the results will not actually measure the variables intended to be measured in the study. By assigning the two groups using random selection, any difference between the groups happens by chance rather than by design. This is often the best we can do in an experimental study.)
Design Principle(s): Optimize output (for explanation); Cultivate conversation
3.3: Random Rectangles (15 minutes)
The mathematical purpose of this activity is to investigate how finding the mean using different sampling methods relates to the mean for the whole population. Students should recognize the importance of using a true random process for selecting a sample rather than having a person attempt to select items randomly. Students are asked to select items for a sample using several methods that may seem random and one that is truly random, then compare the mean from each sample to the true mean.
Supports accessibility for: Organization; Attention; Social-emotional skills
A company offers solar power systems made up of 1 square meter cells arranged into rectangles. They use the designs for their first 100 customers to list the ways people arrange the cells. They are interested in investigating this question: “What is the mean area of the rectangles created by our customers?”
- Collect a sample of 5 rectangles using the methods here.
- Look quickly at the chart and select 5 rectangles by their numbers. Record the numbers of the rectangles you choose.
- Select a number between 1 and 95. Use that number and the next 4 numbers for another sample of 5 rectangles. For example, if you select 8, then you would use rectangles 8, 9, 10, 11, and 12.
- Look closer at the rectangles and choose your 5 favorite. Record the numbers of the rectangles you choose.
- Use a random number generator to select 5 numbers between 1 and 100.
- For each method, find the mean area of the rectangles in the sample.
- Which method do you think is best for estimating the mean area for the entire population? Explain your reasoning.
Are you ready for more?
How does a computer that runs predetermined instructions actually generate a “random number”? One way would be to try to connect the computer to something in nature that we consider random (like a number cube roll). This is doable, but generally not efficient, and the results cannot be replicated, so many computer programs use what is called pseudo-random number generation. Essentially, they create lists of numbers that “seem” random and for many purposes, that is sufficient.
Here is a version of one such method. Start with some number \(s\) (called the seed). To get the next number on the list, multiply the previous number by 6. If our new number is greater than 13, then divide by 13 and take the remainder. For example, if \(s=1\) our list of numbers is \(1, 6, 10, 8, 9, 2, 12, 7, 3, 5, 4, 11, 1, \dots\). Once we get back to our seed, our list will repeat.
- What would our list be if we start with the seed \(s=2\)? How does this relate to the list we had with \(s=1\)?
- What would our list be if we started with the seed \(s=1\), but instead of multiplying by 6 each time, we multiplied by 7 each time? Why does this list not seem as “random”?
The goal of this discussion is for students to understand the importance of random selection when obtaining samples. Collect all the means from the whole group for each of the methods and show a dot plot for each method. Display the four dot plots for all to see. Select several students to share which method they think is best for estimating the mean area for all 100 rectangles as well as their reasoning. Tell the whole group that the actual population mean is 7.4 square meters.
Here are some questions for discussion:
- “Which method do you think was the least useful for estimating the mean area for the entire population?” (I think that the least helpful was the method in which I chose my favorite rectangles because I liked the big square ones the best.)
- “When we displayed the data for the entire class, what results were most interesting or surprising to you?” (I thought that the distribution of the dot plots was really interesting. It was surprising that the dot plot representing the consecutive numbers method looked so similar to the dot plot representing the random number generator method.)
- “Which of these studies do you think would be the most likely to have similar results if it was repeated?” (I think that the method using the random number generator would most likely be similar because it used random selection. I think that the other ones could be similar, but they all depend on what each of us choose. If we choose differently, then the results might be different, especially since we have already done this activity.)
- “What population are the samples in this activity being drawn from?” (They are drawn from the designs used by the first 100 customers.)
Design Principle(s): Support sense-making
Here are some questions for discussion.
- “Why is it important to use random selection when selecting a sample to study?” (Using a random selection to get samples means that every item in the population has an equal chance of being selected for the sample.)
- “What is an example of selecting a sample without using random selection? What are the drawbacks of using a sample selected this way?” (If I wanted to know who people are going to vote for 11th-grade student council, I might just sample my friends. The drawback is that my friends might not represent the whole 11th grade and they might be more likely to vote for the same person as me since they are my friends.)
Display this statement for all to see: “Kiran has a group of 50 people that he is going to study in an experimental study and he needs them to sign a form to participate. He creates two groups of 25 people. The first group consists of the first 25 people who turned in the signed form. The second group consists of the other 25 people who signed the form.”
- “Are there any potential problems with Kiran’s study? Explain your reasoning.” (Yes, Kiran went through the effort of getting 50 participants, but when he made his groups, he did not choose them randomly. This could be a problem because maybe the first 25 people to sign the form have a different personality or are different in some other way from the other 25 people. By using random selection, Kiran could have avoided this issue.)
- “Why is it important to randomly assign people to groups in an experimental study?” (Using a random process to create the groups helps reduce the likelihood of grouping subjects into groups that may differ on some characteristic that is related to the response of interest.)
3.4: Cool-down - Why Random? (5 minutes)
Student Lesson Summary
A statistical study begins with a research question, which describes what you want to know clearly and simply. Most research questions are about a population, like a particular group of people, animals, or things. It is often not feasible to collect data from every individual in the population.
For example, a quality control engineer at a factory that makes snack-sized bags of trail mix wants to know if the bags of trail mix produced on a certain day contain the right amount of pretzels. Imagine a conveyor belt moving thousands of bags of trail mix through the process of mixing the ingredients, seasoning them, and packaging them into bags. How would they know if the bags today contained too many or too few pretzels?
Do they have to count the pretzels in every bag of trail mix that is produced? Of course not—that wouldn’t be practical. Also, they wouldn’t want to open every bag, because then they wouldn’t be able to sell them! What do they do instead? They select a sample of bags of trail mix from that day’s production and count the pretzels.
To get information about a characteristic of a population, people often measure that characteristic on a sample of individuals chosen from a population of interest. The idea is to draw conclusions about the population based on data collected from only the sample. To correctly generalize from the sample to the population, the researcher needs to know that the sample is representative of the population as a whole.
Suppose the engineer counted the pretzels only in the last 25 bags of trail mix that were produced that day, and found that they contained too many pretzels. Should they conclude that all the bags of trail mix produced that day contained too many pretzels? Not necessarily. Something might have happened late in the day that affected the number of pretzels in the bags. The last 25 bags of trail mix may not be a representative sample from the population.
So how do we get a representative sample? The best way is to let chance select the sample. For example, you might randomly select 25 different times throughout the day to remove the next bag of trail mix from the conveyor belt and count its pretzels. Using a process based on chance, in which each individual in the population is equally likely to be selected, is called random selection of the sample.
In experimental studies, it is often necessary to assign the individual participants in the sample to one or more groups. It is also best to assign individuals to groups using a random process.
For example, say that you were studying the effect of students turning off electronic devices while doing homework. After a representative sample is selected, you need to assign the individuals in the sample to two groups: one group makes no changes to the conditions by which they normally do homework, and another group that turns off electronic devices while doing homework for the duration of the study. Examples of assignment processes that are not random include:
- Assigning students whose names start with A–L to one group and M–Z to the other group
- Assigning students who play a musical instrument to one group and the rest to the other group
- Asking for volunteers to be part of the group that turns off electronic devices
In order to assign individuals randomly to groups, every individual must have an equal chance of being assigned to either group. Examples of assignment processes that are random include:
- Writing each participant’s name on a slip of paper and mixing the slips well in a bag. Drawing half of the names from the bag and assigning these participants to one group, and the rest to the other group.
- Flipping a coin for each participant, and placing them in one group if the result is heads and the other group if the result is tails.
- Getting a list of participants and numbering the list. Using a random number generator to select participants for one group.
When subjects are not assigned to experimental groups using a random process, other factors may influence the results from the experimental study so that the data does not answer the initial question. In this example, if the groups are split by volunteering, the impact of turning off the devices may be impacted by similar traits by the subjects who volunteer, such as their not using electronic devices much already or having a personality that is willing to volunteer to try something new. These traits may influence the results so the data from the experimental study does not accurately address the question about the impact of electronic devices on student homework.