Lesson 2
Relative Frequency Tables
 Let’s find relative frequencies of categorical data.
2.1: Notice and Wonder: Teacher Degrees
Several adults in a school building were asked about their highest degree completed and whether they were a teacher.
What do you notice? What do you wonder?
teacher  not a teacher  

associate degree  4%  16% 
bachelor’s degree  52%  64% 
master’s degree or higher  44%  20% 
2.2: City Cat, Country Cat
200 people were asked if they prefer dogs or cats, and whether they live in a rural or urban setting.
The actual values collected from the survey are in the first table.
urban  rural  total  

cat  54  42  96 
dog  80  24  104 
total  134  66  200 
The next table shows what percentage of the 200 total people included are represented by each combination of categories. The segmented bar graph represents the same information graphically.
urban  rural  

cat  27%  21% 
dog  40%  12% 
The next table shows the percentage of each column that had a certain pet preference in a column relative frequency table. The segmented bar graph represents the same information graphically.
urban  rural  

cat  40%  64% 
dog  60%  36% 
The last table shows the percentage of each row that live in a certain area in a row relative frequency table. The segmented bar graph represents the same information graphically.
urban  rural  

cat  56%  44% 
dog  77%  23% 
 For each relative frequency table, select a percentage and explain how numbers from the original table were used to get the percentage.
 What percentage of those surveyed live in an urban area and prefer dogs?
 Among the people surveyed who prefer dogs, what percentage of them live in an urban setting?
 What percentage of people surveyed who live in an urban setting prefer dogs?
 How many of the people responded that they prefer dogs and live in an urban setting?
 Among the people surveyed, are there more people who prefer dogs or cats?
 Your pet food company has access to a billboard in a rural setting. Would you recommend advertising dog food or cat food on this billboard? Which table did you use to make this decision? Explain your reasoning.
2.3: Analyzing a Study With Two Treatments
In an experiment to test the effectiveness of vitamin C on the length of colds, two groups of people with colds are given a pill to take once a day. The pill for one of the groups contains 1,000 mg of vitamin C, while the other group takes a placebo pill. The researchers record the results in a table.
group A  group B  

cold lasts less than a week  16  27 
cold lasts a week or more  17  53 

First, the researchers want to know what percentage (to the nearest whole percent) of people are in each combination of categories. Fourteen percent of all the participants had a cold that lasted less than a week and were in group A. What percentage of all the participants had a cold that lasted less than a week and were in group B? Complete the rest of the relative frequency table with the corresponding percentages.
group A group B cold lasts less than a week 14% (\(\frac{16}{16+27+17+53} \approx 0.14\)) cold lasts a week or more 
Next, the researchers notice that, among participants who had colds that lasted less than a week, 37% were in group A. Among participants who had colds that lasted a week or more, what percentage were in group B? Complete the table with the corresponding percentages.
group A group B cold lasts less than a week 37% (\(\frac{16}{16+27} \approx 0.37\)) cold lasts a week or more 
Finally, the researchers notice that, among the participants in group A, 48% had colds that lasted less than one week. Among the participants in group B, how many had colds that lasted a week or more? Complete the table with the corresponding percentages.
group A group B cold lasts less than a week 48% (\(\frac{16}{16+17} \approx 0.48\)) cold lasts a week or more  To understand the results, the researchers want to know: Among people whose colds lasted less than a week, what percentage are in group B? Explain your reasoning.
 If the researchers believe that vitamin C has a small effect on the length of a cold, which group most likely got the pills containing vitamin C? Explain your reasoning.
A teacher surveyed a group of 25 8th graders and a group of 20 12th graders who indicated they knew a computer programming language. Python and Scratch are programming languages. The results from the 8tgrade survey are displayed in the twoway table.
I know Python the best 
I know Scratch the best 
I know a different programming language the best 

I have been taught a programming language at school 
8 
6 
1 
I have not been taught a programming language at school 
1 
7 
2 
The results from the 12thgrade survey are displayed in the twoway table.
I know Python the best 
I know Scratch the best 
I know a different programming language the best 

I have been taught a programming language at school 
25% 
35% 
0% 
I have not been taught a programming language at school 
30% 
5% 
5% 

Which programming language did a majority of 8th graders surveyed know best?

Which programming language did a majority of 12th graders surveyed know best?

How many of 12th graders surveyed reported that they were taught a programming language at school?

What percentage of 8th graders surveyed reported that they knew Python the best and were not taught a programming language at school?

Why is it difficult to decide if 12th graders or 8th graders use Python more with the way the information is given in the tables?
Summary
Converting twoway tables to relative frequency tables can help reveal patterns in paired categorical variables. Relative frequency tables are created by dividing the value in each cell in a twoway table by the total number of responses in the entire table, or the total responses in a row or a column. Depending on what patterns are important, different types of relative frequency tables are used. To examine how individual combinations of the categorical variables relate to the whole group, divide each value in a twoway table by the total number of responses in the entire table to find the relative frequency.
For example, this twoway table displays the condition of a certain textbook and its price for 120 of the books at a college bookstore.
$10 or less  more than \$10 but less than $30  $30 or more  

new  3  9  27 
used  33  36  12 
A twoway relative frequency table is created by dividing each number in the twoway table by 120, because there are 120 values (\(3 + 9 + 27 + 33 + 36 + 12\)) in this data set. The resulting twoway relative frequency table can be represented using fractions or decimals.
$10 or less  more than \$10 but less than $30  $30 or more  

new  0.025  0.075  0.225 
used  0.275  0.300  0.100 
This twoway relative frequency table allows you to see what proportion of the total is represented by each number in the twoway table. The number 33 in the original twoway table represents the number of used books that also sell for $10 or less, which is 27.5% of all the books in the data set. Using this twoway relative frequency table, you can see that there are very few (2.5%) new books that are also inexpensive and that 10% of the books in the bookstore are both expensive and in used condition.
In other situations, it makes sense to examine row or column proportions in a relative frequency table. For example, to convert the original twoway table to a column relative frequency table using column proportions, divide each value by the sum of the column.
$10 or less  more than \$10 but less than $30  $30 or more  

new  0.08  0.2  0.692 
used  0.917  0.8  0.308 
This shows that about 91.7% (\(\frac{33}{3+33} \approx 0.917\)) of the books that are sold for $10 or less are in used condition. Notice that each column of this column relative frequency table reveals the proportions of the books in each price category that are in each condition and the relative frequencies in each column sum to 1. In particular, this shows that most of the inexpensive and moderately priced books are used, and most of the expensive books are new.
Glossary Entries
 categorical variable
A variable that takes on values which can be divided into groups or categories. For example, color is a categorical variable which can take on the values, red, blue, green, etc.
 relative frequency table
A version of a twoway table in which the value in each cell is divided by the total number of responses in the entire table or by the total number of responses in a row or a column.
The table illustrates the first type for the relationship between the condition of a textbook and its price for 120 of the books at a college bookstore.
$10 or less more than $10 but less than $30 $30 or more new 0.025 0.075 0.225 used 0.275 0.300 0.100  twoway table
A way of organizing data from two categorical variables in order to investigate the association between them.
has a cell phone does not have a cell phone 10–12 years old 25 35 13–15 years old 38 12 16–18 years old 52 8  variable (statistics)
A characteristic of individuals in a population that can take on different values