Chi-Square Goodness of Fit Day 1 (Topic 8.2)
Chapter 12 - Day 1
Learning Targets
-
State appropriate hypotheses and compute the expected counts and chi-square test statistic for a chi-square test for goodness of fit.
-
State and check the Random, 10%, and Large Counts conditions for performing a chi-square test for goodness of fit.
-
Calculate the degrees of freedom and P-value for a chi-square test for goodness of fit.
Activity: Which Color M&M is the Most Common?
Stats Medic / Skew the Script Collaboration Lesson: Does Harvard Discriminate Against Asian Applicants?
We start this lesson by telling students that we emailed the company that makes M&Ms asking about the color distribution. The company replied, claiming the following distribution:
​
Brown 13%, Yellow 14%, Orange 20%, Green 16%, Blue 24%, and Red 13%.
​
We are going to take a sample to try and find evidence against this claim. We buy one large bag of M&Ms and tell students to think of this bag as being a random sample of the entire population of M&Ms. We give each student a small handful of candies until the bag is empty, then we collect totals on the front white board. Students will use the class totals for all of their calculations.

Note 1: There are two M&M factories with different distributions. More info here.
Note 2: The color distribution depends on the type of M&M (milk chocolate, almond, etc). More info here.
Why Do We Square (Observed – Expected)?
Sometimes the observed is greater than expected and sometimes it is less. We square this results so that all of our values are positive. We used a similar approach back in Chapter 1, when we calculated standard deviation. This part of the formula explains why the chi-square distribution starts at 0.
Why Do We Divide by Expected?
We use an example to help explain.
​
Scenario 1: The expected number of red M&M’S is 6 and we get 16 red M&M’S.
​
Scenario 2: The expected number of red M&M’S is 500 and we get 510 M&M’S.
​
Which scenario provides more convincing evidence against the company’s claim? In both scenarios, the observed value is 10 away from the expected. But Scenario 1 provides much more convincing evidence. The important idea is how far away the observed count is from the expected count as a fraction of the expected values.
Luke's Lesson Notes
Here is a brief video highlighting some key information to help you prepare to teach this lesson.