BIO 360 Essentials of
c 2 testsNOTE: This information is meant to supplement your notes from lecture. You should be comfortable with all the terms in boldface type.
1. These tests are suitable for frequency data only, i.e., data consisting of integer counts of the number of times that an observed outcome falls into a particular category.
2. In all cases, the test judges how similar your observed values (i.e., the count data you collected) are to a corresponding set of expected values that embody a particular null hypothesis.
3. We have shown you two types of test so far.
a. Goodness-of-fit tests. Here the expected frequencies are derived from some extrinsic hypothesis that you choose. Many different null hypotheses are possible, so yours has to be defensible. Every data point falls into one category of a single classification.
b. Contingency tables. Here the data are cross-classified according to two (or more) sets of categories. Expected values are always derived from the same intrinsic hypothesis, i.e., the null hypothesis is always one of independence, and expected values are computed from the table's marginal values without reference to any other information.
4. The c 2 statistic is a standardized measure of how closely the observed values match the expected values. If O and E are close, c 2 is small, and you retain the null hypothesis because it closely agrees with the data. If O and E are very different, c 2 is large and you reject the null hypothesis because the data disagree with it.
5. We measure the "largeness" of c 2 by calculating the probability of obtaining our observed values IF the null hypothesis were true. By chance, we could obtain any set of observed values from any null hypothesis, but some sets of observations are so unlikely that we decide to reject the null. If the probability (P) of finding our observations is less than 5%, we say that our data deviate significantly from the null and therefore reject the null. If P > 0.05, we retain the null hypothesis and say the data are not significant.
5. In each case, we've started with the simplest example with the smallest number of data categories (two categories for goodness-of-fit tests or a 2 x 2 contingency table). In these cases, the critical value for c 2 is 3.841. If the calculated c 2 statistic is greater than this cutoff, P is less than 0.05 and the data are significant. Both types of test can be expanded to handle larger numbers of categories, but the determination of the critical value then requires reference to a table.
6. c 2 tests become unreliable if expected values (not observed values) are too small. Some authors insist that expected values exceed 5; in this course, we will use 3 as our limit. If you go further in research, you should know that c 2 has some additional shortcomings and that better alternatives are available (some of these are so-called "exact" tests that require high-speed computing). In the literature, you will often see frequency data analyzed by "G-tests". G is an alternative statistic that's virtually identical to c 2, and you can interpret these tests accordingly.