Here’s a paper I did for my Quantitative and Formal Methods class sometime late in the last century. I hope it sheds a little light on the use of contingency tables and significance testing. Once again, my apologies for the pompous graduate student writing style

John

**Introduction**

Field records indicate the following cases for inhumations and cremations from the cemetery at the protohistoric Zuni village of Hawikku:

**Table 1. Inhumations and cremations by age class, Hawikku, n=938**

Age Class | Inhumation | Cremation |
---|---|---|

Old Adult | 271 | 76 |

Young Adult | 2 | 8 |

Adolescent | 42 | 27 |

Child | 150 | 62 |

Infant | 130 | 17 |

Fetus | 5 | 0 |

Unknown | 66 | 127 |

The problem at hand is to determine whether or not differential burial treatment (inhumations and cremations) of the different age classes is present. Stated differently, is burial treatment dependent on age class?

The above question must be formulated into the null hypothesis:

H0: the distribution of burial treatments (inhumation and cremation) across age class is the same, i.e., they are independent.

The alternative hypothesis then becomes:

H1: the distribution of the burial treatments (inhumation and cremation) across age classes is different.

To analyze this question, the data will be subjected to significance testing. Two tests will be used explicitly: Chi-squared and Kolmogorov-Smirnoff. Other information regarding the data will be available courtesy of the computer program TWOWAY, and is presented without discussion except where it bears on the Chi-squared or Kolmogorov-Smirnoff tests.

**The Data**

Before proceeding, some adjustments must be made to the data. The first problem is the disposition of the unknown age classes. It is clear that the unknown class bucks the trend by being the only class other than Young Adult in which cremations outnumber inhumations. That preservation of cremated remains might not be as good as with inhumations makes sense on an intuitive level. A full 40.1% of the cremations are unknown, against only 9.9% of the inhumations. (See Table 2.) The unknown data, based on size alone, seems too substantial to disregard.

**Table 2. Column Proportions**

Age Class | Inhumation | Cremation |
---|---|---|

Old Adult | 0.407 | 0.240 |

Young Adult | .003 | 0.025 |

Adolescent | 0.063 | 0.084 |

Child | 0.225 | 0.196 |

Infant | 0.195 | 0.054 |

Fetus | 0.008 | 0.000 |

Unknown | 0.099 | 0.401 |

Total | 1.00 | 1.00 |

But this is not simply a numerical problem, though we cannot divorce the problem from manipulation of numbers. Archaeological judgment must also play a role. Separating the unknowns and examining the data might help. Let us begin with a premise.

Experience has shown archaeologists that, as a rule, 30% to 50% of a death population in a context such as Hawikku died at about 12 years or younger. Grouping our populations into two groups, Old & Young Adult/ Adolescent and Child/Infant/Fetus, gives Table 3.

**Table 3. Known age classes grouped for death population examination.**

Age Class | Inhumation | Cremation |
---|---|---|

Adult/Adolescent | 315 (52.5%) | 111 (58.4%) |

Child/Infant/Fetus | 285 (47.5%) | 79 (41.6%) |

Totals | 600 (100%0 | 190 (100%) |

Incidentally, grouped in this manner, the data give a Chi-squared value of 2.036, which with one degree of freedom, would allow acceptance of the null hypothesis with α as large as 0.154. Although not a bad grouping, I use it here in an attempt to evaluate the nature of the known data and to help make a decision on what to do with the unknowns, and will not use it in the significance tests. Grouping the age classes as part of the significance testing is discussed later. The SYSTAT Chi-squared analysis for Table 3 is shown on page 7.

Table 3 shows that the known data passes a real world archaeological check for reliability or representativeness. The question then becomes: what can the inclusion of the unknowns add to the suitability of the known data in the significance testing?

One might make the assumption that the lack of preservation of the unknowns in the cemetery was a random process in regards to age class. In fact, a key assumption in all significance tests is random sampling. We assume this with the unknowns. When assumed for the unknowns, each age class and burial method would have contributed to the unknowns roughly in the same proportion to their total numbers. Reversing this process, the unknowns could be divided up into the rest of the groups in each column based on the percentage of the groups in the column (calculated without the unknowns). This is not useful for the Chi-squared test. It would only increase significance, which is unwarranted and not something we want to do.

Another option might be to keep the unknowns as a separate group. This is an unsatisfactory logical construct. Though it might be useful in some other form of analysis, it would appear to do nothing more in this case than to make unusually high contributions to the calculated Chi-squared value of our observed data from the unknown cells.

My decision is to throw out the unknowns. I conclude:

**1)** Reliable significance testing can be made with the known data.

**2)** The unknown data has little to offer to the significance testing and will not be missed.

This leads to Table 4. The SYSTAT Chi-squared analysis for Table 4 is shown below.

**Table 4. Inhumations and cremations by age class, without unknowns, Hawikku, n=938**

Age Class | Inhumation | Cremation |
---|---|---|

Old Adult | 271 | 76 |

Young Adult | 2 | 8 |

Adolescent | 42 | 27 |

Child | 150 | 62 |

Infant | 130 | 17 |

Fetus | 5 | 0 |

Total | 600 | 190 |

**Consolidation of Age Groups**

SYSTAT warns that significance tests are suspect when more than one-fifth of fitted cells are sparse, that is, less than 5. This is the case in Table 4 so I decided to group rows, or age classes, with low cell frequencies with other age classes in sensible ways. Young Adult can be grouped with Old Adult to form the new group Adult. Infant can be combined with Fetus. This eliminates sparsely filled fitted cells. I justify these consolidations as follows:

1) The chronological proximity of the age classes allows the new groupings without offending logic or sensibilities. (Implicit in this is the assumption that the difference in the treatment of pairs of remains at each end of the age line would not vary substantially between the two age classes at each extreme. Therefore it is only the small sample size or count for Young Adults that causes inhumation/cremation proportions so different from those of Old Adults.)

2) There is no impact on the acceptance or rejection of the null hypothesis. If the new grouping had caused a change in the significance test outcome, then it would have to be determined if the change was due to the new grouping, due to the significance test being no longer “suspect, or some combination of the two.

Table 5 shows the data with the new consolidated age classes.

**Table 5. Inhumations and cremations by grouped age classes, without unknowns, Hawikku, n=790.**

Age Class | Inhumation | Cremation | Row Totals |
---|---|---|---|

Adult | 271 | 84 | 357 |

Adolescent | 42 | 27 | 69 |

Child | 150 | 62 | 212 |

Infant/Fetus | 135 | 17 | 152 |

Column Totals | 600 | 190 | 790 |

Significance testing for the data in Table 5 was done using TWOWAY. Printouts are shown starting on page 9. Significance testing in this case looks at observed differences in the numbers of inhumations and cremations by age class and compares the differences with results that would be expected if the distributions were random. It does so at a chosen level of significance, that is, at a frequency at which a random process would deliver the observed or more extreme counts.

χ2(observed) = 22.55 > χ2 v=3, α=.001=16.266

Based on this result, I reject the null hypothesis, Ho, and accept the alternative hypothesis that the distribution of the burial treatments of inhumation and cremation across age classes is different.

Rejecting Ho when it is, in fact, true is a Type I error. Based on Chi-squared tables and on TWOWAY, it can be said that the likelihood of committing a Type I error in this case is less than 0.1%. Stated another way, we would expect to get the observed cases from a random process less than 0.1% of the time, but it would still happen at a non-zero probability. We have rejected Ho with that probability of having been wrong to do so.

TWOWAY also runs a Monte Carlo simulation as part of its significance testing. One hundred simulations of the table, with cells filled at random, were performed. The Monte Carlo simulation yielded a probability of less than 0.001 of getting a Chi-squared value greater than or equal to the observed value. [My professor, Keith Kintigh, suggested that I’d want to run 10,000 trials or so during the Monte Carlo simulation to get results accurate to 0.001. One hundred trials would probably only give results to about 0.1.]

Since Ho was rejected, there is no chance of committing a Type II error, which is accepting Ho when it is false.

**Kolmogorov-Smirnoff**

The Kolmogorov-Smirnoff test compares the differences between two cumulative distributions. It specifically compares the largest difference with a theoretical value calculated for the chosen level of significance. It requires that observations be divided into at least two mutually exclusive categories and that they should be measured at the ordinal level or above. The cemetery data fit these requirements.

The original counts are converted into percentages of their total category, or column, in this case inhumations and cremations. The percentages are put into a cumulative form as shown below in Table 6, along with the differences between the two burial treatments.

**Table 6. Cumulative percentages of burial treatments by age class, Hawikku.**

Age Class | Inhumation | Cremation | Difference |
---|---|---|---|

Adult | 0.455 | 0.442 | 0.013 |

Adolescent | 0.525 | 0.584 | 0.059 |

Child | 0.775 | 0.910 | 0.135 |

Infant/Fetus | 1.00 | 1.00 | 0.000 |

The largest difference is at the Child age class, and is 0.135. In the Kolmogorov-Smirnoff test, the minimum difference between two cumulative distributions that will be significant is calculated, for α = 0.001, as

1.95 x SQRT[(n1 + n2)/(n1 x n2)]

where n1 is the number of individuals in Sample 1 (inhumations) and n2 is the number of individuals in Sample 2 (cremations). (The value of α was chosen in view of the Chi-squared results, but see below.) For the above cumulative distributions:

1.95 x SQRT[(600+190)/(600 x 190)] = 0.162

This value is greater than our largest observed difference of 0.135, so at this significance level (α = 0.001) we do not have the minimum required to reject Ho. At α = 0.01, the calculated difference is 0.136, so again the null hypothesis would be accepted. For

α = 0.05, the calculated difference is 0.113. This is smaller than the observed difference, so at this level, the null hypothesis would be rejected and the alternative hypothesis would be accepted. The risk of a Type I error would be closer to 1.0% than to 5.0%, but since the theoretically derived multiplication factor that brought the calculated value below the observed value was for α = 0.05, it is customary to consider the Type I error probability from there. Rejecting the null hypothesis again means that there is no chance of a Type II error, or accepting the null hypothesis when it is false.

**Summary**

At this point, it becomes necessary to relate the test results to the archaeological problem at hand. Despite the difference in the significance levels, both the Chi-squared and the Kolmogorov-Smirnoff tests tell me the same thing: the null hypothesis is to be rejected. The distribution of burial treatments (inhumation and cremation) across age classes is not the same. The difference in how the tests are telling us is this: the Kolmogorov-Smirnoff test is telling us that we have a much higher chance of making a type I error than the Chi-squared test is telling us. For the Chi-squared, we reject the null hypothesis for some unspecified α < 0.001, and for the Kolmogorov-Smirnoff test, α = 0.05 (but it was close to calling for rejection of the null hypothesis at α = 0.01) the tests do not eliminate all chance of error. Nor do they tell us about the strength of the relationship, or the way in which two variables are related. The tests do give an indication of whether or not the observed cases were likely to have occurred by chance, or if the variables are somehow related. The test results are not an end in themselves. They are a departure point for inferences about what caused the differences, or why the differences in treatment over age classes exist. From here, the archaeologist must infer those things which the tests did not reveal. And the age class-burial treatment relationship might be impinged upon by other factors, such as individual family practice or tradition. Another possibility might have to do with the abundance or scarcity of fuel for cremation fires. Circumstances of death might play a role in the disposition of remains. Perhaps the burial practice of the village as a whole was consistent, but changed over time from one method to another. The significance tests tested a specific hypothesis. Armed with the results, it is time to modify (and/or expand) the hypothesis and move further on.