Thursday, June 9, 2011

Unconscious Bias and Stephen J. Gould

In The Mismeasure of Man, Stephen J. Gould took on the work of 19th-century anatomist Samuel Morton. Morton's most famous contribution to science was his attempts to measure the volumes of human brain cases using seeds or lead shot, with the idea that this would have some bearing on the intelligence of different individuals, communities, and races. Gould argued that Morton fudged his numbers to prove that Caucasians had the biggest brains.

Gould accused Morton of failing to be objective because he wanted too badly to find a certain result. Now a new and quite convincing paper, by Jason Lewis and others, argues that Gould was doing exactly the same thing. His analysis, it seems, was no better than Morton's:

Gould also performed his own analysis of Morton's cranial capacity data and came to the conclusion that “there are no differences to speak of among Morton's races” ([1], italics in original). For Morton's 1839 seed-based measurements, Gould claims that Morton's Native American average capacity is artificially depressed by his inappropriate use of a straight mean (taking the average of each individual specimen in the entire sample) rather than a grouped mean (first taking the average of each Native American population subsample, then calculating the mean of those means), since the former is sensitive to differences in sample sizes between “large headed” populations and “small headed” populations. In fact, the grouped mean for Morton's Native American dataset is 79.9 in3, almost identical to the straight mean of 80.2 in3 (Dataset S3). So Morton's use of a straight mean actually slightly increased his Native American average. Gould's calculation of a higher Native American average (83.8 in3) is entirely a function of Gould omitting 34 crania (of 144) as coming from populations with samples of n<4 and, even by that criterion, erroneously excluding 6 crania, all with small cranial capacities (Dataset S3).

Gould's reanalysis of Morton's 1849 shot-based data resulted in a Native American mean capacity of 86 in3 rather than Morton's original 79 in3 [1]. Gould obtained his new average by again taking the group mean of Native American populations with four or more crania. But Gould also applied an additional restriction: he only included Native American crania that Morton had also previously measured with seed. This restriction is entirely arbitrary on Gould's part, as Morton's publications and analyses for his seed- and shot-based measurements are completely separate (1839 versus 1849), and Gould did not apply this restriction to the other groups he reanalyzed in Morton's shot-based data. If this restriction is lifted, Gould's Native American average would be reduced to about 83 in3, considerably below his reported 86 in3 (Dataset S3).

Overall, Gould concludes that his reanalysis of Morton's shot-based data produces the “remarkable” result that there are no notable differences in mean cranial capacity between Morton's groups, with Caucasians firmly mid-pack at 85 in3 and the overall range being 83 to 86 in3. However, Gould's Caucasian figure was in error and should really be 87 in3 rather than 85 in3. And even accepting Gould's inflated mean for Native Americans of 86 in3, the overall rank order of Gould's results (whites/Native Americans/“Mongolians” and “Malays”/blacks) is then actually closer to Morton's presumed a priori bias than were Morton's own results (whites/“Malays”/blacks/“Mongolians”/Nat​iveAmericans).

Gould's attempt to play "gotcha" here seems to have failed. Morton was not perfect, but neither was Gould. The statistical analysis of small samples that can be broken up in different ways is very difficult and often involves judgment calls about what should or should not be included. Morton wanted to establish racism on a scientific basis. Gould wanted to use science to fight racism. Both seem to have shaded their statistics toward their political goals, probably without ever consciously altering the data. It is very hard to be objective, which is why medical science now puts so much emphasis on double-blind studies, and why forensic science should move in the same direction.

But, you know, one reason that the statistics for comparing human populations are so complex is that the variation within populations is much greater than the variation between them. In every population there are people with big heads and people with small heads. Comparing the average size of American heads to the average size of Malaysian heads is a strange thing to do, when both populations include people of normal intelligence with heads much bigger or smaller than the human average. Gould would have been on firmer ground if he had just asked why Morton was so focused on the small difference in population averages rather than the great sweep of human variation.

No comments:

Post a Comment