Saturday, July 7, 2018

The Individual vs. the Group

Whenever you use statistical averages to speak about individuals, you run the risk of obscuring more than you reveal. When the data is all over the place, the average can be a meaningless number, and even when there is a normal distribution the outliers may violate all your conclusions.

When it comes to human psychology, things are even worse. This new study found that not only did the averaging statistics hide a lot of variation between individuals, it also masked a lot of variation within individuals, that is, in how they tested from one day to the next.

This study was about depression and anxiety, and it was supposed to measure things like how strongly the two are correlated. What it found was that the standard deviation for individuals, from one day to the next, was eight times greater than the standard deviation for the data set as a whole.

The mean values may still be useful for something, but doing those statistics actually obscures the most fascinating finding of the study, which is about variation: not only do our moods vary a lot, but the correlations between different parts of our moods vary a lot, too. For example, the study tried to determine if brooding is correlated with depression, and the answer was that if you average the data from all 1043 participants, yes, but for you it will depend on what day it is.

For describing human societies, averages are essential, but for getting to know any particular person they are worse than useless. Even the average of one person's behavior may not tell you anything about how he or she will act on any given day.


G. Verloren said...

The word "average" comes from the Arabic عوار, "awār", being a defect, or something damaged.

Our usage and conception of "average" comes from medieval merchants trading on the Mediterranean, and running the risks of damage or loss to cargo due to storms, piracy, et cetera. If a ship lost a portion of its cargo due to mishap, the effect it had on individual fortunes could be wildly uneven based on chance.

For example, a wave might sweep a bunch of crates overboard, and if almost all of the lost crates just happened to belong to a single merchant, that could ruin their entire career while leaving all the other merchants with cargo on that ship relatively untouched, their crates having been secured elsewhere by chance.

More controversially, if a crew had to actively jettison cargo to lighten the ship for various reasons, you ran into the problem of no one wanting THEIR cargo to be the product that gets chucked overboard, and thus you risk the entire ship because no one wants to be the one ruined while their fellow merchants lose nothing.

Thus, the practice of insuring against "awār", or damages, such that if the cargo of a single merchant was lost, the cost was spread evenly among all the other merchants with cargo on that ship. They all collectively faced "average" losses.

As you say, John - averages are essential for describing human socieites. But equally essential is remembering that averages are a human invention, and not a natural law. They are merely a tool - and like all tools, they have uses to which they are suited, and uses to which they are not.

For reasons of insurance, where the concept was invented, averages are great. But when it comes to understanding things like psychology? Well... that's another story entirely.

G. Verloren said...

There's also the old simple examples of the inherent flaw of averages.

If you have 100 apples to distribute among 10 people, and you give 1 apple to the first nine people, and then give the final tenth person 91 apples, then the average number of apples per person will be 10 - despite one person having 91x the resources of any other person.

Shadow said...

Yes, but we have always known the pitfalls of attempting to predict individual (as in a specific individual and not some generic individual) behavior from a statistical analysis of a population. Is this new or news?

For example, statistics can reliably predict that a certain type of individual will buy your product after seeing your advertisement on television. But that is not the same thing as predicting a specific individual, say someone named John Smith who has these traits, will purchase your product. That it can't do, and we can get ourselves in trouble when we try to do this. And, as you point out, the greater the variability or standard deviation within the population, the more trouble we can get into.

Examples: Telling a particular student that given his or her background and IQ score, she or he should lower her career aspirations. Or the FBI arresting a specific individual based almost solely on profiling (fancy word for statistical analysis). FBI has had to wipe egg off its face more than once because of this.

Unknown said...

@G.: fascinating story about awār. Where did you learn this? I might be able to use it in class. Also, how are you able to get Arabic transliteration symbols ("ā") into blogger? Did you just cut and paste (from, say, Word), or is there a more sophisticated system to use?

@Shadow: it seems to me the story is "new or news" in at least the sense that many folks use statistical averages in non- or semi-rigorous discussion to make arguments about human behavior, especially when those statistical averages seem to prove points they already want to make for other reasons. No?

@John: I wonder how useful, let alone "essential," statistical averages are for a discussion of society. My own impression is that human social events are as much, perhaps more, the story of variation among and within individuals as they are the story of aggregates and averages. Yes, those variations can produce events with a rough direction or pattern. But ignoring the variations to talk about the pattern can produce an entirely false impression of historical development and experience. Part of what one must ask is how all that individual variation produces the pattern.

I suppose one can do history a la Jared Diamond, in which all events are environment/biochemical, and humans are simply (because "on average") advantage-maximizing ciphers (even if that, in the style of the "tragedy of the commons," can lead to disaster). I admit his method is powerful and that he is able to discover important truths. But it's not for me.

John said...

@David: a good example of how important mental health statistics can be for society is the effect of lead, which is extremely variable but in the aggregate probably a disaster. Obviously not everyone exposed to high levels as a child becomes a criminal, but it does seem that more do.

Unknown said...

@John: Yes, though the example you cite is essentially biochemical, no? Human biochemistry is complex and subtle in ways we are just beginning to understand, and there is plenty of variation from the average, while the average of humans as a whole will still be important. And I suppose someday we may be able to have a cogent biochemical model of say, the First Crusade. But you seem to have moved the discussion away from more obvious forms of historical-social analysis.

G. Verloren said...


Wikipedia has a number of -very- well sourced pages unified under the entry "List of English words of Arabic origin". I don't know that's it's an absolutely comprehensive list, but it's quite in-depth, and utterly fascinating to me.

As for special characters in Blogger, I typically just copy and paste as needed, often dropping it into an address bar or search bar first (or into Notepad) to remove any weird encoding, then recopying and repasting.

That said, it appears that Blogger does allow for special characters through simple standard Alt Key Codes. For example, entering Alt + 0195 (with NumLock on, and using the keyboard number pad) produces the letter "Ã". If you memorize the codes, you can use them as needed. I've never bothered to memorize them myself, so I always have to pull up a code list, which makes it take more time than it's worth. Copying and pasting is just quicker and easier - especially when including Arabic!

Shadow said...

@Shadow: it seems to me the story is "new or news" in at least the sense that many folks use statistical averages in non- or semi-rigorous discussion to make arguments about human behavior, especially when those statistical averages seem to prove points they already want to make for other reasons. No?


Sure. I meant this shouldn't be news to science or statistics.