Kruger & Dunning: Unskilled and Unaware of It – How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments

Well, that’s the news from Lake Wobegon—where all the women are strong, all the men are good-looking, and all the children are above average.

Garrison Keillor, Lake Wobegon monologues, passim

My paper this time comes from the June 1999 edition of the Journal of Personality and Social Psychology. Here is a link to the original article (1.4MB pdf).

It’s a classic—the foundation document for what is now called the Dunning-Kruger Effect. I’ve come to reread it for reasons I’ll get to at the end of this post. It’s also well worth reading because it is beautifully written and contains jokes, which isn’t something you see every day.

The authors have the following argument:

… that the skills that engender competence in a particular domain are often the very same skills necessary to evaluate competence in that domain—one’s own and anyone else’s.

Only if you’re good at grammar can you tell that you’re good at grammar, in other words. Those who are rubbish at grammar lack the necessary metacognitive skill to identify their own shortcomings.

It’s a commonplace observation that most people think they’re above average at most things—that’s Garrison Keillor’s joke at the head of this piece. In fact, this trait of illusory superiority has also been called the Lake Wobegon Effect in Keillor’s honour. If lack of skill and lack of metacognitive skill go together in many areas of human endeavour, it might provide a partial explanation for the Lake Wobegon Effect.

So the authors carried out four studies aimed at testing various aspects of metacognition in the competent and incompetent. One study involved humour, two involved logic, and one involved grammar. In each study, a group of volunteers would complete a task which would be objectively scored. Unaware of the test results, the participants were then asked to rate their performance in the task, and also to say how they thought they had performed relative to the other participants. (The “objective” score in the humour test, in which participants rated the funniness of jokes, was calibrated against a prior rating given by a panel of professional comedians.) The participants’ rating of their own performance was then compared to their objective score and their actual ranking in the test results.

In each study, there was a rough correlation between how well a participant performed, and how well they thought they had performed. But participants who performed in the lowest quartile typically believed themselves to be just above average. Participants with scores in the highest quartile tended to rate themselves slightly lower than their actual performance. One might reasonably expect some effect like this, since those with very low scores have little scope to underrate themselves, and those with very high scores are unable to hugely overestimate their abilities. However, the overestimate by the low-scorers was strikingly large, whereas the high-scoring underestimate was much more modest—that asymmetry alone suggests there’s more than just regression to the mean going on.

Dunning-Kruger Figure 1 (1999)
Kruger & Dunning’s Figure 1 gives a representative impression of the results of all their studies

As a follow-up to the grammar test, Kruger and Dunning invited participants from the bottom and top quartile back to their laboratory, and asked them to grade representative responses from the other participants. Then, in the light of that information, they were asked to regrade their own performance. The poor performers did relatively badly at grading the responses of other participants, and then tended to stick with their previous assessment of their own performance—they failed to notice that others had performed better than they had. Those who had scored in the top quartile were better at grading others and, having seen a sample of the performance of others, they appropriately upgraded their assessment of their own performance in terms of percentile ranking—they realized they had performed better than many of the responses they had just graded.

So the high performers seem to be falling foul of the false consensus effect—the belief that others will perform roughly as well as you do yourself. Once they had seen examples of poor performance in other subjects, they were immediately able to recalibrate their estimate of their own performance. Whereas the poor performers, unable to properly detect the good performance of others, stuck with their original inflated idea of their own performance.

In a second version of the logic test, after the participants had completed the test and their initial self-assessment, Kruger and Dunning offered half the participants a brief training package in formal logic, while the other half completed an “unrelated filler task”. All participants were then asked to reassess their own performance in the test. Those who had received training improved their self-assessment, with poor performers revising their self-assessment appropriately downwards, and good performers revising appropriately upwards. Education gave everyone a better insight into their own performance.

So there’s evidence to support the following hypotheses, which together constitute what’s now known as the Dunning-Kruger Effect:

  • Incompetent individuals dramatically overestimate their own ability and performance
  • Incompetent individuals are less able to recognize competence in others
  • Incompetent individuals are less able to gain insight into their own performance by means of comparison with the performance of others
  • Incompetent individuals can gain insight into their own incompetence by becoming more competent

We also have evidence that:

  • Competent individuals may underestimate their own competence as a result of the false-consensus effect
  • Some Cornell University undergraduates are terrifyingly bad at logic—the lowest quartile in one study scored an average 0.3 logic questions right out of ten.

Kruger and Dunning finish with:

Although we feel we have done a competent job in making a strong case for this analysis, studying it empirically, and drawing out relevant implications, our thesis leaves us with one haunting worry that we cannot vanquish. That worry is that this article may contain faulty logic, methodological errors, or poor communication.

It certainly would be ironic if Kruger and Dunning had fallen victim to their own eponymous effect.

Which brings me to my reason for coming back to reread this paper. The Dunning-Kruger Effect seems now to be achieving the status of an internet meme. As knowledge of its existence grows, it comes up more and more in on-line discussions. Those with expertise often accuse their less-informed and more opinionated interlocutors of suffering from the Dunning-Kruger Effect. But the less-informed, unable to detect expertise in others, seem just as likely to accuse the experts of being victims of Dunning-Kruger. At which point, neither side’s argument is advanced by a jot.

Mike Godwin has identified an effect in internet discussions that is called Godwin’s Law:

As an online discussion grows longer, the probability of a comparison involving Nazis or Hitler approaches 1.

I’d like to propose The Oikofuge’s Law:

As an online discussion grows longer, the probability that the Dunning-Kruger Effect will be invoked approaches 1.

