This post is devoted to summarize some results obtained in the two tests of overconfidence explained in the previous post: a set of trivia tests devised to estimate individual measures of overestimation (E) and overplacement (P), and a series of questions on interval estimates to obtain an indicator of overprecision (M).

*Trivia tests on overestimation and overplacement*

Participants completed the four trivia in about 15 minutes, instructions included, and there were no incidents in any of the five sessions. The average respondent overestimated her performance in the trivia by 2.9 right answers (out of 40 questions in total) and the bias was persistent in both easy and hard tests. Whereas, the average respondent considered herself below average by -2.7 correct answers, with the bias being mostly attributable to an underplacement in hard tasks. Table 1 summarizes average data in the experiment.

**TABLE 1 – Overestimation and overplacement**

We also observe a strong correlation between both variables E and P. That is, participants that exhibited the highest overestimation tend to consider themselves above average (or, at least, featured a lower underplacement) and vice versa.

Finally, the trivia tests were devised to control for the hard – easy effect. Results were the expected for hard tests (overplacement reduces from -2.4 in hard tests to about zero in easy ones) and suffice for overestimation, which does not increase (suggesting a general bias towards overestimation in our sample). Figure 1 helps to appreciate the effect more clearly.

**FIGURE 1 – The hard – easy effect**

However, things would have been more evident if we did not fail to propose a pair of easy tests that were as easy as we expected. As we may see in Table 1 above, trivia tests T2 and T3 had average (median) correct answers of 2.29 (2.0) and 2.75 (3.0) out of 10 questions (with correct answers attributable only to good luck implying a coefficient of 2.0). In trivia tests T1 and T4, expected to yield correct answers of 7.0 to 8.0 on average, respondents only hit the right answer 5.4 (5.0) and 5.58 (6.0) out of 10 questions on average (median). This would represent a couple of tests of a medium –rather than easy- difficulty for respondents.

*Test on confidence intervals*

Participants completed the six questions on confidence intervals in about 6 to 8 minutes, instructions included. Results show a vast tendency towards overprecision, but we are concerned about the reliability of the estimations obtained at the individual level.

First, judges were significantly overconfident. Aggregate results show a strong tendency to overprecision: the 80% confidence intervals contained the correct answer only 38.3% of the time. As expected, the lowest degree of overprecision corresponds to the domain where participants could draw on personal experience (time to walk). However, they were still overconfident: 80% intervals hit the right answer 62.0% of the time. Using *M* ratios overprecision becomes even more prevalent: more than 90% of the respondents exhibit this bias. Table 2 summarizes the results.

**TABLE 2 – Overprecision**

Although results on aggregate are consistent with empirical literature, we are concerned about the reliability of the estimations obtained at the individual level. First, there is evidence that many participants did not fully understand the instructions. We had several incidents: a respondent that did not complete all three answers per question, responses where minimum and maximum boundaries were swapped, where answers were provided in a different order than required, or where the median estimation was identical to any of the boundaries.

Second, individual estimations of ratio M are highly variable depending on the refinement method to estimate M and whether indicators are estimated as the median or the average of the ratios across domains. In particular, following Soll and Klayman (2004) we compared three alternative refinement methods (the option-by-default *M*, estimator M2 where MAD assumes a symmetric distribution, and a third one that assumes normal distribution), and for each of them we computed mean and median ratios. Results evidence indicators are sensible to the estimation method.

Why this happened? Basically, tests were too simple. When only having two questions per domain, providing an answer to a single question that is close to the true response strongly affects the eventual estimation of M. In future research, having more questions per domain will be essential, but with the restriction of devising a test that is not highly time-consuming for a single indicator.