Reproducibility of peer review in clinical neuroscience. Is agreement between reviewers any greater than would be expected by chance alone?
Rothwell PM., Martyn CN.
We aimed to determine the reproducibility of assessments made by independent reviewers of papers submitted for publication to clinical neuroscience journals and abstracts submitted for presentation at clinical neuroscience conferences. We studied two journals in which manuscripts were routinely assessed by two reviewers, and two conferences in which abstracts were routinely scored by multiple reviewers. Agreement between the reviewers as to whether manuscripts should be accepted, revised or rejected was not significantly greater than that expected by chance [kappa = 0.08, 95% confidence interval (CI) -0.04 to -0.20] for 179 consecutive papers submitted to Journal A, and was poor (kappa = 0.28, 0.12 to 0. 40) for 116 papers submitted to Journal B. However, editors were very much more likely to publish papers when both reviewers recommended acceptance than when they disagreed or recommended rejection (Journal A, odds ratio = 73, 95% CI = 27 to 200; Journal B, 51, 17 to 155). There was little or no agreement between the reviewers as to the priority (low, medium, or high) for publication (Journal A, kappa = -0.12, 95% CI -0.30 to -0.11; Journal B, kappa = 0.27, 0.01 to 0.53). Abstracts submitted for presentation at the conferences were given a score of 1 (poor) to 6 (excellent) by multiple independent reviewers. For each conference, analysis of variance of the scores given to abstracts revealed that differences between individual abstracts accounted for only 10-20% of the total variance of the scores. Thus, although recommendations made by reviewers have considerable influence on the fate of both papers submitted to journals and abstracts submitted to conferences, agreement between reviewers in clinical neuroscience was little greater than would be expected by chance alone.