The observed consensus estimates among referees were λ ≈ 0.30 for the Journal of Finance (JF) and Review of Financial Studies (RFS), λ ≈ 0.35 for Econometrica (ECMTA), the Quarterly Journal of Economics (QJE) and the SFS Cavalcade; and λ ≈ 0.40 for the International Economic Review (IER), the Journal of Economic Theory (JET), the Journal of the European Economic Association (JEEA), and the Rand Journal of Economics (Rand).
Roughly, referee reports were one part signal, two parts noise. ...
For economics journals, when two referees are consulted, the top-10p [percentile] paper receives two rejects with probability 14%, one reject and one non-reject with probability 47%, and two non-rejects with probability 40%. With three referees, the top-10p papers receives a majority of reject recommendations with 30% probability, a majority of non-reject recommendations with 70% probability.
For finance journals, with their lower lambdas and higher rejection probabilities, the higher than 50% reject probability for the top-10p paper results in a strange situation: The more referees are consulted, the more likely it is that the referees will agree that the top-10p paper is bad. For this top-10p paper, with one referee, the probability that the majority of referees recommends rejection is 38%; with three referees, it is almost 70%. (This also obviates the idea of using a tie-breaker referee when two referees disagree.) In fact, only the top-2p papers have a conditional probability of rejection that is less than 50%, resulting in a majority rejection probability that does not increase with the number of referees.
--Ivo Welch, "Referee Recommendations," on the randomness my colleagues and I live with and impose on each other