Why Overall Impact Scores Are Not the Average of Criterion Scores


One of the most common questions that applicants ask after a review is why the overall impact score is not the average of the individual review criterion scores. I’ll try to explain the reasons in this post.

What is the purpose of criterion scores?

Criterion scores assess the relative strengths and weaknesses of an application in each of five core areas. For most applications, the core areas are significance, investigator(s), innovation, approach and environment. The purpose of the scores is to give useful feedback to PIs, especially those whose applications were not discussed by the review group. Because only the assigned reviewers give criterion scores, they cannot be used to calculate a priority score, which requires the vote of all eligible reviewers on the committee.

How do the assigned reviewers determine their overall scores?

The impact score is intended to reflect an assessment of the “likelihood for the project to exert a sustained, powerful influence on the research
field(s) involved.” In determining their preliminary impact scores, assigned reviewers are expected to consider the relative importance of each scored review criterion, along with any additional review criteria (e.g., progress for a renewal), to the likely impact of the proposed research.

The reviewers are specifically instructed not to use the average of the criterion scores as the overall impact score because individual criterion scores may not be of equal importance to the overall impact of the research. For example, an application having more than one strong criterion score but a weak score for a criterion critical to the success of the research may be judged unlikely to have a major scientific impact. Conversely, an application with more than one weak criterion score but an exceptionally strong critical criterion score might be judged to have a significant scientific impact. Moreover, additional review criteria, although not individually scored, may have a substantial effect as they are factored into the overall impact score.

How is the final overall score calculated?

The final impact score is the average of the impact scores from all eligible reviewers multiplied by 10 and then rounded to the nearest whole number. Reviewers base their impact scores on the presentations of the assigned reviewers and the discussion involving all reviewers. The basis for the final score should be apparent from the resume and summary of discussion, which is prepared by the scientific review officer following the review.

Why might an impact score be inconsistent with the critiques?

Sometimes, issues brought up during the discussion will result in a reviewer giving a final score that is different from his/her preliminary score. If this occurs, reviewers are expected to revise their critiques and criterion scores to reflect such changes. Nevertheless, an applicant should refer to the resume and summary of discussion for any indication that the committee’s discussion might have changed the evaluation even though the criterion scores and reviewer’s narrative may not have been updated. Recognizing the importance of this section to the interpretation of the overall summary statement, NIH has developed a set of guidelines to assist review staff in writing the resume and summary of discussion, and implementation is under way.

If you have related questions, see the Enhancing Peer Review Frequently Asked Questions.

Editor’s Note: In the third section, we deleted “up” for clarity.

10 Replies to “Why Overall Impact Scores Are Not the Average of Criterion Scores”

  1. If the Overall Impact Score is the important number in determining the merit of a proposal, why isn’t it made available to the PI? One can make a case for dispensing with all the other scores because they apparently don’t “count.’ If this is the important one, PIs are in the dark without it.

    In addition, the last comment about inconsistencies between the impact scores and the critiques is the major problem with the NIH grant review process. There is little that is rational about the process and reviewers can do anything they want to.

    1. Anonymous above- the overall score IS given to the applicant- just in the form of the average from the panel members voting. The summary will tell you if people agreed or if there was disagreement, this should give the PI an idea of the range around their average impact score. Since the each designated reviewer’s score does not accurately reflect 33% of the voting, but only one part out of 20-30, there is no purpose to giving those three scores. Perhaps NIH might just consider giving the range of all scores from the panel. For example, if the range was 2-4 or 3 or 1-5, this could give some idea of the consensus of the panel. But the summary is supposed to do that.

      1. Actually each designated reviewer’s score probably does represent close to a third of the overall impact score since the study section is going off of their comments/review of the application. In fact, the primary reviewer and their critique is probably the biggest deciding factor in the overall score. Thus, it would be beneficial to know what the individual overall scores are from each of the three assigned reviewers. It would help applicants know whether the reviews were highly disparate, which would suggest a revision may be warranted to address the one bad apple; or if they were overwhelmingly negative, which would allow the applicant to consider not revising a proposal. With the loss of the opportunity to send in a second revision, the horrible state of funding, and lack of continuity in study section members from one review period to another, the odds are so stacked against having a project funded. It seems that giving applicants more insight/transparency into the review process by providing the most detailed information possible is only fair.

        1. I think the “one bad apple/score” idea is not really accurate in the big picture. The best way to address your concern is for the summary of discussion to accurately reflect the level of consensus of the panel. There are three preliminary scores and then the entire panel votes, usually in the range of the three final scores after discussion. So you can have up to approximately 30 votes that are averaged. If you get a 2.2 from the panel, how does it help to know that the primary gave it a 5 and the other reviewers gave it a 2, because from the average you know that the panel mostly voted with the 2s. Similarly, if you get a 3,4,5, and the average is a 4, the panel is spread among those voters. Keep in mind that a reviewer that gives a negative review has to convince the rest of the panel. If the primary reviewer is negative and they are countered by the secondary reviewer, the panel will decide for themselves. Additionally, there is no simple pleasing of a negative review and then getting funded on resubmission- you cannot assume you will have the same reviewers at all. The grant must stand on its own. I do think that the NIH should consider allowing A2s if an A1 grant scores in the 10-25 percentile range and is not funded. I just see no reason why those grants need to change by ~50%, they are clearly top grants.

  2. In today’s world, a negative review kills a grant and trumps any positive review; certainly not the other way around. The negative comments may be accurate, possibly addressable, or even baseless. For the latter, there is no recourse and the loss of an A2 option ensures that. If an A1 grant receives overall scores of 2 or 3, it is by NIH definition a high impact grant, but certainly not fundable. Providing applicants the individual overall impact scores only provides more information regarding how the individual reviewers viewed and weighed the strengths and weaknesses of an application. Since the criterion scores do not equal the overall impact score, there is no way of knowing the general perception of a grant. What is wrong with providing that information?

  3. I agree with anonymous. I am privy to a grant where on a A1 proposal, one reviewer’s strongly opinionated review carried the day. Even some of the comments were factually incorrect and tainted with personal bias. The chair was new and the summary statement was not helpful to find out the discrepancy between two of the reviewers (gave 1 and 2s) whereas the “one bad apple” provided 5 and 6.

    Given the funding climate and constant changing of the reviewers, the applicants appears to have no remedy if the resubmission gets strong negative comments even after addressing all concerns raised at the A0 level.

  4. In general agreement with the puzzlement over why the NIH cannot reveal the impact scores to the applicant….. this could be especially important to proposals that are not discussed. These days, the arithematical mean is used and few if any grants are called back by reviewers, so it is entirely possible that one extremely bad score from a reviewer could sink the ship. Therefore, it would be extremely useful to the applicant if the impact scores were revealed.

  5. What is NIGMS policy with regard to the responsible conduct of research?
    Can an “unacceptable” score in the responsible conduct of research disqualify an application from being discussed by the review committee and given an overall impact score?

    Many thanks for your time.

  6. A section describing the proposed training in the Responsible Conduct of Research is required in all applications for training grants and fellowships. Reviewers are instructed to give their opinion on the acceptability of the proposed training, but not to include it when determining their overall impact score. The review panel uses the average preliminary overall impact score of the assigned reviewers when deciding to discuss or not discuss an application. Therefore, an unacceptable rating for the Responsible Conduct of Research alone would not prevent an application from being discussed by a review panel.

Submit a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.