“Qui tacet consentit.” (Translation: Silence means agreement)
-Cicero Ovid Seneca
Dispersion, agreement, or variance within rater groups is important to measure and report back to participants in 360-degree feedback reports because current research suggests only moderate correlation within rater scores (Nowack, 2009). At Envisia Learning, Inc., we provide up to three different types of metrics of rater agreement within each of our reports including:
- The range of scores. This indicates a band of responses, from the highest to lowest scores on a specific competency by all raters.
- The distribution of scores. This indicates a visual way to discern the spread of scores by raters on specific questions.
- The statistical measure of rater agreement. This is based on standard deviation from 100 percent agreement to no agreement; any agreement less than 50 percent statistically is meaningful and indicates enough variability within raters to suggest that the average score could be misleading if used to highlight strengths or potential development areas.
Coach’s Critique:Â
A very typical question that comes up in my coaching practice has to do with who said what in the 360-degree feedback report. Individuals I coach try very hard to identify the extent to which there is consensus with ratings, and spot those individuals that are so called “outliers”. This question often comes from a place of defensiveness, or a need to justify a fault in the system. However, when I walk them through the report, and show them the various rater dispersion reports, I help them grasp the details behind ratings, while keep the anonymity of raters. I find that reviewing the various range and distribution of scores with them immediately decreases their level of defensiveness and frustration.
For this reason, I prefer to utilize tools that provide the maximum amount of metrics for interpreting rater agreement. I particularly find that since participants seek to blame and justify average low scores on the extreme ratings of outliers, breaking down ratings in terms of a consensus level or rater agreement index can be very helpful. For instance, if a score has a very low consensus level, or a high level of variability, it’s important to explore the reasoning behind the variations, since this weigh less importance on the average score. At the same time, those scores should not be prioritized in terms of creating a development plan. It’s a good idea to develop a plan based on scores where there is a sufficient level of agreement.
What has been your experience with interpreting rater agreement for participants?