“The whole is greater than the sum of its parts.” -Aristotle
Attempting to interpret 360-degree feedback results is difficult enough, and to have to interpret scores of all different raters is even harder. As we discussed in one of our previous blogs on interpreting different rater scores, participants of the 360-degree feedback are challenged about how to effectively use their results when there are different perceptions about him or her. Not only do they have to face discrepant opinions between different raters groups (peers vs. direct reports), but they have to interpret discrepancies within rater groups.
In earlier meta-analytic study by Conway & Huffcutt (1997), ((Conway, J. M., Lombardo, K., & Sanders, K. C. (2001). A meta-analysis of incremental validity and networks for subordinate and peer ratings. Human Performance, 14, 267–303.)) average correlation between two supervisors was only .50, between two peers, .37 and between two subordinates only .30.  As such, agreement within raters appears to be an important issue to discern for clients in the interpretation of their 360-degree feedback reports.
Vendors who do not provide a way for participants to evaluate within-rater agreement in feedback may increase the probability that average scores used in reports can be easily misinterpreted—particularly if they are used by coaches to help clients focus on specific competencies and behaviors for developmental planning purposes.
For example, Envisia Learning, Inc. provides up to three separate measures of rater agreement within each feedback report including a range of scores measure, distribution of ratings on most and least frequent behaviors and a statistical metric of rater agreement based on standard deviation.  These within-rater agreement metrics help clarify “outliers†and how to possibly interpret polarized scores on specific questions and competencies.
Coach’s Critique:
Imagine six of your direct reports have all rated you significantly differently than each other on your 360-degree feedback. One rater might have a view that is completely contrary to another person’s view, and as a result your score results in a mere average. How useful is that?
In my experience, every rating score has something to offer a participant. For example, one of my client’s feedback report revealed that his subordinates have completely different scores than each other. His first reaction was to justify why those individuals that gave him low scores do so, and defended the higher ratings as the “more accurate†scores. He also made a big attempt to identify which of his subordinates gave him which scores, and try to disregard those scores that were contrary to the most. The problem with this approach or reaction is that it disregards the opinions of some. In my opinion, every rating has a purpose, with the exception of extreme outlier scores (that hopefully should be controlled for with an effective 360 degree tool).
As my client and I analyzed the different elements of his scores (beyond looking at the average score, considering all individual scores, and digging further into open-ended comments), he realized the inconsistency of the scores between his subordinates was due to the way he behaved around different people. He was actually treating people differently!
So, as coaches and interpreters of the feedback reports, we need to be cognizant of reasons for discrepancies. Perhaps, we can do this by avoiding to place too much emphasis on average scores and to utilize tools that depicts on all elements of distributions and ranges in order to actually find meaningful and useful results.