Research

Iris Bohnet, Oliver P. Hauser, Ariella S. Kristal. Can gender and race dynamics in performance appraisals be disrupted? The case of social influence. Journal of Economic Behavior & Organization, Volume 235, 2025. 

The issue with bias in performance ratings 

Performance reviews are supposed to reward merit. In practice, they often rely on subjective judgments that can be swayed—consciously or not—by an employee’s demographic characteristics, as well as by design choices in the review process itself. One common feature is that employees submit a self-evaluation that their manager can see before assigning a final rating. That sounds fair—people get a “voice”—but it can also create an anchoring effect: managers may lean toward whatever score employees gave themselves, even when those self-ratings reflect self-stereotyping and social pressures rather than true performance. 

These ratings matter a lot. They influence pay, promotions, project assignments, and even who stays or goes. When bias creeps in, the consequences compound over time, widening pay and opportunity gaps. Understanding where bias originates—employees’ own self-ratings, managers’ perceptions, or both—helps organizations choose the right fixes. Sometimes the problem is confidence and self-presentation; sometimes it’s discrimination by evaluators; often, it’s an interaction of the two. 

What the research says 

A new field study in a multinational financial services firm conducted by Iris Bohnet, the Albert Pratt Professor of Business and Government at HKS, and coauthors examined four review cycles (2015–2018) and a rare “natural experiment.” In 2016, a technical glitch in the firm’s software meant managers could not see self-evaluations before rating employees. That one-year disruption let the researchers test how much self-ratings shape final scores and whether the effects differ by employees’ demographic characteristics the firm was collecting, gender and race. 

The researchers report three core findings: 

  • Self-ratings differ by identity. Across all years, women rated themselves lower than men, and women of color gave themselves the lowest self-ratings of all. This aligns with prior evidence that social norms and anticipated backlash can dampen self-promotion among women—especially women of color. 

  • Managers’ ratings show consistent race gaps—and a modest gender effect. Managers tended to rate people of color lower than white employees, with the sharpest penalties in the United States, particularly for Black employees. At the same time, managers reduced women’s scores slightly less than men’s compared to their self-evaluations, effectively softening gender differences created by self-ratings. Crucially, this “adjustment” did not offset the race penalty; women of color still ended up with the lowest final ratings. 

  • Hiding self-ratings reduces anchoring but doesn’t erase disparities—unless rating history is also removed. In 2016, when managers couldn’t see self-evaluations, average ratings fell and were less correlated with self-scores—evidence that self-ratings normally exert social influence. Yet the gender and race gaps barely budged. The reason? The data suggested that managers anchored on ratings from the year before instead. When researchers looked at newcomers with no prior ratings to anchor on, the pattern shifted: women of color fared notably better when their self-ratings were hidden and no historical scores were available, landing on par with white women and men. Men of color, who tended to give higher self-ratings, lost that advantage when managers couldn’t see them. 

Organizations may be able to use two different levers to address these problems. If self-ratings are amplifying identity-driven differences in self-presentation, businesses could consider withholding self-evaluations from managers until after they submit their initial ratings. Organizations can pair this strategy with clear criteria and calibrated rubrics to limit reliance on vague impressions or historical anchors. But because the race gaps in manager ratings persisted even without self-ratings, organizations also need manager-focused interventions: structured evidence requirements, bias interrupters during calibration, and auditing outcomes by demographic group with accountability for unexplained gaps. 

The research suggests that hiding self-ratings can curb one source of bias, particularly for newcomers who might otherwise undersell themselves. But to address persistent race penalties, companies must also redesign how managers evaluate and calibrate performance. 

Photo: Nitat Termmee / Getty Images