How much information can you actually glean from a five star rating? | iStock/Mack15
Most of us do it reflexively.
If we see a slew of five-star reviews for a restaurant or vendor on eBay, we assume it really is excellent. Unfortunately, online customer ratings increasingly resemble the old slogan for Lake Wobegon, where “all the women are strong, all the men are good looking, and all the children are above average.”
A study several years ago found that 90% of the sellers on eBay had almost entirely positive ratings. At Airbnb, 95% of hosts had average ratings of at least 4.5 out of five stars. From Yelp and Amazon to the Upwork platform for freelance workers, average ratings are way above average.
“Ratings inflation is bad for everyone involved,” says Nikhil Garg, a PhD candidate in electrical engineering. “If everyone gets perfect ratings, neither the platform nor the consumers can identify the best sellers.”
Garg recently teamed up with Ramesh Johari, associate professor of management science and engineering, in a study that tested a surprisingly simple alternative to number- or star-based ratings: Get people to make evaluations in words.
The Stanford researchers studied this approach at a large website for hiring freelance workers. Using a standard numerical scale, employers had given 80% of freelancers a five-star rating, signifying that they were essentially perfect.
But when the researchers enabled employers to evaluate workers with six different word-based scales, the results changed markedly.
For instance, employers were asked to choose adjectives ranging from “terrible” or “mediocre” to “phenomenal” and “best possible!” Another option asked employers to rate how the worker compared with their expectations, with choices ranging from “much worse” to “beyond what I could have expected.”
Sure enough, ratings were substantially less buoyant on all six of the word-based scales. Less than 40% of the workers were given a top rating, and at least half received the verbal equivalent of a two-, three- or four-star review.
Why would people respond more skeptically, or honestly, when they rate with words rather than stars? Garg and Johari suspect that many if not most people find it painful to deliver harsh or even lukewarm reviews. “Giving a bad rating is often unpleasant,” Garg says. “If you’ve just had an extensive work relationship with someone from a freelancing platform, or they just welcomed you into their home or their car and you talked to them, you want to give them a positive rating.”
On top of that, he adds, customers don’t know what a four-star or five-star rating means. They interpret a high number rating to mean average or acceptable or that nothing went wrong. As a result, most numerical ratings are clumped near the very top, with a smaller cluster of negative ratings near the bottom.
Garg says rating people with words tends to be more meaningful to consumers and eliminates the clustering effect. Although very few of the verbal ratings were at the bottom, much the same as with one-star ratings, customers were comparatively reluctant to use superlatives like “phenomenal” or “best freelancer I ever hired.”
“They don’t want to say that an average software developer is really the best they’ve ever worked with because they understand that that probably hurts the superstars.”
Garg and Johari also found that word-based ratings appear less vulnerable to “ratings inflation.” A number of earlier studies had found that the average ratings for online platforms tend to climb over time. If a very high percentage of ratings are for five stars, more customers are likely to think that five-star ratings are the norm and the overall averages creep steadily higher.
With the word-based reviews, however, the average ratings stayed basically in place. If that holds up over time, the reduction of rating inflation could lead to substantially more accurate reviews, which is great news for sellers and consumers alike.
Receive announcements, news, and events for
Stanford's Data Science Institute.