I've been reading reviews on Destructoid for a long time now; probably since they started doing 'em. About a year ago, amidst much debate about the validity of review scores from various sites (
continuing today still), Aaron Linde posted the first
Destructoid Review Guide, and there was much rejoicing.
At least there was for a short while. It wasn't long until the complaints started back up again. People claimed that games were being rated too low solely to get hits, rather than reflecting on the quality of the games. So a few months ago, Jim Sterling released the
Destructoid Review Guide Version 2.0, reminding us that yes, Destructoid editors use the full ten point scale, and that they don't rate games low just for controversy.
I have collected data on the past year of reviews, since Aaron posted the first review guide, up through the most recent review on
Mortal Kombat vs. DC Universe, and as I will show, not only are they not rating games lower than they should, it is quite possible that even the Destructoid editors
overrate games, utilizing the maligned "IGN Scale" of 5 to 10. Without further ado, I present histograms of the review scores awarded by the Destructoid staff. The first histogram (at the top of the page) shows the data between Aaron Linde's guide posting and Jim Sterling's guide posting, the second shows the data between Jim's review guide and today, and the third shows all of the data since the first review guide was posted.
The first chart shows an interesting, almost linear relation of score to occurrence between the scores of two and nine, with each subsequent score being more common than the last. It shows only two ones awarded, both of which were for
Eternity's Child. It should be noted that the half-point scores generally occur less.
The second chart doesn't contain quite as many data points, so it isn't as smooth as the first. It still shows a surprising trend: eight and nine dominate the chart, each with twice as many occurrences as the next most common score. Only two games scored between zero and two (
Facebreaker and
Warriors Orochi 2), whereas thirty-four games scored between eight and ten.
The third chart is just the sum of the first two. With the most data points, it shows an obvious bias toward the right side of the chart, or the higher review scores.
So what causes this? Frankly, I could come up with a number of reasons. While Destructoid is large, it still isn't IGN or Gamespot, whose employees review and score every single game that is released, and so the Destructoid review team is probably more likely to play and review only the games that are likely to enjoy. It could be that in general, games that come out these days are mediocre at worst, and really awful games only come out every once in awhile. Or, the most dire possibility: perhaps some of the reviewers don't quite understand the review guides themselves, and are still grading on a five to ten scale.
Jim Sterling wrote in his review guide, "We try and review games based on what we like or know about most so if, for example, Colette rates
Chocobo Cloud Smiles 2: Chocobonkers a 2 out of 10, you know it's
got to be bad." While that's true, this review philosophy is potentially faulty, because if, for example, Colette rates
Chocobo Cloud Smiles 2: Chocobonkers an eight out of ten, then it doesn't really tell us
anything. Sure, a chocobo fanatic loved it, but what does that mean for the rest of us?
Let's step back for a moment and consider the second point though. I've given it a bit of thought at this point (in case the graphs weren't indication enough), but how
should the histograms be shaped? Do we consider a score of five to be "average" or "mediocre"? If a score of five means that the game is average, then the histogram should show a nice bell curve, with four, five, and six being most common, while zero and ten are least common. If a score of five means the game is mediocre, or in other words that it is as good as it is bad, as enjoyable as it is unenjoyable, as exciting as it is boring, then it is possible for the histogram to be shaped in any way imaginable. I bring this up because depending on how one defines the full scale, there may not be anything necessarily
wrong with the review scores, no matter how interesting.
Still, I wanted to more fully explore the possibility that some editors are more "at fault" for the score buffing phenomenon apparent in the total histograms. So for every editor in my data set who has written ten or more reviews, I have tabulated separate histograms. These follow, with commentary for each.
Arguably one of the most called out of the review staff for giving undeserved low scores is Reverend Anthony. His scores, however, are the most spread out of all of the editors highlighted here. Despite his negative infamy, his average score awarded is 6.71, nearly two points above an average score of five. Some points to note are that he has awarded as many threes as he has seven-point-fives and nine-point-fives. Even though his is the most spread out, it is still biased toward the high end, with only one game getting a review score between zero and two-point-five.
Brad's histogram is undoubtedly the most like a bell curve centered around five out of any of the reviewers. I've become particularly interested in his reviews after seeing these data, because it shows a lot of promise for the proper use of the one to ten scale, but it really needs more data points in it before we can draw any conclusions. One thing you may note is that he almost never awards half-point scores, which I really prefer.
And now we begin to see the more telling data of skewing toward high scores. Aside from two games (
Baroque and
Infinite Undiscovery), Colette has not given any games a score less than seven. Her average score awarded is 7.38, and her most common scores awarded are seven-point-five, eight, and nine.
Dale's histogram is almost identical to Colette's, with one game getting a four (
Zoids Assault), and no others receiving a score less than six. His average score awarded is 7.80, with his most common scores being eight and nine.
Among all of the charts I have prepared, Dick McVengeance's (Brad Rice's) has probably the most interesting shape. It's clearly
bimodal, but I'm not sure what we can conclude from that. It's as if he is actually reviewing on a binary scale; either he likes a game or he doesn't. Even so, his average score is a full point and a half above five, and his most commonly awarded score is eight-point-five.
Jim and Anthony are tied for total number of reviews within this data set at twenty-six reviews in the past year (that's one every other week on average, as it turns out), and he is another common target for accusations of doling out lower scores than were deserved. However, his is not that different from Colette's or Dale's, except that he has given a total of four games scores less than five. His histogram is still pretty heavily biased toward the right, with an average over a point and a half above five, and most common scores of seven and nine.
I'll be honest; part of the reason I was motivated to look at individual editors' scores in this analysis was because I feel that Jonathan is
not critical enough of the games he reviews. The histogram really speaks for itself (but that won't stop me from pointing out various parts). The
lowest score he has awarded in the
past year of reviewing is a
six (for
Samba de Amigo), and it's the only score he has given out less than seven-point-five. His average is way up at 8.44, and his most commonly awarded (twice as many as the next most common scores) is a nine.
Lastly, we've got the histogram for Destructoid's Editor-in-Chief. It's like Anthony's in that his scores are fairly evenly distributed (aside from an overabundance of eights), but it's unlike Anthony's in that it appears Nick is using the IGN five-to-ten scale. The entire left half of the graph is blank, and his average score is 7.61, just about halfway between five and ten.
So there you have it. What do you think about this? Is the high score biasing because the reviewers review games that they are more likely to enjoy? Is it just that some people are extremely enthusiastic about games and they give everything a high score? Is it a case of the reviewers not adhering to the two posted review guides? And perhaps the most important question: is this an issue that requires addressing, or should we just go on as we have been, calibrating our judgment of scores based on the author or ignoring them altogether?
FOOTNOTE: This analysis is not meant as an attack on any of the reviewers. I love them all as people, and most of them as writers too!
Damn impressive work dex! Front page this!!
I agree with those saying they mostly review good games. If they're buying the games they review, I'd expect them to buy good games or ones they really want and expect to be good.
Hooray for second page. This was a great analysis. I try to put more stock in the review itself rather than the number at the end but it is interesting to see how it plays out. Also, I've always known Brad Nicholson rocks. I didn't need math to tell me.
Oh snap, MaxVest enters the fray. Honestly, it might be better for a mathematician such as yourself to take care of this. I'm just a chemist who had an idea and when with it. I can send you my Excel file if you want.
really good job and raised some good points. wish you could do a community reveiw chart just for shits and giggles.
Here are some useful links related to this.
An analysis of GameSpot review scores
An analysis of IGN review scores
Dude, hell of an effort and very well thought out.
God, I wish I had your willpower.
Wait, Twilight Princess got a 4?!
@ Dexter345
Kudos on that 10 shitloads reference.
Damn man!
I think, considering the number of reviews that Destructoid handles, those charts show a pretty decent spread.
Jim and Anthony have come off more critical scoring wise, in my unscientific estimation.
Maybe most games really are just decent to amazing? Perhaps the tendency toward 7 is a testament to real quality in the industry? As much as I personally don't want to believe that, maybe the numbers are telling us that. I mean, maybe all things are even, once a large array of personal opinion is factored into tallied scoring?one man's 9 is another man's 6. Somewhere in there, it comes up to a 7.5ish.
I'd love to see the metacritic histogram.
Very intriguing. *strokes stubble*
Great work. :)
Holy shit, great read, you obviously put a lot of work into it!
Probably one of the most though provoking blogs I have ever read. Also, I would make a distinction between an average game and a mediocre game. Most people would say mediocre is a 5, and that the average game is not mediocre, hence the 7-10 scale. I think to fully use the whole 1-10 scale we have to put average games at a 5 and put mediocre games at more of a 3-ish.
Great work on this Dex, it's indeed VERY interesting to see these histograms.
More like ten shitloads
This is an excellent effort, but I do think that the analysis is perhaps more detailed than the data itself. I think it's too early to tell but I think that your early explanations of any possible bias seem to be pretty good - that a) Destructoid has more choice of what to review than some sites and b) Most major games managed to at least be mediocre before getting released.
But apart from that, review policy aside, it may just be that some reviewers are more likely to be forgiving of faults than others - and that's okay, too. After all, we mustn't lose sight of the fact that reviews are not and SHOULD not be objective.
There's a ZERO on the dtoid scale?
I've always preferred a 5-point scale to 10. Lets you know if a game is average, more, or less than, and encourages better distribution of scores.
I lol'ed when I saw the histograms. Then I left
Fantastic work, Dexter. I'm proud to have been part of the inspiration for this piece. Maybe now Jim will listen to my pleas for review copies of games I'm less likely to enjoy. Trust me, if they start giving me sports games to review, you’ll see how low I can score.
The one place I disagree with you though is on the differentiating between "experts" and "fans". Just because I know the Mega Man series inside and out doesn't mean I'm going to give every Mega Man game a good review. I like to think that because I've played so many Mega Man games, I know what makes the games work and what doesn't, and will not be biased in my report.
Videogames are so diverse in style these days. The only other form of media that can compare in terms of sheer diversity is spectator sports, and just as it wouldn’t do you much good to have a expert on curling do commentary during a football game, it wouldn’t help you to have FPS fan review Mega Man 9. You want a football expert for that job, just as you want a Mega Man expert to review Mega Man 9.
Sure, the football expert is obviously going to like football more than other sports, but that doesn't mean he can't have a critical eye towards the sport he loves. If he’s impartial and unbiased (as all experts should aim to be), then he’s going to use his interest in his specialty to make more a more informed analysis on his chosen subject, while being careful to stay objective.
Having an editor who doesn't like (and therefore doesn't know) Mega Man games review Mega Man 9 wouldn't make any sense. I was critical towards the parts of the game that didn't work for me, but on the whole it is one of the best Mega Man games ever, so it got a 9. Is it really necessary to have a reviewer tell you "If you don't like Mega Man games, then you wont like Mega Man 9, because it's a Mega Man game, so you wont like it”? I agree that it never hurts to have a second opinion, but the first opinion should always come from the reviewer with the most expertise in the game in question.
Also, just wait for my Strong Bad episode 4 review, which should be posted any second now.
You sir, are a god among spreadsheets.
I think it comes down to game choice.
For example, if Destructoid made it their personal goal to rate every game released for the Wii. I don't think we'd be seeing a bell curve centered on 5.
@Joseph Leray: When I think of a 5 as average, it doesn't mean mediocre, but what it does mean to me is undesirable. I don't want to play a game that is just as bad as it is good. I want to play a good game.
The technical term is "mathmagician". And I'd be interested to see what the landscape would look like if game scores were recentered with a median of 5 and replotted on a typical bell curve.
There's just all sorts of ways to play with these number. I'm so excited! I'm so excited! I'm so... scared.
You're as biased as Jim Sterling. Biased biasers are biased and should stop being so biased.
Also, bias!
Nice Dexter. This must have taken forever.
I think destrutoid needs to start reviewing the "Imagine ...iez" series of games to bolster the lower scores. You should even out those histograms in no time.
Or possibly review all Wii Ware and XBLA games. That would skew it the opposite way in a week.
Dexter is a scientist!
BOOOOOOOO
really fun nerdy work there!!! but since we're talking stats I think a big issue would be that your N is too small for pretty much everyone except for Anthony and Jim. Also, there is probably some huge self-selection bias issues as well; as Niero pointed out not all reviewers are reviewing games that came as review copies. Many are reviewing games they wanted to purchase in the first place. That doesn't mean the review is an automatic high score but it does mean there is some kind of pre-existing affinity. Finally there are also cases of consciously assigned bias where reviews are given to particular connoisseurs of a series where their previous knowledge would be considered to give them a unique insight at the cost of, perhaps, a truly "objective" (hint: impossible) perspective; Jim reviewing a Dynasty Warriors game, or Collete getting a Final Fantasy title.
In the end, I think almost everyone would agree that if #s could be done without, things would probably be better. How many times have you seen people get into a ridiculous shouting match over a point or two in the score saying "it doesn't match the text!" or screaming about some bias or another? Some people might read a review seeing at as predominantly positive and gloss over the bits that are critical because they don't really give a shit about those bits in the first place. For example, a reviewer might dock many points for things such as short game length or subpar graphics. But personally, game length and pure visual power don't really mean that much to me so those things won't register as important for me as say story, gameplay or control. Everyone has different criteria not only in writing reviews but in reading them as well and the text is a much more versatile expression than the number.
DMV just likes playing bad games, so he seems to jump at the chance to review them more. I still haven't gotten him to admit that he's lying to himself about wanting to play more Red Steel.
All in all, great analysis, and it'll be interesting to see where we are another year from now.
Hmm, I looked into this myself. Very interesting, but I feel you have under-played the fact that most reviewers will pick games they know they are likely to enjoy, hence the resulting above average means.
The only other point I would make is that the answer to your question "but what does that mean for the rest of us?" Is simply to direct you to the paragraphs of text above the score.
The score tells you the general feeling the writer had for the review, but the text allows them to elaborate.
Interesting results though :)
Holy fucking shit. You're a fucking God, man.
Great read. One of the best c-blogs I've seen.
Holy fucking shit. You're a fucking God, man.
Great read. One of the best c-blog posts I've seen.
I have always loved the idea of no number scores (no scores, i guess) for game reviews.. that would be totally tits. For me... well I always seem to give too much thought to the number attached to the review... I can't explain why.. I mean, I don't want to... but once I see that number it makes a huge impact on me.. I think Linde was talking about this once. It would be nice to see the games explicated without giving them a score. Could it still be a review without the number? Maybe with a pro's and con's section.
Just a thought.