Quantcast
Destructoid - TSuereth's Community Blog



About Me
Once I was stuck in traffic, in the middle of an intersection, blocking another lane. When my route was finally clear and I was able to drive through, a man yelled out his window at me: "It's all about you, eh asshole?"

He was right.
Gamer Profile
3DS friend code:
Steam:
Battle:
PSN:
Mii:
Gamertag: TSuereth
Following (12)
Aaron Mxy Yost
Brandnamecommercial
CblogRecaps
Chris Carter
copilotlindy
DaedHead8
Deathofthedead
doctor insidious
Drachula64
Handy
Jonathan Holmes
Tony Ponce
A Modest Review Proposal
TSuereth | 3:03 AM on 12.17.2009 4 comments


Re: Why do we use review scores?

I completely agree that review scores are easy, painless, and extremely convenient, even if they are an imperfect metric of a game's worth. I myself put non-trivial stock in Metacritic averages, and will sometimes skip a review's text in light of an exceptionally high or low score (hey, I gots things to do!).

But where would the human race be if we settled for convenient, imperfect solutions? (Your mother's house, that's where!) So in the interest of bettering not only video game journalism, but humanity itself, I hereby put forth my proposed replacement for game review scores.



The remainder of this post is essentially a poorly-written whitepaper, so if you just want to post a derogatory comment about my face, feel free to skip ahead.


---


The fundamental problem that review scores attempt to solve, is reducing a game's worth into as succinct and understandable a measure as possible. Anyone possessing remedial math skills can look at a 7/10 rating, and see that it's "better" than a 5/10 rating. But trying to derive any further meaning from a numeric score can be tricky:

1) If Game A has a 7/10 from IGN, does that make it better than Game B, with 6/10 from Destructoid?

This is a problem that Metacritic (and other review aggregators) go a long way toward solving, as they smooth out the effect of each outlet's numeric range. But it is still difficult to separate the effect of a critic's opinion from the effect of an outlet's skew on review scores in general.

2) Is Game C, released in 1999 with an 81% average, better than Game D, released in 2009 with an 80% average?

Our expectations change as genres, technology, and development practices move forward. Last year's award winner often becomes this year's par-for-the-course, and next year's trash. But not always! - some games still seem almost timeless. There's no standard rubric to adjust a game's review score based on its age, nevermind newly-published reviews of older games.

3) If Game E, a racing game, gets 90/100, is it better than Game F, an action-adventure, which got only 85/100?

This is really an unfair question, since it assumes a 1:1 mapping between "good" and "bad" across genres. I would make the argument that all racing and sports games suck, unless they have banana peels; whereas a racing game fan might make the argument that I am a dick. While there are some aspects of disparate games that can be directly compared (graphics, voice acting, etc.), in general, their numeric differences are not meaningful. The score for an "average" football game is completely different from the score for an "average" platformer.

4) Is Game G exactly 'average' if it receives a 5.0/10? That is, can it be assumed that half of all games are better, and half are worse?

Of course, we all know this is bullshit. Individual review outlets can assign textual descriptions to their numbers, but these become meaningless in view of other outlets' different numbers and descriptions. Furthermore, the idea that a game can be explained as purely a number, with no context, is utter nonsense; the number is useless except when compared with other numbers.

It is this last question that inspired my proposed system, which I'll call Relative-Comparative Ranking (RCR for short). I assert that it is pointless attempting to create a holistic measure of game worth, independent of comparison; it is impossible to say a game is "good" or "bad" without some contextual basis, e.g. "better than this game" or "not as good as that game." At any rate, a game consumer doesn't know what to expect from a game rated 80 until he's played a 70, or a 90, et cetera.

In so many words, my assumption is that a succinct game-worth descriptor need not be meaningful purely on its own. So with that in mind, why bother with the artifice of numbers? My RCR proposal is to describe games as better than (or not as good as) other games.

For instance, if I were to review the this-gen slash-em-up Conan, I might describe it as better than Vexx, but not as good as New Super Mario Bros. A user could view Conan's ranking page, and plainly see that it's more fun than a terrible 3d platformer from 2003. Or, he could view the Vexx ranking page, and note that it's worse than a mediocre action game from 2007.

If the user has played any of these three games, he can roughly gauge the quality of the other two, based on my and other reviewers' ranking votes. Given a large enough pool of users, all with their own distinct genre tastes, the effects of those tastes will filter out - leaving (presumably) the real, measurable differences in quality between games of un-like genres. This can be construed as an answer to (3), above: as much as unlike-games can be compared, the ranking system makes the comparison direct, rather than using unreliable numbers.

Rankings become infinitely more meaningful, though, when games are compared to like-games; which is just the kind of comparison you'd expect a critic to make, anyway. Regarding the Conan example, rather than comparing the game to Vexx and Mario, I'm in fact much more likely to say that Conan is better than Golden Axe: Beast Rider, but not as good as God of War. Now, chances are pretty good that any fan of beat-em-up games can use this ranking information to make a reliable purchasing decision. Since these rankings are non-temporal - a 6.0 may not always be better than a 5.0, but Conan will always be better than Beast Rider - this handily solves issue (2).

It's also desirable to rank games using votes from some degrees of separation away - e.g., if there are no votes comparing Modern Warfare to The Conduit, but there are known MW vs. MW: Reflex rankings, and known MW: Reflex vs. Conduit rankings, some second-degree inference may be made. Hence, an answer to (1).

Embellish these rankings with short, tag-like descriptors of a game, covering its genre, platform, key features, etc., and it becomes trivial to computationally construct meaningful rankings based on category. It would be simple to determine the "best" racing game, or the "best" Zelda game, or the "best" game starring Nolan North. This answers issue (4), in that it is easy to see how a game has placed in the grand scheme of things. It could even serve as a mechanism for a novice gamer to pick a good starting point; if a game consumer's never played a Zelda before, he could check the series ranking and instantly go for one near the top of the list.

The trick of RCR is, naturally, the implementation, which I haven't completely figured out yet. A straightforward approach to first-degree comparative rankings could be pretty easy: a massive table with every recorded game on each axis, and ranking votes where two titles intersect. But how should second-degree comparisons, and farther out, be decided? What algorithm, and what weight, should be applied to these rankings? And what determines statistical significance - if I go to a game's ranking page, how many times will the entire table have to be searched, to determine what comparisons are most meaningful?

Ultimately, getting the implementation right requires much further thought, as well as practical experimentation, and a significant set of data to play around with.


---


When I first thought of this system, I'd intended to further design and test it for my personal game site (citation needed); but I gave up before reaching any sort of functional prototype, having no significant userbase to use for algorithm testing. Also, laziness. I'd still love to try this someday, though.

TLDR - review scores are good because the alternatives are kind-of fuckin' complicated.



Attached photos:

Photo

Is this post awesome? Vote it up!

0



Post a comment! You can also post a photo below:

Comment with Facebook





Click connect and comment instantly!

Comment with Dtoid





New? SIGN UP - it takes 5 seconds

3 comments | showing # 1 to 3
prev next

mistic's Avatar - Comment posted on 12/17/2009 08:21
mistic
The main problem I see with that is that nobody likes the same games...

For instance Jim HATED AC2 while he loved The Saboteur, so if he says its better then AC2 and it gets read by somebody that totally LOVES AC2, the person reading it will get the complete opposite idea... It would also be virtually impossible to work together with users to create a 'rating'-system since there'll always be people that absolutely hate or love a certain game, SoTC being a nice example of this... It's really highly rated generally but there's certainly a lot of people hating on it...

I like the idea, but it's just not possible I think, take racing games for example, depending on what kind of player you are, you'll like totally different games, a sim fanatic will love Forza3 and GT5(P) while an arcade-racer will tell you they suck cause they go to slow, you can't corner while using NOS etc. This holds true for most types of games, our market is so diversified and we gamers have such different tastes that it would be impossible to deliver a meaningful result for anyone...

Even if you get a HUGE userbase ( like dtoid ) the results would be screwed up in the end, there's too many fanboys/haters for some series... In general a lot of dtoidpeople hate HALO ( I kinda like it though ) so a random visitor would come in read an FPS-review, hear its better then HALO ( which could mean "still pretty fucking bad" ) and then think it is absolutely awesome... Plus we've all seen what happens when users get allowed to 'rate' games ( see metacritic disaster with userreviews )
confusionbomb's Avatar - Comment posted on 12/17/2009 09:49
confusionbomb
Huh, wha? No Professor, I wasn't sleeping, just resting my eyes.

That was a very well articulated, if dense article.
TSuereth's Avatar - Comment posted on 12/17/2009 22:34
TSuereth
mistic:
Ala Metacritic/Gamerankings, Game A vs. Game B rankings become more meaningful as averages from a large pool, hence the more popular opinion prevails in a published ranking. Second-degree and further rankings would be incorporated using weights not only proportional to their degrees of separation, but also proportional to the number of votes total - the net effect being that majority opinion (for a statistically significant sample size) propagates throughout ranking information.

I probably did a poor job of explaining this, as I haven't really figured out an algorithm for it yet, and also I was approximately 80% asleep when I proofread this blog. So, my bad.

As for comparisons of different types of games, just as with current numeric comparisons, they do become less meaningful. I contend that a straightforward "better/worse than" comparison would do a better job of illustrating qualitative differences than arbitrary numbers do - but yes, at some point, comparisons of un-like games must be recognized as not terribly informative.

Re: user votes, yes, rankings are still very susceptible to "gaming" by Joe User; I can picture for instance, a concerted effort to vote Elmo's Letter Adventure above Modern Warfare 2 PC. Maybe, in a perfect world, Metacritic would buy my idea and implement it such that professional and semi-professional votes are ranked separately from the unwashed masses. Then I can retire and build myself a hot robotic wife.

So ultimately, sure, it's still not a perfect solution. I think it is worth trying, though, with the caveat that significant test data could be pretty difficult to come up with.



confusionbomb:
There will be a quiz.
prev next

Comment with Facebook





Click connect and comment instantly!

Comment with Dtoid





New? SIGN UP - it takes 5 seconds

Comments policy

Destructoid is an open discussion community. You don't need to "audition" to post a comment - just speak your mind. We respect differing opinions on the site, so have at it. Be smart, funny, insightful, clueless, or cute -- but back it up with substance. Keep your cool, keep it fun. We only ask that you act respectfully and above all: don't be a troll and ruin it for everyone else. Don't bring down gamers or we'll, you know, gently shoot you in the face and stuff you into a flaming mailbox. Each comment is your opportuntity to make this community awesomer. Is that even a word?

Avoiding the banhammer only requires common sense: spamming, trolling, racism, NSFW stuff, and other forms of sucking will not be tolerated. If anyone is griefing please report abuse. Be good. Don't suck!