Yeah, I'm starting a blog with Fiddler on the Roof. Don't like it? There's the door.
A lot of my previous chart ons have focused on mechanics found inside of games, but today, I decided to go into something else behind the scenes in games, but not directly used in the game while playing: Matchmaking. There are actually a lot of different ways to match players of similar skill together, and man, do they ever get complicated. I’m not going to say that the subject matter is over my head, but the way it is presented anywhere else is what I call “pumpernickel”: it is very dry, and very dense. There’s a lot of greek, and just mathematical grossness. And when its me saying that? You know its bad. But with that, I wanted to talk about 3 methods.
This isn't relevant to anything here. Or anywhere. This is garbled math of use to no one, from one of the sources I was looking at. Scary stuff.
ELO
I just came to say "Elo"
The first method is the Elo method, often pronounced “E. L. O”, and not to be confused with the guys who sang
Mr Blue Sky. Can we talk about music instead? No? Math? Balls. So Elo ranking comes from a guy who really, really liked chess with the last name of Elo. The system has you starting with an arbitrary amount of pointsl. When you play against someone else, the difference between your scores is the system’s prediction of who will win.
Getting a bit more complicated with this, each player’s skill level is on a bell curve, which is you have been reading these for a while means you know where this is going. The reason why you are on a bell curve is because the ranking isn’t completely precise: You are expected to perform at your skill, but maybe that day you are just on fire, or possibly hung over. And no, I don’t care to elaborate if that condition is supposed to aid or impede your performance. Basically, there can be exceptions to your skill, otherwise, there would never be upsets.
So sick of googling "bell Curve" for these
When talking specifically about Chess and the system that Mr. Blue Skies set up, the bell curve that was used was very specific. The intention was that if two players were separated by 200 points of skill, that the more advanced player should win 76% of the time. In my research I read 76% and found that to be quite odd...it doesn’t say if they figured out the math first and backed into 76, or started with ¾ and backed into the curve, but either way isn’t clean, as it returns a standard deviation of 283 units. How does it do this? Any normal curve shares some characteristics, which can be solved forwards to get the cumulative distribution for a given “X”, or backwards, to get an X value for a given percentage. This is what a z table actually provides, but fortunately I had Excel to back into my number! This 283 tells us that at a difference of 0 in skill, the game should end up in a draw, at 283 the better player is favored by 34%, at 566 the odds increase (or decrease depending on which side of a one sided ass beating you are on) to 47.7, and at 849 your fate is sealed at 49.7. Now, if you were curious how to figure this out yourself, you just need to create an imaginary bell curve, and place yourself at the mean, or center. Then, find your opponent's skill and draw a line straight up from that point on the curve. Any area under the curve to the right is your odds of winning, and any area under the left is your opponent’s chance of winning. In this example, a 50 point difference represents about a 7% chance for a player to win, so it is probably a close game. 25 would obviously be better, but you would be looking for people in a similar population.
The last piece of this puzzle is how your rank changes as you win or lose. The system that ELO uses is called a “
K Factor”, and it is important to note that this number is debated to absolute death by people who are bad at the sport. Hah! Burn! But seriously, there are many ways to come up with a K Factor, but they all revolve around some key ideas. The first is that there is a floor and a ceiling to how many points you can gain or lose. For instance, chess sets this value to 24 per game. The next important step is how to distribute these points, and this is generally accepted (in my research) as your chance of winning minus the outcome, multiplied by the K Factor. The outcome is binary, so 0 is a loss and 1 is a win. If you are favored 10% and win, you get 10 percent of the K. If you are a supreme underdog (or not properly rated) you can get the full value. Some other systems have the K factor change as you play on. For instance, chess has a larger K factor for newer players, but as you play more games, the K factor decreases.
This system is also used by League of Legends, incidentally, and does also use games played as an adjustment for the K Factor, although I couldn’t find the explicit formula. What are the advantages or disadvantages? Well in theory new players will find their skill bracket quicker, for better or worse. Many people feel it is unfair at the early stage because it is more random, so pushing you into a lower bracket doesn’t really represent your skill. These people are also sore losers. Another concern is “ELO Hell” or being in “The Trench”. This effect is where you are ranked, and the system is confident, so you cannot escape your rank because each game doesn’t contribute enough points to get you out of your rank. That being said, the system is what it is, and has been good for chess for quite some time, with minimal in fighting.
MTG
I wanted to take a little bit of time to talk about Magic The Gathering, as it also used an ELO system in the past, with a K Factor of 8, if memory serves. However, the designers felt that the system was putting too many constraints on the players and wanted to change it. The main problems they wanted to solve were 1. players would sit idly at high ELO rankings to avoid losing precious points, 2. players would become nervous while playing, which would cause them to spend less money on cardboard crack, and finally 3. geographical concerns. If I play Magic every week against the same people, I don’t have an opportunity to change my bracket significantly. As a quick aside, League does solve for #1 by having an ELO decay, forcing you to play a certain amount of matches or else you lose your ranking. Surely, no one would sit idly just to get a 10 game winning streak though, right? Right?
Back to Magic, they actually switched over to Planeswalker Points, their proprietary system. How do those work? If you lose you get 0 points. Tie, you get 1. Win, you get 3. And that’s it.
This, in my opinion, is a terrible system for anything more than ego stroking. Let me explain: First of all, this assumes that games played equals an increase in skill which is completely incorrect. One of my DOTA loving co workers has a friend who refuses to learn from his defeats. He refuses to update his skill builds. He doesn’t counter pick. He doesn’t learn. His playing skill has capped out, and his learning skill refuses to grow. If he nurtured those while playing, his skill might increase, but as it is, he has been complaining about being stuck in “The Trenches” for a while. Anecdotally, I have played 80 games and found myself playing with people who have played anywhere from 150 to 400 to 1200 games. The biggest problem with this system is that a newcomer doesn’t get accurately ranked. You could be a strategic mastermind, but unless you play harder than people who have been playing their whole lives, your ranking will never ever reflect this, which goes against the idea of matchmaking entirely. It does breed a friendlier format, and also encourage people to play more (thus, spending more and more money on 100 dollar Planeswalkers).
True Skill
By far, the most interesting thing I found in my research though was TrueSkill. Have you heard of True Skill? It is a matchmaking system designed in 2006 by Microsoft, specifically for XBox Live. Usually when people compare the PS3 to the 360 it is price against the stability against the community, but I think it really says something about Microsoft where they researched a new system for their matchmaking. I thought it was pretty cool. The XB1 has also seen some improvements of the online area, which they really brought up in marketing, so it is clear that they care about this kind of stuff. And just so I don’t lose people who hate Microsoft, this system is also used (in some ways) by DOTA.
So how does this system work? It still has the same bell curve system for your skill, but it adjusts the bell curve’s standard deviation by another factor: certainty. If you are a new player, you are assigned a rating of say 25, and an uncertainty of 8.3. Why 8.3? It assumes your skill is anywhere between your skill +/- the uncertainty times 3. So your skill is between 0 and 50. (Note: remember how the difference in ELO was 200? A significant difference here is actually 6, judging by one of my sources. Like I said, it varies a lot). There is a super fancy term for this uncertainty called
Bayesian_analysis. As you play more games, the certainty factor changes. So if you are an 80/20 to win, and you win, your ranking goes up and the system is more confident in your skill, and vice versa (losing decreases certainty and ranking). There are some interesting interactions with all of this. The first is that if you play against a game with 7 other people at your same rank and certainty and you come out in first, the system figures increases the amount it distrusts your skill level. Why? You haven’t met your match yet! You never hit your skill ceiling this game. Unfortunately, that also applies to whoever was unlucky enough to get the biggest beatdown that game. The people who are in the middle are the people whose certainty moves the least.
Aside:
semantic satiation on certainty has been achieved.
The other thing that separates this system from the ELO ranking is that it can compare more than 2 teams. From what I have gathered, in ELO two teams of five are grouped together into a single ranking which is used to determine points allotted after the match. TrueSkill compares each player and each team against each other to determine how many points they should receive. Kind of neat that way. This also means that when you place you are ranked against each other player. In an 8 player game, your rating actually changes 7 times to its final results, depending on who you outperformed and underperformed against. In this way, the rating is much more a result of your actions in the game as opposed to the teams action, and your individual contributions to the result can be taken into consideration for your overall ranking. Sadly, I don’t have any hard math on this, because I honestly don’t follow a lot of the numbers because I am doing this at home with the comforts of
Reptilia and Laphroaig. So sue me.
The drink of all statisticians
There were a few things specific to XBox that were rather interesting in there as well. If a game only has 2 players, it adjusts to a system similar to ELO. If there are outside factors, such as lag, it can update your skill based on partial results. That one had a URL labeled “math paper” behind it so I was not touching that bad boy. I clicked on it and saw “Bernoulli” and “Gaussian” and just about crapped my pants. OH GOD. I FOUND THE CALCULUS. No. Thanks. Finally, just because everyone loves a circlejerk, this same method is applied to Bing to figure out how best to deliver ads to people.
Last Note on Matchmaking
Now here is something that is really rather curious: in both Chess and Magic the rating isn’t as important as in TF2, DOTA, LoL, and Halo. Why? Geography. The best players in the world rarely get a chance to test their skills against the other best players, with rare exceptions of large tournaments. Because of this, ratings aren’t as important in the actual matchmaking process. But when you connect people to the internet? Both systems can search for matches they they feel are close. Matches where either side has a chance to win. On the DOTA side of things, they often try to even out teams across 500 points (the K factor is something like 32, if that helps you visualize how much 500 is). In addition, the matchmaking tries to match stacks of players (preformed teams) in an even way, as communication is often critical in games like these.
Below is further research if you are interested. And as a warning, you aren’t. This stuff is not written in a friendly format, except maybe the last one, but that links out to the real math. There are integrals, and sigmas, and I think I even saw some calculus hanging out there. I did my best to sum up.
http://research.microsoft.com/en-us/projects/trueskill/details.aspx
http://blog.dota2.com/2013/12/matchmaking/
http://en.wikipedia.org/wiki/Elo_rating_system
http://www.moserware.com/2010/03/computing-your-skill.html
Easy fix: Add
[*].disqus.com
to your software's white list. Tada! Happy comments time again.