Overwatch's Ranking Point System

Overwatch team: great job and all. If you want to listen to an hour of me saying great job, here's a podcast about that.

You should probably re-think the current system of gaining/losing rank points though. Specifically, adjusting the ranking based on individual performance rather than just win/loss is pretty dangerous.

Elo for Team Games

Elo is a standard ranking system. You gain points for winning and lose points for losing. Furthermore, you gain more points for beating someone ranked higher and you lose more points for losing to someone ranked lower. Elo is designed for 1v1 games though, not team games.

To generalize Elo to team games, there's two factors you'd use. First, if your team's AVERAGE ranking was lower than the opposing team's average ranking, then you should get more points for winning. Second, if your PERSONAL ranking is lower than your team's average ranking, you should get more points for winning than your higher-ranked teammates who also won. As far as I know, all of this is true in Overwatch and makes sense.

But what about your individual PERFORMANCE during a game? For example, you lost but you played really well and your stupid teammates caused you to lose. Should you lose FEWER points for this loss because you personally played well? This is dangerous territory. If your instinct is to say yes, then at least consider that this requires you to gain fewer points for a win if you happened to not play well. That's really the least of our worries though.

Before we get into adjusting ranking based on individual performance, does Overwatch currently do that at all? The answer appears to be yes. Here's an excerpt from this article:

Cloud9 carry Lane “SureFour” Roberts was the first player to hit 80 Skill Rating when Competitive Play launched, queuing almost exclusively with his professional player teammates. When Roberts finished his 10 placement matches, he received a 77 Skill Rating, commensurate to his talent as one of the best players in Overwatch. Cloud9’s support Adam Eckel, who played the exact same 10 placement matches as part of a Cloud9-stacked queue, only received a 67 Skill Rating. His tank teammates hit 71. Derrick "reaver" Nowicki, the team’s other carry player, hit 74. All of those numbers rank in the top one percent of Overwatch players, according to MasterOverwatch, but that’s a pretty big discrepancy for players who contributed to winning the exact same games against the exact same opponents.


So what happened here is controlled test where a team of 6 players only ever played with each other and necessarily had the same win/loss record against the same opponents. Because they ended up with different ranks, it looks like individual performance really does matter in this equation. There could be some other explanation maybe, but it's highly likely that their individual performance metrics is what explains the difference in ranks.

Microsoft TrueSkill

Generalizing Elo into a system that handles team games isn't new. That was exactly the purpose of Microsoft's TrueSkill ranking system over a decade ago. TrueSkill intentionally and explicitly does NOT use any individual performance metrics. Their argument is that no matter what game you're talking about and no matter what metrics you measure to determine how well a given player did, it's necessarily imperfect compared to using only win/loss. The point of trying to guess if a player did well or not is how much they contributed win/loss, but the win/loss stat is the most accurate measure, they say. You'd introducing error by adding ANY other metric.

In addition to introducing error, you're warping incentives. For example, if you measure "damage done" as one metric, then it means players will attempt to maximize "winning AND damage done" rather than just "winning," which is not great. You can also very easily accidentally do a lot worse: you might accidentally give incentive not to play support heroes in a game where you really need support heroes on your team. (It seems this is already true in Overwatch.)

In many cases, it's almost hopeless to even devise a metric. If a character's role has to do with healing, you can't actually use how much they healed as a measure of much. If you did, it would penalize a healer on a team that played so well they didn't need as much healing. Or even worse is a character like Mei. Her ice walls can do a lot, her slow and freeze effects can do a lot too. But to actually quantify that into a metric correlated to win-rate? That's a huge error effect waiting to happen. My friend suggested the best metric how effective you were with her is to monitor the opponent's chat to detect how much they are cursing about Mei.

Yet another issue is that it's easy to accidentally create competition within a team for no real reason. For example, if number of kills is a metric that affects your rating, then your teammate killing an enemy that you could have killed essentially "stole" ranking points from you. That's clearly a bad dynamic.

I think Microsoft TrueSkill's reasoning makes sense here. It's a good case against ever using any individual performance metric when adjusting ranking points after a win or a loss.

Tangent: Another Thing about TrueSkill

You can skip this section if you're just here for Overwatch stuff. I just wanted to note that I'm not fully behind the REST of what TrueSkill does. The main idea behind TrueSkill is rather than assign a specific ranking number to a player, behind the scenes its assigning a bell curve probability distribution of what it thinks about your ranking. So two players might both be ranked in the 54th percentile (about tied) but the system strongly believes that's correct for player1 while for player2 it has a wide bell curve showing very low confidence in that ranking.

In theory, I see how this would allow it to converge more quickly to a good value. And in empirical tests done by Microsoft, it did converge faster than a more Elo-like system that didn't use the probability stuff. But...it just seems wrong anyway.

Specifically, if I beat a player way better than me (according to our ranks), I expect to go up a LOT of points. If I go up very few points "because the system is very sure of my current rank," that feels like total bullshit. And I have had this exact experience before. It's confusing and frustrating. As a player, I actually resent the system claiming it's so sure about me and dampening my rank gains when I go against its expectations. I think that feels debilitating and doesn't work well in cases where players really do get a lot better.

Anyway, I don't think Overwatch is doing this.

In Favor of Individual Performance Metrics Affecting Rank

Even though it sounds like a bad idea to count individual performance metrics when adjusting ranking points, is there some reason to do it anyway? Yes, I think there's something in the plus column here. The two main plusses I know of are "good outweighs the bad" and "assistance to escape Elo hell".

Good Outweighs the Bad (??)

Yeah it's imperfect to add any metric at all that gives you a bonus for kill:death ratio or whatever rather than just win/loss, but maybe it helps more than it hurts. For a character like Reaper, kill:death ratio is a relevant metric. It's not a perfect one for sure, but if you did amazing on this stat, chances are fairly likely that you played well. There are times where this indicator is wrong, but we might beat the baseline of "never count this stat" more often than we'd be steered wrong if we do count this metric for Reaper. I don't know that to even be true for real, but that's an argument someone could make.

I think the trouble here is that it's playing with fire. It's very easy to mess this up, so the downside is clear. If you mess it up, you get situations like the Cloud 9 example above where support players appear to be punished accidentally due to the workings of these behind-the-scenes algorithms. Is that risk really worth it? The upside is helping your rating converge to a good value more quickly, but maybe that's less important than avoiding these potentially very bad downsides.

Elo Hell

A related point here is about the urban myth of "Elo hell." This is the phenomenon where players with bad ranks in a team game can't rank up, even though they are actually much better than their current rank says. Their bad teammates make them lose so much that they can't rank up.

Is this a real phenomenon or just a myth? If it's real, then don't we actually place a lot of value on having individual performance metrics boost these decent players out of their unfairly low ranks? After all, they are playing great so they deserve some ranking boost.

I think Elo hell actually is real...sort of. Let's start by looking at the part of it that just can't be real though. If you are actually much better than your rank, then in a 6v6 team game you'd expect on average to have 5 "bozos" on your team, which is one less than the 6 bozos on the opposing team. So...just play enough games and your ranking will climb. Surely you are providing an advantage to your team in getting wins, because that's the very definition of what you being good means. If you find you can't get over 50% win rate, it sounds like you are actually as bad as the other bozos?

Mathematically, that makes sense. But let's look at this in the form of a story to truly understand it. You are playing as Reaper on a payload map. You decide to teleport behind enemy lines, then sneak up on various players. You're trying to flank them, catch them unaware, and get in kills. The more successful you are at this, the easier you're making it for the rest of your team. You aren't at the objective here because it's not your job. Your job is to make it so your teammates at the objective have a really easy job.

You get 3 kills in quick succession and you don't die. How are you doing? I think you're doing incredibly well. Your plan makes sense and your contribution is very large. If you had killed just one player, you might have pulled your weight at least, but by killing 3 there are now only 3 players left on the enemy team. Surely because of this, your team now has control of the payload.

You look at the payload indicator UI, expecting to see three arrows from your team pushing it forward. Instead, you see one arrow of the opposing team pushing it back. You wonder how this is even possible, so you go to the payload. What you see is a single enemy Reinhardt standing on the payload, totally unopposed, with no other players on either team in sight. Welcome to Elo hell.

You end up losing this game. The situation described is pretty unreasonable though. Your stupid teammates should have capitalized on the advantage you gave them and taken the payload. Instead they chased down butterflies or whatever and failed to get any real value out of your contribution. It's actually quite easy to imagine situations like this happening over and over such that even though you do amazing stuff, you still only in around 50-50. So in this sense, yeah Elo hell is real.

I think there's more to the story though. If we try to address this by rewarding you for your good individual performance and to get you to your "rightful" rank, we run into a couple problems. As stated earlier, if we reward you for number of kills, or K:D ratio, or damage done, we also introduce warped incentives. Now your incentive is something OTHER than just winning. Now you're fighting with teammates for kills, etc. So even if we wanted to help you out here, it's dangerous to do so.

But even beyond that, SHOULD we help you out? If we do, the result is that you are going to gain rank for doing things that...didn't help your team win? Yeah it SHOULD have helped your team win, but it didn't. It's a bit weird that you'll then keep playing the same, keep not actually making your team win (even though it's their fault), and we reward you.

Here's the real truth about this Elo hell stuff I think. The example Reaper situation above really is good play, it really is something that should help the team win...if you were a higher rank. The higher rank all the players involved, the more easily your teammates can convert advantage you provide into a win. If your teammates are so bad that they can't convert the advantage you gave into a win, then you should do some completely different things. Yeah it sucks that the thing you did SHOULD help, but in truth, it didn't. Work with what you have. Work with your generally uncoordinated or lower-skilled teammates and provide them whatever they actually DO need to win.

In Overwatch, I think what players generally need in these situations is "babysitting." What I mean is, it's probably more important to have few deaths and to generally be on the payload than it is to achieve impressive stats that "in theory" allow your teammates to be on the payload.  You have to carry them, so you'll have to refrain from strategies that, at higher rank, are very good, so that you can provide for the most basic needs of your team. You don't have to do that in the exact way I said, but the point is if you play in the (sometimes pathetic) way that your team needs, you can contribute more to your team's win rate than if you play in an incredibly impressive way that they are unable to capitalize on, because they suck. Yeah that's frustrating, but THAT is the way out of Elo hell. Having the system give you a ranking boost for strategies that aren't resulting in a positive win rate isn't a good solution.

I don't now the reason Blizzard chose to have individual stats count towards rank (or even 100% that they do, but they sure seem to). I'd advise against it though.