To celebrate the release of Yomi on iPad, I'll tell you some stories about balancing Yomi. First I'll give you two myths about game balance, then tell you about tier lists and matchup charts, and then a bunch of specific balance problems we had to solve.
Game Balance Myth 1: If it's too well balanced, it's boring.
I understand where this one comes from. Game balance is really hard, so if you had a cast of characters (or RTS races, or card decks, or whatever) and some of them were too good vs other ones, what should you do? The easiest thing is to smooth out anything one that has that's too different. Make things more and more homogeneous until it's fair. Yeah that's one approach, but it makes things boring. The harder way is to try to preserve as much asymmetry as possible AND to make it fair. When we do things the hard way, the good way, it doesn't make things boring. Furthermore, balanced just means the matchup is fair. It doesn't say anything about the dynamics of how interesting it is. A balanced game could be boring or interesting.
Game Balance Myth 2: Sirlin only cares about balance.
From the outside, I can see why someone would think that because I work on games that require a lot of balance work. But the testers who work with me would laugh at this. I'm the one always pushing back on balance changes because other things are more important: good flavor (mechanics expressing the right personality), good dynamics, and elegance. I want fewer words, fewer elements, things to be as simple as we can get away with, and for characters to feel right. If you allow balance to rank higher than those things, you get a terrible feeling game. If you make only balance changes that respect all those constraints, it's hard work, but you can still have a balanced game.
At first, I think it's best to get tier lists from testers. That where they put all the characters in a few tiers (groups) to say which characters are all pretty much tied for strongest, which are tied for next strongest, etc. The goal isn't to eliminate tiers, because even you had a 100.00% perfectly magically balanced game, testers would still say there are tiers because of their imperfect perceptions, and that's fine. Tiers help you get a sense of what's going on with balance though.
A helpful format is:
God Tier (S rank). Any character here is brokenly good, above the maximum level that should be allowed, and obsoletes the other characters.
Top Tier (A rank). The group of strongest characters. Being here doesn't mean there's any problem.
Mid Tier (B rank). These characters are noticeably weaker than the top tier, but still very useable.
Bottom Tier (C rank). These characters are noticeably weaker than the mid tier. They are still useable.
Garbage Tier (F rank). Any character here is too weak to bother with. Something really went wrong and they need a boost to become a real part of the game again.
Players are going to disagree and argue, but there will also be some low-hanging fruit here. Even if everyone is arguing about whether CharacterX is high or mid, they might pretty much all agree that CharacterY is garbage or CharacterZ is God tier. The first thing to fix here is to nerf anything in God tier (since even a single thing there ruins the game). The next thing is to buff anything in garbage tier. After that, try to compress the tiers so that being a tier below only means you're barely worse, not like hugely worse.
The next level of zooming in on balance is a matchup chart. That's where you create a grid of every character vs every character and then give a rating to how difficult the matchup is. The notation is stuff like 6-4 or 7-3 which means if two experts played 10 games, we expect the expert using CharacterA to win 6 (and opponent using CharacterX wins 4), for example.
It's actually best not to use numerical data to determine these numbers. Yes, really. It's faster and more accurate to get to the bottom of things by relying on expert opinions, and then having those experts argue, and then play each other to sort out disagreements. Think of matchup chart numbers as a kind of shorthand for this:
10-0. Not possible to lose when you play how you should, which you can always do.
9-1. Horrifically bad matchup. Impossible to lose unless something very unlucky happens.
8-2. Really hard for the other player. Multiple "miracles" required each game for the disadvantaged player to win.
7-3. Very hard for the other player. Clear disadvantage for them, but they can still win.
6-4. Somewhat advantage for you. Pretty close overall.
5.5-4.5. Very close match, but you can slightly detect an advantage.
5-5. No advantage to either character.
I want to emphasize just how important it is to get expert opinions on this, rather than adding up numbers from matches. Experts can get a good sense of what's going on in a match much, much sooner than data will reflect. I mean like months or years sooner, even. Imagine two experts played a certain matchup 20 times and the more they played, the more unfair it got. In our example, there is a certain way of playing that the other character just can't deal with and both players are coming to realize that truth more and more. It's entirely possible that they (correctly!) declare it an 8-2 matchup even though their results are no where near that bad. Lots of their games were before they fully understood what's going on. And if we lump in the data from anyone other than experts, it's likely to be worse than ignoring it because they probably aren't playing the match well enough.
With 20 characters, that's 210 matchups (190 non-mirror matchups) so if every non-mirror matchup was played 20 times, that's 3,800 games. Wow is that a lot to even do a first pass with the numerical method. And you get extremely bad data if you do. Let's say a matchup is really 5-5 and you're lucky enough to have found two expert players of equal skill. The chance that result will be 10 games to 10 is just 18%. Finding catastrophically wrong results (the chance of a player winning 14 games or more, indicating a 7-3 MU or worse) is 12%. You're really better off just asking the experts, letting them argue, and letting them sort it out by playtesting, and that's what we do.
Here's Yomi's matchup chart as of today. Of course it slightly changes as players gain more and more understanding, but it's fairly stable:
To put it into perspective,