Shadow Era Rating System Explanation

Shadow Era Rating System Explanation by shannong

Player complaints about the rating system used in Shadow Era periodically arise in the forums, so it seems useful to summarize in one place a “plain English” explanation of how the rating system works.

Part 1 – Plain English summary of how the system works (without too much math)

The system is very similar to the well-proven TrueSkill system used in environments such as Xbox Live games. It’s essentially a modification of the Glicko rating system, which was an improvement on the ELO rating system used by most chess federations and also many tournament CCG environments such as for Magic the Gathering. Use Wikipedia and Google if you want to know the gory details for these systems.

The system is designed to create a “normal distribution” (aka, a symmetrical “bell curve” with a very specific shape) of ratings throughout the entire Shadow Era player population. This means that not everyone can be an equally high-rated player. Essentially, only a small percentage of the total population can be “at the top”. As the population grows (or shrinks), the “at the top” set might contain more players, but the total RATIO of “top players” to total population should remain essentially the same.

This means two very important things:

  • Your rating is NOT a “score”.
  • Your rating growth is not infinitely positive: the higher your rating gets, the slower it continues to move forward with each “win”, even against a higher-rated opponent, and the more your rating will “fall” when you lose. The closer you get to the right side of the “bell curve”, the more pressure the system exerts to slow down your rightward movement and actively try to push you back towards the mean rating of the entire population (a rating of roughly 250).

The system is currently designed to have a rating range of 0-500, and it’s possible for the players at the very top to “push past” the 500 mark to a small degree, but not by much, because the more you approach and pass the 500 mark, the greater the pressure to push you back towards the mean.

What’s counterintuitive to many players who have a passing familiarity with the ELO system used in Chess, MtG, and so on is that in the ELO system, your rating always changes by a noticeable amount after every single match. The math behind the ELO system is greatly simplified and overlooks some real statistical constraints so that it can be easily calculated by hand with pen and paper. The ELO system was designed in the 1930s and its creator knew full well where its weaknesses were.

By contrast, in the TrueSkill system (which is a slight tweak on the Glicko system), your rating actually comprises two values, one of which is hidden. The visible value is your rating, known as the “mu” value in TrueSkill. The hidden value is in plain English a “confidence” value: how confident the system is about your current rating. The TrueSkill system calls this confidence value the “sigma” value, and the Glicko system calls this confidence value the “ratings deviation” or “RD”.

Now, the perception that your Shadow Era rating is doing “weird” or “wrong” or “counterintuitive” things, especially if you have a comparatively high rating above 400, is because every match compares both the mu and sigma values for yourself and your opponent, and the total result can seem “odd” compared to the straightforward simplicity of the ELO system. Some examples:

  • You have a rating of 450. You win 5 games in a row and your rating (mu) doesn’t move at all. You then lose one game and your rating (mu) drops by a noticeable amount. In the ELO system, you would still have earned some points for every win and lost some points for the loss.
  • You have a rating of 450. You win a game against a lower-ranked opponent and yet your rating actually FALLS by 1 point to 449! omgwtfbbq??!!?! In the ELO system there’s no way you would ever have lost points for a win.

All of these “odd” results make perfect sense within the TrueSkill/Glicko systems, and they are actually MORE accurate than the ELO system. I really want to drive that home: ELO might SEEM more intuitively “fair” and “accurate”, but in fact it is NOT either, and the creator of the system knew it. Back in the 30s it was just too messy to do the math to provide more accurate ratings. Nowadays we have computers to do all this messy math in milliseconds.

To understand why the two “odd” effects in the bullet list above can happen, let’s look at a Wikipedia quote about how the Glicko system works. Emphasis mine.

The Glicko rating system and the Glicko-2 rating system are chess rating systems similar to the Elo rating system: a method for assessing a player’s strength in games of skill such as chess. It was invented by Mark Glickman as an improvement of the Elo rating system. The main idea is the introduction of a measurement for the ratings reliability called RD for ratings deviation.

Both Glicko and Glicko-2 rating systems are under public domain and found implemented on game servers online (like Free Internet Chess Server, Chess.com and SchemingMind). The formulas used for the systems can be found on the Glicko website.

The RD measures the accuracy of a player’s rating. For example, a player with a rating of 1500 and an RD of 50 has a real strength between 1400 and 1600 with 95% confidence. Twice the RD is added and subtracted from their rating to calculate this range. After a game, the amount the rating changes depends on the RD: the change is smaller when the player’s RD is low (since their rating is already considered accurate), and also when their opponent’s RD is high (since the opponent’s true rating is not well known, so little information is being gained). The RD itself decreases after playing a game, but it will increase slowly over time of inactivity.

The bit that I highlighted is equally true in the TrueSkill system. If your rating is very high, you might win lots of games in a row but your rating might not increase even one point because of this effect, if, for example, all of your opponents were lower rated than you and/or had a very wide sigma value (RD value aka “confidence”). To grossly oversimplify, the system only gives you a higher rating when it DOES NOT EXPECT YOU TO WIN, and the higher-ranked you are, the MORE the system expects you to win. So you go for 7 straight wins, your rating does not budge because of this effect, and then you lose one match and your rating suddenly falls drastically. Why? Because the system ALWAYS lowers your rating if you were expected to win but you did not! These two behaviors are what I refer to as “increasing pressure to alway push high-rated players back towards the mean”. The closer you get to the right tail of the bell curve (the closer you get to 500 rating or higher), the stronger this pressure becomes.

As for the second bullet point where you WIN a game but your rating actually falls by 1 point? That can happen because every single game you play narrows the hidden sigma (“confidence”) value for you, which pushes your mu (rating) value a fractional amount in either direction. Because your real underlying mu (rating) value isn’t an integer, the small change might cause the real value to move in a direction where it gets rounded down to the next-lower integer. In plain English, it’s like the system wasn’t really sure whether your rating was 449 or 450. But after playing one more game, the system became slightly more sure that you’re probably 449. It didn’t matter whether you won or lost. If the system strongly EXPECTED you to win, because, say, the opponent was 150 points below your own rating, it wasn’t going to increase your mu (rating) value anyway. But because its “confidence” grew stronger, it still adjusted your mu rating slightly downward EVEN THOUGH YOU WON.

Part 2 – Rules of thumb about the rating system in this game

With the preceding section in mind, let’s summarize how your rating should behave in Shadow Era:

  1. Most players will clump around rank 250. The system will always exert pressure to push most players towards the mean (center peak) of the bell curve. Therefore, you can say that a player between 200 and 300 is “an average player”. 300 to 400 would be “above average”. 400 to 450 would be “very strong”, and if you manage to stay in the 450 to 500 (or above) range for any length of time, you would be a “top player”.
  2. Other than those broad ranges, you cannot say that a 430 player is “better” than a 410 player. Don’t even go there. It just doesn’t work that way.
  3. A new player starting at a 0 rating will very quickly be pushed towards the mean near 250, unless they’re a consistently weak or consistently strong player.
  4. An experienced player who grinds past 350 will see their rating growth slow down more and more even if they consistently win more than they lose. As you pass 400 and 450, this growth slows down to a trickle, and every loss will push you back very far. It takes an *incredible* ratio of wins to losses to grind your way into the top ratings above 450. The better you perform, the more you have to keep performing even better to stay in that top bracket. Too many losses and the system will quickly push you back out of the top and towards the mean again. Then you have to grind upwards the hard way all over again.
  5. If the matchmaking system is balanced and cannot be exploited to manually control your win-loss ratio, there should be a lot of “churn” in the top ratings. You might be lucky enough to get on the top 20 board, but don’t expect to stay there long unless you are both *incredibly* skilled AND ALSO pretty damn lucky. Which brings us to…
  6. Sheer, dumb luck (or exploits) plays a huge role in where you sit in the ratings. Even if you are truly the most amazingly skilled player in the entire population, no rating system can accurately cope with competitive systems in which luck plays a very large role as it does in any CCG like Shadow Era. If you have bad luck and your opponent has good luck, you will be pushed far down towards the mean if you’re near the top of the ratings. The only way you can get to the top and stay there for any considerable length of time in a luck-influenced game like Shadow Era is if you have figured out how to cheat the system to manually influence your win-loss ratio.

The net effect of all these points is that TrueSkill/Glicko-type systems are not meant to precisely rank players so that you can stand around and compare your epeen with each other. The primary purpose is to enable automated matchmaking systems to pair you with someone of more or less equal skill so that you will have a “fun” match rather than a lopsided match. And remember that what the system considers as “skill” is based not only on the rating value that you can see for both players, but also on that hidden “confidence” value (sigma). For example, if you’re 400 with a very low/narrow sigma, the system might consider a 250 player with a very high/wide sigma to potentially also be a 400 rank, so it will consider both of you to be “more or less equal skill” and might pair you up.

Part 3 – How luck works both against you AND for you in a rating system like this

At the time this guide was written, in the 1.24 environment, the “tribal knowledge” is that the matchmaking system will currently pair you with the first person within +-150 rating points of yourself who shows up after you flag yourself ready to play. I say “tribal knowledge” only because I cannot find a specific quote from Kyle to this effect. Regardless of whether this matchmaking range is true, quite a few players in the 400+ range have strongly argued to narrow that range to +-50 or +-75, and they blame the +-150 range for making it impossible to maintain their high rating and they lament that it’s “unfair!” and “it causes most players above 400 to just not play at all unless they see somebody 400 or higher in Waiting status”. (Because presumably that gives you a good chance of being matched up with that 400+ player.)

There are so many things wrong with this argument. And they all hinge around how the luck-influenced nature of the game influences your ratings. To recap from the previous two parts, all of these rating systems–ELO, Glicko, TrueSkill, etc.–are based on assumption that a “higher skilled” player will win more often than lose against a “lower skilled” player. If Shadow Era were 100% skill-based, like chess or go or shogi, then if you managed to grind your way to the top ranks out there at the far right of the bell curve, that act alone is strong proof that you are very highly skilled compared to the rest of the population. And because you are highly skilled, you would be expected to win nearly every single time against anyone else who was much lower ranked than you. You wouldn’t even feel any competition unless you played people very near to your own rank up in the upper echelons.

Most importantly, the system would never expect you to lose against somebody rated 150 points below you in a scale that goes from 0-500. Never. That’s why you’re dinged so badly when you *do* lose against somebody 150 points below you. This is the logic behind the currently high-rated players who argue that “if you just narrow the matchmaking range to +-50 points, you wouldn’t be scaring off all the high-level players, and we wouldn’t UNFAIRLY BE LOSING OUR RATINGS TO BLIND LUCK BY A WORSE PLAYER.”

Okay, here’s what’s wrong with that logic. It’s so simple that it’s real easy to miss. Watch close!

The very “bad luck” that “unfairly” makes you lose your high ratings IS EXACTLY the same “good luck” that helped you “fairly” earn those high ratings in the first place!

Read that again as many times as needed until the import sinks in.

Yes, that’s right. Every person who’s in the 400+ range begging Kyle is reduce the matchmaking range is effectively asking Kyle to change the rules that helped them get up there in the first place, in a manner that will now PROTECT them from the same symmetrical luck that helped them get there in the first place.

Because when you get right down to it, getting into the 400+ range is only partly due to your actual skill. The other part–a BIG part–is due to sheer, dumb LUCK. (Well, and in the current situation, also due to player exploits such as the ability to quit a game you were losing in a way that prevented the system from registering a loss for you.)

So you guys clamoring for a narrower matchmaking range? You didn’t mind the ADVANTAGE of luck helping you get up there in the first place, but now you cry that it’s “unfair” now that that same ADVANTAGE for lower-leveled players threatens to knock you out of your high ranking? Cry moar. You’re not being honest.

The bottom line is that the luck-influenced nature of this game (and all CCGs) doesn’t affect the success of the rating system or the shape of the normal distribution (assuming no exploits are possible). All that luck does is to make the movement of your rating more volatile than in a pure skill-based system, and it makes the “churn” in and out of the top ranks much faster. It makes any particular rating more “fuzzy” than in a pure skill-based system. It makes the relative skill levels of players only accurate within wide bands, making it impossible to say that a 430 is a “better” player than a 390.

But this overall effect of luck in the system IS THE SAME FOR EVERYBODY. It benefits everyone equally when they’re on their way up the ratings, and it hurts everyone equally when they are near the top and the pressure to push them back towards the mean keeps increasing as you near the top. In fact, using my “pressure” analogy, you could say that luck simply increases that pressure. But it increases it the same for everybody. There’s nothing “unfair” about it because it applies equally to everyone.

That said, there are perhaps good reasons for Kyle to implement his sliding range for matchmaking, making it narrower at first and then widening every few seconds, but those reasons have nothing to do with “fairness” or “protecting your rating from loss due to ‘bad luck'”. Instead, those reasons would center around making the matchmaking system attempt to find a more “balanced” match where possible. Regardless, I would strongly urge Kyle to reset the ratings for the entire population BEFORE implementing such a change, to make the playing field level again under the new rules. If Kyle implemented such a change without first resetting the ratings, all he would be doing is unfairly–yes, UNFAIRLY–protecting the ratings of the people who managed to get to to the top under a different set of rules that enabled more “good luck” to help them get into the top ratings in the first place.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *