# How to Catch a Chess Cheater

## The Power and Limitations of Statistics

*This article is a continuation of **“Once a Cheat, Always a Cheat?”,** my previous article.*

As the chess drama revolving around world chess champion grandmaster Magnus Carlsen’s cheating allegations against grandmaster Hans Niemann continues to invoke controversy from chess fans, the International Chess Federation, otherwise known as FIDE, has announced that it will form an Investigative Panel (IP) to look into the matter. It is expected that the IP will take a considerable amount of time to conduct an in-depth analysis and ultimately carry out well-justified follow-up actions.

However, this does not stop everyday wannabe detectives from conducting their own investigation and drawing conclusions about the matter. Most of them utilise mathematics, in particular statistics, alongside some programming, to justify their claims. Some of these analyses are certainly very interesting, and worth discussing. Let’s look at what are the various strategies used, what their strengths and weaknesses are, and maybe use bits of information from these sources to form our own conclusion.

# Professor Kenneth W. Regan’s Analysis

Regan is a professor at the Department of Computer Science and Engineering at the University of Buffalo. He is also an avid chess player, achieving the international master (IM) title back in 1981. These unique combination of skill sets allowed him to design an algorithm to catch chess cheaters. Now, he works with FIDE, the Association of Chess Professionals, and other chess tournaments to detect cheating during live chess games.

During a recent interview focusing on the current chess cheating scandal, Regan roughly explained how his current algorithm works, without going into much detail (interestingly, the people who design these anti-cheat algorithms usually do not thoroughly explain how it works, maybe so as to prevent cheaters from working around them).

The program calculates an index known as the Raw Outlier Index (ROI), which measures the accuracy of each move, or the amount of error made for each move, scaled in a way such that positions where the move are obvious hold less weight in the index. The resultant score is then benchmarked against players of a similar Elo rating. Resultantly, the ROI should follow a normal distribution curve, with the mean set at 50, and the standard deviation, represented by σ, at 5. Hence, if a player plays at a level where the ROI>60 (z-score>2), there are grounds for suspicion, and if a player produces an ROI>63.75 (z-score>2.75), this piece of evidence can contribute towards the judicial decision. However, do note that a one-off high ROI is entirely possible due to chance, hence it is better to look at a distribution of a player’s ROIs, which gives a more holistic viewpoint of a player’s performance.

Analysing Niemann’s over-the-board (OTB) games for the last two years, Regan found that the average ROIs for the OTB tournaments Niemann played has a median of 51.4, which is considered normal. He found that there is insufficient evidence to accuse Niemann of cheating in his OTB games.

## Discussion

In his interview with chess journalist Albert Silver, Regan described Niemann’s ROI distribution over the last two years as normal, even though it seems as if he just eyeballed the values and made a conclusion based on his own intuitions. However, if you plot the ROI values on a histogram, you can see that the plot does not resemble a normal distribution, and is instead more skewed towards the extremes.

Since the ROI is based on the player’s rating, the skewness towards the extremes implies that there are many tournaments where Hans played a lot better than his rating, and many tournaments where he played worse. Given the assumption that he cheated for some tournaments, this plot would explain it, as his typical level of play is lower than his cheating-inflated rating, while his level of play is higher when he cheats during those games/tournaments.

# Fide Master (FM) Yosha Iglesias’ Analysis

Soon after Regan’s interview had been released, FM Yosha Iglesias released a video discussing some data from her friend only known by the alias ‘gambit-man’. The data is derived from the ChessBase ‘Let’s Check’ analysis, which calculates the percentage of moves played by a player which corresponds with the top three moves provided by the chess engine. A score of 100% would mean that every move a player makes is one of the top three moves a chess engine recommends.

By looking at gambit-man’s collection of data, Iglesias found that Niemann has played ten ‘100% games’ recently, while other chess grandmasters typically only have one or two of such perfect games in their career.

She also observed that Niemann has very high ROIs (from Regan’s data) for five tournaments in a row, and used the values to calculate the overall probability for getting such a good result. The cumulative odds amount to 1 in 76609.

## Discussion

The main concern with Iglesias’ argument is the lack of statistical methods to back up her claims. It is implied that playing a perfect game according to Let’s Check is difficult and very rare, but Iglesias fails to quantify its occurrence in OTB chess, and hence lacks a point of reference to compare Niemann’s data with. An alternative explanation for Niemann’s great number of perfect game could be that he is a highly tricky and tactical player, commonly opting for lines with obvious moves. Its something like a chess puzzle: either you get it right, or you get it really wrong.

Another minor criticism is ChessBase’s Let’s Check analysis and its choice in chess engine. It is not immediately clear which engine Let’s Check uses, or at what depth. This is important as different engines might prefer different moves. Even variances in the time given for the engine to compute a particular chess position would result in different recommended moves.

Lastly, the way Iglesias interpreted a portion of Regan’s data is extremely misleading, because of the way the sampling is done. She selected data points which already agrees with her position on the matter, and hence the results are expectedly very biased.

# Rafael Leite’s Analysis

Rafael Leite is an electrical and computer engineer from The University of São Paulo, Brazil. He wrote a program to collect data about a players’ centipawn loss, which quantifies the imperfection of a particular move, when compared to the top engine move. These quantities are then summated for the entire chess game.

Leite hypothesised that as player ratings increase, the average centipawn loss (ACPL) decreases due to the player being better at playing the right moves. He also hypothesised that the standard deviation for centipawn loss (STDCPL) also decreases as rating increases, as the player should become more consistent in terms of quality of moves. He first shows that the plots of ACPL and STDCPL against rating for numerable notable grandmasters has a negative slope, and a Pearson correlation coefficient of almost -1 (describing a linear negative correlation). He then compares it with Niemann’s plots, which has correlation coefficients of -0.95 and -0.94 respectively (normal) before year 2018, and correlation coefficients of -0.53 and 0.06 respectively for years after 2018. This means that as compared to other grandmasters, Niemann seems to be improving at a dissimilar pace, and has not at all been consistent in his performance, insinuating that he might have gotten engine assistance during some of the games.

## Discussion

There are a few issues with Leite’s data interpretation and presentation. First of all, I understood the rationale behind splitting the data into two sets, before 2018 and after 2018, as the current suspicion on Niemann is that he cheated during the recent years. However, Leite failed to consider interpretating his controls in a similar way.

Secondly, notice that the scales he used differs from person to person, in order to fit the line nicely in the scale. This could be misleading as people generally assume for scales to be equivalent when comparing different variables. Typically, the correlation coefficient tells us nothing about the magnitude of the slope, as different slopes can have the same correlation coefficient. Hence, the scale of the plot must be consistent to give readers an idea of how a player’s ACPL and STDCPL changes with rating.

# The Ideal Algorithm?

Cheating in chess has always been a significant problem in the chess world, regardless of rating, presence of prize funds, or the platform where it is played. This means that people have been trying, for a long time, to find solutions to this problem.

The first step to mitigating this issue is by prevention. The security of chess tournaments have improved drastically over the years, with the implementation of cutting-edge technology to detect communication devices, as well as broadcast delays to hamper communication with external parties. However, this does not mean that it is impossible to cheat. In fact, technological advancements have also made it easier to cheat. Electronic devices are becoming smaller over the years, which makes detection difficult.

This leads to the following questions: Can we harness the power of data and statistics to catch cheaters? Can we program an algorithm that informs us with absolute certainty whether a player has cheated in a game, or in a series of games? How can we design the ideal algorithm?

For me, the answer is no. Every algorithm has certain loopholes which can be exploited, especially if the cheater is being smart about the cheating. According to many top chess grandmasters, every game has only one or two critical moments, where played correctly, will easily define the rest of the game. Understanding when these positions arise is also not particularly difficult for the top players. This makes catching cheaters with mathematics very tricky, as the best players will only cheat on one instance per game, or per many games.

# How to Catch a Chess Cheater

The point I want to make in this article is the fact that statistics can be very effective when used correctly, but for complex situations like this one, it cannot be the sole determinant of the verdict.

Instead, the discussion and finalisation of the verdict is a very human process, just like most judicial systems from around the world. We try our derive what we can from data and evidences, but they are often insufficient. This is especially true for this particular context, where statistical techniques produces results in terms of z-scores and p-values. Ultimately, however statistically improbable the outcome was, it was technically still a possibility.

Some might argue that us humans use statistics because of our naturally poor judgement of data, but it is at times like this where our judgement must be relied on, and must shine through. The cheating scandal has ignited great debate and discussion from chess fans all around the world, which is, in my view, a good thing. Moving forward, it is this debate and discussion which will allow different sides of the story to be heard and a well-justified verdict to be made, hopefully bringing the chess world to a brighter tomorrow.