A multi-dimensional extension of the Elo rating system

Roughan, MatthewNguyen, GiangKalenkova, AnnaHamilton, Adam Hugh2024-08-162024-08-162024https://hdl.handle.net/2440/141925The mElo rating system is a multi-dimensional generalisation of the Elo rating system. Both are statistical models, equipped with asynchronous, online algorithms for estimating the probability that one player can beat another in a game. The Elo rating system is a very popular method of ranking individual players or teams in games such as chess, tennis, or soccer. The mElo rating system was introduced in (Balduzzi et al. 2018) to extend the Elo rating system to more complex situations. The Elo rating assumes that a player’s skill can be quantified by a single number which implies that skills in a game is strongly transitive. This means that if Player A is likely to beat player B, and Player B is likely to beat Player C, then Player A must be even more likely to beat Player C. This is not always a realistic assumption. The mElo rating system allows us to model intransitivities that arise in player’s skills by assigning every player a vector of numbers rather than a single scalar. Very little is known about how the Elo rating system behaves when modelling intransitive interactions let alone how the mElo rating system operates at all. This has applications in machine learning where pairwise comparisons are becoming known as a key ingredient in understanding large language models (Dumoulin et al. 2024). In this thesis, we show that the symmetry groups associated with the Elo and mElo rating systems are different. This difference in symmetry groups can be used to show that algorithmically, the two rating systems differ considerably. There are certain conserved quantities in the Elo rating system that are not conserved with mElo. The effect of this is that the Elo ratings will always approach a single solution whereas mElo ratings will approach one of infinitely many solutions, and we don’t know which solution the mElo algorithm will pick. Some of these solutions can be arbitrarily large leading to issues with convergence. We prove the main result of our thesis, that the mElo update algorithm will return a solution of near minimal size. We also use the symmetry group of mElo ratings to work out whether the problem of optimising the log-likelihood function is under-constrained. Due to the differences in the symmetry group of the Elo rating system and the mElo rating system, determining if the log-likelihood has a unique optimum is trivial in the Elo rating system but NP-hard in the mElo rating system. Fortunately, there exist probabilistic methods that can solve this problem in polynomial time with asymptotic probability one. Finally, we are able to show that, whilst very much usable, the mElo rating system is not as convenient as the Elo rating system. More generally, due to the nature of the symmetry groups of the two rating systems the mElo rating system will never be as convenient as Elo. We define a convenient rating system in terms of a list of eight conditions that make the Elo rating system practical. We are able to prove that even though the mElo rating system cannot satisfy these eight conditions it is still usable under a set of realistic assumptions and in practice, can estimate the odds of victory even in situations with intransitivity. This is useful in item response theory where one may want to schedule matches between opponents of equal ability, determine whether a game primarily relies on luck or skill, evaluate the performance of machine learning algorithms, or estimate the payoff matrix of a two player zero sum game.enpairwise comparisonsmatrix completionrandomized algorithmsElo ratingmultidimensionalelogradient descentA multi-dimensional extension of the Elo rating systemThesis