apologies.reward

Reward calculations.

Version 1 of the reward algorithm was developed by hand following my own mental model for the strength of a position. I used the spreadsheet found in notes/reward.xlsx to refine my thoughts as I developed the algorithm.

Basically, a pawn is worth more the closer it is to home. It’s worth incrementally more in the safe zone (where it can’t be hurt). There’s an additional bonus for winning, to incentivize the engine to pick a move that ends the game as fast as possible. The score is calculated by comparing the player’s position relative to the positions of all its opponents. We want the engine to pick the move that both maximizes the player’s position and also minimizes the positions of its opponents.

In simulation runs generated by simulation.py, a reward-based character source vastly out performs a source that picks its moves randomly. The worst-case scenario is a 4-player STANDARD mode game between a single reward-based source and 3 random sources, where the reward-based source wins about 70% of the time. This is probably because in a STANDARD mode game, the possible moves in each turn are fairly limited, due to each player picking and playing the top card off the deck. This evens the playing field, because it’s quite likely that any player will have no good move on their turn. In a 4-player ADULT mode game, where the engine has the opportunity to choose between more possible moves for each turn, a reward-based source wins more than 98% of the time against 3 random sources.

Module Contents

class apologies.reward.RewardCalculator

Bases: abc.ABC

Abstract reward calculator interface, to support multiple reward implementations.

abstract calculate(view: apologies.game.PlayerView) float

Calculate the reward associated with a player view.

abstract range(players: int) Tuple[float, float]

Return the range of possible rewards for a game.

class apologies.reward.RewardCalculatorV1

Bases: RewardCalculator

Version 1 of the reward calculator.

calculate(view: apologies.game.PlayerView) float

Calculate the reward associated with an observation.

range(players: int) Tuple[float, float]

Return the range of possible rewards for a game.