Regret Matching Demo¶
A demo for the regret-matching algorithm (Hart and Mas-Colell 2000) for various N-player normal form games.
For 2-player zero-sum game, regret matching algorithm’s average strategy provably converges to Nash. However, it seems to work for more than 2-player games as well.
- Usage:
Run:
$ python3 imperfecto/demos/regret_matching_demo.py --help
to print the available options.
- imperfecto.demos.regret_matching_demo.generate_random_prob_dist(n_actions)[source]¶
Generate a random probability distribution for a game.
- imperfecto.demos.regret_matching_demo.verify_nash_strategy(Game, nash_strategy, n_iters=10000, n_random_strategies=5)[source]¶
Verifies (roughly) that the given strategy is a Nash equilibrium. The idea of Nash strategy is only pplicable for 2-player (normal form). zero-sum games. We verify Nash strategy by pitting the strategy against random opponent’s strategy. The Nash strategy should be unexploitable (i.e., having the payoff >= 0).
- Parameters
Game (
Type
[ExtensiveFormGame
]) – The game to verify the strategy for.nash_strategy (
ndarray
) – The strategy to verify.n_iters (
int
) – The number of iterations to run the game for.
- Return type
- imperfecto.demos.regret_matching_demo.to_train_regret_matching(Game, n_iters=10000)[source]¶
Train all players simultaneously by the regret-matching algorithm and print the average strategies and payoffs.
- Parameters
Game (
Type
[ExtensiveFormGame
]) – The game to train the players for.n_iters (
int
) – The number of iterations to run the game for.
- Return type
- imperfecto.demos.regret_matching_demo.to_train_delay_regret_matching(Game, n_iters=10000, freeze_duration=10)[source]¶
Train all players by the regret-matching algorithm and print the average strategies and payoffs. We alternatively freeze one player’s strategy and train the other player(s). This is a process of co-evolution.
- Parameters
Game (
Type
[ExtensiveFormGame
]) – The game to train the players for.n_iters (
int
) – The number of iterations to run the game for.freeze_duration (
int
) – The number of iterations to freeze the strategy of the player that is not being trained.