demos package

Submodules

demos.cfr_demo module

A demo for the Counterfactual regret minimization (CFR) algorithm ((Zinkevich et al. “Regret minimization in games with incomplete information” 2008) for various extensive-form game.

Usage:

Run:

$ python3 imperfecto/demos/cfr_demo.py --help

to print the available options.

demos.regret_matching_demo module

A demo for the regret-matching algorithm (Hart and Mas-Colell 2000) for various N-player normal form games.

For 2-player zero-sum game, regret matching algorithm’s average strategy provably converges to Nash. However, it seems to work for more than 2-player games as well.

Usage:

Run:

$ python3 imperfecto/demos/regret_matching_demo.py --help

to print the available options.

demos.regret_matching_demo.generate_random_prob_dist(n_actions)[source]

Generate a random probability distribution for a game.

Parameters

n_actions (int) – The number of actions in the game.

Return type

ndarray

Returns

A numpy array of shape (n_actions,).

demos.regret_matching_demo.verify_nash_strategy(Game, nash_strategy, n_iters=10000, n_random_strategies=5)[source]

Verifies (roughly) that the given strategy is a Nash equilibrium. The idea of Nash strategy is only pplicable for 2-player (normal form). zero-sum games. We verify Nash strategy by pitting the strategy against random opponent’s strategy. The Nash strategy should be unexploitable (i.e., having the payoff >= 0).

Parameters
  • Game (Type[ExtensiveFormGame]) – The game to verify the strategy for.

  • nash_strategy (ndarray) – The strategy to verify.

  • n_iters (int) – The number of iterations to run the game for.

Return type

None

demos.regret_matching_demo.to_train_regret_matching(Game, n_iters=10000)[source]

Train all players simultaneously by the regret-matching algorithm and print the average strategies and payoffs.

Parameters
  • Game (Type[ExtensiveFormGame]) – The game to train the players for.

  • n_iters (int) – The number of iterations to run the game for.

Return type

None

demos.regret_matching_demo.to_train_delay_regret_matching(Game, n_iters=10000, freeze_duration=10)[source]

Train all players by the regret-matching algorithm and print the average strategies and payoffs. We alternatively freeze one player’s strategy and train the other player(s). This is a process of co-evolution.

Parameters
  • Game (Type[ExtensiveFormGame]) – The game to train the players for.

  • n_iters (int) – The number of iterations to run the game for.

  • freeze_duration (int) – The number of iterations to freeze the strategy of the player that is not being trained.

Module contents