Counterfactual Regret Minimization¶

The implementation of Counterfactual regret minimization (CFR) algorithm. (Zinkevich et al. “Regret minimization in games with incomplete information” 2008)

CFR is a self-play algorithm that assumes that the joint strategy of all players in the game is public knowledge. The algorithm iteratively updates the regret of each player’s action at each information set. The average strategy of each player in CFR provably converges to a Nash equilibrium.

Counterfactual regrets are calculated by searching down the game tree, taking into account the reach probabilities.

Reach probabilities are factored in since the nodes in one’s player infostate might belong to different infostates of their opponent. Therefore, not every node (history) in the infostate is equally likely to be reached, so the reach probs should be taken into account to skew the action distribution at that infostate appropriately.

The cfr formulation looks similar to the advantage function in RL Q-learning.

class imperfecto.algos.cfr.CounterFactualRegretMinimizerPlayer(name, player_id)[source]¶

Bases: imperfecto.algos.player.Player

CounterFactualRegretMinimizerPlayer.

Parameters

name (str) – name of the player.
player_id (int) – id of the player.

player_id¶

the player id of this player in the game.

Type: int

cum_regrets¶

map from infostate to a regret vector of size n_actions.

Type: dict

strategy_sum¶

map from infostate to the cumulative strategy at that infostate until now. Useful for calculating the average strategy over many iters.

Type: dict

strategy¶

map from instostate to np.array of shape (n_actions,).

Type: dict

update_strategy(history, player_id)[source]¶

Update the strategy of the player at the end of the game.

Parameters

history (Sequence[IntEnum]) – The history of the game, which must be a terminal node.
player_id (int) – The id of the player to update (i.e., my id).

get_avg_strategies()[source]¶

Returns the average strategy of the player at each infostate.

Return type: dict
Returns: map from infostate to the average strategy at that infostate.

class imperfecto.algos.cfr.counterfactualRegretMinimizerTrainer(Game, players, n_iters=100)[source]¶

Bases: object

CFR with chance-sampling.

CFR (Zinkevich et al. “Regret minimization in games with incomplete information” 2008) is a self-play algorithm that assumes that the joint strategy of all players in the game is public knowledge. The algorithm iteratively updates the regret of each player’s action at each information set. The average strategy of each player in CFR provably converges to a Nash equilibrium.

Parameters

Game (Type[ExtensiveFormGame]) – the game class to be trained.
players (Sequence[CounterFactualRegretMinimizerPlayer]) – the players in the game, each of type CounterFactualRegretMinimizerPlayer.
n_iters (int) – number of iterations to run the algorithm.

game¶

the game to train on.

Type: ExtensiveFormGame

n_iters¶

the number of iterations to run CFR for.

Type: int

cfr(history, reach_probs)[source]¶

Counterfactual regret minimization update step at the current node.

CFR is quite similar to advantage function in classic Q-learning. For example,

node_utils ~ V(s)
counterfactual_values ~ Q(s, a)
regret ~ advantage function: Q(s, a) - V(s)

Parameters

history (List[IntEnum]) – list of actions taken so far (current node)
reach_probs (ndarray) – array of shape game.n_players. The reach probabilities for the current infostate (current node) played by the players’ joint strategy.

Return type

ndarray

Returns

the utility (expected payoff) of the current node for each player.

train()[source]¶

Train the game using CFR for n_iters iterations.

Return type: None