Regret Matching¶

A class of regret matching player for N-player normal form games. See the paper “a simple adaptive procedure leading to correlated equilibrium” (2000) by Sergiu Hart and Andreu Mas-Colell.

The algorithm works by maintaining a cumulative regret vector for each action. The current regret is the difference between the counterfactual reward and the reward of the actual action taken.

The strategy is action distribution proportional to the positive entries of the cumulative regret vector.

class imperfecto.algos.regret_matching.RegretMatchingPlayer(name, n_actions)[source]¶

Bases: imperfecto.algos.player.NormalFormPlayer

Regret Matching (Hart and Mas-Colell 2000) Player works by maintaining a cumulative regret vector for each action.

The current regret is the difference between the counterfactual reward and the reward of the actual action taken.

The policy is action distribution proportional to the positive entries of the cumulative regret vector.

Parameters

name (str) – name of the player.
n_actions (int) – number of actions.

name¶

name of the player.

Type: str

n_actions¶

number of actions.

Type: int

cum_regrets¶

cumulative regret vector of shape (n_actions,).

Type: np.array

static regret_matching_strategy(regrets)[source]¶

Return the regret matching policy.

Parameters: regrets (ndarray) – cumulative regrets vector.
Return type: ndarray
Returns: action distribution according to regret-matching policy.

update_strategy(history, player_id)[source]¶

Update the cumulative regret vector. This will update the strategy of the player as the strategy is computed from the cumulative regret vector.

Parameters

history (List[IntEnum]) – history of the game.
player_id (int) – player id.

Return type

None