Regret Matching

A class of regret matching player for N-player normal form games. See the paper “a simple adaptive procedure leading to correlated equilibrium” (2000) by Sergiu Hart and Andreu Mas-Colell.

The algorithm works by maintaining a cumulative regret vector for each action. The current regret is the difference between the counterfactual reward and the reward of the actual action taken.

The strategy is action distribution proportional to the positive entries of the cumulative regret vector.

class imperfecto.algos.regret_matching.RegretMatchingPlayer(name, n_actions)[source]

Bases: imperfecto.algos.player.NormalFormPlayer

Regret Matching (Hart and Mas-Colell 2000) Player works by maintaining a cumulative regret vector for each action.

The current regret is the difference between the counterfactual reward and the reward of the actual action taken.

The policy is action distribution proportional to the positive entries of the cumulative regret vector.

Parameters
  • name (str) – name of the player.

  • n_actions (int) – number of actions.

name

name of the player.

Type

str

n_actions

number of actions.

Type

int

cum_regrets

cumulative regret vector of shape (n_actions,).

Type

np.array

static regret_matching_strategy(regrets)[source]

Return the regret matching policy.

Parameters

regrets (ndarray) – cumulative regrets vector.

Return type

ndarray

Returns

action distribution according to regret-matching policy.

update_strategy(history, player_id)[source]

Update the cumulative regret vector. This will update the strategy of the player as the strategy is computed from the cumulative regret vector.

Parameters
  • history (List[IntEnum]) – history of the game.

  • player_id (int) – player id.

Return type

None