Regret Matching¶
A class of regret matching player for N-player normal form games. See the paper “a simple adaptive procedure leading to correlated equilibrium” (2000) by Sergiu Hart and Andreu Mas-Colell.
The algorithm works by maintaining a cumulative regret vector for each action. The current regret is the difference between the counterfactual reward and the reward of the actual action taken.
The strategy is action distribution proportional to the positive entries of the cumulative regret vector.
- class imperfecto.algos.regret_matching.RegretMatchingPlayer(name, n_actions)[source]¶
Bases:
imperfecto.algos.player.NormalFormPlayer
Regret Matching (Hart and Mas-Colell 2000) Player works by maintaining a cumulative regret vector for each action.
The current regret is the difference between the counterfactual reward and the reward of the actual action taken.
The policy is action distribution proportional to the positive entries of the cumulative regret vector.
- cum_regrets¶
cumulative regret vector of shape (n_actions,).
- Type
np.array