AlphaZero defeats the best engines in chess, shogi and Go
Not so long ago AlphaGo Zero celebrated the triumphs in the game Go, and now DeepMind presented a new version called AlphaZero, which is more generic, because it faced the strongest engines in games such as:
- chess: the Stockfish 8 engine
- shogi (Japanese chess variant): Elmo engine
- and again Go: AlphaGo Zero (ie earlier version, learned 3 days)
AlphaZero knew only the rules of a given game and learned based on the analysis of games with oneself. Overcoming the existing engines took place:
- after 4h for chess (300,000 iterations)
- after 2 hours for shogi (100,000 iterations)
- after 8h for Go (165,000 iterations)
After the learning process, the tournament was played. The results are: (W - wins, D - draws, P - loses):
Game | White | Black | W | D | L |
---|---|---|---|---|---|
Szachy | AlphaZero | Stockfish | 25 | 25 | 0 |
Szachy | Stockfish | AlphaZero | 3 | 47 | 0 |
Shogi | AlphaZero | Elmo | 43 | 2 | 5 |
Shogi | Elmo | AlphaZero | 47 | 0 | 3 |
Go | AlphaZero | AG0 | 31 | - | 19 |
Go | AG0 | AlphaZero | 29 | - | 21 |
As we can see in the game of chess AlphaZero did not record a single defeat (28 wins and 72 draws). In shogi it was 90 wins, 2 draws and 8 defeats. The most even situation was in the case of the game in Go, which ended with the result of 60 wins and 40 defeats.
The AlphaZero algorithm is based on a deep neural network. It is running on 4 TPU. The algorithm settings, network structure and hyper-parameters were the same for each game.
There were comments from chess players that AlphaZero plays like a madman, making moves that would not come to anyone's head in a given position. However, as it turned out later, they always had their justification.
To sum up:
- the program used a neural network with the same structure and the same parameters for each game, it is another step towards creating a universal algorithm
- it took only a few hours of learning from self-play to surpass the strongest chess engines (which were developed for years)
- hardware requirements limited to 4 TPUs
- extension to further application areas is only a matter of time