AlphaStar is now grandmaster in real-time strategy game

AlphaStar is now grandmaster in real-time strategy game

3. November 2019 0 By Horst Buchwald

AlphaStar is now grandmaster in real-time strategy game

New York, 3.11.2019

Alpha Star, an AI agent of the Google subsidiary DeepMind, now calls himself Grand Master in real-time strategy games. She can now beat 99.8% of all human players in Blizzard Entertainment’s StarCraft II competition. The results will be published in a future research paper in Nature magazine.

DeepMind sees progress as further proof that universal reinforcement learning, the machine learning technique underlying AlphaStar’s training, can one day be used to train self-learning robots, self-propelled cars, and more advanced image and object recognition systems.

“The history of progress in artificial intelligence is marked by milestone achievements in games. Since computers have cracked Go, Chess and Poker, StarCraft has become the next big challenge in consensus,” said David Silver, a DeepMind principle researcher on the AlphaStar team, in a statement. “The complexity of the game is much greater than chess because players control hundreds of units; more complex than Go because there are 10^26 choices for each move; and players have less information about their opponents than in poker.

David Silver, a DeepMind research leader on the AlphaStar team, noted that the complexity of Starcraft II is much greater than chess “because players control hundreds of units” and “more complex than Go because there are 10^26 possible options for each move”.

As early as January, DeepMind announced that its AlphaStar system was able to play 10 games of the best pro players in a row during a pre-recorded session, but lost to pro player Grzegorz “MaNa” Komincz in a final game streamed live online. The company continuously improved the system between January and June, when it said it would start accepting invitations to play the best human players from around the world. The following games took place in July and August, says DeepMind.

The results were impressive: AlphaStar had become one of the most demanding Starcraft II players in the world, but still not quite superhuman. There are about 0.2 percent of players who are able to defeat them, but it’s largely a matter of time before the system improves to the point where it shatters every human opponent.

This research milestone is closely linked to a similar milestone by San Francisco-based AI research company OpenAI, which trains AI agents using reinforcement learning to play the challenging five-on-five multiplayer game Dota 2. Already in April, the most mature version of the OpenAI Five software, as it is called, surpassed the world champion team Dota 2, having just lost against two less powerful e-sports teams last summer. The leap in OpenAI Five’s capabilities reflects that of AlphaStar, and both are strong examples of how this approach to AI can create an unprecedented level of playability.

Similar to OpenAI’s Dota 2 Bots and other gameplay, the goal of this type of AI research is not just to defeat people in different games just to prove that it is possible. Instead, it’s about proving that – with enough time, effort and resources – sophisticated AI software can best help people with virtually any cognitive challenge in competition, be it a board game or a modern video game.

It is also about demonstrating the benefits of reinforcement learning, a special side of machine learning that has had great success in recent years when combined with high computing power and training methods such as virtual simulation.

Like OpenAI, DeepMind trains its AI agents against versions of themselves and at an accelerated pace, so that agents can master hundreds of years of gameplay in the span of a few months. This has allowed this type of software to stand on par with some of the most talented human players of Go and now with much more challenging games like Starcraft and Dota.

However, the software is limited to the narrow discipline for which it was developed. The Go Playing Agent cannot play Dota and vice versa. (DeepMind had a more universal version of his Go game agent try his hand at chess, which he mastered in eight hours.) This is because the software is not programmed with easily replaceable rulesets or directions. Instead, DeepMind and other research institutions use reinforcement learning to let agents figure out how to play themselves, which is why the software often develops new and wildly unpredictable game styles that have since been adopted by top people.

“AlphaStar is a fascinating and unorthodox player – one with the reflexes and speed of the best pros, but strategies and a style that is completely unique. The way AlphaStar has been trained, with agents competing in a league, has led to a gameplay that is unimaginably unusual; it really questions how much of StarCraft’s diverse possibilities professional players have really explored,” said Diego “Kelazhur” Schwimer, a professional player on the Panda Global team, in an explanation. “Although some of AlphaStar’s strategies may seem strange at first glance, I can’t help wondering if the combination of all the different playing styles she’s shown might actually be the best way to play the game.”

DeepMind hopes that the advances in reinforcement learning made by the lab and other AI researchers can be further disseminated sometime in the future. The most likely real application for such software is robotics, where the same techniques can properly train AI agents to perform real-world tasks, such as operating robot hands, in virtual simulation. Then, after years of simulation after years of motor control, the AI can take over the reins of a physical robot arm and perhaps one day even control whole-body robots. DeepMind also sees more and more sophisticated – and thus safer – self-propelled cars as another place for his specific approach to machine learning.

Hits: 14