New ki – approach: finding even better solutions by remembering past successes

25. Februar 2021 0 Von Horst Buchwald

New York, Feb. 25, 2021

Many AI systems use reinforcement learning, in which an algorithm receives positive or negative feedback on its progress toward a specific goal after each step it takes, encouraging it to find a particular solution. This technique was used by AI company DeepMind to train AlphaGo, which beat a world champion at Go in 2016.

Adrien Ecoffet of Uber AI Labs and OpenAI in California and his colleagues hypothesized that such algorithms often overlook paths that would be better, but jump to another area in pursuit of something more promising, thereby overlooking the better solutions.

„What do you do if you don’t know anything about your task?“ asks Ecoffet. „If you’re just waving your arms around, it’s unlikely you’ll ever make a cup of coffee.“

The researchers solved the problem with an algorithm , which remembers all the paths a previous algorithm took when trying to solve a problem. When it finds a data point that doesn’t seem correct, it goes back to its memory card and tries another path.

As it plays, the software saves screen snippets from a game, so it knows what it has tried. It also groups similar-looking images to identify points in the game to which it should return as a starting point. The researchers tested this new approach by adding game rules and a goal: score as many points as possible and try to get a higher score each time.

In Atari games, it’s usually not possible to return at any point, but the researchers used an emulator – software that mimics the Atari system – with the added ability to save statistics and reload them at any time. This meant the algorithm could start at any point without having to play the game from the beginning.

The team had the algorithm play a collection of 55 Atari games, which has become a standard benchmark for reinforcement learning algorithms. The algorithm beat the state-of-the-art algorithms in these games 85.5 percent of the time. In one particularly complex game, Montezuma’s Revenge, the algorithm surpassed the previous record for reinforcement learning software and also beat the human world record.

Once the algorithm achieved a high enough score, the researchers used the solution it found to train a neural network to replicate the strategy and play the game the same way, without having to reload memory states using an emulator. This alternative approach proved more computationally intensive, as the neural network version of the algorithm produced billions of screen captures as it solved each game.

Peter Bentley of University College London expressed confidence that the team’s approach of combining reinforcement learning with an archive of memories could be used to solve more complex problems. In their paper, published in the journal Nature, the researchers said they can envision applications in robotics, language processing and even in the design of new drugs.

Journal Reference: Nature, DOI: 10.1038/s41586-020-03157-9

KategorieHeader