Toward deep-learning models that can understand code like humans do27. April 2021
Toward deep-learning models that can understand code like humans do
New York, 4/27/2021
Productivity tools like Eclipse and Visual Studio suggest code snippets that developers can insert into their work as they write. These automated features are powered by sophisticated language models that have learned to read and write computer code after ingesting thousands of examples. But like other deep-learning models trained on large data sets without explicit instructions, language models designed for code processing have built-in vulnerabilities.
“If you’re not really careful, a hacker can subtly manipulate the inputs to these models to make them predict anything,” says Shashank Srikant, a doctoral student in MIT’s Department of Electrical Engineering and Computer Science. “We’re trying to prevent that.”
In a new paper, Srikant and the MIT-IBM Watson AI Lab present an automated method for finding vulnerabilities in code processing models and retraining them to be more resistant to attack. This is part of a larger project by MIT researcher Una-May O’Reilly and IBM-affiliated researcher Sijia Liu to use AI to make automated programming tools smarter and more secure. The team will present its findings next month at the International Conference on Learning Representations.
Trained on GitHub and other program-sharing websites, code-processing models learn to generate programs the way other language models learn to write messages or poems. As a result, they can act as intelligent assistants, predicting what software developers will do next and offering guidance.
They can suggest programs that are appropriate for the task at hand or create program summaries to document how the software works. Code-processing models can also be trained to find and fix bugs. But despite their potential to increase productivity and improve software quality, they carry security risks that researchers are just beginning to discover.
Srikant and his colleagues have found that code processing models can be fooled by renaming a variable, inserting a fake print statement, or introducing other cosmetic operations into programs that the model is trying to process. These subtly altered programs function normally, but trick the model into processing them incorrectly and making the wrong decision.
These errors can have serious consequences for code-processing models of all kinds. Here are some examples:
● A malware detection model could be misled into mistaking a malicious program for a benign one.
● A code completion model might be misled into making incorrect or malicious suggestions.
● In both cases, viruses can sneak past the unsuspecting programmer.
● A similar problem occurs with computer vision models: Edit a few key pixels in an input image and the model can mistake pigs for airplanes and turtles for guns, as other MIT research has shown.
Like the best language models, code-processing models have a crucial flaw: They are experts at the statistical relationships between words and phrases, but only superficially capture their true meaning. OpenAI’s GPT-3 language model, for example, can write prose that ranges from eloquent to nonsensical, but only a human reader can tell the difference.
In the paper, the researchers propose a framework for automatically modifying programs to detect weaknesses in the models they process. It solves a two-part optimization problem: An algorithm identifies the places in a program where adding or replacing text causes the model to make the biggest mistakes. It also identifies which types of changes pose the greatest risk.
The framework revealed how fragile some models are. For example, the text summary model failed one-third of the time when a single change was made to a program; it failed more than half the time when five changes were made. On the other hand, they showed that the model is capable of learning from its mistakes, potentially gaining a deeper understanding of the programming.
“Our framework for attacking the model and re-training on these specific exploits could potentially help code processing models gain a better understanding of the program’s intent,” said Liu, co-author of the study. “This is an exciting direction waiting to be explored.”
In the background, a larger question remains: what exactly are these black-box deep-learning models learning? “Do they think about code the way humans do, and if not, how can we get them to do that?” says O’Reilly. “That’s the big challenge ahead.”