Baidu beats Microsoft and Google in the competition of language models2. January 2020
Baidu beats Microsoft and Google in the competition of language models
New York, 2.1.2020
General Language Understanding Evaluation, also known as GLUE, is a widely accepted measure of how well an AI system understands human language. It consists of nine different tests such as : selecting the names of people and organizations in a sentence and finding out or what a pronoun like “it” refers to when there are several potential precursors. A language model that scores very well on GLUE can therefore perform various tasks of reading comprehension. Out of a full score of 100, the average person scores about 87. Baidu is now the first team to achieve over 90 points with its ERNIE model.
What is remarkable about Baidu is that it shows how AI research benefits from a diversity of contributors. Baidu’s researchers had to develop a technique specifically for the Chinese language to build ERNIE (which stands for “Enhanced Representation through kNowledge IntEgration”). Coincidentally, the same technique also improved the understanding of the English language.
In order to be able to assess ERNIE, one should keep in mind the model from which it was inspired: Google’s BERT. (Yes, they are both named after the signs of Sesame Street)
Before BERT (“Bidirectional Encoder Representations of Transformers”) was created in late 2018, the results of natural language models were poor. They were good at predicting the next word in a sentence – and thus well suited for applications such as Autocomplete – but they could not sustain a single thought about a small passage. This was because they did not understand the meaning of a word – such as “it”.
BERT changed that. Earlier models learned to predict and interpret the meaning of a word by considering only the context that appeared before or after the word – never both at the same time. They were, in other words, unidirectional.
BERT, on the other hand, considers the context before and after a word at once, making it bi-directional. It uses a technique known as “masking”. In a given passage of text, BERT randomly hides 15% of the words and then tries to predict them from the remaining words. This allows it to make more accurate predictions because it has twice as many clues from which to work. For example, in the sentence “The man went to ___ to buy milk”, both the beginning and the end of the sentence give clues about the missing word. The ___ is a place to go and a place to buy milk.
Masking is one of the core innovations behind dramatic improvements in natural language tasks and is one of the reasons why models like OpenAI’s infamous GPT-2 can write extremely compelling prose without deviating from a central thesis.
From English to Chinese and back again
When the Baidu researchers began to develop their own language model, they wanted to build on the masking technique. But they realized that they had to improve it to do justice to the Chinese language.
In English, the word serves as a semantic unit – a word taken completely out of context still contains meaning. The same does not apply to characters in Chinese. While certain signs have an inherent meaning, such as fire (火, huŏ), water (水, shuĭ) or wood (木, mù), most of them only have a meaning when they are strung together with others. The sign 灵 (líng), for example, can mean either clever (机灵, jīlíng) or soul (灵魂, línghún), depending on the match. And the characters in a proper name like Boston (波士顿, bōshìdùn) or the USA (美国, měiguó) do not mean the same when they are separated.
That’s why researchers have trained ERNIE to use a new version of masking that hides strings instead of individual characters. They also trained it to distinguish between meaningful and random strings so that it can mask the correct character combinations accordingly. This gives ERNIE a better understanding of how words encode information in Chinese and allows him to predict the missing parts much more accurately. This is useful for applications such as translation and information retrieval from a text document.
The researchers quickly discovered that this approach also works better for English. Although not as common as in Chinese, English, like Chinese, has a series of words that express a meaning other than the sum of their parts. Own nouns like “Harry Potter” and expressions like “chip off the old block” cannot be analysed meaningfully by breaking them down into individual words.
So for the sentence:
Harry Potter is a series of fantasy novels written by J. K. Rowling.
BERT could mask it like this:
Potter is a series of fantasy novels written by J.K. Rowling.
But ERNIE would mask it like this instead:
Harry Potter is a fantasy novel by [Mask] [Mask] [Mask] [Mask] [Mask] [Mask] [Mask] [Mask].
This way ERNIE learns more robust predictions based on meaning rather than statistical word usage patterns.
A variety of ideas
The latest version of ERNIE also uses various other training techniques. It takes into account the sequence of sentences and the distances between them, for example, to understand the logical flow of a paragraph. Most importantly, however, there is a method known as continuous training, which allows you to train on new dates and new tasks without forgetting the ones you have learned before. As a result, over time, it can become better and better at performing a wide range of tasks with minimal human intervention.
Baidu is actively using ERNIE to provide users with more applicable search results, remove duplicate messages in its news feed, and improve the ability of its AI Assistant Xiao Du to respond accurately to queries. ERNIE’s latest architecture was described in a paper presented at the Association for the Advancement of Artificial Intelligence conference June 11-14 ( https://icwsm.org/2019/index.php ). Just as their team built on Google’s work with BERT, the researchers hope that others will also benefit from their work with ERNIE.
“When we started this work, we specifically thought about certain characteristics of the Chinese language,” said Hao Tian, chief architect of Baidu Research. “But we quickly discovered that they were applicable