Microsoft and Nvidia unveil most powerful Transformer language models13. October 2021
Microsoft and Nvidia unveil most powerful Transformer language models
San Francisco, Oct. 13, 2021
Microsoft and Nvidia have unveiled a natural language generation model that has three times more parameters than GPT-3. In a blog post, they called the MT-NLG model “the largest and most powerful monolithic Transformer language model trained to date.”
Language models analyze and generate new text based on prompts. For example, they can predict what a person might write next in an email, or compose a story based on a single headline.
MT-NLG shows “unmatched accuracy” on these and other tasks, such as reading comprehension, common sense, completion prediction, and word sense disambiguation.
The model has 530 billion parameters compared to the 175 billion parameters of OpenAI’s GPT-3 model. It’s still less than Google’s Switch Transformer NLP model, which has 1.6 trillion parameters. More parameters usually means a more sophisticated and complex model that is better able to understand the nuances of language.
It was trained using Nvidia’s $85 million Selene supercomputer, which consists of 560 DGX A100 servers, each with eight A100 80GB GPUs. Like other language models, however, it is sometimes toxic and biased. That’s because it inherits those characteristics from the data it’s trained on, though Nvidia and Microsoft say they’re “committed to working to solve this problem.”