[ad_1]
Dubbed RETRO (for “Undo-Enhanced Transformer”), AI matches the performance of neural networks to 25 times its own size, reducing the time and cost required to train very large models. The researchers also claim that the database makes it easier to analyze what the AI has learned, which can help filter out bias and toxic language.
“Being able to look it up on the fly instead of memorizing everything can often be beneficial, as it is for humans,” says DeepMind’s Jack Rae, who led the firm’s research in major language models.
Language models produce text by predicting which words will come next in a sentence or speech. The larger a model is, the more information it can learn about the world during training, which makes its predictions better. GPT-3 has 175 billion parameters—values in a neural network that stores data and adjusts as the model learns. Microsoft’s language model, Megatron, has 530 billion parameters. But large models also require large amounts of computing power to train, making them inaccessible. all but the wealthiest establishments.
With RETRO, DeepMind sought to reduce the cost of training without reducing the amount the AI learned. The researchers trained the model on numerous news articles, Wikipedia pages, books, and texts from GitHub, an online code repository. The dataset contains text in 10 languages, including English, Spanish, German, French, Russian, Chinese, Swahili and Urdu.
RETRO’s neural network has only 7 billion parameters. But the system makes up for it with a database of nearly 2 trillion text snippets. Both the database and the neural network are trained simultaneously.
When RETRO generates text, it uses its database to search and compare passages similar to what it’s typing, making its predictions more accurate. Outsourcing some of the neural network’s memory to the database allows RETRO to do more with less.
The idea is not new, but this is the first time a search system has been developed for a major language model, and the results from this approach have been shown to rival the performance of the best language AIs around for the first time.
Bigger is not always better
RETRO draws on two other studies published this week by DeepMind, one looking at how the size of the model affects its performance and the other looking at the potential harm these AIs cause.
To study size, DeepMind built a large language model called Gopher with 280 billion parameters. Out of the 150+ common language difficulties they used for testing, it outperformed the latest models in 82%. The researchers then compared it to RETRO and found that the 7 billion parameter model matched Gopher’s performance on most tasks.
Ethics study is a comprehensive investigation of well-known problems found in major language models. These models ingest bias, misinformation, and toxic language such as hate speech from the articles and books they’ve been trained in. As a result, they sometimes spit harmful expressions, mindlessly reflecting on what they encounter in the training text without knowing what it means. “Even a model that mimics the data perfectly will be biased,” says Rae.
[ad_2]
Source link