2021 was the year of monster AI models

[ad_1]

What does it mean for a model to be large? The size of a model with a trained neural network is measured by the number of parameters it has. These are values ​​in the network that are adjusted repeatedly during training and then used to make predictions of the model. Roughly speaking, the more parameters a model has, the more information it can extract from the training data and the more accurate its predictions will be on the new data.

GPT-3 has 175 billion parameters; this is 10 times more than the previous version, GPT-2. But the GPT-3 remains in the shadow of the 2021 class. Jurassic-1, a major commercially available language model released in September by US startup AI21 Labs, outstripped GPT-3 with 178 billion parameters. Gopher, a new model released by DeepMind in December, has 280 billion parameters. Megatron-Turing NLG has 530 billion. Google’s Switch-Transformer and GLaM models have one and 1.2 trillion parameters, respectively.

The trend isn’t just in the US. This year, Chinese technology giant Huawei created a language model with 200 billion parameters called PanGu. Another Chinese firm, Inspur, built the Yuan 1.0, a model with 245 billion parameters. Baidu and Peng Cheng Lab, a research institute in Shenzhen, announced PCL-BAIDU Wenxin, a 280-billion-parameter model that Baidu currently uses in a variety of applications, including internet search, news feeds, and smart speakers. Beijing AI Academy announced Wu Dao 2.0 with 1.75 trillion parameters.

Meanwhile, South Korean internet search company Naver has announced a model with 204 billion parameters called HyperCLOVA.

Each of these is a remarkable feat of engineering. To begin with, training a model with over 100 billion parameters is a complex plumbing problem: hundreds of discrete GPUs (the hardware of choice for training deep neural networks) must be connected and synchronized, and the training data must be partitioned and partitioned. distributed among them in the right order at the right time.

Major language models have become prestige projects that showcase a company’s technical prowess. Yet a few of these new models take the research beyond repeating the demonstration that scale-up works well.

There are some innovations. Once trained, Google’s Switch-Transformer and GLaM use some of their parameters to make predictions, thus saving computing power. PCL-Baidu Wenxin combines a GPT-3 style model with an infographic, a technique used in old-style symbolic artificial intelligence to store facts. And with Gopher, DeepMind is launched RETROa language model with only 7 billion parameters that competes with others 25 times its own size by cross-referencing a document database when it renders text. This makes RETRO less costly to train than its giant competitors.

[ad_2]

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *