Top
image credit: Pixabay

Microsoft Trains Two Billion Parameter Vision-Language AI Model BEiT-3

September 27, 2022

Via: InfoQ

Researchers from Microsoft’s Natural Language Computing (NLC) group announced the latest version of Bidirectional Encoder representation from Image Transformers: BEiT-3, a 1.9B parameter vision-language AI model. BEiT-3 models images as another language and achieves state-of-the-art performance on a wide range of downstream tasks.

The model and experiments were described in a paper published on arXiv. The key idea in BEiT-3 is to model images as another language (which the authors call “Imglish”); this allows the model to be pretrained using only the masked language modeling (MLM) objective, and the training process can therefore be scaled up more easily.

Read More on InfoQ