Researchers at Google Brain have open-sourced the Switch Transformer, a natural-language processing (NLP) AI model. The model scales up to 1.6T parameters and improves training time up to 7x compared to the T5 NLP model, with comparable accuracy.
The team described the model in a paper published on arXiv. The Switch Transformer uses a mixture-of-experts (MoE) paradigm to combine several Transformer attention blocks. Because only a subset of the model is used to process a given input, the number of model parameters can be increased while holding computational cost steady.