

Florian Schottmann
October 3, 2024
Turning English-centric LLMs into polyglots: how much multilingualism is needed?
Disclaimer: This article was written in 2024 and describes the situation before Textshuttle’s merger with Supertext and the subsequent relaunch at supertext.com.
The vast majority of today’s large language models (LLMs) are English-centric, having been pretrained predominantly on English text. Yet in order to meet user expectations, models need to be able to respond appropriately in multiple languages once deployed in downstream applications. This requires strong cross-linguistic transfer abilities. In this work, we investigate the minimum amount of multilingualism required during finetuning to elicit cross-linguistic generalisation in English-centric LLMs. In experiments across four LLMs, we find that multilingual instruction tuning with as few as two to three languages is both necessary and sufficient to elicit effective cross-linguistic generalisation, with the limiting factor being the degree to which a target language is seen during pretraining. Evaluations of five different tasks further reveal that multilingual instruction tuning is most beneficial for generative tasks that assume input/output language agreement, such as in chat settings, while being of less importance for highly structured classification-style tasks. Our code and data are available on Github