The Effect of Microsoft Copilot in a Multi-lingual Context with Donald Ngwe

We tested Microsoft Copilot in multilingual contexts, examining how Copilot can facilitate collaboration between colleagues with different native languages.

First, we asked 77 native Japanese speakers to review a meeting recorded in English. Half the participants had to watch and listen to the video. The other half could use Copilot Meeting Recap, which gave them an AI meeting summary as well as a chatbot to answer questions about the meeting.

Then, we asked 83 other native Japanese speakers to review a similar meeting, following the same script, but this time held in Japanese by native Japanese speakers. Again, half of participants had access to Copilot.

For the meeting in English, participants with Copilot answered 16.4% more multiple-choice questions about the meeting correctly, and they were more than twice as likely to get a perfect score.  Moreover, in comparing accuracy between the two scenarios, people listening to a meeting in English with Copilot achieved 97.5% accuracy, slightly more accurate than people listening to a meeting in their native Japanese using standard tools (94.8%). This is a statistically significant difference (p<.05). The changes are small in percentage point terms because the baseline accuracy is so high, but Copilot closed 38.5% of the gap to perfect accuracy for those working in their native language (p<0.10) and closed 84.6% of the gap for those working in (non-native) English (p<.05).

 

Summary from Jaffe et al, Generative AI in Real-World Workplaces, July 2024.