In this ELOQUENCE Web Café session, titled “Teaching AI Europe’s Languages,” the discussion focused on one of the core goals of the ELOQUENCE project: developing voice and chatbot technologies that can work across Europe’s diverse languages, dialects, accents, and real-world communication contexts.
The session featured Seraphina Fong and Lorenzo Concina from Fondazione Bruno Kessler, who brought complementary perspectives from computational linguistics, cognitive science, speech technologies, and machine learning. Together, they explored what it means to teach AI not only a standard version of a language, but the richness and variation that shape how people actually speak.
Seraphina opened the conversation by reflecting on the importance of exposure, context, and representation. Teaching AI a language, she explained, is not simply a matter of giving it a dictionary. Like humans, AI systems need to learn from varied and diverse examples of real speech, including accents, dialects, slang, disfluencies, and regional expressions. If certain communities or speaking styles are missing from the data, the system is more likely to struggle with them. This makes balanced and representative data essential for building inclusive language technologies.
Lorenzo added a more technical perspective, highlighting the importance of acoustic variability, domain specificity, and continuous learning. Speech technologies must be able to handle differences in age, accent, speaker condition, environment, and terminology. Real-world communication does not happen in isolation; it is shaped by context, whether in a medical consultation, a technical meeting, or a multilingual public institution. Because languages evolve over time, AI models also need to be updated continuously with new data.
A key part of the discussion focused on speech large language models, or speech LLMs. Unlike traditional systems that first convert speech into text and then process that text separately, speech LLMs aim to process spoken language more directly while still benefiting from the knowledge stored in large language models. This approach is especially valuable in the European context, where many languages have limited labelled speech data. By using pre-trained models and cross-lingual transfer, researchers can build more scalable solutions for underrepresented languages.
The speakers also discussed the challenges of working with real-world speech data. Existing datasets often fail to capture the full richness of communication, including non-native accents, dialects, code-switching, disfluent speech, and domain-specific terminology. Lorenzo shared lessons from his previous work with United Nations conference transcription, where open-source models performed well on public benchmarks but struggled with specialised terminology, acronyms, names, and multilingual accents.
The session emphasised that inclusive speech technology is not only about collecting more data. It is also about choosing data carefully, evaluating models beyond average performance, understanding where errors occur, and involving the communities that are most affected by these systems.
Looking ahead, both speakers pointed toward a future where multilingual AI can support more languages, more tasks, and more diverse speakers. The goal is not to flatten Europe’s linguistic diversity, but to help sustain it through technologies that are robust, adaptable, and inclusive by design.
Listen to the full session here.
