One of ELOQUENCE’s core ambitions is to rethink how speech and language technologies interact with large language models, not as loosely connected components, but as tightly integrated systems.
In this interview, Olda Plchot from Brno University of Technology reflects on the project’s first two years, highlighting scientific advances in directly connecting speech with LLM-based systems. His perspective emphasizes why moving beyond classical cascade architectures matters, how peer-reviewed research feeds into applied pilots, and how fundamental science underpins ELOQUENCE as a research and innovation action.
Q: Looking back at the first two years of ELOQUENCE, which achievement from your team are you most proud of?
I view our systems that directly interconnect speech with an LLM-based system as a significant achievement. We have successfully tested this on tasks such as dialogue state tracking, which is perfectly aligned with our goals in ELOQUENCE. Seeing that our architecture outperforms classical cascade systems and that we can extract additional information from speech with respect to text transcription was very pleasing. Of course, it is intuitive that we lose a lot of paralinguistic information during transcription, but to prove it on a real benchmark moved us in the right direction of understanding speech directly.
Q: Which result, insight, or technological advancement do you feel has had the strongest impact on the project so far?
Our project falls into a research and innovation action and has fairly low TRL, so I believe that all basic research that we published within the project in respectable conferences and journals has a strong impact. It is not easy to push the state of the art in this crowded AI domain. On the other side, we have a piloting work package within the project to really showcase some of the results. Here, I see the strong contribution of Omilia to the initial version of the Interactive Playground and the IDIAP’s SDialog toolkit, which allows fast development, data simulation, and testing.
Q: What strengths or expertise did your team bring that you believe contributed most to the project’s success?
I believe that we brought a good scientific team that was able to do basic research aligned with the project’s overall goals, publish it in respectable conferences, and also deliver the recipes to the consortium and ultimately the public. This is something that is nicely presentable and has already been verified through the peer review process of the scientific community.
Q: From your perspective, what do you see as the most important opportunities and responsible AI challenges emerging right now? How do you think ELOQUENCE can help address them?
ELOQUENCE is trying to utilize large foundational models for building applications focused to a particular domain and we try to show it within the pilots. As the number of large models grows and they become larger, more general, and complex, I think that more and more small applications will try to utilize them – by interconnecting them, fine-tuning them, or using them to help with data generation or augmentation. All these recipes and findings will be important for the AI industry. As for the bigger picture, I believe that we will see an emergence of multi-modal models. When these are well established, we can utilize the architectures developed in ELOQUNCE, which combine multiple single-modal models (such as speech encoders and LLMs), as better baselines to advance the research.
Together with piloting efforts and shared tools, ELOQUENCE illustrates how validated research results can be translated into reusable methods and frameworks, strengthening the bridge between fundamental science and real-world conversational AI.
