Student Competition in Audio-Visual Speech Synthesis in the Serbian Language
Creating digital avatars that speak naturally and convincingly in real time remains one of the most complex challenges in artificial intelligence. While Text-to-Speech systems have reached high levels of quality, realistic facial animation, especially accurate lip, jaw and facial expression synchronization is still an open research problem.
The challenge becomes even greater in low-resource languages such as Serbian, where limited datasets, lack of standardized tools and real-time constraints significantly increase complexity.
To address this challenge, the University of Novi Sad is organizing a student competition for Master’s students and 3rd and 4th year undergraduate students. Participants are invited to develop a system for audio-visual speech synthesis for the Serbian language.
The task is to generate time-dependent blendshape coefficients controlling facial animation, based on input text (and synthesized speech). Solutions will be evaluated based on animation naturalness, audio-visual synchronization, real-time capability, robustness, and quality of documentation and presentation.
More information here: https://eloquenceai.eu/student-competition-in-audio-visual-speech-synthesis-in-the-serbian-language/