Meta aims to achieve better speech recognition with AI framework12. January 2022
Meta aims to achieve better speech recognition with AI framework
San Francisco, Jan. 12, 2022
Meta AI announced a conversational AI framework designed to bring AI assistants “closer to human speech perception.”
The system, called Audio-Visual Hidden Unit BERT ( AV-HuBERT ), could help smartphone assistants, for example, better understand what users are saying to them in crowds.
The framework is also intended to help develop speech recognition systems that are not as affected by noise. The system learns speech not just by hearing people speak, but by seeing their lip movements.
AV-HuBERT adds the visual component to the process, capturing “nuanced associations” between visual and auditory input even with very small amounts of untranscribed video data for pretraining, according to Meta.
AV-HuBERT requires far less labeled audiovisual data than other systems to achieve the same results. The smaller amount of data enables it to create speech recognition systems in multiple languages, according to Meta.