Mozilla's large repository of voice data will shape the future of machine learning
Subhashish Panigrahi (Alumni)
Mozilla’s open source project, Common Voice, is well on its way to becoming the world’s largest repository of human voice data to be used for machine learning. The saved recording then goes to a voicebank. The project aims to collect more than 10,000 hours of CC0-licensed free and open voice data in numerous world languages, which can effectively be used to train machine-learning models for content-based industries—particularly IoT and other speech-dependent applications and organizations. Common Voice is open to contributions—anyone can go to the Speak page and contribute by reading the sentences that appear on the screen. All contributions go to the Data page, which anyone can download at any time for their own use.