WGBH Educational Foundation

Log Number: LG-252299-OLS-22

The GBH Archives will improve two tools used to create and enhance speech-to-text transcripts for online audio and audiovisual digital collections to improve accessibility and discoverability. The project team will partner with the Brandeis University Computational Linguistics department to improve the transcript output of Kaldi, an open source speech-to-text toolkit, by upgrading to a modern neural network-based language model. GBH staff will simultaneously update and improve FIX IT+, a crowdsourcing tool to edit transcripts, by automating formerly manual workflows to make it easier for a non-developer to implement. The project team will encourage installation and use of the tools by distributing the code and conducting outreach to the library and archives communities so they can create affordable and accurate transcripts that will improve discoverability of digital audiovisual materials in collections. Subrecipient, Brandeis, will be responsible for improving the Kaldi speech to text toolkit to increase accuracy of machine created transcripts and will help disseminate project results at conferences.
Project Proposals