Sunday, May 27, 2012

Speech to Text with timeline data

I'm working on a book project where I have audio output of a reader and the associated text. I want to sync the two. My tool has a simple method... press the mouse down for each word as the audio is playing. Record the time. But sometimes (often), more fine tuning is needed. I figured that someone must have solved this already. I did a search and played with some tools. None of them were magic. The best thing so far is Audacity. A free audio tool. It has an automated bit that gets close. But I find having to edit that is harder than just going at it with the label track. My process: - zoom in. - use left-mouse and shift-left-mouse to change the selection. - then ctrl-left-mouse to play. - adjust the selection if necessary - ctrl-b to create a new label (I don't enter the work-- potentially useful) Walk through the text on a page. Export the label data. It creates a text file with two columns, start and end time. Adjusting my code to take start & end. And writing a perl/python script to convert to the Flash array. Only real question remaining is the format desired. The data files can be imported for editing. I think the logical thing to do would be to outsource someone. $5/page? It takes me about 10 minutes or so per page. Current book has 26 pages. I suspect I'll get faster.