Sunday, May 27, 2012
Speech to Text with timeline data
I'm working on a book project where I have audio output of a reader and the associated text. I want to sync the two.
My tool has a simple method... press the mouse down for each word as the audio is playing. Record the time.
But sometimes (often), more fine tuning is needed. I figured that someone must have solved this already.
I did a search and played with some tools. None of them were magic.
The best thing so far is Audacity. A free audio tool.
It has an automated bit that gets close. But I find having to edit that is harder than just going at it with the label track.
My process:
- zoom in.
- use left-mouse and shift-left-mouse to change the selection.
- then ctrl-left-mouse to play.
- adjust the selection if necessary
- ctrl-b to create a new label (I don't enter the work-- potentially useful)
Walk through the text on a page.
Export the label data.
It creates a text file with two columns, start and end time.
Adjusting my code to take start & end.
And writing a perl/python script to convert to the Flash array.
Only real question remaining is the format desired.
The data files can be imported for editing.
I think the logical thing to do would be to outsource someone. $5/page?
It takes me about 10 minutes or so per page. Current book has 26 pages.
I suspect I'll get faster.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment