Simon: Open-Source Speech Recognition: Februar 2010

Mittwoch, 24. Februar 2010

Benefit project

On the 1st of February 2010 the friendly society simon listens started to work on a new project. For the next one and a half years, the simon listens team will investigate ways and means to make the simon speech recognition solution even more usable – especially for the elderly.

Abstract:
With the help of verbal control provided by simon using terms of everyday language, useful scenarios and areas of application shall be created to enable an easy use of new communication technologies such as the internet, telephone and multimedia applications for elderly people. Moreover, additional security can be provided, for example, a reminder for the user to take a medication.

In the course of this project we will join forces with the Signal Processing and Speech Communication Laboratory of the Graz University of Technology, the HTBLA Kaindorf/Sulm, the Rehabilitation Clinic Maria-Theresia, the KFU Research Center for Austrian German and the Huminatis Graz to ensure that we have the necessary expertise to tackle such an ambitions project.

The solution created in this project will be released under the GPL license. All code will be freely available to the community.

Thanks to the generous support of the bmvit (federal ministry of transport, innovation and technology) of Austria and the FFG (Austrian Research Promotion Agency) for making this possible!

Dienstag, 23. Februar 2010

Model Compilation Adapter

In simon 0.2 we introduced some mechanisms to catch common errors during the compilation of the model and display nicer error messages to the user explaining ways to solve the issue manually. In simon 0.3 simon, however, simon will automatically repair some common mistakes without the user even noticing.

To explain what I am talking about, I first have to talk about simons architecture a bit so bear with me...

During normal operation, the simon client gathers the instructions (words, grammar, etc.) that will then be sent to simond. simond in turn compiles the model out of the given input files. To do that, simond first converts them to a format usable by the underlying tools (HTK, Julius). This conversion step was not needed in 0.2 because simon 0.2 only used the raw file formats of HTK / Julius. However, in simon 0.3 we need more control over the model and also want to give the user some advanced features that were not possible with just the information contained in those raw formats.

In simon 0.3 we introduced a new step between gathering the data and compiling it to a usable model: Adapting the input files.

This sounds like a boring but nescessairy conversion and indeed it is.

But what makes it interesting is that at the point of adaption we have all the input data that will be turned in to a mode in a format that is easily parsable. This means that it is an ideal place to do some last minute optimizations on the temporary files that are then used to generate the model.

The model adaption manager will for example automatically remove words from lexicon that have no training data associated. It will also clean the grammar of sentences that have no associated words. It will even remove samples containing words that are not in your dictionary. Basically, simon should be able to handle a lot of case that would cause an error in simon 0.2, automatically.