Creating Custom UI or UI-less speaker training
Last reviewed: 8/7/2009
HOW Article ID: H080902
The information in this article applies to:
- ProfileKit 3
Some recognizers provide built-in dialogs or provide utilities for invoking speaker training dialogs to build a speaker profile. These training dialogs are usually dictation-based and require the user to read from dialog when prompted.
ProfileKit 3 provides a way to create custom training with dictation or grammar recognition, use a built-in ProfileKit dialog or a custom dialog you create, or use no dialog at all (UI-less) with live or recorded audio.
Recognizers have their own formats for speaker profiles. Some provide training dialogs with which to perform speaker training. In addition to launching recognizer-supplied training dialogs, ProfileKit provides custom training options for training speaker profiles:
- Training dialog to manage the speaker training process;
- UI-less (no dialog) to manage the speaker training process;
- Application custom UI (application custom dialog) to manage the speaker training process; and
- Train with live audio using microphone or with pre-recorded audio.
ProfileKit supports the following recognizers and profile formats:
|Recognizer||Speech API||Training Dialogs||Custom Training|
|Dragon NaturallySpeaking (all languages)||Dragon COM API||Yes||No|
|IBM ViaVoice (all languages)||SMAPI||Yes||Yes|
|Microsoft SAPI 4 (all languages)||SAPI 4||Yes||No|
|Microsoft SAPI 5 (all languages)||SAPI 5||Yes||Yes|
|Nuance VoCon 3200 V2 (all languages)||VoCon 3200 V2||No||Yes|
|Nuance VoCon 3200 V3 (all languages)||VoCon 3200 V3||No||Yes|
The ChantPM class provides a new method or managing speaker profiles: StartTraining.
StartTraining provides a way to pass recorded audio data from which to train as an alternative to live microphone audio. In this case, a dictation vocabulary is automatically loaded and activated if no other vocabulary is active. The recording object type may be one of the following values:
|CROBuffer||1||The recording audio source is copied from a buffer.|
|CROFile||2||The recording audio source is read from a file.|
|CROMultiMedia||3||The recording audio source is from the system real-time multimedia device (e.g., a microphone).|
|CROStream||4||The recording audio source is read from a stream.|
The following examples illustrate using the StartTraining method to train from recorded audio:
// Instantiate ChantPM object NChantPM1 = new NChantPM(); // Set the current speaker NChantPM1.SetStringProperty(ChantStringProperty.CSPSpeaker, "Default Speaker"); // Training the speaker with ProfileKit grammar training NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingPhraseText, "red\nblue\norange\ngreen\npurple\nyellow\nbrown\n"); // Start training with dialog hidden NChantPM1.StartTraining("", 0, ChantRecordingObject.CROMultiMedia, ChantAudioFormat.CAFDefault, false); // Training the speaker with ProfileKit using a grammar file NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingGrammarVocab, "colors.xml"); // Start training with dialog hidden using pre-recorded audio file NChantPM1.StartTraining("mytraining.wav", 0, ChantRecordingObject.CROFile, ChantAudioFormat.CAFDefault, false);
Refer to programming language specific syntax in the help file Class Library Reference.