How do I create custom UI or UI-less speaker training?
Last reviewed: 8/7/2009
HOW Article ID: H080902
The information in this article applies to:
- ProfileKit 3
Summary
Some recognizers provide built-in dialogs or provide utilities for invoking speaker training dialogs to build a speaker profile. These training dialogs are usually dictation-based and require the user to read from dialog when prompted.
ProfileKit 3 provides a way to create custom training with dictation or grammar recognition, use a built-in ProfileKit dialog or a custom dialog you create, or use no dialog at all (UI-less) with live or recorded audio.
More Information
Recognizers have their own formats for speaker profiles. Some provide training dialogs with which to perform speaker training. In addition to launching recognizer-supplied training dialogs, ProfileKit provides custom training options for training speaker profiles:
- Training dialog to manage the speaker training process;
- UI-less (no dialog) to manage the speaker training process;
- Application custom UI (application custom dialog) to manage the speaker training process; and
- Train with live audio using microphone or with pre-recorded audio.
ProfileKit supports the following recognizers and profile formats:
Recognizer | Speech API | Training Dialogs | Custom Training |
---|---|---|---|
Dragon NaturallySpeaking (all languages) | Dragon COM API | Yes | No |
IBM ViaVoice (all languages) | SMAPI | Yes | Yes |
Microsoft SAPI 4 (all languages) | SAPI 4 | Yes | No |
Microsoft SAPI 5 (all languages) | SAPI 5 | Yes | Yes |
Nuance VoCon 3200 V2 (all languages) | VoCon 3200 V2 | No | Yes |
Nuance VoCon 3200 V3 (all languages) | VoCon 3200 V3 | No | Yes |
The ChantPM class provides a new method or managing speaker profiles: StartTraining.
StartTraining provides a way to pass recorded audio data from which to train as an alternative to live microphone audio. In this case, a dictation vocabulary is automatically loaded and activated if no other vocabulary is active. The recording object type may be one of the following values:
APIs | Constant | Value | Description |
---|---|---|---|
CROBuffer | 1 | The recording audio source is copied from a buffer. | |
CROFile | 2 | The recording audio source is read from a file. | |
CROMultiMedia | 3 | The recording audio source is from the system real-time multimedia device (e.g., a microphone). | |
CROStream | 4 | The recording audio source is read from a stream. |
The following examples illustrate using the StartTraining method to train from recorded audio:
// Instantiate ChantPM object
NChantPM1 = new NChantPM();
// Set the current speaker
NChantPM1.SetStringProperty(ChantStringProperty.CSPSpeaker, "Default Speaker");
// Training the speaker with ProfileKit grammar training
NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingPhraseText, "red\nblue\norange\ngreen\npurple\nyellow\nbrown\n");
// Start training with dialog hidden
NChantPM1.StartTraining("", 0, ChantRecordingObject.CROMultiMedia, ChantAudioFormat.CAFDefault, false);
// Training the speaker with ProfileKit using a grammar file
NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingGrammarVocab, "colors.xml");
// Start training with dialog hidden using pre-recorded audio file
NChantPM1.StartTraining("mytraining.wav", 0, ChantRecordingObject.CROFile, ChantAudioFormat.CAFDefault, false);
Refer to programming language specific syntax in the help file Class Library Reference.