Determining which speaker training method to use
Last reviewed: 12/1/2011
HOW Article ID: H121101
The information in this article applies to:
- ProfileKit 4
Summary
Some recognizers provide built-in dialogs or provide utilities for invoking speaker training dialogs to create a speaker profile. These training dialogs are usually dictation-based recognition and require the user to read from dialog when prompted.
ProfileKit 4 provides a way to use built-in recognizer dialogs, custom training with a built-in ProfileKit dialog or a custom dialog you design, or use no dialog at all (UI-less) with live or recorded audio. Custom training can be dictation- or grammar-based recognition.
More Information
Recognizers have their own formats for speaker profiles. Some provide training dialogs with which to perform speaker training. In addition to launching recognizer-supplied training dialogs, ProfileKit provides custom training options for training speaker profiles:
- Training dialog to manage the speaker training process;
- UI-less (no dialog) to manage the speaker training process;
- Application custom UI (application custom dialog) to manage the speaker training process; and
- Train with live audio using microphone or with pre-recorded audio.
ProfileKit supports the following recognizers and profile formats:
Recognizer | Speech API | Training Dialogs | Custom Training |
---|---|---|---|
Dragon NaturallySpeaking (all languages) | Dragon COM API | Yes | No |
IBM ViaVoice (all languages) | SMAPI | Yes | Yes |
Microsoft SAPI 4 (all languages) | SAPI 4 | Yes | No |
Microsoft SAPI 5 (all languages) | SAPI 5 | Yes | Yes |
Nuance VoCon 3200 V2 (all languages) | VoCon 3200 V2 | No | Yes |
Nuance VoCon 3200 V3 (all languages) | VoCon 3200 V3 | No | Yes |
Nuance VoCon 3200 V4 (all languages) | VoCon 3200 V4 | No | Yes |
Recognizer selection and application deployment strategy impacts how you should decide on which training method to use.
Recognizer | Training Method Considerations |
---|---|
Dragon NaturallySpeaking (all languages) | Dragon does not provide a training API so you must use the built-in dialogs for speaker training. |
IBM ViaVoice (all languages) | Since all training options are available, your application deployment approach is likely to dictate your training strategy. If you are relying heavily on dictation, then consider using built-in training or long custom training scripts. If you are using command or grammar recognition, then consider using UI or UI-less custom training. |
Microsoft SAPI 4 (all languages) | SAPI 4 does not provide a training API so you must use the built-in dialogs for speaker training. |
Microsoft SAPI 5 (all languages) | Since all training options are available, your application deployment approach is likely to dictate your training strategy. If you are relying heavily on dictation, then consider using built-in training or use long custom training scripts. If you are using command or grammar recognition, then consider using UI or UI-less custom training. |
Nuance VoCon 3200 (all versions, all languages) | VoCon does not provide built-in training dialogs. Consider using UI or UI-less custom training. |
The ChantPM class provides the StartTraining method for training speaker profiles with live (microphone) or recorded audio.
The audio source type may be one of the following values:
APIs | Constant | Value | Description |
---|---|---|---|
![]() ![]() ![]() ![]() ![]() | CROBuffer | 1 | The recording audio source is copied from a buffer. |
![]() ![]() ![]() ![]() ![]() | CROFile | 2 | The recording audio source is read from a file. |
![]() ![]() ![]() ![]() ![]() | CROMultiMedia | 3 | The recording audio source is from the system real-time multimedia device (e.g., a microphone). |
![]() ![]() ![]() ![]() ![]() | CROStream | 4 | The recording audio source is read from a stream. |
The following examples illustrate using the StartTraining method to train from recorded audio. This example assumes you have recorded the end user saying the following words in order with pauses between each word to the wave file mytraining.wav: red, blue, orange, green, purple, yellow, brown; and that you have the grammar colors with these words as a list of choices. The grammar file type and syntax varies based on the recognizer used.
// Instantiate ChantPM object NChantPM1 = new NChantPM(); // Set the engine API or enumerate and select specific engine NChantPM1.SetNumberProperty(ChantNumberProperty.CNPEngineAPI, (int)ChantEngineAPI.CESAPI5SR); // Set the current speaker NChantPM1.SetStringProperty(ChantStringProperty.CSPSpeaker, "Default Speaker"); // Training the speaker with ProfileKit grammar training NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingPhraseText, "red\nblue\norange\ngreen\npurple\nyellow\nbrown\n"); // Start training with dialog hidden NChantPM1.StartTraining("", 0, ChantRecordingObject.CROMultiMedia, ChantAudioFormat.CAFDefault, false); // Training the speaker with ProfileKit using a grammar file NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingGrammarVocab, "colors.xml"); // Start training with dialog hidden using pre-recorded audio file NChantPM1.StartTraining("mytraining.wav", 0, ChantRecordingObject.CROFile, ChantAudioFormat.CAFDefault, false);
See the Chant ProfileKit 4 help file for more training examples.