Determining which speaker training method to use

Last reviewed: 12/1/2011

HOW Article ID: H121101

The information in this article applies to:

  • ProfileKit 4

Summary

Some recognizers provide built-in dialogs or provide utilities for invoking speaker training dialogs to create a speaker profile. These training dialogs are usually dictation-based recognition and require the user to read from dialog when prompted.

ProfileKit 4 provides a way to use built-in recognizer dialogs, custom training with a built-in ProfileKit dialog or a custom dialog you design, or use no dialog at all (UI-less) with live or recorded audio. Custom training can be dictation- or grammar-based recognition.

More Information

Recognizers have their own formats for speaker profiles. Some provide training dialogs with which to perform speaker training. In addition to launching recognizer-supplied training dialogs, ProfileKit provides custom training options for training speaker profiles:

  • Training dialog to manage the speaker training process;
  • UI-less (no dialog) to manage the speaker training process;
  • Application custom UI (application custom dialog) to manage the speaker training process; and
  • Train with live audio using microphone or with pre-recorded audio.

ProfileKit supports the following recognizers and profile formats:

RecognizerSpeech APITraining DialogsCustom Training
Dragon NaturallySpeaking (all languages)Dragon COM APIYesNo
IBM ViaVoice (all languages)SMAPIYesYes
Microsoft SAPI 4 (all languages)SAPI 4YesNo
Microsoft SAPI 5 (all languages)SAPI 5YesYes
Nuance VoCon 3200 V2 (all languages)VoCon 3200 V2NoYes
Nuance VoCon 3200 V3 (all languages)VoCon 3200 V3NoYes
Nuance VoCon 3200 V4 (all languages)VoCon 3200 V4NoYes

Recognizer selection and application deployment strategy impacts how you should decide on which training method to use.

RecognizerTraining Method Considerations
Dragon NaturallySpeaking (all languages)Dragon does not provide a training API so you must use the built-in dialogs for speaker training.
IBM ViaVoice (all languages)Since all training options are available, your application deployment approach is likely to dictate your training strategy. If you are relying heavily on dictation, then consider using built-in training or long custom training scripts. If you are using command or grammar recognition, then consider using UI or UI-less custom training.
Microsoft SAPI 4 (all languages)SAPI 4 does not provide a training API so you must use the built-in dialogs for speaker training.
Microsoft SAPI 5 (all languages)Since all training options are available, your application deployment approach is likely to dictate your training strategy. If you are relying heavily on dictation, then consider using built-in training or use long custom training scripts. If you are using command or grammar recognition, then consider using UI or UI-less custom training.
Nuance VoCon 3200 (all versions, all languages)VoCon does not provide built-in training dialogs. Consider using UI or UI-less custom training.

The ChantPM class provides the StartTraining method for training speaker profiles with live (microphone) or recorded audio.

The audio source type may be one of the following values:

APIsConstantValueDescription
IBM SMAPIMicrosoft SAPI 4 Speech RecognitionMicrosoft SAPI 5 Speech RecognitionNuance Dragon NaturallySpeakingNuance VoCon 3200CROBuffer1The recording audio source is copied from a buffer.
IBM SMAPIMicrosoft SAPI 4 Speech RecognitionMicrosoft SAPI 5 Speech RecognitionNuance Dragon NaturallySpeakingNuance VoCon 3200CROFile2The recording audio source is read from a file.
IBM SMAPIMicrosoft SAPI 4 Speech RecognitionMicrosoft SAPI 5 Speech RecognitionNuance Dragon NaturallySpeakingNuance VoCon 3200CROMultiMedia3The recording audio source is from the system real-time multimedia device (e.g., a microphone).
IBM SMAPIMicrosoft SAPI 4 Speech RecognitionMicrosoft SAPI 5 Speech RecognitionNuance Dragon NaturallySpeakingNuance VoCon 3200CROStream4The recording audio source is read from a stream.

The following examples illustrate using the StartTraining method to train from recorded audio. This example assumes you have recorded the end user saying the following words in order with pauses between each word to the wave file mytraining.wav: red, blue, orange, green, purple, yellow, brown; and that you have the grammar colors with these words as a list of choices. The grammar file type and syntax varies based on the recognizer used.

// Instantiate ChantPM object
NChantPM1 = new NChantPM();

// Set the engine API or enumerate and select specific engine
NChantPM1.SetNumberProperty(ChantNumberProperty.CNPEngineAPI, (int)ChantEngineAPI.CESAPI5SR);

// Set the current speaker
NChantPM1.SetStringProperty(ChantStringProperty.CSPSpeaker, "Default Speaker");

// Training the speaker with ProfileKit grammar training
NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingPhraseText, "red\nblue\norange\ngreen\npurple\nyellow\nbrown\n");

// Start training with dialog hidden 
NChantPM1.StartTraining("", 0, ChantRecordingObject.CROMultiMedia, ChantAudioFormat.CAFDefault, false);

// Training the speaker with ProfileKit using a grammar file
NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingGrammarVocab, "colors.xml");

// Start training with dialog hidden using pre-recorded audio file 
NChantPM1.StartTraining("mytraining.wav", 0, ChantRecordingObject.CROFile, ChantAudioFormat.CAFDefault, false);

See the Chant ProfileKit 4 help file for more training examples.