How do I create custom UI or UI-less speaker training?

Last reviewed: 8/7/2009

HOW Article ID: H080902

The information in this article applies to:

  • ProfileKit 3

Summary

Some recognizers provide built-in dialogs or provide utilities for invoking speaker training dialogs to build a speaker profile. These training dialogs are usually dictation-based and require the user to read from dialog when prompted.

ProfileKit 3 provides a way to create custom training with dictation or grammar recognition, use a built-in ProfileKit dialog or a custom dialog you create, or use no dialog at all (UI-less) with live or recorded audio.

More Information

Recognizers have their own formats for speaker profiles. Some provide training dialogs with which to perform speaker training. In addition to launching recognizer-supplied training dialogs, ProfileKit provides custom training options for training speaker profiles:

  • Training dialog to manage the speaker training process;
  • UI-less (no dialog) to manage the speaker training process;
  • Application custom UI (application custom dialog) to manage the speaker training process; and
  • Train with live audio using microphone or with pre-recorded audio.

ProfileKit supports the following recognizers and profile formats:

RecognizerSpeech APITraining DialogsCustom Training
Dragon NaturallySpeaking (all languages)Dragon COM APIYesNo
IBM ViaVoice (all languages)SMAPIYesYes
Microsoft SAPI 4 (all languages)SAPI 4YesNo
Microsoft SAPI 5 (all languages)SAPI 5YesYes
Nuance VoCon 3200 V2 (all languages)VoCon 3200 V2NoYes
Nuance VoCon 3200 V3 (all languages)VoCon 3200 V3NoYes

The ChantPM class provides a new method or managing speaker profiles: StartTraining.

StartTraining provides a way to pass recorded audio data from which to train as an alternative to live microphone audio. In this case, a dictation vocabulary is automatically loaded and activated if no other vocabulary is active. The recording object type may be one of the following values:

APIsConstantValueDescription
IBM SMAPIMicrosoft SAPI 4 Speech RecognitionMicrosoft SAPI 5 Speech RecognitionNuance Dragon NaturallySpeakingNuance VoCon 3200CROBuffer1The recording audio source is copied from a buffer.
IBM SMAPIMicrosoft SAPI 4 Speech RecognitionMicrosoft SAPI 5 Speech RecognitionNuance Dragon NaturallySpeakingNuance VoCon 3200CROFile2The recording audio source is read from a file.
IBM SMAPIMicrosoft SAPI 4 Speech RecognitionMicrosoft SAPI 5 Speech RecognitionNuance Dragon NaturallySpeakingNuance VoCon 3200CROMultiMedia3The recording audio source is from the system real-time multimedia device (e.g., a microphone).
IBM SMAPIMicrosoft SAPI 4 Speech RecognitionMicrosoft SAPI 5 Speech RecognitionNuance Dragon NaturallySpeakingNuance VoCon 3200CROStream4The recording audio source is read from a stream.

The following examples illustrate using the StartTraining method to train from recorded audio:


// Instantiate ChantPM object
NChantPM1 = new NChantPM();

// Set the current speaker
NChantPM1.SetStringProperty(ChantStringProperty.CSPSpeaker, "Default Speaker");

// Training the speaker with ProfileKit grammar training
NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingPhraseText, "red\nblue\norange\ngreen\npurple\nyellow\nbrown\n");

// Start training with dialog hidden 
NChantPM1.StartTraining("", 0, ChantRecordingObject.CROMultiMedia, ChantAudioFormat.CAFDefault, false);

// Training the speaker with ProfileKit using a grammar file
NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingGrammarVocab, "colors.xml");

// Start training with dialog hidden using pre-recorded audio file 
NChantPM1.StartTraining("mytraining.wav", 0, ChantRecordingObject.CROFile, ChantAudioFormat.CAFDefault, false);

Refer to programming language specific syntax in the help file Class Library Reference.