How do I determine which speaker training method to use?
Last reviewed: 12/1/2011
HOW Article ID: H121101
The information in this article applies to:
- ProfileKit 4
Summary
Some recognizers provide built-in dialogs or provide utilities for invoking speaker training dialogs to create a speaker profile. These training dialogs are usually dictation-based recognition and require the user to read from dialog when prompted.
ProfileKit 4 provides a way to use built-in recognizer dialogs, custom training with a built-in ProfileKit dialog or a custom dialog you design, or use no dialog at all (UI-less) with live or recorded audio. Custom training can be dictation- or grammar-based recognition.
More Information
Recognizers have their own formats for speaker profiles. Some provide training dialogs with which to perform speaker training. In addition to launching recognizer-supplied training dialogs, ProfileKit provides custom training options for training speaker profiles:
- Training dialog to manage the speaker training process;
- UI-less (no dialog) to manage the speaker training process;
- Application custom UI (application custom dialog) to manage the speaker training process; and
- Train with live audio using microphone or with pre-recorded audio.
ProfileKit supports the following recognizers and profile formats:
Recognizer | Speech API | Training Dialogs | Custom Training |
---|---|---|---|
Dragon NaturallySpeaking (all languages) | Dragon COM API | Yes | No |
IBM ViaVoice (all languages) | SMAPI | Yes | Yes |
Microsoft SAPI 4 (all languages) | SAPI 4 | Yes | No |
Microsoft SAPI 5 (all languages) | SAPI 5 | Yes | Yes |
Nuance VoCon 3200 V2 (all languages) | VoCon 3200 V2 | No | Yes |
Nuance VoCon 3200 V3 (all languages) | VoCon 3200 V3 | No | Yes |
Nuance VoCon 3200 V4 (all languages) | VoCon 3200 V4 | No | Yes |
Recognizer selection and application deployment strategy impacts how you should decide on which training method to use.
Recognizer | Training Method Considerations |
---|---|
Dragon NaturallySpeaking (all languages) | Dragon does not provide a training API so you must use the built-in dialogs for speaker training. |
IBM ViaVoice (all languages) | Since all training options are available, your application deployment approach is likely to dictate your training strategy. If you are relying heavily on dictation, then consider using built-in training or long custom training scripts. If you are using command or grammar recognition, then consider using UI or UI-less custom training. |
Microsoft SAPI 4 (all languages) | SAPI 4 does not provide a training API so you must use the built-in dialogs for speaker training. |
Microsoft SAPI 5 (all languages) | Since all training options are available, your application deployment approach is likely to dictate your training strategy. If you are relying heavily on dictation, then consider using built-in training or use long custom training scripts. If you are using command or grammar recognition, then consider using UI or UI-less custom training. |
Nuance VoCon 3200 (all versions, all languages) | VoCon does not provide built-in training dialogs. Consider using UI or UI-less custom training. |
The ChantPM class provides the StartTraining method for training speaker profiles with live (microphone) or recorded audio.
The audio source type may be one of the following values:
APIs | Constant | Value | Description |
---|---|---|---|
CROBuffer | 1 | The recording audio source is copied from a buffer. | |
CROFile | 2 | The recording audio source is read from a file. | |
CROMultiMedia | 3 | The recording audio source is from the system real-time multimedia device (e.g., a microphone). | |
CROStream | 4 | The recording audio source is read from a stream. |
The following examples illustrate using the StartTraining method to train from recorded audio. This example assumes you have recorded the end user saying the following words in order with pauses between each word to the wave file mytraining.wav: red, blue, orange, green, purple, yellow, brown; and that you have the grammar colors with these words as a list of choices. The grammar file type and syntax varies based on the recognizer used.
// Instantiate ChantPM object
NChantPM1 = new NChantPM();
// Set the engine API or enumerate and select specific engine
NChantPM1.SetNumberProperty(ChantNumberProperty.CNPEngineAPI, (int)ChantEngineAPI.CESAPI5SR);
// Set the current speaker
NChantPM1.SetStringProperty(ChantStringProperty.CSPSpeaker, "Default Speaker");
// Training the speaker with ProfileKit grammar training
NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingPhraseText, "red\nblue\norange\ngreen\npurple\nyellow\nbrown\n");
// Start training with dialog hidden
NChantPM1.StartTraining("", 0, ChantRecordingObject.CROMultiMedia, ChantAudioFormat.CAFDefault, false);
// Training the speaker with ProfileKit using a grammar file
NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingGrammarVocab, "colors.xml");
// Start training with dialog hidden using pre-recorded audio file
NChantPM1.StartTraining("mytraining.wav", 0, ChantRecordingObject.CROFile, ChantAudioFormat.CAFDefault, false);
// Instantiate ChantPM object
pChantPM = new CChantPM();
// Set the engine API or enumerate and select specific engine
pChantPM->SetNumberProperty(CNPEngineAPI, CESAPI5SR);
// Set the current speaker
pChantPM->SetStringProperty(CSPSpeaker, L"Default Speaker");
// Training the speaker with ProfileKit grammar training
pChantPM->SetTrainingProperty(CTPTrainingPhraseText, L"red\nblue\norange\ngreen\npurple\nyellow\nbrown\n");
// Start training with dialog hidden
pChantPM->StartTraining(NULL, 0, CROMultiMedia, CAFDefault, false);
// Training the speaker with ProfileKit using a grammar file
pChantPM->SetTrainingProperty(CTPTrainingGrammarVocab, L"colors.xml");
// Start training with dialog hidden using pre-recorded audio file
pChantPM->StartTraining(L"mytraining.wav", 0, CROFile, CAFDefault, false);
// Instantiate ChantPM object
pChantPM = new CChantPM();
// Set the engine API or enumerate and select specific engine
pChantPM->SetNumberProperty(CNPEngineAPI, CESAPI5SR);
// Set the current speaker
pChantPM->SetStringProperty(CSPSpeaker, "Default Speaker");
// Training the speaker with ProfileKit grammar training
pChantPM->SetTrainingProperty(CTPTrainingPhraseText, "red\nblue\norange\ngreen\npurple\nyellow\nbrown\n");
// Start training with dialog hidden
pChantPM->StartTraining(NULL, 0, CROMultiMedia, CAFDefault, false);
// Training the speaker with ProfileKit using a grammar file
pChantPM->SetTrainingProperty(CTPTrainingGrammarVocab, "colors.xml");
// Start training with dialog hidden using pre-recorded audio file
pChantPM->StartTraining("mytraining.wav", 0, CROFile, CAFDefault, false);
// Instantiate ChantPM object
ChantPM1 := TChantPM.Create();
// Set the engine API or enumerate and select specific engine
ChantPM1.SetNumberProperty(CNPEngineAPI, CESAPI5SR);
// Set the current speaker
ChantPM1.SetStringProperty(CSPSpeaker, 'Default Speaker');
// Training the speaker with ProfileKit grammar training
ChantPM1.SetTrainingProperty(CTPTrainingPhraseText, 'red\nblue\norange\ngreen\npurple\nyellow\nbrown\n');
// Start training with dialog hidden
ChantPM1.StartTraining(nil, 0, CROMultiMedia, CAFDefault, False);
// Training the speaker with ProfileKit using a grammar file
ChantPM1.SetTrainingProperty(CTPTrainingGrammarVocab, 'colors.xml');
// Start training with dialog hidden using pre-recorded audio file
ChantPM1.StartTraining('mytraining.wav', 0, CROFile, CAFDefault, False);
// Instantiate ChantPM object
JChantPM1 = new JChantPM();
// Set the engine API or enumerate and select specific engine
JChantPM1.setNumberProperty(ChantNumberProperty.CNPEngineAPI, ChantEngineAPI.CESAPI5SR);
// Set the current speaker
JChantPM1.setStringProperty(ChantStringProperty.CSPSpeaker, "Default Speaker");
// Training the speaker with ProfileKit grammar training
JChantPM1.setTrainingProperty(ChantTrainingProperty.CTPTrainingPhraseText, "red\nblue\norange\ngreen\npurple\nyellow\nbrown\n");
// Start training with dialog hidden
JChantPM1.startTraining("", 0, ChantRecordingObject.CROMultiMedia, ChantAudioFormat.CAFDefault, false);
// Training the speaker with ProfileKit using a grammar file
JChantPM1.setTrainingProperty(ChantTrainingProperty.CTPTrainingGrammarVocab, "colors.xml");
// Start training with dialog hidden using pre-recorded audio file
JChantPM1.startTraining("mytraining.wav", 0, ChantRecordingObject.CROFile, ChantAudioFormat.CAFDefault, false);
// Set the current speaker
WChantPM1.SetStringProperty(CSPSpeaker, "Default Speaker");
// Set the engine API or enumerate and select specific engine
WChantPM1.SetNumberProperty(CNPEngineAPI, CESAPI5SR);
// Training the speaker with ProfileKit grammar training
WChantPM1.SetTrainingProperty(CTPTrainingPhraseText, "red\nblue\norange\ngreen\npurple\nyellow\nbrown\n");
// Start training with dialog hidden
WChantPM1.StartTraining("", 0, CROMultiMedia, CAFDefault, false);
// Training the speaker with ProfileKit using a grammar file
WChantPM1.SetTrainingProperty(CTPTrainingGrammarVocab, "colors.xml");
// Start training with dialog hidden using pre-recorded audio file
WChantPM1.StartTraining("mytraining.wav", 0, CROFile, CAFDefault, false);
' Set the current speaker
XChantPM1.SetStringProperty(CSPSpeaker, "Default Speaker");
' Set the engine API or enumerate and select specific engine
XChantPM1.SetNumberProperty CNPEngineAPI, CESAPI5SR
' Training the speaker with ProfileKit grammar training
XChantPM1.SetTrainingProperty CTPTrainingPhraseText, "red" + vbCrLf + "blue" + vbCrLf + "orange" + vbCrLf + "green" + vbCrLf + "purple" + vbCrLf + "yellow" + vbCrLf + "brown" + vbCrLf
' Start training with dialog hidden
XChantPM1.StartTraining vbNull, 0, CROMultiMedia, CAFDefault, False
' Training the speaker with ProfileKit using a grammar file
XChantPM1.SetTrainingProperty CTPTrainingGrammarVocab, "colors.xml"
' Start training with dialog hidden using pre-recorded audio file
XChantPM1.StartTraining "mytraining.wav", 0, CROFile, CAFDefault, False
' Instantiate ChantPM object
NChantPM1 = new NChantPM()
' Set the engine API or enumerate and select specific engine
NChantPM1.SetNumberProperty(ChantNumberProperty.CNPEngineAPI, ChantEngineAPI.CESAPI5SR)
' Set the current speaker
NChantPM1.SetStringProperty(ChantStringProperty.CSPSpeaker, "Default Speaker");
' Training the speaker with ProfileKit grammar training
NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingPhraseText, "red" + vbCrLf + "blue" + vbCrLf + "orange" + vbCrLf + "green" + vbCrLf + "purple" + vbCrLf + "yellow" + vbCrLf + "brown" + vbCrLf)
' Start training with dialog hidden
NChantPM1.StartTraining("", 0, ChantRecordingObject.CROMultiMedia, ChantAudioFormat.CAFDefault, false)
' Training the speaker with ProfileKit using a grammar file
NChantPM1.SetTrainingProperty(ChantTrainingProperty.CTPTrainingGrammarVocab, "colors.xml")
' Start training with dialog hidden using pre-recorded audio file
NChantPM1.StartTraining("mytraining.wav", 0, ChantRecordingObject.CROFile, ChantAudioFormat.CAFDefault, false)
See the Chant ProfileKit 4 help file for more training examples.