Last reviewed: 3/23/2024 10:07:26 AM
Recognizing Speech
A speech recognizer converts speech to text for transcription, data entry, or command and control. In addition, events are generated to return recognized speech and indicate processing states.
The Microsoft Speech API (SAPI5) and WindowsMedia runtimes are part of Windows that provides application control of the listening context and events for recognized speech and processing states of a recognizer. Microsoft includes a speech recognizer in many Windows SKUs.
Recognizers from other speech technology vendors do not support Microsoft APIs and event processing and provide their own proprietary speech API with SDK and runtimes. See the section Recognizer and Synthesizer Installation for more information about speech technologies.
SpeechKit provides common speech recognition management for multiple application scenarios across across the various speech technology APIs by managing speech recognition directly with the recognizer.
SpeechKit includes libraries for the following Speech APIs for speech recognition:
Speech API | Platforms |
---|---|
Apple Speech | ARM, x64, x86 |
Google android.speech | ARM |
Microsoft Azure Speech | ARM, x64, x86 |
Microsoft SAPI 5 | x64, x86 |
Microsoft Speech Platform | x64, x86 |
Microsoft .NET System.Speech | x64, x86 |
Microsoft .NET Microsoft.Speech | x64, x86 |
Microsoft WindowsMedia (UWP) | ARM, x64, x86 |
Microsoft WindowsMedia (WinRT) | x86, x64 |
Nuance Dragon NaturallySpeaking | x64, x86 |
Libraries for the most popular recognizer speech APIs are included in Chant Developer Workbench. For additional libraries that support different APIs, runtimes, versions, and vendors contact Chant Support.
SpeechKit supports speech recognition with with a single request.
// Start speech recognition from microphone audio source
_Recognizer.startRecognition();
// Start speech recognition from microphone audio source
_Recognizer.StartRecognition();
// Transcribe from audio file
_Recognizer.TranscribeAudio("myaudio.wav");
// Transcribe from text - Emulate Speech Recognition
_Recognizer.TranscribeAudio("The first Saturday in January");
// Start speech recognition from microphone audio source
_Recognizer.StartRecognition();
// Transcribe from audio file
_Recognizer.TranscribeAudio("myaudio.wav");
// Transcribe from text - Emulate Speech Recognition
_Recognizer.TranscribeAudio("The first Saturday in January");
// Start speech recognition from microphone audio source
_Recognizer->StartRecognition();
// Transcribe from audio file
_Recognizer->TranscribeAudio("myaudio.wav");
// Transcribe from text - Emulate Speech Recognition
_Recognizer->TranscribeAudio("The first Saturday in January");
// Start speech recognition from microphone audio source
_Recognizer.StartRecognition();
// Transcribe from audio file
_Recognizer.TranscribeAudio('myaudio.wav');
// Transcribe from text - Emulate Speech Recognition
_Recognizer.TranscribeAudio('The first Saturday in January');
// Start speech recognition from microphone audio source
_Recognizer.startRecognition();
// Transcribe from audio file
_Recognizer.transcribeAudio("myaudio.wav");
// Transcribe from text - Emulate Speech Recognition
_Recognizer.transcribeAudio("The first Saturday in January");
// Start speech recognition from microphone audio source
[_recognizer startRecognition];
// Start speech recognition from microphone audio source
_Recognizer!.startRecognition()
' Start speech recognition from microphone audio source
_Recognizer.StartRecognition()
' Transcribe from audio file
_Recognizer.TranscribeAudio("myaudio.wav")
' Transcribe from text - Emulate Speech Recognition
_Recognizer.TranscribeAudio("The first Saturday in January")
To know the progress or state of speech recognition and process the recognized speech, the application processes event callbacks.
public class MainActivity extends AppCompatActivity implements com.speechkit.JChantSpeechKitEvents
{
...
// Set the callback
_Recognizer.setChantSpeechKitEvents(this);
// Register Callbacks for engine init
_Recognizer.registerCallback(ChantSpeechKitCallback.CCSRRecognitionDictation);
...
@Override
public void recognitionDictation(Object o, RecognitionDictationEventArgs recognitionDictationEventArgs)
{
// Display recognized speech
final EditText textBox1 = (EditText) findViewById(R.id.textbox1);
if ((textBox1 != null) && (recognitionDictationEventArgs.getText() != null)) {
textBox1.append( recognitionDictationEventArgs.getText() + "\n" );
}
...
}
}
// Register Event Handler
_Recognizer.RecognitionDictation += Recognizer_RecognitionDictation;
...
private void Recognizer_RecognitionDictation(object sender, RecognitionDictationEventArgs e)
{
if ((e != null) && (e.Text != string.Empty))
{
textBox1.Text += e.Text;
textBox1.Text += " ";
// Make Visible
textBox1.SelectionStart = textBox1.Text.Length;
}
}
// Register Event Handler
_Recognizer->SetRecognitionDictation(RecognitionDictation);
...
void CALLBACK RecognitionDictation(void* Sender, CRecognitionDictationEventArgs* Args)
{
CDictationDlg* dlg = (CDictationDlg*)AfxGetApp()->GetMainWnd();
if (dlg != NULL)
{
// Add text in the text box
if ((Args != NULL) && (wcslen(Args->GetText()) > 0))
{
CString sText;
CEdit* pEdit = (CEdit*)dlg->GetDlgItem(IDC_EDIT1);
pEdit->GetWindowText(sText);
sText += Args->GetText();
pEdit->SetWindowText(sText);
// Make Visible
pEdit->SetSel(sText.GetLength(), sText.GetLength());
}
}
}
// Register Event Handler
_Recognizer->SetRecognitionDictation(RecognitionDictation);
...
void CALLBACK RecognitionDictation(void* Sender, CRecognitionDictationEventArgs* Args)
{
// Add text in the text box
if ((Args != NULL) && (Args->GetText().Length() > 0))
{
Form1->Memo1->Text = Form1->Memo1->Text + Args->GetText();
}
}
// Register event handler
_Recognizer.RecognitionDictation := RecognitionDictation;
...
procedure TForm1.RecognitionDictation(Sender: TObject; Args: TRecognitionDictationEventArgs);
begin
// Add text in the text box
If ((Args <> nil) and (Length(Args.Text) > 0)) then
begin
Form1.Memo1.Text := Form1.Memo1.Text + Args.Text;
end;
end;
public class Frame1 extends JFrame implements com.speechkit.JChantSpeechKitEvents
...
// Set the callback
_Recognizer.setChantSpeechKitEvents(this);
// Register Callbacks for receiving recognized speech.
_Recognizer.registerCallback(ChantSpeechKitCallback.CCSRRecognitionDictation);
...
public void recognitionDictation(Object sender, RecognitionDictationEventArgs args)
{
if ((args != null) && (args.getText() != null))
{
jTextArea1.append(args.getText());
// Make Visible
jTextArea1.setCaretPosition(jTextArea1.getText().length());
}
}
// Set the callback
[_recognizer setDelegate:(id<SPChantRecognizerDelegate>)self];
...
-(void)recognitionDictation:(NSObject *)sender args:(SPRecognitionDictationEventArgs *)args;
{
NSString* newText = [NSString stringWithFormat:@"%@%@ ", [_textView1 text], [args text]];
[_textView1 setText:newText];
}
// Set the callback
[_recognizer setDelegate:(id<SPChantRecognizerDelegate>)self];
...
-(void)recognitionDictation:(NSObject *)sender args:(SPRecognitionDictationEventArgs *)args;
{
NSString* newText = [NSString stringWithFormat:@"%@%@ ", [_textView1 text], [args text]];
[_textView1 setText:newText];
}
Dim WithEvents _Recognizer As NSAPI5Recognizer = Nothing
...
Private Sub Recognizer_RecognitionDictation(ByVal sender As System.Object, ByVal e As RecognitionDictationEventArgs) Handles _Recognizer.RecognitionDictation
If (e.Text <> String.Empty) Then
textBox1.Text += e.Text
textBox1.Text += " "
' Make Visible
textBox1.SelectionStart = textBox1.Text.Length
End If
End Sub
To control basic properties of how speech recognition occurs, some recognizers support property settings. Review the vendor Speech API version-specific documentation for supported properties.
// No properties
// Set the silence timeout
_Recognizer.SetProperty("endsilencetimeout","200");
// Set the silence timeout
_Recognizer->SetProperty(L"endsilencetimeout","200");
// Set the silence timeout
_Recognizer->SetProperty("endsilencetimeout","200");
// Set the silence timeout
_Recognizer.SetProperty('endsilencetimeout','200');
// Set the silence timeout
_Recognizer.setProperty("endsilencetimeout","200");
// No properties
// No properties
' Set the silence timeout
_Recognizer.SetProperty("endsilencetimeout","200")
Microsoft Azure Speech Properties
(Source: learn.microsoft.com)
Property | Value |
---|---|
audiologgingenabled | Audio and content logs are retained. |
devicename | The multimedia device ID that is used by the audio object. |
dictationenabled | Interpret punctation. |
profanityoption | Removes profanity or replaces letters of profane words with stars (0 - Masked, 1 - Removed, 2 - Does nothing). |
languages | One or more languages for which to recognize. |
speechkey | The Azure Speech Services key. |
speechregion | The The Azure Speech Services region. |
Microsoft SAPI 5 Properties
(Source: Microsoft SAPI5 Help File)
Property | Value |
---|---|
deviceid | The multimedia device ID that is used by the audio object. |
lineid | The current line identifier associated with the multimedia device. |
Microsoft WindowsMedia Properties
(Source: learn.microsoft.com)
Property | Value |
---|---|
babbletimeout | The length of time that the speech recognizer continues to listen while detecting only non-speech input such as background noise. The default is 0 seconds (not activated). |
endsilencetimeout | The length of time that the speech recognizer continues to listen while detecting only silence. The default is 150 milliseconds. |
initialsilencetimeout | The length of time that the speech recognizer continues to listen while detecting only silence. The default is 5 seconds. |
autostopsilencetimeout | The time threshold at which the continuous recognition session ends due to lack of audio input. |
isreadbackenabled | Whether the recognized text is spoken back to the user on the Heard you say screen. The default is true. |
showconfirmation | Whether a Heard you say screen is shown to the user after speech recognition is completed. The default is true. |
audibleprompt | The heading text that is displayed on the Listening screen. The default is "Listening...". |
exampletext | The example text shown on the Listening screen. |
Nuance Dragon NaturallySpeaking Properties
(Source: Dragon NaturallySpeaking Help File)
Property | Value |
---|---|
adaptationon | Dragon NaturallySpeaking property dgnregGlobalCM to load the Dragon NaturallySpeaking global compatibility module, enabling global commands, tracking, and global dictation support (Dragon NaturalText). Use 0 for off and 1 for on. Default value for Dragon NaturallySpeaking is 0. |
enabled | Dragon NaturallySpeaking DragonBar is visible. Valid values include: 0 (disabled) and 1 (enabled). |
engineui | Indicates whether this application is displaying the Dragon NaturallySpeaking tray microphone icon and result box. Values may be combined. Valid values: 0 (all hidden), 1 (tray icon visible), 2 (result box visible). |
languageid | Indicates the Dragon NaturallySpeaking language. |
registerconstants | Global compatibility module constants. |
maxalternates | Indicates that alternate recognized phrases should be returned as part of the recognition results. |
resultsboxpos | The Dragon NaturallySpeaking results box position: left, top, right, bottom. |
speaker | Name of the current speaker (i.e., end user). |
topic | Speech recognition topic. |