Last reviewed: 3/23/2024 10:07:26 AM

Recognizing Speech

A speech recognizer converts speech to text for transcription, data entry, or command and control. In addition, events are generated to return recognized speech and indicate processing states.

The Microsoft Speech API (SAPI5) and WindowsMedia runtimes are part of Windows that provides application control of the listening context and events for recognized speech and processing states of a recognizer. Microsoft includes a speech recognizer in many Windows SKUs.

Recognizers from other speech technology vendors do not support Microsoft APIs and event processing and provide their own proprietary speech API with SDK and runtimes. See the section Recognizer and Synthesizer Installation for more information about speech technologies.

SpeechKit provides common speech recognition management for multiple application scenarios across across the various speech technology APIs by managing speech recognition directly with the recognizer.

SpeechKit includes libraries for the following Speech APIs for speech recognition:

Speech APIPlatforms
Apple SpeechARM, x64, x86
Google android.speechARM
Microsoft Azure SpeechARM, x64, x86
Microsoft SAPI 5x64, x86
Microsoft Speech Platformx64, x86
Microsoft .NET System.Speechx64, x86
Microsoft .NET Microsoft.Speechx64, x86
Microsoft WindowsMedia (UWP)ARM, x64, x86
Microsoft WindowsMedia (WinRT)x86, x64
Nuance Dragon NaturallySpeakingx64, x86

Libraries for the most popular recognizer speech APIs are included in Chant Developer Workbench. For additional libraries that support different APIs, runtimes, versions, and vendors contact Chant Support.

SpeechKit supports speech recognition with with a single request.


// Start speech recognition from microphone audio source
_Recognizer.startRecognition();

// Start speech recognition from microphone audio source
_Recognizer.StartRecognition();
// Transcribe from audio file
_Recognizer.TranscribeAudio("myaudio.wav");
// Transcribe from text - Emulate Speech Recognition
_Recognizer.TranscribeAudio("The first Saturday in January");

// Start speech recognition from microphone audio source
_Recognizer.StartRecognition();
// Transcribe from audio file
_Recognizer.TranscribeAudio("myaudio.wav");
// Transcribe from text - Emulate Speech Recognition
_Recognizer.TranscribeAudio("The first Saturday in January");
    

// Start speech recognition from microphone audio source
_Recognizer->StartRecognition();
// Transcribe from audio file
_Recognizer->TranscribeAudio("myaudio.wav");
// Transcribe from text - Emulate Speech Recognition
_Recognizer->TranscribeAudio("The first Saturday in January");
    

// Start speech recognition from microphone audio source
_Recognizer.StartRecognition();
// Transcribe from audio file
_Recognizer.TranscribeAudio('myaudio.wav');
// Transcribe from text - Emulate Speech Recognition
_Recognizer.TranscribeAudio('The first Saturday in January');
    

// Start speech recognition from microphone audio source
_Recognizer.startRecognition();
// Transcribe from audio file
_Recognizer.transcribeAudio("myaudio.wav");
// Transcribe from text - Emulate Speech Recognition
_Recognizer.transcribeAudio("The first Saturday in January");

// Start speech recognition from microphone audio source
[_recognizer startRecognition];

// Start speech recognition from microphone audio source
_Recognizer!.startRecognition()

' Start speech recognition from microphone audio source
_Recognizer.StartRecognition()
' Transcribe from audio file
_Recognizer.TranscribeAudio("myaudio.wav")
' Transcribe from text - Emulate Speech Recognition
_Recognizer.TranscribeAudio("The first Saturday in January")
    

To know the progress or state of speech recognition and process the recognized speech, the application processes event callbacks.


public class MainActivity extends AppCompatActivity implements com.speechkit.JChantSpeechKitEvents
{
        ...
        // Set the callback
        _Recognizer.setChantSpeechKitEvents(this);
        // Register Callbacks for engine init
        _Recognizer.registerCallback(ChantSpeechKitCallback.CCSRRecognitionDictation);
        ...
    @Override
    public void recognitionDictation(Object o, RecognitionDictationEventArgs recognitionDictationEventArgs)
    {
        // Display recognized speech
        final EditText textBox1 = (EditText) findViewById(R.id.textbox1);
        if ((textBox1 != null) && (recognitionDictationEventArgs.getText() != null)) {
            textBox1.append( recognitionDictationEventArgs.getText() + "\n" );
        }
        ...
   }
}
    

// Register Event Handler
_Recognizer.RecognitionDictation += Recognizer_RecognitionDictation;
...
private void Recognizer_RecognitionDictation(object sender, RecognitionDictationEventArgs e)
{
    if ((e != null) && (e.Text != string.Empty))
    {
        textBox1.Text += e.Text;
        textBox1.Text += " ";
        // Make Visible
        textBox1.SelectionStart = textBox1.Text.Length;
    }
}
    

// Register Event Handler
_Recognizer->SetRecognitionDictation(RecognitionDictation);
...
void CALLBACK RecognitionDictation(void* Sender, CRecognitionDictationEventArgs* Args)
{
    CDictationDlg* dlg = (CDictationDlg*)AfxGetApp()->GetMainWnd();
    if (dlg != NULL)
    {
        // Add text in the text box
        if ((Args != NULL) && (wcslen(Args->GetText()) > 0))
        {
            CString sText;
            CEdit* pEdit = (CEdit*)dlg->GetDlgItem(IDC_EDIT1);
            pEdit->GetWindowText(sText);
            sText += Args->GetText();
            pEdit->SetWindowText(sText);
            // Make Visible
            pEdit->SetSel(sText.GetLength(), sText.GetLength());
        }
    }
}
    

// Register Event Handler
_Recognizer->SetRecognitionDictation(RecognitionDictation);
...
void CALLBACK RecognitionDictation(void* Sender, CRecognitionDictationEventArgs* Args)
{
    // Add text in the text box
    if ((Args != NULL) && (Args->GetText().Length() > 0))
    {
        Form1->Memo1->Text = Form1->Memo1->Text + Args->GetText();
    }
}
    

// Register event handler
_Recognizer.RecognitionDictation := RecognitionDictation;
...
procedure TForm1.RecognitionDictation(Sender: TObject; Args: TRecognitionDictationEventArgs);
begin
    // Add text in the text box
    If ((Args <> nil) and (Length(Args.Text) > 0)) then
    begin
      Form1.Memo1.Text := Form1.Memo1.Text + Args.Text;
    end;
end;
    

public class Frame1 extends JFrame implements com.speechkit.JChantSpeechKitEvents
...
// Set the callback
_Recognizer.setChantSpeechKitEvents(this);
// Register Callbacks for receiving recognized speech.
_Recognizer.registerCallback(ChantSpeechKitCallback.CCSRRecognitionDictation);
...
public void recognitionDictation(Object sender, RecognitionDictationEventArgs args)
{
    if ((args != null) && (args.getText() != null))
    {
        jTextArea1.append(args.getText());
        // Make Visible
        jTextArea1.setCaretPosition(jTextArea1.getText().length());
    }
}
    

// Set the callback
[_recognizer setDelegate:(id<SPChantRecognizerDelegate>)self];
...
-(void)recognitionDictation:(NSObject *)sender args:(SPRecognitionDictationEventArgs *)args;
{
    NSString* newText = [NSString stringWithFormat:@"%@%@ ", [_textView1 text], [args text]];
    [_textView1 setText:newText];
}

// Set the callback
[_recognizer setDelegate:(id<SPChantRecognizerDelegate>)self];
...
-(void)recognitionDictation:(NSObject *)sender args:(SPRecognitionDictationEventArgs *)args;
{
    NSString* newText = [NSString stringWithFormat:@"%@%@ ", [_textView1 text], [args text]];
    [_textView1 setText:newText];
}
    

Dim WithEvents _Recognizer As NSAPI5Recognizer = Nothing
...
Private Sub Recognizer_RecognitionDictation(ByVal sender As System.Object, ByVal e As RecognitionDictationEventArgs) Handles _Recognizer.RecognitionDictation
    If (e.Text <> String.Empty) Then
        textBox1.Text += e.Text
        textBox1.Text += " "
        ' Make Visible
        textBox1.SelectionStart = textBox1.Text.Length
    End If
End Sub
    

To control basic properties of how speech recognition occurs, some recognizers support property settings. Review the vendor Speech API version-specific documentation for supported properties.


// No properties

// Set the silence timeout
_Recognizer.SetProperty("endsilencetimeout","200");

// Set the silence timeout
_Recognizer->SetProperty(L"endsilencetimeout","200");

// Set the silence timeout
_Recognizer->SetProperty("endsilencetimeout","200");

// Set the silence timeout
_Recognizer.SetProperty('endsilencetimeout','200');

// Set the silence timeout
_Recognizer.setProperty("endsilencetimeout","200");

// No properties

// No properties

' Set the silence timeout
_Recognizer.SetProperty("endsilencetimeout","200")
    

Microsoft Azure Speech Properties

(Source: learn.microsoft.com)

PropertyValue
audiologgingenabledAudio and content logs are retained.
devicenameThe multimedia device ID that is used by the audio object.
dictationenabledInterpret punctation.
profanityoptionRemoves profanity or replaces letters of profane words with stars (0 - Masked, 1 - Removed, 2 - Does nothing).
languagesOne or more languages for which to recognize.
speechkeyThe Azure Speech Services key.
speechregionThe The Azure Speech Services region.

Microsoft SAPI 5 Properties

(Source: Microsoft SAPI5 Help File)

PropertyValue
deviceidThe multimedia device ID that is used by the audio object.
lineidThe current line identifier associated with the multimedia device.

Microsoft WindowsMedia Properties

(Source: learn.microsoft.com)

PropertyValue
babbletimeoutThe length of time that the speech recognizer continues to listen while detecting only non-speech input such as background noise. The default is 0 seconds (not activated).
endsilencetimeoutThe length of time that the speech recognizer continues to listen while detecting only silence. The default is 150 milliseconds.
initialsilencetimeoutThe length of time that the speech recognizer continues to listen while detecting only silence. The default is 5 seconds.
autostopsilencetimeoutThe time threshold at which the continuous recognition session ends due to lack of audio input.
isreadbackenabledWhether the recognized text is spoken back to the user on the Heard you say screen. The default is true.
showconfirmationWhether a Heard you say screen is shown to the user after speech recognition is completed. The default is true.
audiblepromptThe heading text that is displayed on the Listening screen. The default is "Listening...".
exampletextThe example text shown on the Listening screen.

Nuance Dragon NaturallySpeaking Properties

(Source: Dragon NaturallySpeaking Help File)

PropertyValue
adaptationonDragon NaturallySpeaking property dgnregGlobalCM to load the Dragon NaturallySpeaking global compatibility module, enabling global commands, tracking, and global dictation support (Dragon NaturalText). Use 0 for off and 1 for on. Default value for Dragon NaturallySpeaking is 0.
enabledDragon NaturallySpeaking DragonBar is visible. Valid values include: 0 (disabled) and 1 (enabled).
engineuiIndicates whether this application is displaying the Dragon NaturallySpeaking tray microphone icon and result box. Values may be combined. Valid values: 0 (all hidden), 1 (tray icon visible), 2 (result box visible).
languageidIndicates the Dragon NaturallySpeaking language.
registerconstantsGlobal compatibility module constants.
maxalternatesIndicates that alternate recognized phrases should be returned as part of the recognition results.
resultsboxposThe Dragon NaturallySpeaking results box position: left, top, right, bottom.
speakerName of the current speaker (i.e., end user).
topicSpeech recognition topic.