Last reviewed: 3/23/2024 10:02:45 AM

Event Handling

To track recognition operations, applications receive event notifications. Event availability varies among Speech APIs.

EventEvent ArgumentsDescription
ApiErrorChantAPIErrorEventArgsNotifies the application of an API error
AudioSourceLevelAudioLevelEventArgsNotifies the application of the audio level
AudioSourceStartAudioEventArgsNotifies the application the speech recognizer has started processing audio
AudioSourceStopAudioEventArgsNotifies the application the speech recognizer has stopped processing audio
DialogClosedDialogClosedEventArgsNotifies the application the speech recognizer dialog has closed
FalseRecognitionSREventArgsNotifies the application the speech recognizer was unable to recognize speech from the utterance
InitCompleteSREventArgsNotifies the application that speech engine enumeration is complete
InterferenceInterferenceEventArgsNotifies the application the speech recognizer detected interference
PausedSREventArgsNotifies the application the speech recognizer has paused processing audio
PhraseStartSREventArgsNotifies the application the speech recognizer has detected the start of a phrase
PropertyChangePropertyChangeEventArgsNotifies the application a speech recognizer property has changed
RecognitionCommandRecognitionCommandEventArgsNotifies the application the speech recognizer recognized speech from a command vocabulary
RecognitionCommandHypothesisRecognitionCommandEventArgsNotifies the application the speech recognizer may have recognized speech from a command vocabulary
RecognitionDictationRecognitionDictationEventArgsNotifies the application the speech recognizer recognized speech from a dictation vocabulary
RecognitionDictationHypothesisRecognitionDictationEventArgsNotifies the application the speech recognizer may have recognized speech from a dictation vocabulary
RecognitionGrammarRecognitionGrammarEventArgsNotifies the application the speech recognizer recognized speech from a grammar vocabulary
RecognitionGrammarHypothesisRecognitionGrammarEventArgsNotifies the application the speech recognizer may have recognized speech from a grammar vocabulary
RecognitionHypothesisWindowsMediaRecognitionHypothesisEventArgsNotifies the application the WindowsMedia (via WinRT C++) speech recognizer may have recognized speech
RecognitionOtherRecognitionOtherEventArgsNotifies the application the speech recognizer recognized speech from another application context for Windows desktop recognizers and no command vocabulary match for Android, iOS, and macOS recognizers
RecognitionTimeOutSREventArgsNotifies the application the speech recognizer has stopped processing audio
RequestUIRequestUIEventArgsNotifies the application the speech recognizer recommends invoking one of its dialogs
SoundSREventArgsNotifies the application the speech recognizer detected sound
SRBookMarkSRBookMarkEventArgsNotifies the application the speech recognizer detected a bookmark in the audio source
SRCancelCancelEventArgsNotifies the application the speech recognizer canceled the request
UtteranceBeginSREventArgsNotifies the application the speech recognizer detected the beginning of an utterance
UtteranceEndSREventArgsNotifies the application the speech recognizer detected the end of an utterance

Some events provide data values that are returned in argument objects. Argument data availability varies among Speech APIs.

  • AudioEventArgs
    • File - File name
    • MCSAudioSourceEventArgs
      • SessionId - Session identifier
    • WindowsAudioEventArgs
      • AudioStreamOffset - Audio stream offset
      • AudioTimeOffset - Audio time offset
  • AudioLevelEventArgs
    • Level - Audio level
    • AudioStreamOffset - Audio stream offset
    • AudioTimeOffset - Audio time offset
  • CancelEventArgs
    • MCSCancelEventArgs
      • ErrorCode - Cancel error code
      • ErrorDetails - Cancel error details
      • Reason - Cancel reason
      • MCSSRCancelEventArgs
        • Offset - Session offset
        • SessionId - Session identifier
  • ChantAPIErrorEventArgs
    • Function - API funtion name
    • Message - API error message
    • RC - API error return code
  • DialogClosedEventArgs
    • Dialog - Dialog identifier
    • ExitCode - Dialog exit code
  • InterferenceEventArgs
    • Interference - The type of interference detected by the recognizer
    • AudioStreamOffset - Audio stream offset
    • AudioTimeOffset - Audio time offset
  • PropertyChangeEventArgs
    • Property - Property that changed
    • Value - New property value
    • AudioStreamOffset - Audio stream offset
    • AudioTimeOffset - Audio time offset
  • RecognitionCommandEventArgs
    • Phrase - Recognized phrase
    • AndroidRecognitionCommandEventArgs
      • Alternates - Collection of alternate recognized phrases
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
      • Semantics - Collection of semantics for recognized phrase
      • Words - Collection of recognized words
    • SFRecognitionCommandEventArgs
      • Alternates - Collection of alternate recognized phrases
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
      • Semantics - Collection of semantics for recognized phrase
      • Words - Collection of recognized words
    • WindowsRecognitionCommandEventArgs
      • AnnotatedPhrase - Annotated recognized phrase
      • StreamTime - Recognition result absolute time for start of phrase audio
      • Length - Recognition result length of the phrase specified in 100 nanosecond units
      • TickCount - Number of milliseconds elapsed from the start of the system to the start of the current result
      • Start - The total 100 nanosecond units from the start of the stream to the start of the phrase
      • SAPIElements - Collection of SAPIElements for recognized phrase
      • SAPIPhrases - Collection of SAPIPhrases for recognized phrase
      • AudioStreamOffset - Audio stream offset
      • AudioTimeOffset - Audio time offset
    • WindowsMediaRecognitionCommandEventArgs
      • Alternates - Collection of alternate recognized phrases
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
      • Duration - The amount of time required for the utterance.
      • Phrase - Recognized phrase
      • RawConfidence - Indicates the relative confidence of the result when compared with a collection of alternatives
      • Rules - Collection of rules for recognized phrase
      • Semantics - Collection of semantics for recognized phrase
      • StartTime - The start time of the utterance
      • Status - The result state
        ValueDescription
        0The recognition session or compilation succeeded
        1A topic constraint was set for an unsupported language
        2The language of the speech recognizer does not match the language of a grammar
        3A grammar failed to compile
        4Audio problems caused recognition to fail
        5User canceled recognition session
        6An unknown problem caused recognition or compilation to fail
        7A timeout due to extended silence or poor audio caused recognition to fail
        8An extended pause, or excessive processing time, caused recognition to fail
        9Network problems caused recognition to fail
        10Lack of a microphone caused recognition to fail
      • Words - Collection of recognized words
  • RecognitionDictationEventArgs
    • Text - Recognized phrase
    • AndroidRecognitionDictationEventArgs
      • Alternates - Collection of alternate recognized phrases
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
    • MCSRecognitionDictationEventArgs
      • Alternates - Alternative recognition results
      • Confidence - Confidence of recognition from 0.0 (no confidence) to 1.0 (full confidence)
      • Lexical - The actual words recognized
      • ITN - Inverse-text-normalized form of the recognized text and other transformations applied
      • MaskedITN - Normalized form with profanity masked
      • SessionId - Session identifier
      • Offset - Offset of the recognized speech in ticks. A single tick represents one hundred nanoseconds
      • Duration - Duration of the recognized speech that does not include trailing or leading silence
      • Words - Word level timing result list
    • SFRecognitionDictationEventArgs
      • Alternates - Collection of alternate recognized phrases
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
    • WindowsRecognitionDictationEventArgs
      • StreamTime - Recognition result absolute time for start of phrase audio
      • Length - Recognition result length of the phrase specified in 100 nanosecond units
      • TickCount - Number of milliseconds elapsed from the start of the system to the start of the current result
      • Start - The total 100 nanosecond units from the start of the stream to the start of the phrase
      • AudioStreamOffset - Audio stream offset
      • AudioTimeOffset - Audio time offset
    • WindowsMediaRecognitionDictationEventArgs
      • Alternates - Collection of alternate recognized phrases
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
      • Text - Recognized phrase
  • RecognitionGrammarEventArgs
    • Phrase - Recognized phrase
    • AndroidRecognitionGrammarEventArgs
      • Alternates - Collection of alternate recognized phrases
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
      • Rules - Collection of rules for recognized phrase
      • Semantics - Collection of semantics for recognized phrase
      • Words - Collection of recognized words
    • SFRecognitionGrammarEventArgs
      • Alternates - Collection of alternate recognized phrases
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
      • Rules - Collection of rules for recognized phrase
      • Semantics - Collection of semantics for recognized phrase
      • Words - Collection of recognized words
    • WindowsRecognitionGrammarEventArgs
      • AnnotatedPhrase - Annotated recognized phrase
      • StreamTime - Recognition result absolute time for start of phrase audio
      • Length - Recognition result length of the phrase specified in 100 nanosecond units
      • TickCount - Number of milliseconds elapsed from the start of the system to the start of the current result
      • Start - The total 100 nanosecond units from the start of the stream to the start of the phrase
      • SAPIElements - Collection of SAPIElements for recognized phrase
      • SAPIPhrases - Collection of SAPIPhrases for recognized phrase
      • SAPIRules - Collection of SAPIRules for recognized phrase
      • AudioStreamOffset - Audio stream offset
      • AudioTimeOffset - Audio time offset
    • WindowsMediaRecognitionGrammarEventArgs
      • Alternates - Collection of alternate recognized phrases
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
      • Duration - The amount of time required for the utterance.
      • Phrase - Recognized phrase
      • RawConfidence - Indicates the relative confidence of the result when compared with a collection of alternatives
      • Rules - Collection of rules for recognized phrase
      • Semantics - Collection of semantics for recognized phrase
      • StartTime - The start time of the utterance
      • Status - The result state
        ValueDescription
        0The recognition session or compilation succeeded
        1A topic constraint was set for an unsupported language
        2The language of the speech recognizer does not match the language of a grammar
        3A grammar failed to compile
        4Audio problems caused recognition to fail
        5User canceled recognition session
        6An unknown problem caused recognition or compilation to fail
        7A timeout due to extended silence or poor audio caused recognition to fail
        8An extended pause, or excessive processing time, caused recognition to fail
        9Network problems caused recognition to fail
        10Lack of a microphone caused recognition to fail
      • Words - Collection of recognized words
  • RecognitionHypothesisEventArgs
    • Text - Recognized phrase
    • MCSRecognitionHypothesisEventArgs
      • Duration - Audio stream offset
      • Offset - Audio time offset
      • s
      • SessionId - Session identifier
    • WindowsMediaRecognitionHypothesisEventArgs
  • RecognitionOtherEventArgs
    • Command - Recognized phrase
    • AndroidRecognitionOtherEventArgs
    • SFRecognitionOtherEventArgs
    • WindowsRecognitionOtherEventArgs
      • AudioStreamOffset - Audio stream offset
      • AudioTimeOffset - Audio time offset
  • RequestUIEventArgs
    • RequestUI - The requested recognizer dialog
    • AudioStreamOffset - Audio stream offset
    • AudioTimeOffset - Audio time offset
  • SRBookMarkEventArgs
    • MarkValue - Bookmark value
    • RecoEventFlag - Speech recognition event flag
    • AudioStreamOffset - Audio stream offset
    • AudioTimeOffset - Audio time offset
  • SREventArgs
    • AndroidSREventArgs
    • MCSSREventArgs
      • Duration - Audio stream offset
      • Offset - Audio time offset
      • SessionId - Session identifier
    • SFSREventArgs
    • WindowsSREventArgs
    • WindowsMediaSREventArgs
      • AudioStreamOffset - Audio stream offset
      • AudioTimeOffset - Audio time offset

Event arguments may contain the following class objects:

  • ChantAlternate
    • Phrase - Recognized phrase
    • AndroidAlternate
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
    • MCSAlternate
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
      • Lexical - The actual words recognized
      • ITN - Inverse-text-normalized form of the recognized text and other transformations applied
      • MaskedITN - Normalized form with profanity masked
      • SessionId - Session identifier
      • Offset - Offset of the recognized speech in ticks. A single tick represents one hundred nanoseconds
      • Duration - Duration of the recognized speech that does not include trailing or leading silence
      • Words - Word level timing result list
    • SFAlternate
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
    • WindowsMediaAlternate
      • Confidence - Indicates the confidence of the speech recognizer in the recognition result
  • ChantSAPIElement
    • DisplayText - The display text for this element
    • LexicalForm - The lexical form of this element
    • Pronunciation - The phonemes for this element
    • ActualConfidence - The actual confidence for this element
    • SREngineConfidence - The confidence score computed by the SR engine
    • AudioTimeOffset - The starting offset of the element in 100-nanosecond units of time relative to the start of the phrase
    • AudioSizeTime - The length of the element in 100-nanosecond units of time
    • AudioStreamOffset - The starting offset of the element in bytes relative to the start of the phrase in the original input stream
    • AudioSizeBytes - The size of the element in bytes in the original input stream
    • RetainedStreamOffset - The starting offset of the element in bytes relative to the start of the phrase in the retained audio stream
    • RetainedSizeBytes - The size of the element in bytes in the retained audio stream
  • ChantSAPIPhrase
    • PropName - Property name
    • PropID - Property ID
    • ValStr - ValStr value
    • Val - Val value
    • SREngineConfidence - Confidence computed by the speech recognition engine
    • Confidence - Confidence computed by SAPI
  • ChantSAPIRule
    • Name - Rule name
    • ID - Rule ID
    • SREngineConfidence - Confidence computed by the speech recognition engine
    • Confidence - Confidence computed by SAPI
  • ChantSemantic
    • PropName - Property name
    • PropValue - Property value
  • ChantWord
    • Text - Word text
    • MCSWord
      • Duration - Word duration
      • Offset - Word offset
    • WindowsMediaWord

Event notifications are recieved in callback routines as follows:


_Recognizer = _SpeechKit.createChantRecognizer();
if (_Recognizer != null)
{
    // Set the callback
    _Recognizer.setChantSpeechKitEvents(this);
    // Register Callbacks for engine init
    _Recognizer.registerCallback(ChantSpeechKitCallback.CCSRRecognitionDictation);
}

_Recognizer = _SpeechKit.CreateChantRecognizer();
if (_Recognizer != null)
{
    _Recognizer.RecognitionCommand += this.Recognizer_RecognitionCommand;
}

_Recognizer = _SpeechKit->CreateChantRecognizer();
if (_Recognizer != NULL)
{
    // Register Event Handlers
    _Recognizer->SetRecognitionCommand(RecognitionCommand);
}
    

_Recognizer = _SpeechKit->CreateChantRecognizer();
if (_Recognizer != NULL)
{
    // Register Event Handlers
    _Recognizer->SetRecognitionCommand(RecognitionCommand);
}

_Recognizer := _SpeechKit.CreateChantRecognizer();
if (_Recognizer <> nil) then
begin
    // Register Event Handlers
    _Recognizer.RecognitionCommand := RecognitionCommand;
end;
    

_Recognizer = _SpeechKit.createChantRecognizer();
if (_Recognizer != null)
{
    // Set the callback object
    _Recognizer.setChantSpeechKitEvents(this);
    // Register for callbacks
    _Recognizer.registerCallback(ChantSpeechKitCallback.CCSRRecognitionCommand);
}

_recognizer = [_speechKit createChantRecognizer];
if (_recognizer != nil)
{
    [_recognizer setDelegate:(id<SPChantRecognizerDelegate>)self];
}

_Recognizer = _SpeechKit!.createChantRecognizer()
if (_Recognizer != nil)
{
    _Recognizer!.delegate = self
}

_Recognizer = _SpeechKit.CreateChantRecognizer()
// Declaring the event handlers routines with Handles clause in VB automatically registers for the event notifications
Private Sub Recognizer_RecognitionCommand(ByVal sender As System.Object, ByVal e As RecognitionCommandEventArgs) Handles _Recognizer.RecognitionCommand
    

The recognizer object sends all notifications to the event handlers. All event data is contained in a arguments object.


                @Override
public void recognitionDictation(Object o, RecognitionDictationEventArgs recognitionDictationEventArgs)
{
    // Display recognized speech
    final EditText textBox1 = (EditText) findViewById(R.id.textbox1);
    if ((textBox1 != null) && (recognitionDictationEventArgs.getText() != null)) {
        textBox1.append( recognitionDictationEventArgs.getText() + "\n" );
    }
    ...
}
    

private void Recognizer_RecognitionCommand(object sender, RecognitionCommandEventArgs e)
{
    if ((e != null) && (e.Phrase != null))
    {
        ...
    }
}
    

void CALLBACK RecognitionCommand(void* Sender, CRecognitionCommandEventArgs* Args)
{
    ...
    // Get the command properties
    if ((Args != NULL) && (wcslen(Args->GetPhrase()) > 0))
    {
        ...
    }
}
    

void RecognitionCommand(void* Sender, CRecognitionCommandEventArgs* Args)
{
    // Get the command
    if ((Args != NULL) && (Args->GetPhrase().Length() > 0))
    {
        ...
    }
}
    

procedure TForm1.RecognitionCommand(Sender: TObject; Args: TRecognitionCommandEventArgs);
begin
    // Get the command properties
    If ((Args <> nil) and (Length(Args.Phrase) > 0)) then
    begin
      ...
    end;
end;
    

public void recognitionCommand(Object sender, RecognitionCommandEventArgs args)
{
    if ((args != null) && (args.getPhrase() != null))
    {
        ...
    }
}
    

-(void)recognitionDictation:(NSObject *)sender args:(SPRecognitionDictationEventArgs *)args;
{
    NSString* newText = [NSString stringWithFormat:@"%@%@ ", [_textView1 text], [args text]];
    [_textView1 setText:newText];
}

func recognitionDictation(sender: SPChantRecognizer, args: SPRecognitionDictationEventArgs)
{
    let newText = String(format: "%@%@ ", self.textView1.text, args.text)
    self.textView1.text = newText
}
    

Private Sub Recognizer_RecognitionCommand(ByVal sender As System.Object, ByVal e As RecognitionCommandEventArgs) Handles _Recognizer.RecognitionCommand
    If ((e IsNot Nothing) And (e.Phrase IsNot Nothing)) Then
        ...
    End If
End Sub