Managing queued speech processing requests

Last reviewed: 7/8/2022

HOW Article ID: H072222

The information in this article applies to:

  • Speech Manager 2

Summary

Applications may generate speech requests dynamically as a result many different types of interactions creating complex management situations.

More Information

Chant Speech Manager supports two types of speech requests: transcription and speech synthesis.

With transcription requests, audio streams (i.e., buffers and files) serve as audio source for speech recognition.

With synthesis requests, audio is generated from synthesizing speech from text and returned as buffers, file, or streamed for live playback.

Requests are created, scheduled, and destroyed. A request is created with optional parameters that specify the details for transcription or synthesis. Once a request is created, it can be managed with various priorities:

  • Cancel - cancel the request;
  • Interrupt - cancel the current request and process this request immediately (speech synthesis only);
  • Priority - place the request at the head of the queue to be the next request processed; or
  • Schedule - append the request at the end of the queue to be processed.
The request can be destroyed when appropriate by the application.

Speech Manager uses the SpeechKit speech API libraries to process the requests. All SpeechKit speech API property setting and event handling is supported.


// Instantiate SpeechManager
NSpeechManager _SpeechManager = new NSpeechManager();
if (_SpeechManager != null)
{
    // Set credentials
    _SpeechManager.SetCredentials("Credentials");
    // Create transcription request
    NChantTranscribeAudioRequest transcribeRequest = _SpeechManager.CreateTranscribeAudioRequest("", "", "myaudio.wav");
    if (transcribeRequest != null)
    {
        // Register for recognition events
        transcribeRequest.RecognitionDictation += Recognizer_RecognitionDictation;
        // Optionally register for begin/end events
        transcribeRequest.AudioSourceStart += Recognizer_AudioSourceStart;
        transcribeRequest.AudioSourceStop += Recognizer_AudioSourceStop;

        // Schedule request
        transcribeRequest.ScheduleRequest();

        // Since we no longer need it, destroy it
        transcribeRequest.Dispose();
    }
    // Create synthesis request
    NChantSpeakRequest speakRequest = _SpeechManager.CreateSpeakRequest("See how easy it is to talk with Speech Manager.");
    if (speakRequest != null)
    {
        // Optionally register for begin/end events
        speakRequest.AudioDestStart += Synthesizer_AudioDestStart;
        speakRequest.AudioDestStop += Synthesizer_AudioDestStop;
        speakRequest.Done += Synthesizer_TTSDone;
        speakRequest.Started += Synthesizer_TTSStarted;

        // Schedule request
        speakRequest.ScheduleRequest();

        // Since we no longer need it, destroy it
        speakRequest.Dispose();
    }
}

Syntax Options

A SpeechManager supports the following methods:

  • CreateTranscribeAudioRequest - Instantiate a transcription request.
  • CreateSpeakRequest - Instantiate a speak request;
  • FlushSRRequests - Remove all transcription requests.
  • FlushTTSRequests - Remove all speak requests.
  • QuiesceSRRequests - Stop processing all transcription requests allowing the current request to finish.
  • QuiesceTTSRequests - Stop processing all speak requests allowing the current request to finish.
  • StartSRRequests - Start processing transcription requests.
  • StartTTSRequests - Start processing speak requests.
  • StopSRRequests - Stop processing all transcription requests canceling the current request.
  • StopTTSRequests - Stop processing all speak requests canceling the current request.

The CreateTranscribeAudioRequest method has two optional parameters that may be used to control speech recognition:

  • commands - (optional) Comma-separated list of commands for command recognition.
  • grammar - (optional) File path of speech recognition grammar for grammar recognition.
  • audiofile - (optional) File path of audio file for transcription. Use PutAudioBytes for audio buffers.
  • api - (optional) Speech API. Supported APIs include: sapi5, dragon, and msp.
  • engine - (optional) Speech engine name, id, or language.

A TranscribeAudioRequest supports the following methods:

  • CancelRequest - Cancels the request and removes it.
  • PriorityRequest - Schedules the request as the next request to process.
  • PutAudioBytes - Sets the source audio for the transcription.
  • SetProperty - Sets the engine properties.
  • ScheduleRequest - Schedules the request at the end of the queue to be processed.

The CreateSpeakRequest method has four optional parameters that may be used to control speech synthesis:

  • text - (required) The text from which to synthesize speech.
  • options - (optional) Speech synthesis options.
  • outfile - (optional) The file path of where to write the synthesized audio.
  • outformat - (optional) The audio output format.
  • api - (optional) Speech API. Supported APIs include: sapi5, windowsmedia, msp, acatt, swift, cerevoice.
  • engine - (optional) Speech engine name, id, or language.

A SpeakRequest supports the following methods:

  • CancelRequest - Cancels the request and removes it.
  • GetAudioBytes - Returns the audio bytes from speech synthesis.
  • InterruptRequest - Cancels the current request and schedules the request as the next request to process.
  • PriorityRequest - Schedules the request as the next request to process.
  • SetProperty - Sets the engine properties.
  • ScheduleRequest - Schedules the request at the end of the queue to be processed.

Development and Deployment

Speech Manager applications require the Speech Manager library and the applicable SpeechKit Speech API libraries:

  • C++Builder, C++, and Delphi applications require the Speech Manager library (CSpeechManager.dll or CSpeechManagerX64.dll) and the applicable SpeechKit Speech API libraries in the same directory as the application .exe.
  • Java applications require the chant.speechmanager.jar, speechkit.jar, and chant.shared.jar in the target system Java JRE lib directory and/or ensure the classpath includes the path where the chant.speechmanager.jar, speechkit.jar, and chant.shared.jar libraries are placed on the target system. The Speech Manager library (JSpeechManager.dll or JSpeechManagerX64.dll) and the applicable SpeechKit Speech API libraries must be in the target system Java JRE bin directory.
  • C# and VB .NET applications require the assembly libraries Chant.SpeechManager.Windows.dll, Chant.SpeechKit.Windows.dll, and Chant.Shared.Windows.dll embedded in the application or located in the same directory as the application .exe. The Speech Manager library (NSpeechManager.dll or NSpeechManagerX64.dll) must be registered as a COM library on the target system and the applicable SpeechKit Speech API libraries in the same directory as the application .exe.