How do I use Azure Speech in my app?

Last reviewed: 9/12/2023

HowTo Article ID: H032309

The information in this article applies to:

  • Chant Developer Workbench 2023
  • SpeechKit 12

Developing and Testing Apps that Speak and Listen with Azure Speech

Applications can leverage speech recognition and speech synthesis services from Microsoft's latest Speech SDK 1.29 with SpeechKit MCSRecognizer and MCSSynthesizer classes.

Developing Apps for Azure Speech with SpeechKit

SpeechKit supports many speech SDKs by providing API-specific recognizer and synthesizer classes. Apps instantiate the API-specific class to access resource for that API.

MCSRecognizer and MCSSynthesizer classes are available to access the Microsoft Cognitive Services speech API that provides access to Microsoft Azure Speech cloud resources.

Synthesizing Speech with Azure Speech Voices

Synthesizing with MCSSynthesizer is identical to other SpeechKit synthesizer classes: instantiate the API class, optionally enumerate and select a voice, and speak. Playback is real-time to an output device, streamed, or written to a file. UI events for real-time playback are synchronized. The Synthesis samples for C++Builder, C++, Delphi, Java, and .NET Windows are updated with MCSSynthesizer to illustrate.


private JSpeechKit _SpeechKit = null;
private JMCSSynthesizer _Synthesizer = null; 
// Create SpeechKit object
_SpeechKit = new JSpeechKit();
// Set credentials
_SpeechKit.setCredentials("Credentials");
_Synthesizer = _SpeechKit.createMCSSynthesizer("speechKey", "speechRegion"); 
if (_Synthesizer != null)
{
    // Set the callback object
    _Synthesizer.setChantSpeechKitEvents(this);
    // For Azure Speech, set Activity for callbacks on UI thread
    _Synthesizer.setActivity(this);
}
...
if (_Synthesizer != null)
{
    _Synthesizer.Speak("Hello world");
}
                    

NSpeechKit _SpeechKit = null;
NMCSSynthesizer _Synthesizer = null;
// Instantiate SpeechKit
_SpeechKit = new NSpeechKit();
if (_SpeechKit != null)
{
    // Set credentials
    _SpeechKit.SetCredentials("Credentials");
    _Synthesizer = _SpeechKit.CreateMCSSynthesizer("speechKey", "speechRegion");
}
...
if (_Synthesizer != null)
{
    _Synthesizer.Speak("Hello world");
}
                    

CSpeechKit* _SpeechKit;
CMCSSynthesizer* _Synthesizer;
_SpeechKit = new CSpeechKit();
if (_SpeechKit =! NULL)
{
    // Set credentials
    _SpeechKit->SetCredentials(L"Credentials");
    // Create synthesizer
    _Synthesizer = _SpeechKit->CreateMCSSynthesizer(L"speechKey", L"speechRegion");
}
...
if (_Synthesizer != NULL)
{
    _Synthesizer->Speak(L"Hello world");
}
                

CSpeechKit* _SpeechKit;
CMCSSynthesizer* _Synthesizer;
_SpeechKit = new CSpeechKit();
if (_SpeechKit =! NULL)
{
    // Set credentials
    _SpeechKit->SetCredentials("Credentials");
    // Create synthesizer
    _Synthesizer = _SpeechKit->CreateMCSSynthesizer("speechKey", "speechRegion");
}
...
if (_Synthesizer != NULL)
{
    _Synthesizer->Speak("Hello world");
}
            

var
_SpeechKit: TSpeechKit;
_Synthesizer: TCMSSynthesizer;
begin
    // Instantiate SpeechKit object
    _SpeechKit := TSpeechKit.Create();
    if (_SpeechKit <> nil) then
    begin
        // Set credentials
        _SpeechKit.SetCredentials('Credentials');
        // Create synthesizer
        _Synthesizer := _SpeechKit.CreateMCSSynthesizer('speechKey', 'speechRegion');
    end;
end;

begin
    if (_Synthesizer <> nil) then
    begin
        _Synthesizer.Speak('Hello world');
    end;
end;
                

JSpeechKit _SpeechKit = null;
JMCSSynthesizer _Synthesizer = null;

// Create SpeechKit object
_SpeechKit = new JSpeechKit();
// Set credentials
_SpeechKit.setCredentials("Credentials");
// Create Synthesizer object
_Synthesizer = _SpeechKit.createMCSSynthesizer("speechKey", "speechRegion");

if (_Synthesizer != null)
{
    _Synthesizer.speak("Hello world");
}
                

Dim _SpeechKit As NSpeechKit
Dim WithEvents _Synthesizer As NMCSSynthesizer
Private Sub Window_Loaded(ByVal sender As System.Object, ByVal e As System.Windows.RoutedEventArgs) Handles MyBase.Loaded
    ' Instantiate SpeechKit
    _SpeechKit = New NSpeechKit()
    If (_SpeechKit IsNot Nothing) Then
        ' Set credentials
        _SpeechKit.SetCredentials("Credentials")
        _Synthesizer = _SpeechKit.CreateMCSSynthesizer("speechKey", "speechRegion")
    End If
End Sub
...
    If (_Synthesizer IsNot Nothing) Then
        _Synthesizer.Speak("Hello World")
    End If
            

Recognizing Speech with Azure Speech

Recognizing with MCSRecognizer is identical to other SpeechKit recognizer classes: instantiate the API class, optionally select a language, and recognize. Recognition is real-time via a microphone, audio stream or file. The Dictation samples for C++Builder, C++, Delphi, Java, and .NET Windows are updated with MCSRecognizer to illustrate.


private JSpeechKit _SpeechKit = null;
private JMCSRecognizer _Recognizer = null; 
_Recognizer = _SpeechKit.createMCSRecognizer("speechKey", "speechRegion"); 
if (_Recognizer != null)
{
    // Set the callback object
    _Recognizer.setChantSpeechKitEvents(this);
    // For Azure Speech, set Activity for callbacks on UI thread
    _Recognizer.setActivity(this);
}
...
if (_Recognizer != null)
{
    _Recognizer.StartRecognition();
}
...
if (_Recognizer != null)
{
    _Recognizer.StopRecognition();
}
                    

NSpeechKit _SpeechKit = null;
NMCSRecognizer _Recognizer = null;
// Instantiate SpeechKit
_SpeechKit = new NSpeechKit();
if (_SpeechKit != null)
{
    // Set credentials
    _SpeechKit.SetCredentials("Credentials");
    _Recognizer = _SpeechKit.CreateMCSRecognizer("speechKey", "speechRegion");
    if (_Recognizer != null)
    {
        // Setup handler for recognition results
        _Recognizer.RecognitionDictation += this.Recognizer_RecognitionDictation;
    }
}
...
if (_Recognizer != null)
{
    _Recognizer.StartRecognition();
}
...
if (_Recognizer != null)
{
    _Recognizer.StopRecognition();
}
                    

CSpeechKit* _SpeechKit;
CMCSRecognizer* _Recognizer;
_SpeechKit = new CSpeechKit();
if (_SpeechKit =! NULL)
{
    // Set credentials
    _SpeechKit->SetCredentials(L"Credentials");
    // Create recognizer
    _Recognizer = _SpeechKit->CreateMCSRecognizer(L"speechKey", L"speechRegion");
    if (_Recognizer != NULL)
    {
        // Register Event Handlers
        _Recognizer->SetRecognitionDictation(RecognitionDictation);
    }
}
...
if (_Recognizer != NULL)
{
    _Recognizer->StartRecognition();
}
...
if (_Recognizer != NULL)
{
    _Recognizer->StopRecognition();
}
                

CSpeechKit* _SpeechKit;
CMCSRecognizer* _Recognizer;
_SpeechKit = new CSpeechKit();
if (_SpeechKit =! NULL)
{
    // Set credentials
    _SpeechKit->SetCredentials("Credentials");
    // Create recognizer
    _Recognizer = _SpeechKit->CreateMCSRecognizer("speechKey", "speechRegion");
    if (_Recognizer != NULL)
    {
        // Register Event Handlers
        _Recognizer->SetRecognitionDictation(RecognitionDictation);
    }
}
...
if (_Recognizer != NULL)
{
    _Recognizer->StartRecognition();
}
...
if (_Recognizer != NULL)
{
    _Recognizer->StopRecognition();
}
            

var
_SpeechKit: TSpeechKit;
_Recognizer: TMCSRecognizer;
begin
    // Instantiate SpeechKit object
    _SpeechKit := TSpeechKit.Create();
    if (_SpeechKit <> nil) then
    begin
        // Set credentials
        _SpeechKit.SetCredentials('Credentials');
        // Create recognizer
        _Recognizer := _SpeechKit.CreateMCSRecognizer('speechKey', 'speechRegion');
        if (_Recognizer <> nil) then
        begin
            // Register Event Handlers
            _Recognizer.RecognitionDictation := RecognitionDictation;
        end;
    end;
end;

begin
    if (_Recognizer <> nil) then
    begin
        _Recognizer.StartRecognition();
    end;
end;

begin
    if (_Recognizer <> nil) then
    begin
        _Recognizer.StopRecognition();
    end;
end;
                

JSpeechKit _SpeechKit = null;
JMCSRecognizer _Recognizer = null;

// Create SpeechKit object
_SpeechKit = new JSpeechKit();
// Set credentials
_SpeechKit.setCredentials("Credentials");

// Create Recognizer object
_Recognizer = _SpeechKit.createMCSRecognizer("speechKey", "speechRegion");
if (_Recognizer != null)
{
    // Set the callback object
    _Recognizer.setChantSpeechKitEvents(this);
    // Register for callbacks
    _Recognizer.registerCallback(ChantSpeechKitCallback.CCSRRecognitionDictation);
}

if (_Recognizer != null)
{
    _Recognizer.startRecognition();
}

if (_Recognizer != null)
{
    _Recognizer.stopRecognition();
}
                

Dim _SpeechKit As NSpeechKit
Dim WithEvents _Recognizer As NMCSRecognizer
Private Sub Window_Loaded(ByVal sender As System.Object, ByVal e As System.Windows.RoutedEventArgs) Handles MyBase.Loaded
    ' Instantiate SpeechKit
    _SpeechKit = New NSpeechKit()
    If (_SpeechKit IsNot Nothing) Then
        ' Set credentials
        _SpeechKit.SetCredentials("Credentials")
        _Recognizer = _SpeechKit.CreateMCSRecognizer("speechKey", "speechRegion")
    End If
End Sub
...
    If (_Recognizer IsNot Nothing) Then
        _Recognizer.StartRecognition()
    End If
...
    If (_Recognizer IsNot Nothing) Then
        _Recognizer.StopRecognition()
    End If
            

SpeechKit App Integration

Explore these details specific to your application development environment:

Testing Azure Speech in Chant Developer Workbench

Another way to see Microsoft cloud speech services in action is to access and test them via the Developer Workbench.

Cloud Access Setup

To access the Azure cloud, paste your SpeechKey and SpeechRegion values in the View->Options->Environment->Credentials->Microsoft Cognitive Services fields and press the Save toolbar button.

Testing Azure Speech Synthesis

Open a Speech Synthesizers tab (View->Speech Synthesizers). The Speech API list includes two new entries: Microsoft Cognitive Services (C++) Speech Synthesis and Microsoft Cognitive Services (.NET) Speech Synthesis. The C++ API is the SpeechKit library used by C++Builder, C++, Delphi, Java, and .NET Windows apps. The .NET API is the SpeechKit library use by .NET apps. This approach is also used for SAPI, Microsoft Speech Platform, and Windows Media APIs.

Select Microsoft Cognitive Services (C++) Speech Synthesis, select a voice, enter text, and press the Start button. Observe the events in the Events windows. Notice they are synchronized with the live playback.

Microsoft Cognitive Services (C++) Speech Synthesis
Microsoft Cognitive Services (C++) Speech Synthesis: Review synthesis events in the Events window.

Select Microsoft Cognitive Services (.NET) Speech Synthesis, select a voice, enter text, and press the Start button. Observe the events in the Events windows. Notice they are synchronized with the live playback.

Microsoft Cognitive Services (.NET) Speech Synthesis
Microsoft Cognitive Services (.NET) Speech Synthesis: Microsoft Cognitive Services (.NET) Speech Synthesis: Review synthesis events in the Events window.

Testing Azure Speech Recognition

Open a Speech Recognizer tab (View->Speech Recognizers). The Speech API list includes two new entries: Microsoft Cognitive Services (C++) Speech Recognition and Microsoft Cognitive Services (.NET) Speech Recognition. The C++ API is the SpeechKit library used by C++Builder, C++, Delphi, Java, and .NET Windows apps. The .NET API is the SpeechKit library use by .NET apps. This approach is also used for SAPI, Microsoft Speech Platform, and Windows Media APIs.

Select Microsoft Cognitive Services (C++) Speech Recognition, optionally select a language, optionally select the text dictation vocabulary, and press the Start button. Speak into the microphone. Press the Stop button when finished. When a dictation vocabulary is enabled, spoken punctuation is interpreted. Observe the events in the Events window and output in the Output window.

Microsoft Cognitive Services (C++) Speech Recognition
Microsoft Cognitive Services (C++) Speech Recognition: Review recognition results in the Events and Output windows.

Select Microsoft Cognitive Services (.NET) Speech Recognition, optionally select a language, optionally select the text dictation vocabulary, and press the Start button. Speak into the microphone. Press the Stop button when finished. When a dictation vocabulary is enabled, spoken punctuation is interpreted. Observe the events in the Events window and output in the Output window.

Microsoft Cognitive Services (.NET) Speech Recognition
Microsoft Cognitive Services (.NET) Speech Recognition: Review recognition results in the Events and Output windows.