How do I use properties and markup to control speech?
Last reviewed: 1/19/2022
HOW Article ID: H072123
The information in this article applies to:
- SpeechKit 10
- Speech Manager
Summary
Applications may control speech prosody by setting synthesizer properties (if supported) and with markup.
More Information
There are many ways in which applications can control how speech is synthesized and the resulting audio. Some speech APIs support property settings and most support W3C Speech Synthesis Markup Language (SSML) and/or a proprietary markup language. See the Chant Developer Workbench help file VoiceMarkupKit\Markup Syntax Quick Reference sections: Acapela TTS Tags, CereProc Tagset, Microsoft SAPI 5 XML Markup, and W3C SSML.
Think of properties as global variables and markup as local variables. Property settings can affect all utterances and markup affects a specific utterance. Note that some speech APIs do not support properties. For example, Acapela TTS supports many properties and others such as Microsoft WindowsMedia support none. See the Chant Developer Workbench help file SpeechKit\Recognizing and Synthesizing Speech with SpeechKit\Speech Synthesis Applications\Synthesizing Speech section for properties supported by each speech API.
To control how the synthesis occurs, some synthesizers support property settings. These properties can be set anytime and persist over subsequent synthesis requests.
// SpeechKit: Set the speaking rate
_Synthesizer.SetProperty("rate","5");
// SpeechManager: Set the speaking rate
NChantSpeakRequest speakRequest = _SpeechManager.CreateSpeakRequest("See how easy it is to talk with Speech Manager.");
speakRequest.SetProperty("rate","5");
// SpeechKit: Set the speaking rate
_Synthesizer->SetProperty(L"rate", L"5");
// SpeechManager: Set the speaking rate
CChantSpeakRequest* pSpeakRequest = _SpeechManager->CreateSpeakRequest(L"See how easy it is to talk with Speech Manager.");
pSpeakRequest->SetProperty(L"rate", L"5");
// SpeechKit: Set the speaking rate
_Synthesizer->SetProperty("rate","5");
// SpeechManager: Set the speaking rate
CChantSpeakRequest* pSpeakRequest = _SpeechManager->CreateSpeakRequest("See how easy it is to talk with Speech Manager.");
pSpeakRequest->SetProperty("rate","5");
// SpeechKit: Set the speaking rate
_Synthesizer.SetProperty('rate','5');
// SpeechManager: Set the speaking rate
var speakRequest: TChantSpeakRequest;
speakRequest := _SpeechManager.CreateSpeakRequest('See how easy it is to talk with Speech Manager.');
speakRequest.SetProperty('rate','5');
// SpeechKit: Set the speaking rate
_Synthesizer.setProperty("rate","5");
// SpeechManager: Set the speaking rate
JChantSpeakRequest speakRequest = _SpeechManager.createSpeakRequest("See how easy it is to talk with Speech Manager.", "", "");
speakRequest.setProperty("rate","5");
Dim _Synthesizer As NSAPI5Synthesizer
Dim _SpeechManager As NSpeechManager
' SpeechKit: Set the speaking rate
_Synthesizer.SetProperty("rate","5")
// SpeechManager: Set the speaking rate
' Create synthesis request
_TTSRequest = _SpeechManager.CreateSpeakRequest("See how easy it is to talk with Speech Manager.")
_TTSRequest.SetProperty("rate","5")
For greater control and flexibility, Text-to-speech (TTS) markup is text with imbedded indicators that control speech synthesis. Speaking qualities such as the speed, pitch, emphasis, and word pronunciation may be tailored in reproducing speech from text.
Most synthesizers support W3C SSML markup or a proprietary markup language that offers rich settings for controlling how speech is produced from text. See VoiceMarkupKit for how you can easily create, test, and hear the impact TTS markup can have on your synthesis.
// SpeechKit: Synthesize speech for playback with SSML
_Synthesizer.Speak("<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with SpeechKit.</prosody></speak>", (int)(SPEAKFLAGS.SPF_ASYNC | SPEAKFLAGS.SPF_PARSE_SSML));
// SpeechManager: Synthesize speech for playback with SSML
NChantSpeakRequest speakRequest = _SpeechManager.CreateSpeakRequest("<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with Speech Manager.</prosody></speak>", (int)(SPEAKFLAGS.SPF_ASYNC | SPEAKFLAGS.SPF_PARSE_SSML));
// SpeechKit: Synthesize speech for playback with SSML
_Synthesizer->Speak(L"<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with SpeechKit.</prosody></speak>", (int)(SPEAKFLAGS.SPF_ASYNC | SPEAKFLAGS.SPF_PARSE_SSML));
// SpeechManager: Synthesize speech for playback with SSML
CChantSpeakRequest* pSpeakRequest = _SpeechManager->CreateSpeakRequest(L"<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with Speech Manager.</prosody></speak>", (int)(SPEAKFLAGS.SPF_ASYNC | SPEAKFLAGS.SPF_PARSE_SSML));
// SpeechKit: Synthesize speech for playback with SSML
_Synthesizer->Speak("<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with SpeechKit.</prosody></speak>", (int)(SPEAKFLAGS.SPF_ASYNC | SPEAKFLAGS.SPF_PARSE_SSML));
// SpeechManager: Synthesize speech for playback with SSML
CChantSpeakRequest* pSpeakRequest = _SpeechManager->CreateSpeakRequest("<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with Speech Manager.</prosody></speak>", (int)(SPEAKFLAGS.SPF_ASYNC | SPEAKFLAGS.SPF_PARSE_SSML));
// SpeechKit: Synthesize speech for playback with SSML
_Synthesizer.Speak('<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with SpeechKit.</prosody></speak>', (SPEAKFLAGS.SPF_ASYNC + SPEAKFLAGS.SPF_PARSE_SSML));
// SpeechManager: Synthesize speech for playback with SSML
var speakRequest: TChantSpeakRequest;
speakRequest := _SpeechManager.CreateSpeakRequest('<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with Speech Manager.</prosody></speak>', (SPEAKFLAGS.SPF_ASYNC + SPEAKFLAGS.SPF_PARSE_SSML));
// SpeechKit: Synthesize speech for playback with SSML
_Synthesizer.speak("<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with SpeechKit.</prosody></speak>", (SPEAKFLAGS.SPF_ASYNC + SPEAKFLAGS.SPF_PARSE_SSML));
// SpeechManager: Synthesize speech for playback with SSML
JChantSpeakRequest speakRequest = _SpeechManager.createSpeakRequest("<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with Speech Manager.</prosody></speak>", (int)(SPEAKFLAGS.SPF_ASYNC | SPEAKFLAGS.SPF_PARSE_SSML), "", "");
Dim _Synthesizer As NSAPI5Synthesizer
Dim _SpeechManager As NSpeechManager
' SpeechKit: Synthesize speech for playback with SSML
_Synthesizer.Speak("<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with SpeechKit.</prosody></speak>", (SPEAKFLAGS.SPF_ASYNC | SPEAKFLAGS.SPF_PARSE_SSML))
// SpeechManager: Synthesize speech for playback with SSML
' Create synthesis request
_TTSRequest = _SpeechManager.CreateSpeakRequest("<speak xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\" version=\"1.0\"><prosody rate=\"fast\">See how easy it is to talk with Speech Manager.</prosody></speak>", (SPEAKFLAGS.SPF_ASYNC | SPEAKFLAGS.SPF_PARSE_SSML))