Last reviewed: 3/23/2024 10:58:54 AM
<voice>
At least one voice element must be specified within each SSML speak element. This element determines the voice that is used for text to speech.
Multiple voice elements may be included in a single SSML document. Each voice element may specify a different voice and use the same voice multiple times with different settings.
<?xml version="1.0"?>
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis"
xml:lang="en-US">
<voice name="en-US-JennyNeural">
Good morning!
</voice>
<voice name="en-US-ChristopherNeural">
Good morning to you too Jenny!
</voice>
</speak>
Attributes
name
Specifies a processor-specific voice name to speak the contained text. The value may be a space-separated list of names ordered from top preference down.
effect
Optional. The audio effect processor that may be one of the following:
- eq_car – Optimize the auditory experience when providing high-fidelity speech in cars, buses, and other enclosed automobiles.
- eq_telecomhp8k – Optimize the auditory experience for narrowband speech in telecom or telephone scenarios. Use a sampling rate of 8 kHz. If the sample rate is not 8 kHz, the auditory quality of the output speech is not optimized.
Children
<audio>, <audioduration>, <backgroundaudio>, <bookmark>, <break>, <lang>, <emphasis>, <express-as>, <mark>, <p>, <phoneme>, <prosody>, <say-as>, <silence>, <sub>, <s>, <voice>, and <viseme>.
Parents
Source: Microsoft Azure Speech Synthesis Markup Language (SSML)