Last reviewed: 3/23/2024 11:28:35 AM
<prosody>
The prosody element enables the control of the pitch, speaking rate and volume of the speech output.
<?xml version="1.0"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
xml:lang="en-US">
The price of XYZ is <prosody rate="-10%">$45</prosody>
<prosody contour="(0%,+20Hz) (10%,+30%) (40%,+10Hz)">Good morning</prosody>
</speak>
Attributes
contour
Sets the actual pitch contour for the contained text. The pitch contour is defined as a set of white space-separated targets at specified time positions in the speech output. The algorithm for interpolating between the targets is synthesizer-specific.
duration
Specifies a value in seconds or milliseconds for the desired time to take to read the element contents. Follows the time value format from the Cascading Style Sheet Level 2 Recommendation [CSS2] (e.g. "250ms", "3s").
pitch
Specifies the baseline pitch for the contained text. Valid values include a number followed by "Hz", a relative change or "x-low", "low", "medium", "high", "x-high", or "default".
range
Specifies the pitch range (variability) for the contained text. Valid values include a number followed by "Hz", a relative change or "x-low", "low", "medium", "high", "x-high", or "default".
rate
Specifies a change in the speaking rate for the contained text. Valid values include a relative change or "x-slow", "slow", "medium", "fast", "x-fast", or "default".
volume
Specifies the volume for the contained text in the range 0.0 to 100.0. Higher values are louder and specifying a value of zero is equivalent to specifying "silent". Valid values include a number, a relative change, or "silent", "x-soft", "soft", "medium", "loud", "x-loud", or "default".
Children
<audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <say-as>, <sub>, <s>, and <voice>.
Parents
<audio>, <emphasis>, <p>, <prosody>, <s>, <speak>, and <voice>.
Source: Speech Synthesis Markup Language (SSML) Version 1.0