Last reviewed: 3/23/2024 11:28:35 AM

<prosody>

The prosody element enables the control of the pitch, speaking rate and volume of the speech output.

<?xml version="1.0"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
         xml:lang="en-US">

  The price of XYZ is <prosody rate="-10%">$45</prosody>

  <prosody contour="(0%,+20Hz) (10%,+30%) (40%,+10Hz)">Good morning</prosody>

</speak>

Attributes

contour

Sets the actual pitch contour for the contained text. The pitch contour is defined as a set of white space-separated targets at specified time positions in the speech output. The algorithm for interpolating between the targets is synthesizer-specific.

duration

Specifies a value in seconds or milliseconds for the desired time to take to read the element contents. Follows the time value format from the Cascading Style Sheet Level 2 Recommendation [CSS2] (e.g. "250ms", "3s").

pitch

Specifies the baseline pitch for the contained text. Valid values include a number followed by "Hz", a relative change or "x-low", "low", "medium", "high", "x-high", or "default".

range

Specifies the pitch range (variability) for the contained text. Valid values include a number followed by "Hz", a relative change or "x-low", "low", "medium", "high", "x-high", or "default".

rate

Specifies a change in the speaking rate for the contained text. Valid values include a relative change or "x-slow", "slow", "medium", "fast", "x-fast", or "default".

volume

Specifies the volume for the contained text in the range 0.0 to 100.0. Higher values are louder and specifying a value of zero is equivalent to specifying "silent". Valid values include a number, a relative change, or "silent", "x-soft", "soft", "medium", "loud", "x-loud", or "default".

Children

<audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <say-as>, <sub>, <s>, and <voice>.

Parents

<audio>, <emphasis>, <p>, <prosody>, <s>, <speak>, and <voice>.

Source: Speech Synthesis Markup Language (SSML) Version 1.0