Last reviewed: 3/23/2024 10:54:26 AM

<say-as>

The say-as element specifies synthesizer-specific information about the type of text construct contained within the element to help with rendering the contained text.

<?xml version="1.0"?>
<speak version="1.0"
    xmlns="http://www.w3.org/2001/10/synthesis"
    xml:lang="en-US">
    <voice name="en-US-JennyNeural">
        <p>
            Your <say-as interpret-as="ordinal"> 1st </say-as> request was for <say-as interpret-as="cardinal"> 1 </say-as> room
            on <say-as interpret-as="date" format="mdy"> 10/19/2010 </say-as>, with early arrival at <say-as interpret-as="time" format="hms12"> 12:35pm </say-as>.
        </p>
    </voice>
</speak>

Attributes

interpret-as

Specifies the content type of the contained text construct as one listed in the table below.

format

Optional. Provides additional information about the precise formatting of the element's text for content types that might have ambiguous formats as one listed in the table below.

details

Optional. Indicates the level of detail to be spoken. For example, this attribute might request that the speech synthesis engine pronounce punctuation marks. There are no standard values defined for detail.

interpret-asformatInterpretation
characters, spell-out none The text is spoken as individual letters (spelled out). The speech synthesis engine pronounces:
<say-as interpret-as="characters">test</say-as>
As "T E S T."
cardinal, number none The text is spoken as a cardinal number. The speech synthesis engine pronounces:
There are <say-as interpret-as="cardinal">10</say-as> options
As "There are ten options."
ordinal none The text is spoken as an ordinal number. The speech synthesis engine pronounces:
Select the <say-as interpret-as="ordinal">3rd</say-as> option
As "Select the third option."
number_digit none The text is spoken as a sequence of individual digits. The speech synthesis engine pronounces:
<say-as interpret-as="number_digit">123456789</say-as>
As "1 2 3 4 5 6 7 8 9."
fraction none The text is spoken as a fractional number. The speech synthesis engine pronounces:
<say-as interpret-as="fraction">3/8</say-as> of an inch
As "three eighths of an inch."
date dmy, mdy, ymd, ydm, ym, my, md, dm, d, m, y The text is spoken as a date. The format attribute specifies the date's format (d=day, m=month, and y=year). The speech synthesis engine pronounces:
Today is <say-as interpret-as="date" format="mdy">10-19-2016</say-as>
As "Today is October nineteenth two thousand sixteen."
time hms12, hms24 The text is spoken as a time. The format attribute specifies whether the time is specified by using a 12-hour clock (hms12) or a 24-hour clock (hms24). Use a colon to separate numbers representing hours, minutes, and seconds. Here are some valid time examples: 12:35, 1:14:32, 08:15, and 02:50:45. The speech synthesis engine pronounces:
The train departs at <say-as interpret-as="time" format="hms12">4:00am</say-as>
As "The train departs at four A M."
duration hms, hm, ms The text is spoken as a duration. The format attribute specifies the duration's format (h=hour, m=minute, and s=second). The speech synthesis engine pronounces:
<say-as interpret-as="duration">01:18:30</say-as>
As "one hour eighteen minutes and thirty seconds".
<say-as interpret-as="duration" format="ms">01:18</say-as>
As "one minute and eighteen seconds". This tag is only supported on English and Spanish.
telephone none The text is spoken as a telephone number. The speech synthesis engine pronounces:
The number is <say-as interpret-as="telephone">(888) 555-1212</say-as>
As "My number is area code eight eight eight five five five one two one two."
currency none The text is spoken as a currency. The speech synthesis engine pronounces:
<say-as interpret-as="currency">99.9 USD</say-as>
As "ninety-nine US dollars and ninety cents."
address none The text is spoken as an address. The speech synthesis engine pronounces:
I'm at <say-as interpret-as="address">150th CT NE, Redmond, WA</say-as>
As "I'm at 150th Court Northeast Redmond Washington."
name none The text is spoken as a person's name. The speech synthesis engine pronounces:
<say-as interpret-as="name">ED</say-as>
As [æd].

Children

none

Parents

<audio>, <emphasis>, <p>, <prosody>, <speak>, <s>, and <voice>.

Source: Microsoft Azure Speech Synthesis Markup Language (SSML)