Last reviewed: 3/23/2024 11:34:12 AM
Acapela TTS Tags
Acapela TTS supports native markup tags.
\col=wgt=val1,val2[,val3,[val4...]]\
A Colibri-specific tag that allows the morphing between the base voice and its first variant (val2). If there is more than one variant embedded in the voice data, each variant has a valN where N is the number of the corresponding variant. Otherwise, if there is no variant available (i.e., only the base voice) this has no effect.
\equ=preset\
Change the equalizer preset at runtime. You can manage the list of presets with the Voice Manager.
\equ=val1;val2;val3;val4\
Where val1, val2, val3, val4 are in a range from -100 to 100 and affect the frequency bands of 275Hz, 2.2kHz, 5kHz and 8.3kHz, respectively.
\ls=number\
Where number is the minimal (inclusive) threshold required to consider a block of consecutive symbols or letters to be pronounced as a group of symbols instead of being pronounced individually. The special values of 0 and 1 are not supported.
I want to aggregate these *** and yyy, \ls=2\like this *** and yyy.
\mrk=number\
This tag indicates a user bookmark in the text. \mrk=0\ is reserved.
\emph\
This tag sets emphasis on the next word. Emphasis may be rendered differently by different voices and languages. A word can be explicitly de-emphasized by using the \emph=0\ tag.
\skipaudio=number\
This tag allows to select parts of texts to mute. When number is 1 or 2, the text following the tag is muted, when number is 0, the sound is sent normally to audio output. In contrast to 1, option 2 ignores too the callback events. Option 1 is then usefull if we want to analyse the full text or to force the pronunciation of a part of a text in a specific context.
This part will be read, \skipaudio=2\while this one will not, \skipaudio=0\simple, right?
\pau=number\
This tag inserts a pause of the specified number of milliseconds in the speech. 5000 is the max value.
I am \pau=2000\ ready
\paumode=number\
Pauses are automatically inserted when synthesizing the text for sequences of space-separated numbers like 46 27, 108 95. This feature can be turned off by with \paumode=1\ and turned on with \paumode=0.
\paumode=0\ 1 2 3. \paumode=1\ 1 2 3.
\pit=number\
This tag sets the baseline pitch of the voice to the specified value in Hertz. The actual pitch fluctuates above and below this baseline following the prosodic rules. number must be in the range from 50 (%) up to 200 (% of the base pitch). For a female voice the average pitch is about 180-190 Hz. For a male voice, the average pitch is about 100-110 Hz.
\pit=70\
\prn="phonetic string"\
The phonetic string is composed of phonemes followed by space characters. The phonetic alphabet is language-dependent. This tag is only suitable for inserting single words into the text. Unpredictable errors (mainly prosody) can occur when inserting greater units.
I will say: \prn=h e l @U1\.
\prx="phonetic string"\
This is the same as the \prn\ tag but is the prefererred form.
\prx=%1nature%word\
This tag allows us to fix the nature of a word in a sentence. This can be relevant to remove a potential ambiguity between identical words pronounced differently. nature may be one of the following: NOUN, ADJ, VERB, ADV, PARTPASSE, PARTPRES, CHIF, or INFINIT.
The queen and Alice \prx=%1VERB%read\ a book.
\rms=number\
Sets the reading mode to spelling out each letter of each word when number is 1 or turns it off when number is 0.
\rmw=number\
Sets the reading mode to leaving audible pauses between each word when number is 1 or turns it off when number is 0.
\rmu=number\
It allows the synthesizer to read intelligibly series of attached words such as "CFMutableString". Each word must start with a capital letter, then the word will be read like a sequence of split words : "CF Mutable String . number is 1 to enable the feature and 0 to disable it.
\rpit=number\
Sets the relative pitch. number must be in the range from 50 up to 200, 100 being the default pitch of the voice.
\rspd=number\
Sets the relative speed. 100 is the default speed that is about 180 words per minute depending on the voice. Use \rst\ to reset to the default speed.
\rst\
Resets the engine to the default settings for the current mode.
\sel=altN\
Gives an alternative synthesis for the following word. To further explore alternatives, \sel=altN\ gives the N-th acoustic alternative for the following word.
I don't like the sound of this \sel=alt3\word.
\spd=number\
Sets the baseline average talking speed of the voice to the specified number of words per minute. Each voice has a default speed (about 180 words per minute, depending of the voice). Call \rst\ to reset to the default speed.
\vce=key=value\
Changes the speaking voice according to the specified characteristics. The pitch, speed, volume, etc. revert to the defaults for the new voice.
-
language
\vce=language=languagename\
Requests the engine speak in the specified language: Arabic, BelgianDutch, Brazilian, British, CanadianFrench, Catalan, Czech, Danish, Dutch, FinlandSwedish, Finnish, French, German, Greek, IndianEnglish, Italian, Norwegian, Portuguese, Russian, Spanish, Swedish, Turkish, USEnglish, or USSpanish.
\vce=language=Spanish\
-
speaker
\vce=speaker=speakername\
Specifies the speaker value of the voice. Beware that the speaker name is different from the voice name. Heather22k_HQ, Heather8k_HQ and Heather22k_HM are voice names. Heather is the speaker name.
\vce=speaker=Ryan\
-
gender
\vce=gender=gendername\
Uses to specify the gender, male or female, of the actual language to be used (if voice available).
\vce=gender=male\
\vct=number\
Controls the Voice Shaping of the voice (min: 50%-150%).
\vol=number\
Sets the output volume. Volume is a value in the range 0 to 65535, inclusive. The default value is 65535.
\audioboost=val\
The Audio Boost has effect on 2 aspects of the speech:
- it improves the speech clarity by emphasizing medium and high frequencies, that are important for intelligibility, and
- it increases the perceived level of the speech with no saturation effect.
val controls the emphasis of medium and high frequencies from no emphasis (0) to maximum emphasis (90). The default value is 0.
\wrp=number\
A Colibri-specific tag that controls the warping of the voice (min: 50%-150%) and can be used to adjust the tone of a voice.
\audio=command1[=argument1[;argument2...]]\
Changes the speaking voice according to the specified characteristics. The pitch, speed, volume, etc. revert to the defaults for the new voice.
-
mix
\audio=mix="filepath"\
Plays the file in the background, the speech synthesis will continue during the playing.
\audio=mix="c:\mozart.pcm"\I speak with Mozart playing in the background!
-
offset
\audio=offset=n\
Skips n milliseconds at the beginning of the sound.
We can start again where we left off! \audio=play="c:\mozart.pcm";offset=5000\
-
pause, resume, and stop
\audio=pause, \audio=resume, and \audio=stop\
Pauses, resumes or stops background playing (mix mode only).
\audio=mix="c:\mozart.pcm"\ I put the background music on pause! \audio=pause\ Then I resume it! \audio=resume\ Finally, I stop it! \audio=stop\ It's finished.
-
play
\audio=play\
If the file name is omitted and there is any \audio=mix\ command enqueued, it turns the background playing into foreground playing, thus going from asynchronous mode to synchronous mode.
\audio=mix="c:\mozart.pcm"\I speak with Mozart playing in the background. Now I'll be quiet and let the music track play until it's finished... \audio=play\
-
play
\audio=play="filepath"[;argument1[;argument2...]]\
Plays a sound in the foreground (synchronous mode).
Please applaud! \audio=play="c:\bravo.pcm"\ Thank you!
-
continue
\audio=play="filepath";duration=timeduration;continue\
Makes a sound continue in the background (asynchronous mode). There must a duration=timeduration or until=timeposition arguments. The foreground playing turns into background playing going from synchronous playing to asynchronous mode when the limit specified by those other arguments is reached.
\audio=play="c:\bravo.pcm";duration=100;continue\ Thank you very much!
-
duration
\audio=play="filepath";duration=timeduration\
Plays the sound for timeduration milliseconds (play or mix mode) and then stop reading it.
Now you will listen to Mozart for five seconds \audio=play="c:\mozart.pcm";duration=5000\ \audio=mix="c:\mozart.pcm";duration=1250\The music will stop while I speak... \audio=mix="c:\mozart.pcm"\ I'll let the music play for five seconds after speaking... \audio=play;duration=5000\
-
until
\audio=play="filepath";until=timeposition\
Plays the sound until the position timeposition milliseconds within the sound is reached (play or mix commands) and then stops reading it.
Now you will listen to Mozart for five seconds \audio=play="c:\mozart.pcm";until=5000\ \audio=play="mozart.pcm";duration=2000\ The music will stop when I start speaking, it lasts only two seconds, then, when I am finished speaking, it will continue playing the music where it stopped for three seconds. \audio=play="mozart.pcm";offset=2000;until=5000\
-
repeat
\audio=repeat=status\
When status is on, continuously repeats the foreground or background sound (play or mix command). When it is off, does not repeat it (anymore).
If status is a positive integral number, possibly zero, it will be the number of times to play the sound from the beginning as soon as its end is reached in addition to the regular play of the sound. In the end, the sound is player status times + 1.
\audio=mix="c:\bravo.pcm";repeat=on\I speak while they applaud! bla, bla, bla, ... bla, bla, bla... \audio=play;repeat=off\ \audio=play="c:\gong.pcm";repeat=1\I hear two gongs then this text... bla, bla, bla, ... bla, bla, bla.... Beware that despite repeat=1, two gong.pcm will be played two times!
-
volume
\audio=volume=percentage\
Sets the volume of the sound to percentage % of its base level where 100 is the base level.
\audio=mix="c:\mozart.pcm";volume=25\I speak with Mozart playing smoothly! \audio=mix="c:\mozart.pcm";volume=200\I speak with Mozart playing loudly! \audio=mix="c:\mozart.pcm"\I will turn down the music slowly...\audio=volume=80\ \pau=1000\ \audio=volume=60\ \pau=1000\ \audio=volume=40\ \pau=1000\ \audio=volume=20\ \pau=1000\ \audio=volume=10\ \pau=500\ \audio=volume=5\ \pau=500\ \audio=volume=0\ \pau=500\
Source: Acapela TTS for Windows, Mac and Linux* User's Guide