Last reviewed: 3/23/2024 10:36:13 AM

Fine-tuning TTS With VoiceMarkupKit

Text-to-speech (TTS) markup is text with imbedded indicators that control speech synthesis from the text. Speaking qualities such as the speed, pitch, emphasis, and word pronunciation may be tailored in reproducing speech from text.

Chant VoiceMarkupKit is comprised of software class that handle the complexities of generating text-to-speech markup for various markup syntax. This enables you to tailor speech synthesis to produce sounds in familiar dialects, speaking patterns, and accents of your end users. You can adjust TTS markup as needed for the synthesizer to enhance the playback quality when synthesizing.

Synthesizers (i.e. speech APIs) interpret different markup syntax. VoiceMarkupKit supports the following markup syntax:

Speech APIMarkup Syntax
Acapela TTSAcaTTS Tags
Cepstral SwiftW3C SSML
CereProc CereVoiceW3C SSML, CereVoice Tagset
Microsoft Azure SpeechAzure Speech SSML
Microsoft SAPI 5SAPI 5 XML Markup, W3C SSML (SAPI 5.3+)
Microsoft Speech PlatformW3C SSML
Microsoft .NET System.SpeechW3C SSML
Microsoft .NET Microsoft.SpeechW3C SSML
Microsoft WindowsMedia (UWP and WinRT)W3C SSML

For more information about the fine-tuning speech synthesis with VoiceMarkupKit, review the following topics: