Last reviewed: 3/23/2024 9:17:54 AM
Generating, Editing, Speaking, and Persisting Pronunciations
The LexiconKit management class is designed to provide a lot of flexibility and minimize the programming necessary to manage lexicon word pronunciations.
Pronunciations and Phonemes
Lexicon word pronunciations are comprised of phonemes or basic units of sounds. Phonemes collectively are represented in an alphabet format. Speech engine vendors have unique phoneme and pronunciation formats (i.e., alphabets) and may support International Phonetic Alphabet (IPA).
For example, the following table illustrates the differences in default pronunciations for the word tomato across speech engines.
IPA | Cepstral Swift | Microsoft SAPI 5 | Microsoft Universal Phone Set (UPS) |
təme͡ito | t ah0 m ey1 t ow0 | t ax m ey t ow | T AX M EI T O |
LexiconKit handles the complexities of dealing with these differences for applications.
Generating and Speaking Pronuncations
To generate a lexicon word pronunciation, simply pass the word, the word type (i.e., part of speech), language, and alphabet to LexiconKit. To speak a lexicon word pronunciation, simply pass the phonemes, language, and alphabet to LexiconKit.
// Instantiate LexiconKit
NLexiconKit _LexiconKit = new NLexiconKit();
if (_LexiconKit != null)
// Set credentials
NSAPI5Synthesizer _Synthesizer = _LexiconKit.CreateSAPI5Synthesizer();
if (_Synthesizer != null)
string phonemes = _Synthesizer.GeneratePhonemes("tomato", "Noun", "en-US", "sapi");
_Synthesizer.SpeakPhonemes(phonemes, "en-US", "sapi");
// Instantiate LexiconKit object
CLexiconKit* _LexiconKit = new CLexiconKit();
if (_LexiconKit != NULL)
// Set credentials
// Create synthesizer
CSAPI5Synthesizer* _Synthesizer = _LexiconKit->CreateSAPI5Synthesizer();
if (_Synthesizer != NULL)
wchar_t* phonemes = _Synthesizer->GeneratePhonemes(L"tomato", L"Noun", L"en-US", L"sapi");
_Synthesizer->SpeakPhonemes(phonemes, L"en-US", L"sapi");
delete _Synthesizer;
delete _LexiconKit;
// Instantiate LexiconKit object
CLexiconKit* _LexiconKit = new CLexiconKit();
if (_LexiconKit != NULL)
// Set credentials
// Create synthesizer
CSAPI5Synthesizer* _Synthesizer = _LexiconKit->CreateSAPI5Synthesizer();
if (_Synthesizer != NULL)
String phonemes = _Synthesizer->GeneratePhonemes("tomato", "Noun", "en-US", "sapi");
_Synthesizer->SpeakPhonemes(phonemes, "en-US", "sapi");
delete _Synthesizer;
delete _LexiconKit;
_LexiconKit: TLexiconKit;
_Synthesizer: TSAPI5Synthesizer;
phonemes: string;
// Instantiate LexiconKit object
_LexiconKit := TLexiconKit.Create();
if (_LexiconKit <> nil) then
// Set credentials
// Create synthesizer
_Synthesizer := _LexiconKit.CreateSAPI5Synthesizer();
if (_Synthesizer <> nil) then
phonemes := _Synthesizer.GeneratePhonemes('tomato', 'Noun', 'en-US', 'sapi');
_Synthesizer.SpeakPhonemes(phonemes, 'en-US', 'sapi');
JLexiconKit _LexiconKit = new JLexiconKit();
if (_LexiconKit != null)
// Set credentials
JSAPI5Synthesizer _Synthesizer = _LexiconKit.createSAPI5Synthesizer();
if (_Synthesizer != null)
String phonemes = _Synthesizer.generatePhonemes("tomato", "Noun", "en-US", "sapi");
phonemes = _Synthesizer.speakPhonemes(phonemes, "en-US", "sapi");
Dim _LexiconKit As NLexiconKit
Dim WithEvents _Synthesizer As NSAPI5Synthesizer
Dim phonemes As String
' Instantiate LexiconKit
_LexiconKit = New NLexiconKit()
If (_LexiconKit IsNot Nothing) Then
' Set credentials
_Synthesizer = _LexiconKit.CreateSAPI5Synthesizer()
If (_Synthesizer IsNot Nothing) Then
phonemes = _Synthesizer.GeneratePhonemes("tomato", "Noun", "en-US", "sapi")
_Synthesizer.SpeakPhonemes(phonemes, "en-US", "sapi")
End If
End If
All synthesizers support speaking phonemes. Acapela TTS synthesizers and Microsoft WindowsMedia recognizers and synthesizers currently do not support generating phonemes.
Editing Pronunciations
To provide alternate pronunciations, edit lexicon word pronunciations (i.e., change the phonemes) and add them to the lexicon.
Speech engines (i.e., recognizers and synthesizers) support unique lexicon alphabets, formats, and approaches for runtime inclusion.
Speech API | Alphabets | File Format |
Acapela TTS | ipa, acatts | .dic |
class="px-2" Swift API | swift | .txt |
Microsoft (SAPI5, Speech Platform, WindowsMedia) | ipa, sapi, ups | W3C .pls |
Microsoft Azure Speech | ipa, sapi, ups, x-sampa | (W3C) .pls |
A speech engine may support one or more alphabets. This varies by speech language. To determine which phoneme alphabets are support by an engine, enumerate the PhonemesAlphabets property. Use the applicable alphabet for generating, editing, and speaking alphabets.
foreach (string alphabet in _Synthesizer.PhonemeAlphabets)
for (int i = 0; i < _Synthesizer->GetPhonemeAlphabets()->GetCount(); i++)
wchar_t* alphabet = _Synthesizer->GetPhonemeAlphabets()->at(i).c_str();
for (int i = 0; i < _Synthesizer->GetChantEngines()->Count; i++)
String alphabet = _Synthesizer->GetPhonemeAlphabets()[i];
alphabet: string;
for i := 0 to _Synthesizer.PhonemeAlphabets.Count-1 do
// Access engine properties
alphabet := _Synthesizer.PhonemeAlphabets[i]
for (String alphabet : _Synthesizer.getPhonemeAlphabets())
For Each alphabet In _Synthesizer.PhonemeAlphabets
In cases where a speech engine support multiple spoken languages (Azure), set the alphabetlanguage property to the language value before enumerating the PhonemeAlphabets property, generating, editing, and speaking phonemes.
// Set the synthesizer language
// Set the synthesizer language
// Set the synthesizer language
// Set the synthesizer language
// Set the synthesizer language
' Set the synthesizer language
Acapela TTS Properties
(Source: Acapela Group Acapela TTS Developer Guide)
Property | Value |
enginepath | Runtime library path. |
license | License file path. |
preset | Name of the equalizer preset |
lexicon | A list of user lexicons. Each file name must be separated with a semicolon. |
pitch | The baseline pitch expressed in Hz. Man (110) Woman (180). Value ranges between 30 and 500. |
speed | The reading speed. Values are in percent of the default speed rate 100. Value ranges between 30 and 300. |
maxpitch | The maximum pitch allowed expressed in Hz. Value ranges between pitch and pitch * 2.5. |
minpitch | The minimum pitch allowed expressed in Hz. Value ranges between pitch / 5 and pitch. |
volume | A ratio percentage of the TTS output volume. Default value is 100. Value ranges 0 to 150. |
leadingsilence | The pause duration at the beginning of speaking in milliseconds. Default value is 50. Value ranges 20 to 5000. |
trailingsilence | The pause duration at the end of speaking in milliseconds. Default value is 500. Value ranges 20 to 5000. |
deviceid | The audio device identifier. |
readingmode | The way text is spoken: normal, word at a time, or letter at a time (spelling). |
pausepunct | The pause duration for period, exclamation point, and question mark. Value ranges from 0 to 5. |
pausesemicolon | The pause duration for semicolon. Value ranges from 0 to 5. |
pausecomma | The pause duration for comma and colon. Value ranges from 0 to 5. |
pausebracket | The pause duration for quote, braces, and brackets. Value ranges from 0 to 5. |
pausespell | The pause duration between letters in spell reading mode. Value ranges from 0 to 5. |
usefilter | Enable or disable equalizer use: 0 (no) or 1 (yes). |
filtervalue1 | A value corresponding to the attenuation for band 1 filter. Value ranges 0 to 200. |
filtervalue2 | A value corresponding to the attenuation for band 2 filter. Value ranges 0 to 200. |
filtervalue3 | A value corresponding to the attenuation for band 3 filter. Value ranges 0 to 200. |
filtervalue4 | A value corresponding to the attenuation for band 4 filter. Value ranges 0 to 200. |
vocaltract | The voice shaping (tone) expressed in percentage of the default value 100. Value ranges 50 to 150. |
audioboostpreemph | Controls the emphasis of medium and high frequencies. The default value is 0. Value ranges 0 to 90. |
Cepstral Swift Properties
(Source: Cepstral Swift SDK documentation)
Property | Value |
configfile | Configuration file to use instead of the default. |
audiochannels | Number of audio channels [ 1 (mono) or 2 (stereo) ]. |
audiodeadair | Milliseconds of dead air (silence) to pad at the end of speech. |
audiopan | Left-to-Right panning [ -1 = left, 0 = center, 1 = right ]. This must be used with audio/channels=2. |
audiovolume | Volume multiplication factor as a percentage. Default value is 100. |
lexicon | A lexicon.txt file in the voice directory. Review Ceptral Lexicon Editing with [LexiconKit](xref:lexiconkit "LexiconKit") |
speechrate | Speaking rate (average WPM). |
voicedir | Directory for the voice. |
sfx | Special effects output chain file. |
CereProc CereVoice Properties
(Sourcce: CereProc CereVoicec SDK User Guide)
Property | Value |
enginepath | Runtime library path. |
voicepath | Voice files path. |
configfile | Configuration file path. |
license | License file path. |
rootcertificate | Root certificate file path. |
clientcrt | Client CRT file path. |
clientkey | Client key file paht. |
Microsoft Azure Speech Properties
Property | Value |
alphabetlanguage | The voice language to use for synthesizers that support multiple languages from which to obtain supported phoneme alphabets. |
devicename | The multimedia device ID that is used by the audio object. |
language | The voice language to use for synthesizers that support multiple languages. |
languages | One or more languages for which to enumerate available voices. |
speechkey | The Azure Speech Services key. |
speechregion | The The Azure Speech Services region. |
Microsoft SAPI 5 Properties
(Source: Microsoft SAPI5 Help File)
Property | Value |
deviceid | The multimedia device ID that is used by the audio object. |
rate | The current text rendering rate adjustment. Value specifying the speaking rate of the voice. Supported values range from -10 to 10 - values outside this range may be truncated. |
volume | The synthesizer output volume level of the voice in real time. Volume levels are specified in percentage values ranging from zero to 100. The default base volume for all voices is 100. |
Perisisting Pronunciations
Once the lexicon containing lexicon word pronunciations is saved, it can be deployed it with applications and installed on a target system.
Acapela and Cepstral lexicons are loaded at run time by the SpeechKit class. See the speech synthesis Lexicons topic for details. W3C .pls lexicons are included as part of text-to-speech (TTS) markup. See VoiceMarkupKit for details.