Last reviewed: 3/23/2024 10:18:25 AM

Recognizing and Synthesizing Speech with SpeechKit

Applications use a text-to-speech engine (i.e., synthesizer or voice) to speak. A text-to-speech engine is a software program that converts words to phonetic and prosodic symbols and generates a synthetic speech audio data. You application can playback the audio data through the system speakers, over telephony session, or streamed over an Internet session.

Applications use a speech recognition engine (i.e., recognizer) to listen. A speech recognition engine converts audio data from a microphone, telephony session, or Internet session, to a set of words.

Chant SpeechKit handles the complexities of speech recognition and speech synthesis to minimize the programming necessary to develop applications that speak and listen.

Recognizers are accessed via proprietary application programming interfaces (APIs). SpeechKit supports the following speech APIs for speech recognition:

Speech APIPlatforms
Apple SpeechARM, x64, x86
Google android.speechARM
Microsoft SAPI 5x64, x86
Microsoft Speech Platformx64, x86
Microsoft .NET System.Speechx64, x86
Microsoft .NET Microsoft.Speechx64, x86
Microsoft WindowsMedia (UWP)ARM, x64, x86
Microsoft WindowsMedia (WinRT)x86, x64
Nuance Dragon NaturallySpeakingx64, x86

Synthesizers are accessed via proprietary application programming interfaces (APIs). SpeechKit supports the following speech APIs for speech synthesis:

Speech APIPlatforms
Acapela TTSx64, x86
Apple AVFoundation TTSARM, x64, x86
Cepstral Swiftx64, x86
CereProc CereVoicex64, x86
Google android.speech.ttsARM
Microsoft SAPI 5x64, x86
Microsoft Speech Platformx64, x86
Microsoft .NET System.Speechx64, x86
Microsoft .NET Microsoft.Speechx64, x86
Microsoft WindowsMedia (UWP)ARM, x64, x86
Microsoft WindowsMedia (WinRT)x86, x64

For more information about recognizing and synthesizing speech with SpeechKit, review the following topics: