Speech‑to‑Text (STT)
Real‑time or batch transcription with punctuation, diarization and multilingual support.
Speech‑to‑Text, Text‑to‑Speech, real‑time translation and speaker recognition with Azure Cognitive services.
Technology Cluster · Back to Cognitive Services · Voice use cases
Real‑time or batch transcription with punctuation, diarization and multilingual support.
Natural neural voices; tone, rate and prosody controls.
Live speech translation for meetings, support and multilingual content.
Speaker identification and verification for security and personalization.
Create a branded voice (where allowed) with consent processes, review and monitoring.
Streaming STT example (websocket/SDK):
POST /speech/recognition/conversation/cognitiveservices/v1?language=en-US
Ocp-Apim-Subscription-Key: <key>
Content-Type: audio/wav
Optimize audio format (16kHz mono PCM), chunking and retries.
Real‑time with WebSocket/SignalR; batch with Functions + Blob Storage. For advanced customization use Azure ML.
Service | When to use | Output |
---|---|---|
STT | Real‑time or batch transcription | Text with timestamps, diarization |
TTS | Voice assistants, audio content | Synthesized audio, SSML |
Translation | Multilingual meetings/support | Live transcriptions & translations |
Speaker | Authentication & personalization | Speaker ID/verification with scores |
Custom Voice | Controlled branded voice | Voice model + usage policies |
Reduce noise/reverberation, consistent mics, proper gain, 16kHz sampling.
Clear notices, minimal retention, anonymization and access roles.
Streaming for real‑time, batch for long files; caching, compression and quota controls.
No. Processing runs in Azure; optimize codec/bitrate and network conditions.
Pick the right locale, apply lexical adaptation and evaluate custom dictionaries.
Yes. Apply content filters and SSML rules; add human review for public content.