ElevenLabs Create Speech API Documentation: HTTP Text-to-Speech
This detailed guide provides everything you need to know about the ElevenLabs (11 Labs) Create Speech API, the core HTTP endpoint for transforming static text into high-quality audio. With this documentation, developers can confidently integrate 11 Labs TTS API into applications, services, and platforms to deliver dynamic speech synthesis with precision.
Overview of ElevenLabs Create Speech API
The Create Speech API enables users to submit full text input via a standard HTTP POST request to generate audio output in the desired voice and format. It’s the ideal solution for scenarios where the entire text is ready upfront, offering low-latency, high-quality audio without the complexity of managing streaming connections.
API Endpoint and Method
Send a POST request to:
bashCopyEdithttps://api.elevenlabs.io/v1/text-to-speech/:voice_id
- Method: POST
- Path Parameter:
voice_id
— The unique identifier of the chosen voice model.
Required Headers
- xi-api-key: Your ElevenLabs API key for authorization.
- Content-Type:
application/json
Request Body Schema
The request body is structured in JSON format, containing several customizable parameters:
jsonCopyEdit{
"text": "Your text here.",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0,
"use_speaker_boost": true,
"speed": 1
},
"model_id": "eleven_monolingual_v1",
"pronunciation_dictionary_locators": [
{
"pronunciation_dictionary_id": "dict_id",
"version_id": "version_id"
}
],
"output_format": "mp3",
"enable_ssml": false,
"seed": 123456
}
Required Field
- text: The full text to synthesize.
Optional Fields
- voice_settings:
stability
: Controls speech consistency.similarity_boost
: Enhances voice similarity.style
: Adjusts expressiveness (V2+ models).use_speaker_boost
: Defaulttrue
; strengthens speaker effect (V2+).speed
: Speech rate (0.7 to 1.2).
- model_id: Specific TTS model to use.
- pronunciation_dictionary_locators: Override default pronunciation.
- output_format: Audio format (
mp3
,wav
,pcm
, etc.). - enable_ssml: Enable parsing of SSML tags (default
false
). - seed: Deterministic output sampling.
Available Output Formats
The output_format
field supports:
mp3
(default)pcm_16000
pcm_22050
pcm_24000
pcm_44100
ulaw_8000
ulaw_16000
alaw_8000
alaw_16000
ogg_vorbis
aac
flac
wav
Choose based on your playback environment and bandwidth needs.
Example Request
httpCopyEditPOST /v1/text-to-speech/21m00Tcm4TlvDq8ikWAM HTTP/1.1
Host: api.elevenlabs.io
xi-api-key: YOUR_API_KEY
Content-Type: application/json
{
"text": "Welcome to ElevenLabs TTS API.",
"voice_settings": {
"stability": 0.6,
"similarity_boost": 0.8,
"speed": 1
},
"output_format": "mp3"
}
Example Response
jsonCopyEdit{
"audio": "<base64_encoded_audio>",
"alignment": {
"charStartTimesMs": [0, 5, 10, 15],
"charsDurationsMs": [5, 5, 5, 5],
"chars": ["W", "e", "l", "c"]
}
}
- audio: Base64-encoded audio file.
- alignment: Optional; per-character timing data.
SSML Support
Enable SSML (Speech Synthesis Markup Language) by setting enable_ssml: true
. This allows:
<break>
— Insert pauses.<emphasis>
— Emphasize words.<prosody>
— Control pitch, rate, volume.
Example:
jsonCopyEdit{
"text": "<speak>Hello <break time='500ms'/>world!</speak>",
"enable_ssml": true
}
Pronunciation Dictionaries
Integrate dictionaries for custom word pronunciation:
jsonCopyEdit"pronunciation_dictionary_locators": [
{
"pronunciation_dictionary_id": "custom_dict_id",
"version_id": "v1"
}
]
Deterministic Output with Seed
Use seed
to ensure consistent output across requests:
jsonCopyEdit"seed": 123456
Valid values: 0
to 4294967295
.
Best Practices for ElevenLabs Create Speech API
- Prepare your full text upfront for efficient processing.
- Leverage SSML for nuanced speech control.
- Tune
voice_settings
to achieve the desired vocal tone. - Select the correct
output_format
based on your platform requirements. - Use
pronunciation_dictionary_locators
to perfect speech clarity for unique terms. - Enable alignment data for syncing subtitles or text highlights with audio.
The ElevenLabs Create Speech API delivers powerful, high-fidelity text-to-speech synthesis for a wide range of use cases. Its straightforward POST-based architecture ensures that integrating dynamic voice capabilities into applications is both easy and flexible.