ElevenLabs Create Speech API Documentation: HTTP Text-to-Speech

This detailed guide provides everything you need to know about the ElevenLabs (11 Labs) Create Speech API, the core HTTP endpoint for transforming static text into high-quality audio. With this documentation, developers can confidently integrate 11 Labs TTS API into applications, services, and platforms to deliver dynamic speech synthesis with precision.

Overview of ElevenLabs Create Speech API

The Create Speech API enables users to submit full text input via a standard HTTP POST request to generate audio output in the desired voice and format. It’s the ideal solution for scenarios where the entire text is ready upfront, offering low-latency, high-quality audio without the complexity of managing streaming connections.

API Endpoint and Method

Send a POST request to:

bashCopyEdithttps://api.elevenlabs.io/v1/text-to-speech/:voice_id
  • Method: POST
  • Path Parameter:
    • voice_id — The unique identifier of the chosen voice model.

Required Headers

  • xi-api-key: Your ElevenLabs API key for authorization.
  • Content-Type: application/json

Request Body Schema

The request body is structured in JSON format, containing several customizable parameters:

jsonCopyEdit{
  "text": "Your text here.",
  "voice_settings": {
    "stability": 0.5,
    "similarity_boost": 0.75,
    "style": 0,
    "use_speaker_boost": true,
    "speed": 1
  },
  "model_id": "eleven_monolingual_v1",
  "pronunciation_dictionary_locators": [
    {
      "pronunciation_dictionary_id": "dict_id",
      "version_id": "version_id"
    }
  ],
  "output_format": "mp3",
  "enable_ssml": false,
  "seed": 123456
}

Required Field

  • text: The full text to synthesize.

Optional Fields

  • voice_settings:
    • stability: Controls speech consistency.
    • similarity_boost: Enhances voice similarity.
    • style: Adjusts expressiveness (V2+ models).
    • use_speaker_boost: Default true; strengthens speaker effect (V2+).
    • speed: Speech rate (0.7 to 1.2).
  • model_id: Specific TTS model to use.
  • pronunciation_dictionary_locators: Override default pronunciation.
  • output_format: Audio format (mp3, wav, pcm, etc.).
  • enable_ssml: Enable parsing of SSML tags (default false).
  • seed: Deterministic output sampling.

Available Output Formats

The output_format field supports:

  • mp3 (default)
  • pcm_16000
  • pcm_22050
  • pcm_24000
  • pcm_44100
  • ulaw_8000
  • ulaw_16000
  • alaw_8000
  • alaw_16000
  • ogg_vorbis
  • aac
  • flac
  • wav

Choose based on your playback environment and bandwidth needs.

Example Request

httpCopyEditPOST /v1/text-to-speech/21m00Tcm4TlvDq8ikWAM HTTP/1.1
Host: api.elevenlabs.io
xi-api-key: YOUR_API_KEY
Content-Type: application/json

{
  "text": "Welcome to ElevenLabs TTS API.",
  "voice_settings": {
    "stability": 0.6,
    "similarity_boost": 0.8,
    "speed": 1
  },
  "output_format": "mp3"
}

Example Response

jsonCopyEdit{
  "audio": "<base64_encoded_audio>",
  "alignment": {
    "charStartTimesMs": [0, 5, 10, 15],
    "charsDurationsMs": [5, 5, 5, 5],
    "chars": ["W", "e", "l", "c"]
  }
}
  • audio: Base64-encoded audio file.
  • alignment: Optional; per-character timing data.

SSML Support

Enable SSML (Speech Synthesis Markup Language) by setting enable_ssml: true. This allows:

  • <break> — Insert pauses.
  • <emphasis> — Emphasize words.
  • <prosody> — Control pitch, rate, volume.

Example:

jsonCopyEdit{
  "text": "<speak>Hello <break time='500ms'/>world!</speak>",
  "enable_ssml": true
}

Pronunciation Dictionaries

Integrate dictionaries for custom word pronunciation:

jsonCopyEdit"pronunciation_dictionary_locators": [
  {
    "pronunciation_dictionary_id": "custom_dict_id",
    "version_id": "v1"
  }
]

Deterministic Output with Seed

Use seed to ensure consistent output across requests:

jsonCopyEdit"seed": 123456

Valid values: 0 to 4294967295.

Best Practices for ElevenLabs Create Speech API

  • Prepare your full text upfront for efficient processing.
  • Leverage SSML for nuanced speech control.
  • Tune voice_settings to achieve the desired vocal tone.
  • Select the correct output_format based on your platform requirements.
  • Use pronunciation_dictionary_locators to perfect speech clarity for unique terms.
  • Enable alignment data for syncing subtitles or text highlights with audio.

The ElevenLabs Create Speech API delivers powerful, high-fidelity text-to-speech synthesis for a wide range of use cases. Its straightforward POST-based architecture ensures that integrating dynamic voice capabilities into applications is both easy and flexible.