ElevenLabs Voices

✴️ ElevenLabs TTS



The most realistic Text to Speech provider is now integrated inside D-ID's API. ElevenLabs brings the most compelling, rich and lifelike voices to creators and publishers seeking the ultimate tools for storytelling. Simply choose your desired voice and use it instantly in your API request.

✴️ Example Usage

D-ID provides ElevenLabs integration to generate text to speech

{
   "script":{
      "type":"text",
      "provider":{
         "type":"elevenlabs",
         "voice_id":"21m00Tcm4TlvDq8ikWAM"
      }
   }
}
{
	"script":{
		"type":"text",
		"provider":{
			"type":"elevenlabs",
			"voice_id":"21m00Tcm4TlvDq8ikWAM",
			"voice_config":{
				"stability":0.5,
				"similarity_boost":0.75
			}
		}
	}
}
{
   "script":{
      "type":"text",
      "ssml": true,
      "input": "Enjoy the 3 seconds of <break time=\"3s\"/> silence!",
      "provider":{
         "type":"elevenlabs",
         "voice_id":"21m00Tcm4TlvDq8ikWAM"
      }
   }
}

✴️ Available Voices

Go to ElevenLabs website, and:

  1. Select any voice from the premade voices to listen how is it sound
  2. Copy the voice name of your selected voice
  3. Fetch the voice_id matched to the voice name from here

📘

Premium Voices ⭐️

ElevenLabs provider available for paid plans only.

📘

Get all Text-to-Speech supported voices

See /voices endpoint to get all the supported voices from all integrated TTS providers

👍

Using other text to speech providers

You can also use any other external provider you like, and pass it as an audio URL instead, or upload it as an audio file.

✴️ Voice Config

Stability: The stability parameter determines how stable the voice is and the randomness of each new generation. Lowering this parameter introduces a broader emotional range for the character - this, as mentioned before, is also influenced heavily by the original voice. Setting the parameter too low may result in odd performances that are overly random and cause the character to speak too quickly. On the other hand, setting it too high can lead to a monotonous voice with limited emotion. Default value: 0.5

Similarity: The similarity_boost parameter dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity slider is set too high, the AI may reproduce artifacts or background noise when trying to mimic the voice if those were present in the original recording. Default value: 0.75

✴️ Adding Pauses

There are a few ways to introduce a pause or break and influence the rhythm and cadence of the speaker.
The most consistent way is using the SSML (Speech Synthesis Markup Language) with the syntax and the examples below. Adding <break time=\"3s\"/> will create an exact and natural pause in the speech for 3 seconds. It is not just added silence between words, but the AI has an actual understanding of this syntax and will add a natural pause.

{
   "script":{
      "type":"text",
      "ssml": true,
      "input": "Enjoy the 3 seconds of <break time=\"3s\"/> silence!",
      "provider":{
         "type":"elevenlabs",
         "voice_id":"21m00Tcm4TlvDq8ikWAM"
      }
   }
}

Please note the following limitations:

  • Break time should be described in seconds, each pause is maximum 3 seconds in length.
  • Excessive number of break tags has shown to potentially cause some instability in the ElevenLabs AI. The speech of the AI might start speeding up and become very fast, or it might introduce more noise in the audio and a few other strange artifacts. ElevenLabs team is resolving the issues.
  • The video will not be generated if only "pauses" without actual text are specified, due to ElevenLabs limitations.
    If you need to create a fully silent video, please refer to Microsoft TTS silent video example.

Please visit the ElevenLabs Prompting page for more information on pauses

✴️ Use your own ElevenLabs account's voices

In case you already have an ElevenLabs account and you wish to use your own ElevanLabs voices as well as cloned voices, please follow these steps:

  1. Fetch your API Key from your personal ElevenLabs account's profile and login into your account
  2. Fetch the Voice ID of your desired voice from ElevanLabs' VoiceLab
  3. Create /talks, /clips, or /talks/streams request with the following parameters:

Request Header:

KeyValue (should be "String" type only)
x-api-key-external"{"elevenlabs": "YOUR_ELEVENLABS_API_KEY"}"

Request Body:

{
    "script": {
        "type": "text",
        "input": "This video was created using my own ElevenLabs voice",
        "provider": {
            "type": "elevenlabs",
            "voice_id": "YOUR_ELEVENLABS_VOICE_ID"
        }
    }
}

✴️ Support


Have any questions? We are here to help! Please leave your question in the Discussions section and we will be happy to answer shortly.

Ask a question