Text to Speech Endpoint
Overview
The Text-to-Speech endpoint enables you to generate audio by providing a text input along with pre trained voices.
Sample Generation
Example 1
Prompt
In the ancient land of Eldoria, where the skies were painted with shades of mystic hues and the forests whispered secrets of old, there existed a dragon named Zephyros. Unlike the fearsome tales of dragons that plagued human hearts with terror, Zephyros was a creature of wonder and wisdom, revered by all who knew of his existence.
Generated Output
Request
--request POST 'https://modelslab.com/api/v1/enterprise/text_to_speech/make' \
Make a POST request to 'https://modelslab.com/api/v1/enterprise/text_to_speech/make endpoint and pass the required parameters as a request body.
Body Attributes
| Parameter | Description | Values |
|---|---|---|
| key | The API key used for authenticating your request. | String |
| prompt | The text prompt that describes the audio to be generated. | Text |
| voice_id | The ID of trained voice only. You can get the list of trained voices here. | string trained voice ID |
| language | The language for the voice. Defaults to english if not specified. | american english, british english, spanish, japanese, mandarin chinese, french, brazilian portuguese, hindi, italian |
| output_format | The format of the generated audio. Defaults to wav if not specified. | wav or mp3 |
| speed | playback speed of the generated audio. Defaults to 1.0. | Integral value |
| emotion | Whether to enable emotion support. Currently only available in English. Defaults to false. | TRUE or FALSE |
| temp | Specifies if temporary links should be used valid for 24 hours. This can help if access to certain storage sites is blocked. Defaults to false . | TRUE or FALSE |
| webhook | A URL where the API will send a POST request once the audio generation is complete. | URL |
| track_id | An ID returned in the API response, used to identify webhook requests | Integral value |
Note: If you use
language:en, thenemotion:trueis required to use emotion tags in your prompt.
Emotion Support
Emotion support is currently only available in English (en) language. When emotion is enabled, you can use special tags in your text prompt to add expressive elements to the generated speech.
Available Emotion-Supported Voices
Female Voices
- Tara
- Leah
- Jess
- Mia
- Zoe
Male Voices
- Leo
- Dan
- Zac
Supported Emotion Tags
The following emotion tags can be added to speech prompts to enhance expressiveness:
| Tag | Description |
|---|---|
<laugh> | Adds a laughing effect |
<chuckle> | A soft chuckle for a subtle humorous tone |
<sigh> | Expresses disappointment, relief, or tiredness |
<cough> | Simulates a short cough |
<sniffle> | Mimics a sniffle, indicating sadness or a cold |
<groan> | Adds a groaning effect for frustration or discomfort |
<yawn> | Simulates yawning to express boredom or tiredness |
<gasp> | Expresses shock or surprise |
Example
Body
{
"key": "",
"prompt":"Build next-generation AI products without worrying about GPUs",
"language":"american english",
"voice_id":"madison",
"speed":1,
"emotion":false
}
Request
- JS
- PHP
- NODE
- PYTHON
- JAVA
var myHeaders = new Headers();
myHeaders.append("Content-Type", "application/json");
var raw = JSON.stringify({
"key": "",
"prompt":"Build next-generation AI products without worrying about GPUs",
"language":"american english",
"voice_id":"madison",
"speed":1,
"emotion":false
});
var requestOptions = {
method: 'POST',
headers: myHeaders,
body: raw,
redirect: 'follow'
};
fetch("'https://modelslab.com/api/v1/enterprise/text_to_speech/make", requestOptions)
.then(response => response.text())
.then(result => console.log(result))
.catch(error => console.log('error', error));
<?php
$payload = [
"key" => "",
"prompt" =>"Build next-generation AI products without worrying about GPUs",
"language" => "american english",
"voice_id" =>"madison",
"speed" => 1,
"emotion" =>false
];
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => 'https://modelslab.com/api/v1/enterprise/text_to_speech/make',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => '',
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => 'POST',
CURLOPT_POSTFIELDS => json_encode($payload),
CURLOPT_HTTPHEADER => array(
'Content-Type: application/json'
),
));
$response = curl_exec($curl);
curl_close($curl);
echo $response;
var request = require('request');
var options = {
'method': 'POST',
'url': 'https://modelslab.com/api/v1/enterprise/text_to_speech/make',
'headers': {
'Content-Type': 'application/json'
},
body: JSON.stringify({
"key": "",
"prompt":"Build next-generation AI products without worrying about GPUs",
"language":"american english",
"voice_id":"madison",
"speed":1,
"emotion":false
})
};
request(options, function (error, response) {
if (error) throw new Error(error);
console.log(response.body);
});
import requests
import json
url = "'https://modelslab.com/api/v1/enterprise/text_to_speech/make"
payload = json.dumps({
"key": "",
"prompt":"Build next-generation AI products without worrying about GPUs",
"language":"american english",
"voice_id":"madison",
"speed":1,
"emotion":false
})
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType, "{\n \"key\":\"\",\n \"prompt\":\"Build next-generation AI products without worrying about GPUs\",\n \"language\":\"american english\",\n \"voice_id\":\"madison\",\n \"speed\":1,\n \"emotion\":false\n}");
Request request = new Request.Builder()
.url("'https://modelslab.com/api/v1/enterprise/text_to_speech/make")
.method("POST", body)
.addHeader("Content-Type", "application/json")
.addHeader("X-API-Key", "{{token}}")
.build();
Response response = client.newCall(request).execute();
Response
- Success
- Processing
- Error
{
"status": "success",
"generationTime": 1.904285192489624,
"id": 334166,
"output": [
"https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"proxy_links": [
"https://cdn2.stablediffusionapi.com/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"meta": {
"base64": "no",
"emotion": "Neutral",
"filename": "b2dff60e-4636-4178-9a72-04a10a309185.wav",
"input_sound_clip": [
"tmp/0-b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"input_text": "Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
"language": "english",
"speed": 1,
"temp": "no"
}
}
{
"status": "processing",
"tip": "Your audio is processing in background, you can get this audio using fetch API",
"eta": 100,
"message": "Try to fetch request after seconds estimated",
"fetch_result": "https://modelslab.com/api/v6/voice/fetch/334166",
"id": 334166,
"output": [],
"future_links": [
"https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"proxy_links": [
"https://cdn2.stablediffusionapi.com/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"meta": {
"base64": "no",
"emotion": "Neutral",
"filename": "b2dff60e-4636-4178-9a72-04a10a309185.wav",
"input_sound_clip": [
"tmp/0-b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"input_text": "Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
"language": "english",
"speed": 1,
"temp": "no"
}
}
{
"status": "error",
"message": "Error message"
}