Audio Models
Doubao TTS and Gemini TTS text-to-speech endpoints
Current audio documentation focuses on text-to-speech and uses POST /v1/audio/speech. This page groups content by provider instead of splitting a full tutorial for every TTS model.
Endpoint Path
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/audio/speech | Convert input text into speech. Successful responses usually return audio binary directly. |
curl -X POST "https://api.routescope.ai/v1/audio/speech" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o-mini", "input": "Enter text that needs to be processed.", "voice": "string", "format": "json", "speed": 1 }'"string"Authorization
BearerAuth
Model relay interface recognition. Request heading: Autoration: Bearer .
In: header
Request Body
application/json
A speech synthesis model.
Text content to read.
Sound, e.g. _FD_PROTEC_0, FD_PROTEC_1. A voice string field. Scope: Non-empty string or verification by business configuration.
Output audio format, e.g. _FD_PROTEC_0, FD_PROTEC_1.
Speed. Speed value fields. Scope: An interface description or backstage configuration.
Response Body
audio/mpeg
Model Selection
| Provider | Model ID | Example Voice / Notes | Typical Use |
|---|---|---|---|
| Doubao | seed-tts-1.0 | Example zh_female_cancan_mars_bigtts | Chinese speech synthesis, female voice example. |
| Doubao | seed-tts-2.0 | Example zh_male_m191_uranus_bigtts | Chinese speech synthesis, male voice example. |
| Gemini | gemini-2.5-flash-preview-tts | No unified default voice is currently provided | Gemini TTS preview model. Flash orientation follows model name and page display. |
| Gemini | gemini-2.5-pro-preview-tts | No unified default voice is currently provided | Gemini TTS preview model. Pro orientation follows model name and page display. |
Common Parameters
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | TTS model ID. |
input | string | Yes | Text to synthesize. |
voice | string | Yes | Voice key. Doubao examples provide default voices. Gemini does not currently provide a unified default; follow API responses or the actual page display. |
format | string | No | Gateway OpenAPI field for output audio format. In Doubao official docs this maps to response_format / audio.encoding; when calling Routescope, use format as shown here. |
speed | number | No | Speech speed. |
Model-Specific Parameters
| Field | Applicable Models | Default / Range | Description |
|---|---|---|---|
voice | seed-tts-1.0 | zh_female_cancan_mars_bigtts | ByteDance voice key. Actual available voices depend on account authorization. |
voice | seed-tts-2.0 | zh_male_m191_uranus_bigtts | ByteDance voice key. Actual available voices depend on account authorization. |
voice | Gemini TTS | No unified default | Follow API responses or the actual page display. |
input | Doubao TTS | For normal voices, one request is recommended to stay under 1024 bytes | Long text or cloned voices depend on channel configuration. |
format | Doubao TTS | Default pcm; supports mp3, wav, pcm, ogg_opus | wav is usually not used for streaming scenarios. When calling Routescope, use format. |
speed | Doubao TTS | Default 1, range [0.2, 3] | Maps to official audio.speed_ratio. |
volume_ratio, pitch_ratio, emotion, language | Doubao TTS | Support depends on channel configuration and voice capability | Optional advanced fields. Actual support depends on channel configuration and voice capability. |
Doubao Example
curl https://api.routescope.ai/v1/audio/speech \
-H "Authorization: Bearer $ROUTESCOPE_API_KEY" \
-H "Content-Type: application/json" \
--output speech.mp3 \
-d '{
"model": "seed-tts-2.0",
"input": "Hello, this is a voice clip generated by Seed TTS 2.0.",
"voice": "zh_male_m191_uranus_bigtts",
"format": "mp3",
"speed": 1
}'Gemini Example
curl https://api.routescope.ai/v1/audio/speech \
-H "Authorization: Bearer $ROUTESCOPE_API_KEY" \
-H "Content-Type: application/json" \
--output speech.mp3 \
-d '{
"model": "gemini-2.5-flash-preview-tts",
"input": "Please generate a short voice clip.",
"voice": "your-voice-id"
}'Gemini TTS does not currently have one confirmed default voice or parameter set. Replace your-voice-id with an available voice from API responses or the actual page display.
Response Structure
Successful speech generation usually returns audio binary directly. The examples use --output speech.mp3 to save the response to a file.
How is this guide?
Last updated on