Audio Models

Current audio documentation focuses on text-to-speech and uses POST /v1/audio/speech. This page groups content by provider instead of splitting a full tutorial for every TTS model.

Endpoint Path

Method	Path	Purpose
POST	`/v1/audio/speech`	Convert input text into speech. Successful responses usually return audio binary directly.

curl -X POST "https://api.routescope.ai/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "input": "Enter text that needs to be processed.",
    "voice": "string",
    "format": "json",
    "speed": 1
  }'

curl -X POST "https://api.routescope.ai/v1/audio/speech" \  -H "Content-Type: application/json" \  -d '{    "model": "gpt-4o-mini",    "input": "Enter text that needs to be processed.",    "voice": "string",    "format": "json",    "speed": 1  }'

"string"

Authorization

BearerAuth

AuthorizationBearer <token>

Model relay interface recognition. Request heading: Autoration: Bearer .

In: header

Request Body

application/json

model*string

A speech synthesis model.

input*string

Text content to read.

voice*string

Sound, e.g. _FD_PROTEC_0, FD_PROTEC_1. A voice string field. Scope: Non-empty string or verification by business configuration.

format?string

Output audio format, e.g. _FD_PROTEC_0, FD_PROTEC_1.

speed?number

Speed. Speed value fields. Scope: An interface description or backstage configuration.

Response Body

audio/mpeg

Model Selection

Provider	Model ID	Example Voice / Notes	Typical Use
Doubao	`seed-tts-1.0`	Example `zh_female_cancan_mars_bigtts`	Chinese speech synthesis, female voice example.
Doubao	`seed-tts-2.0`	Example `zh_male_m191_uranus_bigtts`	Chinese speech synthesis, male voice example.
Gemini	`gemini-2.5-flash-preview-tts`	No unified default voice is currently provided	Gemini TTS preview model. Flash orientation follows model name and page display.
Gemini	`gemini-2.5-pro-preview-tts`	No unified default voice is currently provided	Gemini TTS preview model. Pro orientation follows model name and page display.

Common Parameters

Field	Type	Required	Description
`model`	string	Yes	TTS model ID.
`input`	string	Yes	Text to synthesize.
`voice`	string	Yes	Voice key. Doubao examples provide default voices. Gemini does not currently provide a unified default; follow API responses or the actual page display.
`format`	string	No	Gateway OpenAPI field for output audio format. In Doubao official docs this maps to `response_format` / `audio.encoding`; when calling Routescope, use `format` as shown here.
`speed`	number	No	Speech speed.

Model-Specific Parameters

Field	Applicable Models	Default / Range	Description
`voice`	`seed-tts-1.0`	zh_female_cancan_mars_bigtts	ByteDance voice key. Actual available voices depend on account authorization.
`voice`	`seed-tts-2.0`	zh_male_m191_uranus_bigtts	ByteDance voice key. Actual available voices depend on account authorization.
`voice`	`Gemini TTS`	No unified default	Follow API responses or the actual page display.
`input`	`Doubao TTS`	For normal voices, one request is recommended to stay under 1024 bytes	Long text or cloned voices depend on channel configuration.
`format`	`Doubao TTS`	Default pcm; supports mp3, wav, pcm, ogg_opus	wav is usually not used for streaming scenarios. When calling Routescope, use format.
`speed`	`Doubao TTS`	Default 1, range [0.2, 3]	Maps to official audio.speed_ratio.
`volume_ratio, pitch_ratio, emotion, language`	`Doubao TTS`	Support depends on channel configuration and voice capability	Optional advanced fields. Actual support depends on channel configuration and voice capability.

Doubao Example

curl https://api.routescope.ai/v1/audio/speech \
  -H "Authorization: Bearer $ROUTESCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "seed-tts-2.0",
    "input": "Hello, this is a voice clip generated by Seed TTS 2.0.",
    "voice": "zh_male_m191_uranus_bigtts",
    "format": "mp3",
    "speed": 1
  }'

Gemini Example

curl https://api.routescope.ai/v1/audio/speech \
  -H "Authorization: Bearer $ROUTESCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "gemini-2.5-flash-preview-tts",
    "input": "Please generate a short voice clip.",
    "voice": "your-voice-id"
  }'

Gemini TTS does not currently have one confirmed default voice or parameter set. Replace your-voice-id with an available voice from API responses or the actual page display.

Response Structure

Successful speech generation usually returns audio binary directly. The examples use --output speech.mp3 to save the response to a file.