Routescope APIRoutescope API
Audio Models

Audio Models

Doubao TTS and Gemini TTS text-to-speech endpoints

Current audio documentation focuses on text-to-speech and uses POST /v1/audio/speech. This page groups content by provider instead of splitting a full tutorial for every TTS model.

Endpoint Path

MethodPathPurpose
POST/v1/audio/speechConvert input text into speech. Successful responses usually return audio binary directly.
POST
/v1/audio/speech
curl -X POST "https://api.routescope.ai/v1/audio/speech" \  -H "Content-Type: application/json" \  -d '{    "model": "gpt-4o-mini",    "input": "Enter text that needs to be processed.",    "voice": "string",    "format": "json",    "speed": 1  }'
"string"

Authorization

BearerAuth

AuthorizationBearer <token>

Model relay interface recognition. Request heading: Autoration: Bearer .

In: header

Request Body

application/json

model*string

A speech synthesis model.

input*string

Text content to read.

voice*string

Sound, e.g. _FD_PROTEC_0, FD_PROTEC_1. A voice string field. Scope: Non-empty string or verification by business configuration.

format?string

Output audio format, e.g. _FD_PROTEC_0, FD_PROTEC_1.

speed?number

Speed. Speed value fields. Scope: An interface description or backstage configuration.

Response Body

audio/mpeg

Model Selection

ProviderModel IDExample Voice / NotesTypical Use
Doubaoseed-tts-1.0Example zh_female_cancan_mars_bigttsChinese speech synthesis, female voice example.
Doubaoseed-tts-2.0Example zh_male_m191_uranus_bigttsChinese speech synthesis, male voice example.
Geminigemini-2.5-flash-preview-ttsNo unified default voice is currently providedGemini TTS preview model. Flash orientation follows model name and page display.
Geminigemini-2.5-pro-preview-ttsNo unified default voice is currently providedGemini TTS preview model. Pro orientation follows model name and page display.

Common Parameters

FieldTypeRequiredDescription
modelstringYesTTS model ID.
inputstringYesText to synthesize.
voicestringYesVoice key. Doubao examples provide default voices. Gemini does not currently provide a unified default; follow API responses or the actual page display.
formatstringNoGateway OpenAPI field for output audio format. In Doubao official docs this maps to response_format / audio.encoding; when calling Routescope, use format as shown here.
speednumberNoSpeech speed.

Model-Specific Parameters

FieldApplicable ModelsDefault / RangeDescription
voiceseed-tts-1.0zh_female_cancan_mars_bigttsByteDance voice key. Actual available voices depend on account authorization.
voiceseed-tts-2.0zh_male_m191_uranus_bigttsByteDance voice key. Actual available voices depend on account authorization.
voiceGemini TTSNo unified defaultFollow API responses or the actual page display.
inputDoubao TTSFor normal voices, one request is recommended to stay under 1024 bytesLong text or cloned voices depend on channel configuration.
formatDoubao TTSDefault pcm; supports mp3, wav, pcm, ogg_opuswav is usually not used for streaming scenarios. When calling Routescope, use format.
speedDoubao TTSDefault 1, range [0.2, 3]Maps to official audio.speed_ratio.
volume_ratio, pitch_ratio, emotion, languageDoubao TTSSupport depends on channel configuration and voice capabilityOptional advanced fields. Actual support depends on channel configuration and voice capability.

Doubao Example

curl https://api.routescope.ai/v1/audio/speech \
  -H "Authorization: Bearer $ROUTESCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "seed-tts-2.0",
    "input": "Hello, this is a voice clip generated by Seed TTS 2.0.",
    "voice": "zh_male_m191_uranus_bigtts",
    "format": "mp3",
    "speed": 1
  }'

Gemini Example

curl https://api.routescope.ai/v1/audio/speech \
  -H "Authorization: Bearer $ROUTESCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "gemini-2.5-flash-preview-tts",
    "input": "Please generate a short voice clip.",
    "voice": "your-voice-id"
  }'

Gemini TTS does not currently have one confirmed default voice or parameter set. Replace your-voice-id with an available voice from API responses or the actual page display.

Response Structure

Successful speech generation usually returns audio binary directly. The examples use --output speech.mp3 to save the response to a file.

How is this guide?

Last updated on