All chunks fired simultaneously — your device does the merging. Unlimited text length, lightning fast, no queue.
audio/mpeg bytes.
For long text: split client-side → fire all chunks in parallel with Promise.all() → concatenate the Blobs.
MP3 is a sequential stream so Blob concat works perfectly — zero server-side ffmpeg needed.
Returns all available voices with 1-based index numbers, grouped by language. Use the index in TTS calls.
Synthesizes one chunk of text and returns raw audio/mpeg binary. Max 1950 chars per call. No timeout set — Vercel's 30s function limit applies.
/api/voicesjsonList all TTS voices with index, name, language, gender.
/api/ttsjsonConvert text to natural speech. Returns audio stream.
| Param | Type | Description | |
|---|---|---|---|
| text | string | required | Text to speak. |
| voiceIndex | number | required | Voice index from /api/voices. |
| pitch | number | optional | Pitch (default 0). |
| rate | number | optional | Speed (default 0). |
curl -X POST {ORIGIN}/api/tts \
-H "Content-Type: application/json" \
-d '{"text":"Hello from Ahm7xMakki","voiceIndex":0}' \
--output speech.mp3fetch() in code.