Voice data for how people actually talk.
Curated conversational audio in code-switched languages — Hinglish, Taglish, Nigerian Pidgin English — for the teams training the next generation of voice AI.
The gap
Two billion people switch languages mid-sentence every day. The audio data to train models on how they actually speak doesn’t exist. Less than 100 hours of natural Hinglish is publicly available — and similar gaps exist for every other code-switched pair.
STT and TTS models trained on monolingual corpora break the moment users speak naturally. We close that gap with consented, native-speaker, mobile-recorded audio.
What we deliver
- 16 kHz / 16-bit / mono WAV — the buyer-standard format for ASR / TTS training
- Native conversational speech from urban speakers across India, the Philippines, and Nigeria
- Transcribed, quality-scored, and consent-logged
- Custom scenarios on request — food ordering, customer support, healthcare intake, casual chat
Talk to us
If you’re building voice AI for emerging-market users — or you’ve hit the multilingual ceiling on your current dataset — we have samples ready.