On May 7, 2026, OpenAI rolled out three new real-time audio models in its API, aimed at a new class of voice applications.
The headline model, GPT-Realtime-2, is its first voice model with GPT-5-class reasoning, meaning it can handle more complex requests and carry a conversation forward naturally rather than simply reading text aloud. Alongside it, GPT-Realtime-Translate performs live translation across dozens of languages while keeping pace with the speaker.
The common thread is latency: these models respond fast enough to feel like a genuine conversation instead of delayed playback. That sharply lowers the barrier for building voice agents, narrated content and accessibility features that sound convincingly human.
For Aikiros, it underlines that text-to-voice is shifting from a nice-to-have feature into a core interface that creators and businesses increasingly expect as standard.