Aikiros
Back to news

OpenAI launches a new generation of real-time voice models

The headline model, GPT-Realtime-2, is its first voice model with GPT-5-class reasoning, meaning it can handle more complex requests and carry a conversation forward naturally rather than simply reading text aloud. Alongside it, GPT-Realtime-Translate performs live translation across dozens of languages while keeping pace with the speaker.



On May 7, 2026, OpenAI rolled out three new real-time audio models in its API, aimed at a new class of voice applications.


The headline model, GPT-Realtime-2, is its first voice model with GPT-5-class reasoning, meaning it can handle more complex requests and carry a conversation forward naturally rather than simply reading text aloud. Alongside it, GPT-Realtime-Translate performs live translation across dozens of languages while keeping pace with the speaker.


The common thread is latency: these models respond fast enough to feel like a genuine conversation instead of delayed playback. That sharply lowers the barrier for building voice agents, narrated content and accessibility features that sound convincingly human.


For Aikiros, it underlines that text-to-voice is shifting from a nice-to-have feature into a core interface that creators and businesses increasingly expect as standard.