OpenAI is finally bringing Advanced Voice mode to the desktop. It will be available in both the Windows and Mac versions of the ChatGPT app and works the same as the mobile release.
This means you can finally have a conversation with your computer. Not in the way that you can talk to Siri or Alexa (and yes, they were both triggered as I dictated this copy), but a full conversation as if you were talking to another human being.
Advanced Voice is native speech-to-speech. This means that OpenAI's voice bot can understand everything you say, how you say it, and even the pauses between your words. It responds just as naturally, including adding vocal tics such as "ums" and breathing sounds between each sentence.
We still don't quite have the full promise made during OpenAI's spring update of screen sharing and live video with ChatGPT, but it is coming eventually and this is still a major upgrade on other voice models.
How does Advanced Voice work on a desktop?
Big day for desktops.Advanced Voice is now available in the macOS and Windows desktop apps.https://t.co/mv4ACwIhzA pic.twitter.com/HbwXbN9NkDOctober 30, 2024
You access Advanced Voice in the desktop app in the same way you would in iOS or Android — click the icon in the chat bar. Once you click the button, it will open a new view with that now infamous gradiating blue circle.
You can continue talking to the AI while you get on with other tasks. And while it can't see what you're doing, it can respond to descriptions of the task or your performance. So for example, if you're using it while playing Minecraft, you could describe the scene, and it could propose a building or block type to use.
Bringing Advanced Voice to the desktop is the next logical step for OpenAI and further cements ChatGPT as more than just a gimmick, but a full productivity platform. Being able to hold a conversation with an AI allows you to brainstorm ideas or perform tasks that you might not be able to do alone.
In the future, you'll be able to also share your screen with Advanced Voice so it can watch what you're doing. And one day, as AI agents take off, you may even be able to have it take control of your screen and talk you through a process.
What comes next?
While Advanced Voice is an incredibly useful tool, what's more powerful is the underlying real-time API. This is the back end of Advanced Voice used by developers to build their own versions or build them into their own tools.
During a recent briefing I had with the OpenAI team, the company's developer liaison lead, Romain Huet, showed this impressive demo of the solar system. You could instruct the voice to move between planets, and it was able to offer insights into the nature of each of the worlds that we visited in real-time and answer questions in a conversational style.
In another demo, he showed off using it as a virtual travel agent to help you not just book a flight but find the best deal. You could tell it your explicit requirements, and it could ask questions or follow up with feedback based on what was available, rather than the logic tree approach that we see from automated calls at the moment.
All of these features are going to start to roll out, not just in OpenAI's apps but in apps from other developers over the coming months and years. I think voice is going to become the new way that we all interact with our computers.
Now I just need to find a better dictation software that doesn't require me to spend hours going back over everything that I typed with my voice to fix the glaring errors.