top of page

OpenAI Introduces Advanced Voice Mode in ChatGPT

OpenAI has begun rolling out an exciting new feature for ChatGPT: Advanced Voice Mode. Starting this week, a select group of ChatGPT Plus users will get a first look at GPT-4o’s hyperrealistic audio responses, with a broader release planned for all Plus users by fall 2024.

In a notable May demonstration, GPT-4o's voice capabilities surprised audiences with their lifelike quality, closely mimicking a human voice. The demonstration featured a voice named Sky, which bore an uncanny resemblance to actress Scarlett Johansson's voice from the movie "Her." Following the demo, Johansson publicly refuted any involvement with the project and sought legal action against OpenAI for using a voice similar to hers without permission. In response, OpenAI removed the Sky voice from their showcase and postponed the feature's release to bolster safety measures.


As the alpha version rolls out, it’s worth noting that the much-anticipated video and screensharing functionalities displayed during the Spring Update will not be included. These features are set to launch at a later date. For now, premium users will gain access to the refined voice capabilities shown earlier.


Advanced Voice Mode represents a significant upgrade from ChatGPT's existing Voice Mode. Previously, the system relied on three separate models to handle audio inputs and outputs: one for converting voice to text, GPT-4 for processing the prompt, and another for text-to-voice conversion. In contrast, GPT-4o integrates these tasks into a single multimodal model, drastically reducing response time and latency. GPT-4o can also interpret emotional nuances in users' voices, detecting tones of sadness, excitement, or even singing.

While TechCrunch and other media outlets have yet to test the new feature, OpenAI is releasing it incrementally to monitor usage closely. Users selected for the alpha test will receive notifications via the ChatGPT app and follow-up emails with usage instructions.


Since the initial demo, OpenAI has conducted extensive testing of GPT-4o’s voice capabilities, involving over 100 external red teamers who speak 45 different languages. A comprehensive report on these safety measures is expected in early August.

The new voice feature will include four preset voices: Juniper, Breeze, Cove, and Ember, all created in collaboration with professional voice actors. Notably, the Sky voice from the May demo has been removed, and OpenAI has emphasized that ChatGPT cannot mimic real people’s voices, whether public figures or private individuals. The system will also block outputs that deviate from the preset voices to prevent misuse.


In a move to prevent potential deepfake controversies, OpenAI has implemented filters to block requests for generating music or other copyrighted audio. This precaution follows legal challenges faced by other AI companies over copyright infringement issues, particularly from the music industry.


Comments


bottom of page