News | OpenAI Debuts GPT-4o: A Flagship Model with Live Multimodal Features

OpenAI Debuts GPT-4o: A Flagship Model with Live Multimodal Features

Published by: Insights Desk Released: May 14, 2024 Source: DemandTalk

Highlights:

The company states that GPT-4o, with “o” representing “omni,” aims to make conversing with AI feel akin to interacting with a human.
ChatGPT users will soon have free access to the new model as it gets rolled out to enhance its experiences.

OpenAI has advanced its artificial intelligence capabilities with the introduction of a cutting-edge AI model. OpenAI debuted GPT-4o, which can promptly react to text, audio, and image inputs, fostering more authentic interactions between humans and computers.

The company states that GPT-4o, where the “o” signifies “omni,” marks a stride towards imbuing interactions with AI models with a more human-like feel. With an average response time of 320 milliseconds to voice inputs, akin to human responsiveness, it also matches GPT-4 Turbo in English text performance while notably enhancing capabilities across non-English languages.

“This is the first time that we’re making a huge step forward when it comes to the ease of use. Until now, with voice mode, we had three models that come together to deliver this experience. We had transcription, intelligence and then text-to-speech all together in orchestration to deliver voice mode. This also brings a lot of latency to the experience, which breaks the immersion in collaboration with ChatGPT. Now, with GPT-4o, this all happens natively,” said Chief Technology Officer of OpenAI, Mira Murati.

The upcoming model will soon be available to ChatGPT users at no cost, as it will be integrated to enhance its functionalities. In April, OpenAI introduced a version of ChatGPT that is accessible without an account. Recently, the company unveiled a desktop version for MacOS, catering to both free and paid users.

During a demonstration, OpenAI researchers showcased how the new model integrated into ChatGPT enables real-time voice interactions, simulating the experience of conversing with a live person by delivering nearly instantaneous emotive responses. This advanced model can generate a diverse array of emotional reactions, seamlessly incorporating nuances such as chuckling, a hint of a “smile” in speech, gentle sighs, and other verbal cues typically associated with human speakers.

In the demonstration, OpenAI instructed the model to narrate a bedtime story and infused drama into the narrative, prompting the model to adopt a more grandiose tone. As it recounted a story about a robot, the presenters consistently directed it to adjust its tone, eventually asking for a “robotic voice” and a “singsong voice.” Impressively, the model adeptly complied with each request, smoothly transitioning its tone and even humorously responding with “Initiating dramatic robotic voice.”

The demonstration highlighted the model’s capability to be interrupted while speaking, allowing users to interject without waiting for it to complete a sentence. This feature significantly enhances the conversational experience with the model, mirroring real-life conversations where interruptions are sometimes necessary to convey a point effectively.

As the model is “multimodal,” it possesses the ability to “see” images and videos, enabling it to engage in discussions about visual content displayed on the screen or captured through a camera. To showcase this functionality, OpenAI demonstrated by requesting the model to observe as a mathematical equation was being written on a piece of paper.

The researchers presented the equation “3x 1 = 4” to the model and requested assistance in solving for x without revealing the answer. The model proceeded to guide them through the steps to determine the value, resulting in “x = 1.” Throughout the demonstration, ChatGPT effectively acted as a patient and considerate tutor.

The ChatGPT app offers coding assistance as well. Even without visual access to the screen, users can copy and send code to the app. Developers can then engage in a spoken conversation with the model about the code. Additionally, users have the option to share their entire screen with the model, enabling discussions about the screen’s context.

Another application of GPT-4o within ChatGPT, leveraging its voice and multilingual capabilities, is its function as a real-time cross-translator. With enhanced quality and speed across 50 languages, covering 97% of the global population, users can ask the model, “Could you translate Italian into English and vice versa for me and my friend?” and it could provide that service. During the OpenAI demonstration, the model even infused personal touches, such as stating, “Your friend asked.”

While GPT-4o access will be free as OpenAI integrates it into ChatGPT, paid users will enjoy five times the capacity limits compared to free users. Additionally, GPT-4o is accessible to developers via the application programming interface (API). It boasts double the speed, a 50% reduction in cost, and five times higher rate limits compared to the GPT-4 Turbo model.

fundamentos da cmm: um guia para iniciantes em tec...

get a sneak peek into revealx...

the total economic impact™ of extrahop reveal(x)...

7 critical reasons for office 365 backup...

top 5 use cases for splunk enterprise security...

2024 gartner® magic quadrant™ for siem...

the hidden costs of downtime...

the ai philosophy powering digital resilience...

following the leaders: how premier organizations b...

the essential guide to zero trust...

2023 gartner® market guide for security, orchestr...

security and network transformation in the age of ...

protecting data using artificial intelligence and ...

prioritize modern tools to scale video content cre...

the right video tools improve business impact....

prioritize video production speed, cost, and secur...

how cppi is unifying cost management with autodesk...

the westlands advisory 2023 it/ot network protecti...

the westlands advisory 2023 it/ot network protecti...

building for success with off-premises private clo...

deciphering cryptowall ransomware to plot a cyber ...

apache spark maximizing data potential with advanc...

navigating shadow data: securing your sensitive bu...

guide to data center virtualization: management, p...

mastering source code management: best practices a...

bespoke software catalyzing roi: transforming busi...

maximize sales pipeline through marketing automati...

modern web applications security for comprehensive...

application virtualization for revenue returns opt...

profitable content cloud solutions to boost conten...

soap api for web service integration: working and ...

mainframe modernization for capital returns: it in...

maximize profit with marketing spend management so...

leveraging thick data for business revenue returns...

how to make your business recession proof to earn ...

boost channel partner engagement for increased sal...

integrated marketing communication strategy to bui...

customer intelligence for investment returns: mode...

maximizing value from unstructured data to support...

the power of ssrf in optimizing operational effcie...

google llc enhances gemma 2 with 3 new models...

mission cloud launches mission ai foundation to op...

checkly secures usd 20 m for synthetic monitoring...

particle launches tachyon single-board computer fo...

gradient ai raises usd 56 m to innovate in insuran...

uk regulator seeks feedback on google-anthropic pa...

extreme networks inc. announced partnership with i...

kindo reels in usd 20.6 m and acquires whiterabbit...

microsoft’s spreadsheetllm enhances ai’s compr...

herculesai raises usd 26 m to develop and expand i...

intel capital leads usd 15 m investment in ai cons...

snowflake introduces multifactor authentication af...

alphabet call offs hubspot acquisition plans...

tracebit secures usd 5 m to promote cyberthreat de...

immunefi ethereum foundation collaborate on crowd...

ransomware group volcano demon makes phone calls f...

meta releases four open-source language models...

tembo raises usd 14 m to operate postgresql manage...

harvey is reportedly raising usd 100 m at usd 1.5 ...

twilio urges authy users to update their apps afte...

startup zero networks secures usd 20m...

openzeppelin rolls out defender 2.0 for new blockc...

best devsecops practices: securing the agile pipel...

role of machine learning in networking...

actionable big data insights to help you make bett...

new searchlight security module brings extra intel...

14 interesting trends that affect innovation and t...

what is web hosting?...

data privacy best practices every business should ...

OpenAI Debuts GPT-4o: A Flagship Model with Live Multimodal Features

Highlights:

Insights Desk

Related posts

Google LLC Enhances Gemma 2 with 3 New Models...

Mission Cloud Launches Mission AI Foundation to Op...

Gradient AI Raises USD 56 M to Innovate in Insuran...

UK Regulator Seeks Feedback on Google-Anthropic Pa...

Extreme Networks Inc. Announced Partnership with I...

Kindo Reels in USD 20.6 M and Acquires WhiteRabbit...

Microsoft’s SpreadsheetLLM Enhances AI’s Compr...