OpenAI Announced Sora: The Game Changing Text-to-Video AI Model

Not to be surpassed by competitors like Google, who recently launched a text-to-video tool, AI firm OpenAI on Thursday announced its text-to-video model, Sora.

Contents

Strengths of Sora
Sora’s weaknesses
Is OpenAI’s Sora accessible to the public?

Sora, like Google Lumiere, has limited availability. Sora can produce videos that are up to one minute long, unlike Lumiere.

Text-to-video has become the latest technological arms race in artificial intelligence (AI), as OpenAI, Google, Microsoft, and others look beyond text and image generation to secure their position in a sector expected to generate $1.3 trillion in revenue by 2032 and to attract users who have been attracted by generative AI since ChatGPT appeared a little more than a year ago.

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf

— OpenAI (@OpenAI) February 15, 2024

Also Read: What is Artificial Intelligence (AI)? Basics, Differences and Key Features

In a post on 15 Feb 2024, the company announced that Sora, the creator of both ChatGPT and Dall-E, will be available to “red teamers,” or experts in areas like as misinformation, violent speech and bias, who will “adversarily test the model.” It will additionally interact with visual artists, designers and filmmakers to get more feedback from creative specialists. Adversarial testing will be especially important in addressing the possibility of producing deepfakes, which is a key issue in using artificial intelligence (AI) to generate images and videos.

In addition to collecting feedback from outside the organization, the AI company showed that it is interested in disclosing its achievements right away to “give the public a sense of what AI capabilities are on the horizon.”

Strengths of Sora

One thing that may differentiate Sora is its capacity to read long prompts, such as one that was 135 words long. OpenAI released an example video on Thursday that shows Sora producing a variety of characters and scenarios, featuring people, animals and fluffy monsters, as well as urban environments, landscapes, zen gardens, and even a submerged New York City.

Prompt: “Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance… pic.twitter.com/Um5CWI18nS

— OpenAI (@OpenAI) February 15, 2024

This can be attributed in part to OpenAI’s prior work with the Dall-E and GPT models. Dall-E 3, a text-to-image generator, was launched in September 2023. Sora, in particular, utilizes Dall-E 3’s recaptioning method, which, according to OpenAI, produces “highly descriptive captions for the visual training data.”

Prompt: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. she wears a black leather jacket, a long red dress, and black boots, and carries a black purse. she wears sunglasses and red lipstick. she walks confidently and casually.… pic.twitter.com/cjIdgYFaWq

— OpenAI (@OpenAI) February 15, 2024

Also Read: Top 15 AI-powered apps shaping the world

“Sora can generate complicated scenes that include numerous characters, certain kinds of motion and accurate details of the subject and background,” according to the statement. “The program understands not only what the user was looking for in the prompt, but also how those things function in the physical world,” the statement said.

Prompt: “A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.” pic.twitter.com/gzEE8SwP81

— OpenAI (@OpenAI) February 15, 2024

OpenAI’s sample videos on X (previously known as Twitter) are very realistic, except for close-ups of a human face and swimming aquatic creatures. Otherwise, you may find it difficult to tell the difference between what is real and what is not.

The model, like Lumiere, can create video from still images, extend existing films, and replace missing frames.

“Sora provides a foundation for models that can understand and recreate the real world, which we believe will be an important milestone in achieving AGI,” according to the announcement.

Sora’s weaknesses

OpenAI admits that Sora has weaknesses such as trying to fully convey the physics of a complex situation and understand cause and effect.
“For example, a person could take a bite out of a cookie, but afterwards, the cookie might not have a bite mark,” stated the post.

Anyone who still has to create an L with their hands to find out which side is left may use their heart: Sora confuses left and right.

Also Read: Top 10 Future Tech: Brain-Computer Interfaces to Eco-Friendly AI

Is OpenAI’s Sora accessible to the public?

OpenAI did not say when Sora will be publicly accessible but clarified that it wants to take “multiple essential safety measures” first. This includes complying with OpenAI’s existing safety guidelines, which ban too much violence, sexual content, hostile imagery, celebrity likeness and the use of others’ intellectual property.

“Despite years of testing and research, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will misuse it,” the post said. It went on to say, “That is why we believe that gaining knowledge from real-world use is an essential part of creating and releasing increasingly safe AI systems over time.”