With popular AI tools on the rise for generative videos, OpenAI has also launched a new text-to-video AI named Sora. Find out more details on the latest AI here.
New Delhi: OpenAI has gained wide popularity for its revolutionary work with text-based generative AI, ChatGPT, which made its debut on November 30, 2022. While the San Francisco-based company had already ventured into text-to-image-based generation with the launch of DALL-E earlier in 2021, it has recently unveiled its new video generation AI, known as Sora, whose visual results look high-quality and impressive. Here’s a look at its features, prowess and how to use the AI.
OpenAI’s Sora AI: A Text-To-Video Generator
Sora is a text-to-video generative AI model developed by OpenAI with the goal of creating videos from text descriptions. Based on text prompts from users, the model can create videos up to a minute long. It can generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. It understands not only what the user has asked for in the prompt, but also how those things exist in the physical world. The model is being developed to help people solve problems that require real-world interaction. As of the most recent information available, Sora is still in development and has not been made publicly available.
Sora AI: Capabilities and Features
While the new text-to-video generation tool is speculated to be launched soon, here is what the AI can do (based on information available so far):
- Sora AI generates videos from text descriptions, with the ability to create complex scenes, multiple characters, and specific types of motion.
- The new AI can produce videos up to a minute long and understands not only the content of the text prompt but also how those things exist in the physical world.
- OpenAI’s latest addition generates compelling characters that express vibrant emotions and creates multiple shots within a single generated video that accurately portray characters and visual style.
- It can create videos based on a single image, as well as extend an existing video or fill in missing frames.
- Sora AI has a deep understanding of language, enabling it to accurately interpret prompts.
- However, the new AI may have weaknesses, such as issues with accurately simulating the physics of a complex scene, understanding specific instances of cause and effect, confusing spatial details, and struggling with precise descriptions of events that take place over time.
- It is being tested by red teamers and a select group of visual artists, designers, and filmmakers to assess critical areas for harm or risks.
- OpenAI’s latest addition includes safety measures, such as working with red teamers and building tools to help detect misleading content.
- Future plans also expect the AI to include C2PA metadata if deployed in an OpenAI product.
How Does Sora Work?
Sora’s working mechanism involves a deep understanding of language semantics, enabling it to grasp the narrative and thematic essence of textual inputs. One of the key technical challenges Sora addresses is the simulation of motion within the generated videos. It incorporates dynamic modeling techniques that predict and render motion in a way that feels authentic and seamless.
Sora uses a combination of a diffusion model and a type of neural network called a transformer. The transformer inside Sora can process chunks of video data, and it can be trained on various types of video, including different resolutions, durations, aspect ratios, and orientations. This approach allows Sora to generate videos that are high-definition and full of detail, handling occlusion well and creating videos that are up to a minute long.
Additionally, Sora is also able to simulate artificial processes, such as video games. It can simultaneously control the player in a game like Minecraft while rendering the world and its dynamics in high fidelity. These capabilities can be elicited zero-shot by prompting Sora with captions mentioning the specific game.