今日好奇心
《圣经· 约翰福音》有这么一句话:the word was made flesh. 言即肉身。
维特根斯坦在《逻辑哲学论》里说,句子是实在的图像语言,意思是句子、语言可以影射世界的真实状态,语言就是我们的世界模型。
Sora已经通过大量的数据,能够理解物理世界的运动规律,学会了关于3D几何形状和一致性的知识,通过运动、反射等方式,创建了一个包含物理规则、与真实世界接近的虚拟世界。Sora不仅懂拍摄,还可以识别人类的情绪。
1. 什么是sora:将静象画面转化为视频:用60s的视频实现想象力的视觉化,同时保证视觉质量和对遵循提示。
2.关键词: “単视频多角度”“高度拟真世界”“世界模型”“充满激情的角色”
3. 原理: 与GPT模型类似,也使用了diffusion transformer架构,是一种扩散型变化器模型。OA 将视频/图像数据表现为patch,类似于GPT中的token,技术上,sora生成的视频中,主体可以在三维视频中进行连续运动。
4. Sora的惊人之处:堪比全能视觉艺术家
5. Sora是如何工作的
节点:人prompt—sora模型输出
直接影响行业:动画制作,游戏
加速行业:元宇宙,数字人,自动驾驶,降低数字资产成本
竞争赛道:pika,runaway等
风险:对于诈骗场景,世界真实性越来越难考证
6. 什么时候,以及如何用它来装B:当Sora最终跟大家见面的时候,就是在GPT5和ChatGPT相结合发布的时候。先不用说那时会有多炸裂的改进,就算跟现在Pika或Gen-2相近的视频生成功能,可以在ChatGPT不额外花钱使用,那也将会给用户带来很大的改变。
【思考时间】
关于这把双刃剑
“From a technical perspective it seems like a very significant leap forward,” says Sam Gregory, executive director at Witness, a human rights organization that specializes in the use and misuse of video technology. “But there are two sides to the coin,” he says. “The expressive capabilities offer the potential for many more people to be storytellers using video. And there are also real potential avenues for misuse.”
这个世界还会是真实的吗?安全界限预警
The OpenAI team plans to draw on the safety testing it did last year for DALL-E 3. Sora already includes a filter that runs on all prompts sent to the model that will block requests for violent, sexual, or hateful images, as well as images of known people. Another filter will look at frames of generated videos and block material that violates OpenAI’s safety policies.
OpenAI says it is also adapting a fake-image detector developed for DALL-E 3 to use with Sora. And the company will embed industry-standard C2PA tags, metadata that states how an image was generated, into all of Sora’s output. But these steps are far from foolproof. Fake-image detectors are hit-or-miss. Metadata is easy to remove, and most social media sites strip it from uploaded images by default.?
“We’ll definitely need to get more feedback and learn more about the types of risks that need to be addressed with video before it would make sense for us to release this,” says Ramesh.
Brooks agrees. “Part of the reason that we’re talking about this research now is so that we can start getting the input that we need to do the work necessary to figure out how it could be safely deployed,” he says.
Update 2/15: Comments from Sam Gregory were added.
prompt指令紧密相连
Similar to DALL·E 3, we also leverage GPT to turn short user prompts into longer detailed captions that are sent to the video model. This enables Sora to generate high quality videos that accurately follow user prompts.
Prompting with images and videos
All of the results above and in our landing page show text-to-video samples. But Sora can also be prompted with other inputs, such as pre-existing images or video. This capability enables Sora to perform a wide range of image and video editing tasks—creating perfectly looping video, animating static images, extending videos forwards or backwards in time, etc.
Animating DALL·E images
Sora is capable of generating videos provided an image and prompt as input. Below we show example videos generated based on DALL·E 231 and DALL·E 330 images.
(视频略)
A Shiba Inu dog wearing a beret and black turtleneck.
Monster Illustration in flat design style of a diverse family of monsters. The group includes a furry brown monster, a sleek black monster with antennas, a spotted green monster, and a tiny polka-dotted monster, all interacting in a playful environment.
An image of a realistic cloud that spells “SORA”.
In an ornate, historical hall, a massive tidal wave peaks and begins to crash. Two surfers, seizing the moment, skillfully navigate the face of the wave.
Extending generated videos
Sora is also capable of extending videos, either forward or backward in time. Below are four videos that were all extended backward in time starting from a segment of a generated video. As a result, each of the four videos starts different from the others, yet all four videos lead to the same ending.
We can use this method to extend a video both forward and backward to produce a seamless infinite loop.
部分原理
Video-to-video editing
Diffusion models have enabled a plethora of methods for editing images and videos from text prompts. Below we apply one of these methods, SDEdit,32 to Sora. This technique enables Sora to transform? the styles and environments of input videos zero-shot.