Video is AI’s new frontier – and it is so persuasive, we should all be worried | Victoria Turk
I recently had the opportunity to see a demo of Sora, OpenAI’s video generation tool which was released in the US on Monday, and it was so impressive it made me worried for the future. The new technology works like an AI text or image generator: write a prompt, and it produces a short video clip. In the pre-launch demo I was shown, an OpenAI representative asked the tool to create footage of a tree frog in the Amazon, in the style of a nature documentary. The result was uncannily realistic, with aerial camera shots swooping down on to the rainforest, before settling on a closeup of the frog. The animal looked as vivid and real as any nature documentary subject.
Yet despite the technological feat, as I watched the tree frog I felt less amazed than sad. It certainly looked the part, but we all knew that what we were seeing wasn’t real. The tree frog, the branch it clung to, the rainforest it lived in: none of these things existed, and they never had. The scene, although visually impressive, was hollow.
Video is AI’s new frontier, with OpenAI finally rolling out Sora in the US after first teasing it in February, and Meta announcing its own text-to-video tool, Movie Gen, in October. Google made its Veo video generator available to some customers this month. Are we ready for a world in which it is impossible to discern which of the moving images we see are real?
In the past couple of years, we’ve witnessed the proliferation of generative AI text and image generators, but video feels even more high-stakes. Historically, moving pictures have been more difficult to falsify than still ones, but generative AI is about to change all that. There are many potential abuses of such technology. Scammers are already using AI to impersonate people’s friends or family members’ voices, in order to trick them out of money. Disinformation pedlars use deepfakes to support their political agendas. Extortionists and abusers make fake sexual images or videos of their victims. We are living in a world where some security researchers now suggest that families adopt a secret codeword, so they can prove they really are who they say they are if they have to call for help.
The creators of these tools appear to be aware of the risks. Before its public release, OpenAI opened up access only to select creative partners and testers. Meta is doing the same. The tools incorporate various safeguards, such as restrictions on the prompts people can use: preventing videos from featuring public figures, violence or sexual content, for instance. They also contain watermarks by default, to flag that a video has been created using AI.
While the more extreme possibilities for abuse are alarming, I find the prospect of low-stakes video fakery almost as disconcerting. If you see a video of a politician doing something so scandalous that it is hard to believe, you may respond with scepticism anyway. But an Instagram creator’s skit? A cute animal video on Facebook? A TV ad for Coca-Cola? There’s something boringly dystopian about the thought of having to second-guess even the most mundane content, as the imagery we’re surrounded with becomes ever-more detached from reality.
As I watched the AI-generated tree frog, I mainly wondered what the point of it was. I can certainly see AI’s utility in CGI for creative film-making, but a fake nature documentary seemed a strange choice. We have all marvelled at the amazing visuals in such programmes, but our awe is not just because the pictures are pretty: it is because they are real. They allow us to see a part of our world we otherwise could not, and the difficulty of obtaining the footage is part of the appeal. Some of my favourite nature documentary moments have been behind-the-scenes clips in programmes such as Our Planet, which reveal how long a cameraperson waited silently in a purpose-made hide to capture a rare species, or how they jerry-rigged their equipment to get the perfect shot. Of course, AI video can never reach this bar of genuine novelty. Trained on existing content, it can only produce footage of something that has been seen before.
Perhaps how a video has been produced shouldn’t matter so much. A tree frog is a tree frog, and one survey suggests that as long as we don’t know an image is made by AI, we like it just the same. It’s the deception inherent in so much AI media that I find upsetting. Even the blurriest real photograph of 2024 meme hero Moo Deng contains more life than a Movie Gen video of a baby hippo swimming, which, however sleekly rendered, is dead behind the eyes.
As AI content gets more convincing, it risks ruining real photos and videos along with it. We can’t trust our eyes any more, and are compelled to become amateur sleuths just to make sure the crochet pattern we’re buying is actually constructable, or the questionable furniture we’re eyeing really exists in physical form. I was recently scrolling through Instagram and shared a cute video of a bunny eating lettuce with my husband. It was a completely benign clip – but perhaps a little too adorable. Was it AI, he asked? I couldn’t tell. Even having to ask the question diminished the moment, and the cuteness of the video. In a world where anything can be fake, everything might be.