
My Experiment with Sora: The Promise and Pitfalls of AI-Generated Video
Introduction
Artificial intelligence is transforming creative industries, and nowhere is this more apparent than in the world of video generation. OpenAI’s Sora is at the cutting edge of this revolution, a tool capable of generating entire video sequences from simple text prompts. The idea is as thrilling as it is disruptive—describe a scene in words, and Sora brings it to life in moving images. For industries relying on animation, filmmaking, or corporate content creation, this technology has the potential to be game-changing.
Excited by this, I set out to explore what Sora could do. My initial goal was to create a simple, professional-looking video for a corporate website—one depicting a small, three-person call center. But what should have been a straightforward scenario quickly highlighted some of the challenges that come with AI-generated video. Sora struggled with coherence, missing key details and generating warped results that didn’t align with my description.
Rather than give up, I shifted my focus. If Sora’s failures were interesting, perhaps there was value in exploring them. My next attempt was more ambitious: a mythical production line assembling unique toy cars, each rolling off the belt with slight variations in color or design. The results were equally unpredictable—machines emerged deformed, the logic of the scene was lost, and my instructions were largely ignored. It became clear that while Sora is groundbreaking, it still struggles with consistency and precision, particularly when dealing with complex mechanics and logical object interactions.
Determined to see if I could achieve better results, I turned to a different approach inspired by Marques Brownlee’s review of Sora: using remix prompts on pre-existing, high-quality AI-generated videos from OpenAI’s video base. This seemed like a promising way to guide Sora toward more structured outputs. Given my ongoing work on an article about an IoT project for sheep farmers, I selected a particularly amusing base video—a sheep standing on its hind legs, wearing a hat, and dancing in a field. My prompt was simple: add small radio collars with antennas to all the sheep in the video to represent the IoT technology.
What happened next was both fascinating and horrifying. Instead of a logical augmentation, Sora produced something that was unintentionally surreal: the main sheep developed too many legs, its form became increasingly distorted, and as it curtsied, it mysteriously transitioned into what looked like either a freshly sheared state or a white overall. The results, while entertaining in a strange way, underscored the unpredictability of AI video generation.
In this blog post, I’ll take you through my full experience—what worked, what didn’t, and what Sora’s limitations reveal about the future of AI-generated video. While Sora is an incredible leap forward, my experiments show that we’re still far from a tool that can reliably interpret and execute detailed creative vision.
The Challenges of AI-Generated Video
Sora represents an incredible leap forward in AI-generated video, but it also highlights just how complex the task of video generation is. From a software development perspective, creating something like Sora is an immense challenge because video synthesis isn’t just about generating a series of coherent frames—it requires deep understanding of physics, spatial logic, object permanence, and human expectations of movement and form.
Unlike still image generation, where inconsistencies might go unnoticed, video forces AI models to maintain continuity. Objects can’t shift in shape unpredictably, characters shouldn’t gain or lose limbs, and interactions between objects must follow real-world physics. In other words, Sora needs to create not just visually plausible frames, but sequences where each frame logically follows the last—something that current AI still struggles with.
In my experiments, these difficulties became painfully obvious. While Sora can generate incredibly realistic short clips, it struggles with specificity, precision, and logical consistency over time. The more detailed or mechanical the request, the more things tend to fall apart.
Let’s take a deep dive into two of my attempts:
Deep Dive #1: The Toy Machine That Couldn’t Build Toys
One of my early ideas was a short product visualization video featuring a whimsical factory. The concept: a machine that assembles toy cars, specifically Mini Coopers, producing each one with slight variations—different wheels, unique paint colors, small aesthetic differences.
I imagined a clean, futuristic factory where a robotic production line would seamlessly assemble and personalize each toy car before rolling it off onto a conveyor belt. My prompt to Sora was clear:
“A futuristic production line assembles small toy cars. Each car is a Mini Cooper but comes out unique, with different wheels, paint colors, and subtle design variations. The factory is sleek and modern, with robotic arms carefully assembling each piece.”
The result, however, was far from the controlled precision I envisioned.
What Went Wrong?
- Warped Machinery – The robotic arms and conveyor belts, instead of being sleek and functional, emerged twisted and distorted. In some attempts, the production line seemed half-melted, with mechanical parts appearing fused together in an unnatural way.
- Illogical Object Interactions – The cars often didn’t follow any logical assembly process. Some appeared already built, only to disassemble as they moved down the line. Others changed shape inexplicably mid-video.
- Ignored Prompt Details – The concept of “each car being unique” was either ignored or misinterpreted. Instead of subtle variations, some cars merged together into surreal hybrid forms, or the differences were too extreme—one attempt resulted in a Mini Cooper morphing into an entirely different vehicle halfway through.
- Physics Glitches – In some attempts, cars floated off the conveyor belt instead of rolling, while in others, robotic arms grabbed components but didn’t actually attach them.
Why This Is So Hard for AI
Sora (and other AI video models) struggle with causality and mechanical interactions. Unlike a traditional animation pipeline—where human animators program every movement with logic—AI must infer how objects interact from its training data. If it hasn’t been exposed to enough examples of how a car is assembled in reality, it fills in the gaps with unpredictable results.
In short, my toy factory looked impressive at first glance, but completely fell apart on closer inspection. The surreal, dreamlike qualities of AI-generated video became clear: it’s great at suggesting realism, but poor at sustaining logical sequences over time.
Attempt 1:
Attempt 2:
Deep Dive #2: The dancing sheep that became a horror show
After the factory debacle, I decided to take a different approach. Instead of generating a video from scratch, I tried modifying an existing, well-structured AI-generated clip.
I found an entertaining and well-made video of a sheep standing on its hind legs, wearing a hat, and doing a small dance in a field of grazing sheep. This video was already visually coherent, so I thought it would be a perfect candidate for Sora’s remix feature—a way to subtly modify a high-quality video rather than generating a new one from scratch.
Since I was working on an article about IoT technology for sheep farming, I had a simple request:
“Add small radio collars with antennas to all the sheep in the video.”
It sounded like a modest edit—just adding a minor visual detail to an existing clip. But the results were… disturbing.
What Went Wrong?
- Sheep Mutation – The main sheep gained extra legs over the course of the video. It started out normal, but as it danced, additional limbs appeared in unnatural places.
- Glitches in Clothing – At one point, after curtsying, the sheep seemed to either be suddenly sheared or was now wearing a white overall—Sora couldn’t decide what was happening.
- Ignored Prompt Details – The small IoT radio collars with antennas I requested? Nowhere to be seen. Instead, random distortions appeared on the sheep’s fur, almost as if the AI had tried to interpret “collars” but got confused.
- Disturbing Visual Artifacts – The video had occasional blurry, half-formed appendages flickering in and out of existence, making the whole thing unintentionally nightmarish.
Why This Happened
This experiment revealed another weakness of AI video models: modifying existing footage while preserving structural integrity is incredibly difficult.
Unlike Photoshop-style editing, where objects can be cleanly layered on top of an image, AI-generated video must reimagine the entire scene with the requested modification in mind. This means:
- If the AI doesn’t fully understand an object (like a “radio collar with an antenna”), it will either ignore the request or introduce unpredictable distortions.
- Any changes to a subject’s form (like modifying a sheep’s fur) can cascade into unintended effects—hence the extra legs and mysterious costume change.
- AI struggles with temporal consistency—meaning an object may start out normal but progressively morph into something unrecognizable.
Attempt 1:
Attempt 2:
Deep Dive #3: The Call Center That Didn’t Understand Workspaces
Before my experiments with the toy machine and the dancing sheep, my initial goal with Sora had been relatively simple: to create a professional-looking video for a corporate website. I envisioned a small call center with just three people, each at their desks, working on computers in a clean, modern office environment.
My prompt was straightforward:
“A small, professional call center with three employees at desks, working on computer screens. The environment is modern and sleek, with a quiet, focused atmosphere.”
Compared to the other requests, this seemed like something AI should be able to handle easily—after all, corporate office stock footage is everywhere. But once again, Sora produced unexpected results.
What Went Wrong?
- Screens Were the Wrong Size and Orientation – Instead of uniform computer monitors, some screens were too large, too small, or floating at odd angles. In some cases, they were facing the wrong way, as if the AI didn’t quite understand that monitors should be visible to the user, not positioned randomly.
- Text Was Gibberish – Any attempt to depict writing—whether on screens or notepads—resulted in the usual AI-generated nonsense symbols and garbled text.
- People’s Hands and Phones Were Warped – Some employees held misshapen phones, while others appeared to be typing without keyboards.
- General Lack of Conceptual Understanding – Sora seemed to struggle with what a call center actually is. In some variations, there were too many chairs but no people, and in others, the workers were staring at blank walls instead of screens.
Why This Happened
The call center experiment underscored one of the fundamental weaknesses of AI video generation: understanding abstract concepts and structured human environments.
- AI doesn’t inherently understand what a “call center” is—it only knows patterns from its training data. If it hasn’t been trained on enough clear examples of correctly arranged office spaces, it fills in the gaps in unpredictable ways.
- Screens and writing are notoriously difficult for AI because they require not just accurate rendering, but contextual understanding. A screen should display something readable and relevant, but AI struggles to generate coherent text or meaningful user interfaces.
- Physical realism is inconsistent—while Sora is excellent at creating natural movement, it sometimes loses track of object permanence and logical placement—hence, floating screens and reversed monitors.
What should have been a simple, professional scene ended up looking off—just realistic enough to feel plausible at first glance, but filled with subtle distortions that made it clear AI doesn’t actually understand the content it’s generating.
Attempt 1:
This experience, like the others, highlighted a key takeaway: Sora is brilliant at generating visually impressive scenes, but it lacks true comprehension of structured environments and functional objects.
Conclusion: The Future of AI Video Is Both Exciting and Unpredictable
My experiments with Sora have been fascinating, frustrating, and deeply insightful. The technology is mind-blowing in terms of what it can achieve, but it also highlights fundamental challenges in AI-generated video.
- AI struggles with logical sequences and physical interactions, leading to surreal and dreamlike results.
- Editing existing AI-generated videos isn’t straightforward—small modifications can lead to unpredictable distortions.
- While Sora produces visually stunning clips, it often ignores or misinterprets details in complex prompts.
As AI video tools continue to evolve, we can expect better consistency, stronger adherence to user input, and more control over results. But for now, AI-generated video remains an experimental, sometimes bizarre, and often hilarious frontier of digital content creation.
Would I use Sora again? Absolutely. But I’d go in knowing that the results will probably be unexpected, weird, and occasionally nightmarish.