
Don't Let Security Kill Your Deal: How AI Startups Can Meet Enterprise Requirements
Read More
When I was building Fairytale Genie, an AI application I vibe-coded on the DevOpser platform, I discovered something troubling: there's a limit to how much you can improve AI outputs through prompt refinement alone. No matter how detailed or precise I made my prompts, the AI would still make errors. Worse, every time I tweaked the prompt to fix one issue, something else would break. My outputs were working 95% of the time, but those last few percent? They were a mess. It felt like walking a tightrope—fragile, brittle, and a far cry from the robust consistency needed for a great user experience at scale.
But somewhere in the course of building the site, I discovered a robust, repeatable solution. The breakthrough? Multiple rounds of LLM review for both prompts and outputs. (LLM stands for Large Language Model—these are the AI systems like ChatGPT, Claude, or Gemini that generate text based on patterns learned from vast amounts of data.) The old rules of process improvement—like Six Sigma—feel upside down when it comes to AI applications.
There are two main paths to achieving this kind of redundancy:
If your application is leveraging pure automation, you can set up chains of LLM calls where each LLM has a different role and reviews the prompt from the previous round. For example, in Fairytale Genie, I have a storyController
that generates the original story text based on user inputs, then a second LLM call acts as an editor with robust instructions to ensure all the non-negotiable requirements are met (nothing inappropriate for children, coherent stories, user input integrated intelligently into the story arc, etc).
For image generation, I use four rounds. The initial prompt generation is handled by a storyboarder
service that takes the story text, breaks it into scenes, identifies key elements for each image, and ensures continuity and clear location identification—outputting everything in JSON format. This is passed to the image generator, which creates the first prompt (text and negativeText, both required for Nova Canvas).
Next, a review service checks that all constraints are met (for Fairytale Genie, this means no human faces on main characters, no chimeric elements, no splitting characters into human and animal, etc). After that, there's a final LLM review before the data is sent to Nova Canvas to trigger image generation.
The other path—which I don't use in Fairytale Genie—is the use of AI agents. Agents can keep trying over and over again to achieve the goal, naturally reviewing their own output and updating their next prompt to improve results. This is why AI agents often produce such high-quality outputs: they get redundancy out of the box, unlike automation-based apps where you have to explicitly build redundancy into each step.
Conventional process engineering teaches us that errors multiply in complex systems. The more handovers or steps you have, the less reliable your process becomes. You're taught to minimize steps, avoid repetition, and keep things as simple as possible (think DRY: Don't Repeat Yourself, and KISS: Keep It Simple, Stupid).
Six Sigma principles reinforce this by showing that throughput errors multiply against each other. If each step in a process has 95% reliability, a 10-step process only achieves about 60% overall throughput (0.95¹⁰ ≈ 0.60). In traditional wisdom, your overall process throughput significantly worsens the more steps have throughput issues.
Also from LEAN, we learn to minimize handovers and that redundancy is considered a "hidden factory" that is a source of MUDA (waste).
But AI turns this on its head.
In traditional software engineering, redundancy is often seen as waste. But in AI systems, especially those built on LLMs, redundancy is resilience. A system with a single LLM call is actually more prone to errors than one that uses multiple LLM calls—where each call reviews, corrects, and improves upon the last output.
For the first time, building in multiple, redundant rounds of LLM review isn't just acceptable—it's essential. This is how you achieve the level of reliability and quality that users expect, especially when scaling to a broader audience.
Want to see this in action? Check out Fairytale Genie. The stories and images generated by the AI are not just creative—they're consistently reliable, thanks to this new approach of iterative LLM review and correction.
We're entering a new era of software engineering where the old rules no longer apply. The path forward isn't found in traditional playbooks—it's discovered through bold experimentation, continuous innovation, and the courage to test, learn and adapt.
Vibe code it yourself or hire DevOpser to build it for you—in just 6 weeks.