We vibe coded a production AI tool. Then we audited what we built.

A real-world look at what happens when an AI-powered product goes from idea to working MVP at speed, and what a security audit of the result taught us about building responsibly with generative AI.

The backstory

From idea to working product, fast.

The term we hear all too often in digital delivery: vibe coding. It's the practice of building software almost entirely through conversation with generative AI where, by iterating prompts (directives), the generative AI model will write code to reach its directive. The barrier to entry is not a computer science degree but domain knowledge paired with the ability to articulate a problem well.

It’s exactly the approach we took to build a virtual interview platform at Deazy, a tool designed to handle the full interview pipeline with AI complete with voice synthesis, real-time candidate analysis and automated scoring.

It went from concept to MVP in a fraction of the time a traditional build would take and we tested it… it worked. Interviews could run. Candidates could be assessed. Something genuinely useful had been created. But then we asked, how secure is this?, and ran a security audit.

"The vulnerability discovery process required no specialist tools, no insider access, and no prior knowledge of the codebase. Just a browser, developer tools, and a well-crafted prompt."

What we found?

Seven vulnerabilities. Four rated high severity.

The audit surfaced seven distinct security issues. The method to find them was disarmingly simple, utilising just a browser and a generative AI chat. Within minutes a working local client existed that could call any endpoint the app used, with no authentication required.

Of the seven vulnerabilities detected, four were rated at high severity including PII leaks and authentication bypass approaches. It was also discovered that it was possible to use the virtual interview system backend to arbitrarily call third party integrations such as our text-to-speech engine and generative AI backend.

The three PII findings alone represent a potential disclosure-level security event, the kind that damages client relationships, attracts regulatory attention and is very difficult to walk back once it has occurred.

It underlines the necessity for balance when it comes to vibe-coded work and is a stark reminder that (as outlined in insight 4 of our last ‘state of play’ report) review and testing is an ever more important part of the AI engineering development lifecycle.

How it happened

This is not a criticism of AI.

It would be easy to frame this as a cautionary tale about AI-generated code being inherently unsafe but that reading is too simple, the real issue is more instructive.

Generative AI is extraordinarily good at making things work. Ask it to build an endpoint that returns candidate data, and it will. Ask it to wire up a login flow, and it will. What it will not automatically do, unless explicitly instructed however, is reason about threat models, enforce authentication boundaries, apply row-level security or consider what happens when someone reverse-engineers the bundle from a browser tab.

Security is not a feature you add at the end, it’s a posture you maintain throughout - but maintaining that posture requires either deep engineering experience baked into the process, or very deliberate prompting that encodes security thinking into every step of the build.

(It’s why at Deazy we’re committed to ensuring that fast doesn’t mean exposed, with experienced teams to help define the security architecture of your product before the first line of AI code is even generated).

False sense of security

One of the more instructive parts of this process was not what the audit found, but what the tooling had already claimed to check.

Throughout the build, the AI coding environment performed its own security reviews. Prompts were analysed, code was scanned and feedback was returned. Green ticks. No issues flagged. From the outside, it looked like due diligence was happening automatically as part of the workflow.

It was not.

What those checks were doing was evaluating code in isolation, looking at individual functions and verifying known vulnerabilities in packages. What they were not doing was reasoning about the broader system as a whole, such as deeply checking how components were configured and applied (or not applied in this case!).

This is an easy trap to fall into, and it is worth naming directly. The presence of automated checks creates a reasonable assumption that security is being handled but that assumption is dangerous when the checks are not designed to model an attacker's perspective. A function can pass every lint rule and AI review and still be trivially exploitable if the authentication layer around it was never built.

The green ticks were not wrong, exactly. They just were not answering the right question. Security review at the function level is not a substitute for thinking about what an unauthorised user could do with access to the whole system: that’s a kind of thinking that requires deliberate instruction in a vibe-coded build and, at this point in time, cannot just be assumed to exist.

What needed fixing…

For this particular project but all vibe-development too, the remediation steps are straightforward:

Security prompting as a first-class concern from day one. Every AI generation session for backend logic is to include an explicit instruction to apply authentication, validate inputs and reason about who should and should not have access to each resource.

A security review checkpoint before any new tool moves from internal prototype to client-facing product. It’s the lightweight self-audit using the same browser-and-AI method described here that caught these issues before they existed in a live environment.

Structured guidance for security review built into the development workflow. In our case, we are documenting a reusable guidance pattern that instructs the AI to audit generated code for auth gaps, exposed keys and missing input validation before it ships.

In practical terms?

This is where prompt frameworks and project-level AI guidance become useful, with the goal not to trust the model more but to narrow its freedom so that secure defaults are part of the instruction set every time it writes or reviews code.

For example, a backend generation prompt should not just ask for functionality. It should also define the security posture:

Build a backend function that returns candidate interview data for the logged-in user.

Requirements:

- Require a valid authentication and authorization, scoped to the correct group

- Validate all inputs and reject malformed requests

- Assume all client-side code can be inspected by an attacker

- Do not expose service role keys or trust claims from the browser without verification

- Access only rows the authenticated user is permitted to read

- If Row Level Security is required, show the SQL policy needed

- At the end, explain the main abuse cases and how this implementation prevents them

And the same applies during review. A useful review prompt is explicit about what the AI should look for, rather than asking the vague question "is this secure?":

Review this feature as a security auditor.

Check for:

- Missing authentication or authorisation checks

- Database logic that relies on code rather than database constraints

- Backend functions callable without authentication and authorization

- Session flows that can be triggered for unauthorised users

- Exposure of PII, secrets, or internal identifiers to the client

- User input that could create XSS, injection, or denial-of-service risk

List findings by severity and propose the smallest safe fix for each one.

The other layer is durable project context. Files such as AGENTS.md or claude.md can encode team rules so they do not need to be reinvented in every prompt. In practice, that guidance can be very direct:

## Security rules for AI-generated code

- Treat all browser code as public and reversible

- Never return PII from a route or function without verifying the caller

- Any backend function that touches user data or paid APIs must verify a JWT

- Never trust client-supplied email domains, roles, or identifiers without server-side checks

- If a feature creates HTML or rich text from user input, assess XSS risk explicitly

- When generating code, explain what could be abused and what protects against it

- When reviewing code, prioritise auth gaps, data exposure, secrets, and cost-amplification risks

It’s a context that won’t replace experienced engineering judgement, but it does give the model a better operating envelope. Instead of repeatedly asking an AI to write code from scratch, you are giving it a framework that reflects how your team wants software to be built.

The bigger picture

Vibe coding is not going away. How do we harness it?

The ability to take a product idea from concept to working software through sheer clarity of thought and well-directed AI prompting is a genuine shift in how things get built. That is worth recognising. The goal of this write-up is not to pour cold water on that capability, but to reflect what responsible use of it looks like in practice.

The leverage of AI-assisted development is real and so are its blind spots. The answer is not to slow down or add layers of gatekeeping. It is to build better scaffolding around the process: checklists, prompt patterns, review gates and a habit of asking "what could go wrong?" with the same energy as "does it work?".

If a feature or an app is functional, great, but it isn't the end goal. We need to define that goal with other factors in mind, such as security, scaling beyond one user and sane UI choices, then convey that to the generative 'engineer' as a complete instruction.

The conclusion?

An essential balancing of risk

With this example we’ve shown how easy it might be to believe in and ship a product only to subsequently find the problematic gaps. But the process exists to do both at the same time and it’s an approach that needs to be prioritised.

Vibe coding represents a huge leap in productivity, a thinning of the barrier between idea and execution that is incredibly exciting but is equally littered with risk.

At Deazy, we’re committed to minimising that risk with an AI framework that focuses on security by design, integrating threat modelling and data privacy into the product roadmap from day zero, so the speed of innovation doesn't compromise the safety of business or users.

Five vibe coding takeaways

Working doesn't mean secure: A product that functions correctly can still have critical vulnerabilities. "Does it work?" and "is it safe?" are separate questions that both need answering.
Green ticks aren't a security audit: Automated code checks evaluate functions in isolation. They don't reason about your whole system or model what an attacker could do.
Security prompting is non-negotiable: AI won't apply authentication, validate inputs or think about threat models unless you explicitly tell it to, most of the time. Build security requirements into every backend generation prompt from day one.
Audit before you ship, not after: A simple browser-and-AI review checkpoint between prototype and production can catch what automated tools miss. Make it a hard gate, not an afterthought.
Durable context beats repeated prompting: Encoding your team's security rules into a project-level file (like AGENTS.md) means secure defaults are baked in every time, not remembered by chance.

About Deazy

Deazy enables ambitious organisations to explore and harness AI to drive digital product innovation and operational efficiency, applying our award-winning AI and software delivery expertise to solve complex challenges, accelerate innovation and build resilient digital platforms that scale.

With a uniquely flexible delivery model, we provide rapid access to a diverse pool of 6,000+ experienced nearshore AI, software, and data professionals, managed by highly-experienced and multidisciplinary in-house product and delivery experts who provide the support and resources to guarantee success.

If you’d like to explore how Deazy can support your team’s AI adoption journey or optimise your product delivery capability, please get in touch with us at hello@deazy.com or look us up on LinkedIn.

We vibe coded a production AI tool. Then we audited what we built.

A real-world look at what happens when an AI-powered product goes from idea to working MVP at speed, and what a security audit of the result taught us about building responsibly with generative AI.

The backstory

From idea to working product, fast.

What we found?

Seven vulnerabilities. Four rated high severity.

How it happened

This is not a criticism of AI.

False sense of security

What needed fixing…

In practical terms?

The bigger picture

Vibe coding is not going away. How do we harness it?

The conclusion?

An essential balancing of risk

Five vibe coding takeaways

About Deazy

Why AI POCs stall, and what it really takes to move into production

A CTO’s checklist: turning AI-augmented engineering into daily practice

State of Play: the Deazy take on AI-Augmented Engineering moving into 2026

We vibe coded a production AI tool. Then we audited what we built.

A real-world look at what happens when an AI-powered product goes from idea to working MVP at speed, and what a security audit of the result taught us about building responsibly with generative AI.

The backstory

From idea to working product, fast.

What we found?

Seven vulnerabilities. Four rated high severity.

How it happened

This is not a criticism of AI.

False sense of security

What needed fixing…

In practical terms?

The bigger picture

Vibe coding is not going away. How do we harness it?

The conclusion?

An essential balancing of risk

Five vibe coding takeaways

About Deazy

Related Articles

Why AI POCs stall, and what it really takes to move into production

A CTO’s checklist: turning AI-augmented engineering into daily practice

State of Play: the Deazy take on AI-Augmented Engineering moving into 2026

Subscribe to our newsletter