What Is Vibe Coding? I Shipped a Production App With It to Find Out

Where the term came from

On February 2, 2025, Andrej Karpathy fired off what he later called a "shower of thoughts throwaway tweet." It read: "There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists." He credited the new capability to tools getting good enough to make it work, naming Cursor's Composer paired with Anthropic's Sonnet, and mentioned he was mostly talking to the editor by voice through SuperWhisper rather than typing. The tweet was viewed over four million times. Collins Dictionary later named vibe coding a 2025 Word of the Year. You can read the original post for yourself.

Here is the part most coverage skips. The phrase escaped its own definition within weeks. People started calling any use of an LLM to write code "vibe coding," which flattened a precise idea into a vibe of its own. Simon Willison drew the line back in March 2025: vibe coding is "building software with an LLM without reviewing the code it writes." If the model wrote every line but you read, tested, and understood all of it, that is not vibe coding, that is using an LLM as a typing assistant. The distinction is the entire story, so hold onto it.

I wanted to know what was actually true once you strip out the hype on one side and the reflexive disgust on the other. So I did what this column always does. I gave it a weekend and real money and built something.

The actual mechanism

Strip away the mystique and vibe coding is a tight feedback loop with one deliberate omission. You describe an intent in natural language. The model generates code. You run it. You look at the result, not the code, and you describe the next change based on what you saw. Then you repeat. The loop runs on outputs and behaviours, never on the source. That omission, refusing to read the code, is the thing that makes it vibe coding rather than ordinary AI-assisted development.

It works because the cost of generating a plausible next version of the code has collapsed. When regenerating a chunk is nearly free and nearly instant, reading it stops feeling worth the time. You start treating the codebase the way you treat a slot machine that mostly pays out: pull the lever, watch the result, pull again. The model holds the structure in its context window. You hold the goal in your head. Nobody holds the actual code, which is the point and also the trap.

That is the mechanism. Now the test.

What I built, and what it cost

The brief I set myself: a single-purpose web app I would genuinely use, buildable by someone who refuses to open a single generated file. I picked a "meeting cost calculator" for my own consulting work, a small tool where you add attendees with hourly rates, start a timer, and watch a running dollar figure tick up during a call. Trivial enough to finish in a weekend, real enough that broken state or a wrong calculation would actually annoy me.

The stack, chosen for the lowest-friction vibe loop I could assemble:

Cursor as the editor, Composer mode, Claude Sonnet as the model. $20 Pro subscription, prorated to roughly $8 for the weekend window.
Claude API overflow once I blew through Cursor's included fast requests on Saturday afternoon. $51.40 in metered tokens. This was the line item that surprised me.
Railway for deploy, a Node + SQLite backend with a static frontend. $5 in usage plus the hobby plan, call it $9.91 for the weekend.
A domain I already owned, so $0 there. Total out of pocket: $84.31.

Two timeboxes. Saturday 09:40 to 16:20, with a long lunch. Sunday 11:00 to 14:30. Just under fourteen hours of wall-clock time. The rule I held myself to: I do not open a generated file to read it. I can run the app, I can describe what I see, I can paste an error message back in, but I do not read source. The moment I read source to fix something, the experiment is over and I am no longer vibe coding.

What worked

The first three hours were genuinely startling. By 11:00 Saturday I had a working frontend with the timer, an attendee list, and a running total, all of it generated from maybe a dozen plain-English requests. I never named a framework. Cursor picked React and I let it. When I said "the total should keep counting even if I switch browser tabs," it added a timestamp-based calculation instead of an interval counter, which is the correct fix, and I only know that because it told me, not because I read it. By lunch I had something I would have estimated at a full day of normal work.

The model was strongest exactly where you would predict: well-trodden patterns with millions of training examples. CRUD operations, form state, a SQLite schema, a deploy config for Railway. These are problems the model has seen ten thousand times. The generated code for them was, when I later cheated and looked, boringly conventional and correct. For the well-lit parts of the problem, the vibe loop is not just fast, it is faster than I am, and I have been doing this for twenty years.

What broke

The wall arrived Saturday at 15:30. I asked for a feature where the meeting cost persisted across page reloads, so you could close the tab and come back to a still-running meeting. The app started showing the wrong total on reload. Sometimes off by a few cents, sometimes off by an hour of billing. I did what vibe coding tells you to do. I described the symptom and asked the model to fix it. It changed something. The bug moved. I described the new symptom. It changed something else. The bug came back in a different shape.

I spent ninety minutes in that loop. The model and I were both debugging a system neither of us could see. It had lost the thread of its own earlier decisions, because the relevant code had scrolled out of its working context, and I had no thread to lose because I had chosen never to read it. This is the precise failure mode that defines the ceiling. When the bug lives in code nobody has read, you are not debugging, you are negotiating with a slot machine and hoping. The error was eventually a timezone offset being applied twice, once at write and once at read. I know that because at 17:00 I broke my own rule, opened the file, and found it in four minutes.

That four-minute fix is the whole lesson. The vibe loop got me ninety percent of an app in three hours and then cost me ninety minutes failing to fix a bug a junior engineer would have caught by reading twenty lines.

Where it works and where it falls apart

The boundary is not "small apps versus big apps" or "frontend versus backend." The boundary is whether the problem fits inside what someone can hold in their head, model or human. Vibe coding works while the entire relevant state of the system is either in the model's context window or visible in the running app's behaviour. It falls apart the instant a defect hides in code that neither party is tracking.

This gives a clean decision rule. Vibe coding is the right tool when all of these hold:

The blast radius of a failure is a redeploy, not a lawsuit. No PII, no money movement, no irreversible side effects.
The lifespan is short. A prototype, a demo, a script you run twice, a tool you will rewrite if it survives contact with real use.
The problem is well-trodden. You are assembling known patterns, not inventing a novel algorithm or a tricky concurrency model.
You are validating an idea, not committing to a system. The output is a yes-or-no on "is this worth building properly," not the proper build.

It is the wrong tool the moment any of those flip. The most expensive mistake I see teams make is using a prototyping technique for a production job and being surprised when the prototype's properties show up in production. Unreviewed code is fine in a sketch. In a system you have to evolve for two years, the understandability of the code is the asset, and vibe coding spends that asset to buy speed you only needed on day one.

Willison made the same point from the maintenance angle: most of the work of software engineering is evolving existing systems, where the quality and understandability of the underlying code is exactly what lets you change it safely. Vibe coding optimizes for the one moment that requirement is absent, the green-field sketch, and quietly mortgages every moment after it.

So is it real engineering?

The question is framed wrong, which is why it generates so much heat and so little light. Engineering was never the typing. The typing is the part we have spent forty years trying to automate, from compilers to autocomplete to this. The engineering is the decomposition, the specification, the choice of what to build and what to leave out, and the validation that the thing does what you claimed. Vibe coding keeps all of that. It deletes the typing and, critically, it also deletes the reading.

The reading is where it stops being engineering. Reviewing the code is how you build the mental model that lets you change the system later, reason about its failure modes, and take responsibility for what it does. A discipline that ships artifacts no human has read and no human understands is not engineering, whatever else it is. So the honest answer is that vibe coding is real engineering right up until the point where it asks you to own an outcome you cannot explain. My weekend app was engineering through Saturday lunch and stopped being engineering at 15:30, in the exact same session, on the exact same codebase. Nothing about the tool changed. What changed was that I now owned a behaviour I could not account for.

What it means for your career

The 2025 Stack Overflow Developer Survey found that 84% of developers now use or plan to use AI tools, up from 76% the year before. The same survey found something more interesting underneath that headline: trust in AI output is at a low, with more developers actively distrusting the accuracy of AI tools (46%) than trusting it (33%). Read those two numbers together. Adoption is near-universal and confidence is falling. The survey data describes a workforce that uses the thing constantly and does not believe it.

That gap is the job. If the tool produced trustworthy output, the human in the loop would be redundant. Because it does not, someone has to decide what is correct, and the model cannot do that for you. This is why the career impact is a barbell and not a cliff. Vibe coding compresses the value of the bottom of the skill curve, the part where you reproduce known patterns and assemble boilerplate, because the model now does that faster than you. It simultaneously raises the premium on the top of the curve, the judgement to know what to read, what to trust, what to throw away, and what a correct system even looks like.

The exposed group is narrow and specific. It is not juniors. It is people whose entire contribution was producing code without understanding it, at any level. A junior who uses vibe coding to ship faster and then reads the output to learn why it worked is compounding their skill at a rate I would have killed for at their stage. A junior who only generates and never reads is training themselves to be the exact function the model already performs. The dividing line is not seniority. It is whether you use the loop to outsource thinking or to accelerate it.

For anyone making hiring or org decisions, the implication is concrete. Stop interviewing for the ability to produce code from a spec, because that skill is now cheap. Interview for the ability to look at generated code and say what is wrong with it, what it will cost to maintain, and whether it should exist at all. That is the skill that just became scarce, and the survey's trust gap is the market telling you the price of it is going up.

The honest verdict

Vibe coding is a real technique with a real and narrow zone of value, wrapped in a name that invites people to use it everywhere, which is exactly where it hurts them. I would reach for it again without hesitation for the next prototype, the next throwaway tool, the next "is this idea even worth building" question. It got me to a working answer in three hours for $84, and that is a genuinely new capability, not hype.

I would not let it within a mile of production code that holds data, moves money, or has to be maintained by a team next year, and after my Saturday afternoon I can tell you precisely why. Not because the code fails to run. Because when it breaks, you are debugging a system you have never read, and the only other party who could read it has already forgotten what it wrote. The four minutes it took me to fix a bug by reading the file, against the ninety minutes I lost not reading it, is the entire economics of the thing.

Use it to find out whether to build. Do not use it to build the thing you have to keep. And if you are early in your career, the move is not to avoid it or to surrender to it. The move is to vibe code fast, then read what it gave you, every time, until reading code at speed is the skill that makes you expensive. That is the one part of this job the model still cannot do for you, which is exactly why it is the part worth getting good at.

FAQ

What is vibe coding?

Vibe coding is writing software by describing what you want in natural language and letting a large language model generate the code, without reviewing the code it produces. Andrej Karpathy coined the term on February 2, 2025, describing a workflow where you "fully give in to the vibes" and "forget that the code even exists." The defining feature is the absence of code review: you accept the output, run it, and respond to what you see rather than to what the code says.

Is vibe coding real engineering?

It depends on what you keep from the engineering discipline. The act of decomposing a problem, specifying behaviour, and validating output is real engineering. The act of shipping code you have never read to a system that has to survive is not. In my weekend test, vibe coding was genuinely engineering while I was prototyping and stopped being engineering the moment I had a bug inside code I never looked at. The label is less useful than the question: did you understand what you shipped?

Is vibe coding bad?

It is not bad, it is mismatched to most production work. For prototypes, internal tools, one-off scripts, and validating an idea before you commit, it is the fastest path I have used and I would reach for it again. For anything carrying real data, real money, or a multi-year maintenance horizon, shipping unreviewed code is a liability you will pay for later. The risk is not the tool, it is using a prototyping technique for a production job.

Can you vibe code production apps?

You can deploy a vibe-coded app to production, and people do it daily. Whether you should depends on the blast radius. My test app went live and served real requests, but it held no sensitive data and nobody depended on it. The problem with vibe-coded production code is not that it fails to run; it is that when it breaks, you are debugging a system you do not understand, and so is the model. For anything where a failure costs more than a redeploy, read the code.

Will vibe coding replace developers?

No, but it will change what developers are paid for. The 2025 Stack Overflow survey found 84% of developers use or plan to use AI tools, yet trust in their output is at an all-time low. That gap is the job: someone has to decide what is correct, and the model cannot. Vibe coding compresses the value of producing code and raises the value of judging it. Developers who can specify, review, and own a system get more leverage; those who only generate code without understanding it get commoditized.