

Aravind Putrevu
January 23, 2026
9 min read
January 23, 2026
9 min read

Cut code review time & bugs by 50%
Most installed AI app on GitHub and GitLab
Free 14-day trial
In the 1996 film Jerry Maguire, Tom Cruise’s famous phone call, where he shouts “Show me the money!” cuts through everything else. It’s the moment accountability enters the room.

In AI-assisted software development, “show me the prompt” should play a similar role.
As more code is generated by large language models (LLMs), accountability does not disappear. It moves upstream. The question facing modern engineering teams is not whether AI-generated code can be reviewed, but where and how review should happen when intent is increasingly expressed before code exists at all.
Earlier this week, Gergely Orosz of Pragmatic Engineer shared a quote on Twitter (or X, if you prefer) from an upcoming podcast with Peter Steinberger, creator of the self-hosted AI agent Clawdbot.
Steinberger’s point was straightforward but provocative: as more code is produced with LLMs, traditional pull requests may no longer be the best way to review changes. Instead, he suggested, reviewers should be given the prompt that generated the change.
That idea quickly triggered a polarized response.
Supporters argued that reviewing large, AI-generated diffs is becoming increasingly impractical.
From their perspective, the prompt captures intent more directly than the output. It tells reviewers what the developer was trying to accomplish, what constraints they set, and what scope they intended. In addition, a prompt can be re-run or adjusted, which makes it easier to validate the approach without combing through thousands of lines of generated code.
Critics, however, pointed to issues that prompts alone do not solve: determinism, reproducibility, git blame, and legal accountability.
Because LLM outputs can vary across runs, models, and configurations, approving a prompt does not necessarily mean approving the exact code that ultimately ships. For audits, ownership, and downstream liability, that distinction matters. In their view, code review cannot be replaced by “prompt approval” without weakening the guarantees that PR-based workflows were designed to provide.
The core disagreement, then, is not whether prompts should be part of review. It is where accountability should live in an AI-assisted workflow: primarily in the prompt, primarily in the code, or in a deliberately structured combination of both.
A prompt request is exactly what it sounds like: a request by a developer for a peer review of their prompt before feeding it into an LLM to generate code. Or, in the case of multi-shot or conversational prompts, a review of the conversation between the developer and the agent.
Instead of starting review at the diff level, a prompt request asks reviewers to evaluate the instructions given to the LLM so they can sign off on or contribute to the context, intent, constraints, and assumptions that guide the model’s output. A typical prompt request may include:
The system and user prompts
Relevant repository or architectural context
Model selection and configuration
Constraints, invariants, or non-goals
Examples of expected behavior
The goal is to make explicit what the model was asked to do before evaluating how well it did it.
In this sense, a prompt request functions more like a design artifact than a code artifact. It captures intent at the moment of generation and helps ensure the prompt is comprehensive and explicit enough to address the requirements. It can help teams better align around how they prompt and ensure that everyone is using the same context to generate code.
Much of the debate this week stemmed from treating prompt requests and pull requests as competitors. Either you do a prompt request or a pull request, some commenters suggested.
However, they shouldn’t be.
After all, they address different failure modes at different stages of the development lifecycle. Just like you’re not going to skip testing because you did a code review, you shouldn’t skip a code review because you did a prompt request.
Prompt requests are valuable because they ensure alignment and best practices early, before any code is generated or committed. They help teams align on what should be built, define boundaries, and constrain agent behavior. Because large language models are non-deterministic, capturing intent explicitly becomes even more important upstream, where variability is highest.
A prompt request can also help ensure that a prompt is optimized for the specific model or tool that will be used to generate the code, something essential in ensuring the quality of the output of increasingly divergent models (something we’ve consistently found in our evals).
Pull requests remain essential later, when teams review the exact code that will ship. They preserve determinism, traceability, testing, auditing, and accountability. One captures intent. The other captures execution.
Treating prompt requests as replacements for pull requests creates a false tension. Used together, they complement each other. Doing a prompt request and then skipping a pull request seems reckless and like tempting fate since the actual code produced hasn’t been validated.
When done as part of the regular software development workflow that includes a thorough code review, prompt requests are a way to shift left and catch issues early. It ensures a team is aligned on the goals of the feature, can help optimize the prompt for the model it’s using, and can ensure that the proper context is being supplied to improve the generated output. This can cut down significantly on review and issues later on.
When used alone without doing a pull request after the code is generated, the primary appeal of prompt requests is cognitive efficiency and speed.
AI has dramatically increased the speed at which developers can produce code, but the review process has not kept pace. As AI-authored changes grow larger and more frequent, line-by-line review becomes increasingly difficult and cognitively taxing to complete. Subtle defects slip through not because engineers don’t care, but because reviewing enormous, machine-generated diffs is mentally taxing.
Prompts, by contrast, are typically shorter and more declarative. Reviewing a prompt allows engineers to reason directly about scope, intent, and constraints without getting buried in implementation details produced by the model.
Prompt-first review works particularly well for:
Scaffolding and boilerplate generation
Small changes
Greenfield prototypes
Fast-moving teams optimizing for iteration speed
Hobby projects where defects in prod aren’t that consequential
In these cases, the most important question is often not “is every line correct?” but “is this what we meant to build?”
When used in concert with pull requests, there are few downsides since they simply offer another opportunity to review the proposed code change before generation. The biggest one is the time and cognitive effort it takes and how this could become a new bottleneck for code generation if it takes too long to get a review.
When treated as a replacement for pull requests, the biggest limitation of prompt requests is non-determinism.
After all, the same prompt can produce different outputs across runs or models. That makes reviewing prompts a weak substitute for reviewing an auditable record of what actually shipped. From the perspective of git blame, compliance, or legal accountability, prompt reviews alone are insufficient.
There are also real security and correctness risks. You might think you covered everything in your prompt but it may encode unsafe assumptions, omit edge cases, or fail to account for system-specific constraints that would normally be caught during careful code review. Reviewing intent does not guarantee that the generated output is secure, performant, or compliant.
Finally, prompts are highly contextual. A prompt that looks reasonable in isolation can still produce problematic implementations if the reviewer lacks deep familiarity with the codebase, infrastructure, or runtime environment. While prompt reviews are designed to limit this by bringing in additional sets of eyes to improve the prompt, human reviewers make mistakes all the time on actual code. Add in the unpredictability of a model and that’s a recipe for bugs and downtime These risks increase as prompts are reused or gradually modified over time or if you change models.
Used together, prompt requests and pull requests offset each other’s weaknesses.
A practical workflow might look like this:
A developer proposes a prompt request describing the intended change, constraints, and assumptions. This can involve just one prompt or a series of prompts for different parts of the code being generated. In the case of conversational prompts, the dev might propose a conversational response or share their transcript with the LLM after the fact. In that case, the review could help reprompt the agent to generate a better result.
The team reviews and aligns on the prompt(s) before code generation.
The code is generated and committed.
A traditional pull request reviews the concrete output for correctness, safety, and fit.
In this model, prompt requests act as an upstream alignment step for AI-generated work. They reduce ambiguity early, potentially shrink downstream diffs, and make pull requests easier to review.
Prompt requests do not replace the later rigor needed in pull requests. They just add more rigor earlier.
Let’s be honest, prompt requests are unlikely to fully replace pull requests. No one thinks a large publicly traded company is going to trust AI-generated output so faithfully, they’ll bet their revenue (and future) on it without careful review.
While we are bullish on prompt requests at CodeRabbit, the industry is still in the early stages of their adoption, and today’s LLMs are not capable of fully replacing pull requests.
Will prompt requests work instead of pull requests for smaller open-source or single-maintainer projects? We are likely heading toward that reality sooner rather than later, but pull requests remain an essential part of the current software development lifecycle. This is especially true for production systems, regulated environments, or large teams with shared ownership and long-lived, complex codebases.
Pull requests exist because software development ultimately involves shipping specific, deterministic artifacts into production. As long as that remains true, teams will need a concrete mechanism to review, test, audit, and approve the exact code that runs.
The more realistic future is not prompt requests instead of pull requests. It is prompt requests before pull requests.
What is becoming clear is that the quality of the prompt increasingly determines the quality of the output. Treating prompts as first-class artifacts acknowledges that reality without abandoning the safeguards that traditional code review provides.
In that sense, “show me the prompt” does not remove accountability. It shifts some of it earlier, where it can reduce rework, surface intent, and make the pull request stage easier rather than unnecessary.
Interested in trying CodeRabbit? Get a 14-day free trial.