We just shipped a new feature in QEEK: the ability to load a brief directly into your IDE via MCP. A single command—work on brief id: xxx—and every spec, diagram, and decision from your QEEK session lands in your AI coding assistant's context.
To test it, we used QEEK to diagnose a real production bug, generated a brief, and handed it to an AI coding agent. Here's what actually happened—the parts that worked, the parts that didn't, and the parts that surprised us.
The Bug
We had a production error. Here's the actual log entry that started everything:
Files were listed in the processing manifest but weren't present in Google Cloud Storage when Cloud Run Jobs tried to read them. Smaller repos worked fine. The pandas-dev/pandas repo failed consistently with hundreds of missing files.
The confusing part: a 14,000-file repo synced perfectly. But pandas (~6,000 files) broke every time. Scale alone couldn't explain it.
We didn't know the root cause yet. We just had error logs.
Step 1: Diagnose in QEEK
We didn't write a detailed bug report. We opened QEEK's Architect and pasted the error log—the exact JSON you saw above—plus the contents of the failing file. That was the entire input. QEEK had already synced our codebase, so it could cross-reference the error against our actual sync pipeline code.
From that raw paste, QEEK produced:
Product Spec
Problem diagnosis, user impact, success criteria, and acceptance tests
Technical Spec
Root cause analysis, two solution options with trade-offs, implementation plan with code-level detail
Architecture Context
File paths, method signatures, data flow from GitHub API through GCS to Cloud Run Jobs
The brief was grounded in our actual code—specific methods, specific files, specific data flows. But it wasn't perfect. It diagnosed the symptom correctly but missed the root cause. More on that later.
Step 2: Load the Brief via MCP
This is the new part. Instead of copy-pasting specs into chat, we typed one command:
The MCP tool fetched every spec from the QEEK session and injected it into the AI agent's context. The agent immediately had:
Full problem diagnosis
Implementation plan
File-level architecture context
No pasting. No “here's the context” preamble. No explaining the codebase from scratch. The agent started working immediately—reading the right files, proposing edits to the right methods, in the right order.
Step 3: Implementation
The agent followed the brief's recommended approach (“Option 1: Individual Upload with Retry”) and implemented the fix across a single file. Here's what was built:
uploadFileWithRetry method
Per-file GCS upload with 3 retries and exponential backoff (1s, 2s, 4s)
Success/failure tracking
Each upload tracked individually instead of failing the entire batch on one error
Manifest verification
Manifest built from actual GCS uploads, not the GitHub API file list
Graceful degradation
Partial uploads proceed with available files; total failure aborts cleanly
The agent didn't hallucinate method names, didn't invent APIs, and didn't need to ask “which file should I edit?” because the brief already contained that context.
But it didn't compile cleanly on the first pass.
TypeScript flagged two errors: a possibly-undefined f.path property that needed a guard, and a progress counter using the wrong array length. Both were one-line fixes, but claiming “it just worked” would be a lie.
The Unexpected Discovery
Here's where the brief's limitation became a strength. The brief had diagnosed “upload failures need retry”—the right symptom, wrong root cause. But because the agent had architectural context, when we asked “Why does this break on pandas but not on a 14K-file repo?” it could investigate.
The agent checked the pandas repo's .gitattributes file and found the real smoking gun:
Git's export-ignore attribute tells GitHub's zip download to exclude those files from archives. But our sync pipeline was building the manifest from the Git Tree API, which lists all tracked files regardless of export rules.
Two data sources. One invisible mismatch.
GitHub Tree API (manifest)
Lists all 6,000 files
GitHub Zip Download (GCS)
Excludes ~2,800 files
The manifest promised 6,000 files. GCS only had ~3,200. Cloud Run tried to process all 6,000—ENOENT for every missing one.
This wasn't in the original brief. QEEK never mentioned export-ignore. The brief prescribed retry + manifest verification—a reasonable treatment for the symptom, but not the disease. The architectural context from the brief gave the agent enough understanding to discover something the brief itself had missed.
The Final Fix: Three Buckets
The first implementation wasn't the final one. The agent initially filtered the manifest against uploaded files, but that still conflated export-ignored files (expected, harmless) with real upload failures (unexpected, needs alerting). We had to ask for a second refinement pass to distinguish between three categories:
Verified files
Present in both the zip and GCS. Included in the manifest.
Export-ignored files
Listed by Tree API but absent from zip. Expected behavior—logged at info level.
Upload failures
Present in zip but failed to upload to GCS after retries. Real errors—logged at error level.
One file changed. Three iterations total (initial implementation, TS error fixes, three-bucket refinement). The final version handles both the symptom (transient upload failures) and the root cause (manifest/zip mismatch from export-ignore).
The Code Review: Enter the Third Amigo
We thought we were done. Build passed, tests passed, the three-bucket logic was clean. Then we ran the branch through CODEX for an automated code review. It came back with two P1 blockers that both QEEK and the implementing agent had missed.
P1: Export-ignore-only syncs were marked as failed
When every pending file was absent from the zip (all export-ignored), the code treated it as a hard failure. A repo change containing only export-ignored files would repeatedly fail sync instead of completing with zero processable files. The three-bucket categorization was correct—but the control flow after it was wrong.
P1: Partial upload failures could still complete the sync
If some files failed to upload but at least one succeeded, the code logged the failures but continued to write a manifest and launch the Cloud Run job for the successful subset. This created a silent data gap—files expected to be processed simply weren't, with no failure signal.
Both fixes were small—reordering the decision logic so upload failures fail the sync first, and export-ignore-only runs complete as a no-op instead of erroring. But the bugs were real, and neither the brief nor the agent caught them.
The Three Amigos
QEEK (Kimi 2.5)
Diagnosed the problem, produced the brief, provided architectural context
Windsurf (Claude Opus 4.6)
Implemented the fix, discovered the root cause, iterated through three passes
CODEX (GPT 5.4)
Caught two release-blocking logic bugs that both the brief and the agent missed
The Timeline
We can't give you a precise before/after measurement because we didn't run this bug through both paths. But here's what we can say about where time went:
The “without brief” estimate is a guess based on how long this kind of cross-service debugging usually takes us. We didn't A/B test this.
Honest Assessment
What worked
Starting from an error log paste—not a polished bug report—QEEK still produced a brief with specific file paths, method signatures, and data flows.
The agent didn't need to explore the codebase from scratch. It went straight to the right file and the right methods.
The recommended solution option was sound. Retry with exponential backoff was the right pattern for the symptom.
Architectural context enabled deeper discovery. The .gitattributes root cause emerged because the agent understood the full pipeline, not just the failing method.
One command to load everything. No copy-paste, no "here's the background" preamble.
CODEX caught two release-blocking logic bugs that both the brief and the implementing agent missed. Different AI tools have genuinely different strengths.
The fix landed clean. No 'we thought we fixed it but it didn't actually work' next-iteration loops. No misdirection, no 2-hour frustration spirals. It shipped and it worked.
The outcome wasn't just a bug fix—it was a system improvement. The three-bucket categorization (verified, export-ignored, upload-failed) gives the sync pipeline explicit handling for edge cases it used to silently mishandle.
What didn't
The brief missed the root cause entirely. It never mentioned .gitattributes or export-ignore. It treated 'files missing from GCS' as an upload reliability problem, not a data source mismatch. A human had to ask the right follow-up question.
First compilation failed. The agent produced code with two TypeScript errors—an undefined check and a wrong variable reference. Small fixes, but 'it compiled cleanly on the first pass' would have been false.
Three iterations, not one. Initial implementation → TS fixes → three-bucket refinement. The brief's plan suggested a single implementation pass. Reality required back-and-forth.
The brief's testing section was aspirational. It suggested unit tests, integration tests, and load tests. None were written. The agent correctly prioritized the fix itself, but the brief set expectations it couldn't deliver on.
Neither the brief nor the implementing agent caught the control flow bugs after the three-bucket categorization. The architecture was right but the decision logic was wrong. It took a separate code review tool to spot it.
The Takeaway
It took three AI tools to ship this fix. QEEK (Kimi 2.5) diagnosed the problem and produced the brief. Windsurf (Claude Opus 4.6) implemented the fix, discovered the root cause, and iterated through three passes. CODEX (GPT 5.4) caught two release-blocking logic bugs that the first two missed entirely.
No single tool did it all. The brief missed the root cause. The agent missed the control flow bugs. The code reviewer couldn't have built the fix from scratch. But together, they covered each other's blind spots—and the fix landed clean on the first deploy. No misdirection loops, no “we thought we fixed it” false starts.
And the result wasn't just a patch. The three-bucket categorization replaced silent mishandling with explicit control flow. The system is better than it was before the bug was filed.
A brief doesn't remove iteration. It removes the cold start. And when you layer diagnosis, implementation, and review across the right tools, each one catches what the others miss—and the fix ships clean.
QEEK turns expert thinking into structured briefs that AI agents can act on. Connect your repos and try it.
Try QEEK