From Brief to Fix in One Session: Developing with the QEEK MCP

We just shipped a new feature in QEEK: the ability to load a brief directly into your IDE via MCP. A single command—work on brief id: xxx—and every spec, diagram, and decision from your QEEK session lands in your AI coding assistant's context.

To test it, we used QEEK to diagnose a real production bug, generated a brief, and handed it to an AI coding agent. Here's what actually happened—the parts that worked, the parts that didn't, and the parts that surprised us.

The Bug

We had a production error. Here's the actual log entry that started everything:

Cloud Run Job Logs

❌ Error processing file

error: "ENOENT: no such file or directory, open '/gcs/repositories/pandas-dev_pandas_main_.../ci/deps/actions-313-pyarrownightly.yaml'"

errorType: "NON_RETRIABLE"

filePath: "ci/deps/actions-313-pyarrownightly.yaml"

Files were listed in the processing manifest but weren't present in Google Cloud Storage when Cloud Run Jobs tried to read them. Smaller repos worked fine. The pandas-dev/pandas repo failed consistently with hundreds of missing files.

The confusing part: a 14,000-file repo synced perfectly. But pandas (~6,000 files) broke every time. Scale alone couldn't explain it.

We didn't know the root cause yet. We just had error logs.

Step 1: Diagnose in QEEK

We didn't write a detailed bug report. We opened QEEK's Architect and pasted the error log—the exact JSON you saw above—plus the contents of the failing file. That was the entire input. QEEK had already synced our codebase, so it could cross-reference the error against our actual sync pipeline code.

From that raw paste, QEEK produced:

Product Spec

Problem diagnosis, user impact, success criteria, and acceptance tests

Technical Spec

Root cause analysis, two solution options with trade-offs, implementation plan with code-level detail

Architecture Context

File paths, method signatures, data flow from GitHub API through GCS to Cloud Run Jobs

The brief was grounded in our actual code—specific methods, specific files, specific data flows. But it wasn't perfect. It diagnosed the symptom correctly but missed the root cause. More on that later.

Step 2: Load the Brief via MCP

This is the new part. Instead of copy-pasting specs into chat, we typed one command:

IDE Chat

load and review brief id: architect-123456789-1ths8k

The MCP tool fetched every spec from the QEEK session and injected it into the AI agent's context. The agent immediately had:

Full problem diagnosis

Implementation plan

File-level architecture context

No pasting. No “here's the context” preamble. No explaining the codebase from scratch. The agent started working immediately—reading the right files, proposing edits to the right methods, in the right order.

Step 3: Implementation

The agent followed the brief's recommended approach (“Option 1: Individual Upload with Retry”) and implemented the fix across a single file. Here's what was built:

uploadFileWithRetry method

Per-file GCS upload with 3 retries and exponential backoff (1s, 2s, 4s)

Success/failure tracking

Each upload tracked individually instead of failing the entire batch on one error

Manifest verification

Manifest built from actual GCS uploads, not the GitHub API file list

Graceful degradation

Partial uploads proceed with available files; total failure aborts cleanly

The agent didn't hallucinate method names, didn't invent APIs, and didn't need to ask “which file should I edit?” because the brief already contained that context.

But it didn't compile cleanly on the first pass.

TypeScript flagged two errors: a possibly-undefined f.path property that needed a guard, and a progress counter using the wrong array length. Both were one-line fixes, but claiming “it just worked” would be a lie.

The Unexpected Discovery

Here's where the brief's limitation became a strength. The brief had diagnosed “upload failures need retry”—the right symptom, wrong root cause. But because the agent had architectural context, when we asked “Why does this break on pandas but not on a 14K-file repo?” it could investigate.

The agent checked the pandas repo's .gitattributes file and found the real smoking gun:

pandas/.gitattributes

.github export-ignore

ci export-ignore

doc export-ignore

scripts/** export-ignore

*.csv *.json *.html *.pickle export-ignore

... 30+ more rules

Git's export-ignore attribute tells GitHub's zip download to exclude those files from archives. But our sync pipeline was building the manifest from the Git Tree API, which lists all tracked files regardless of export rules.

Two data sources. One invisible mismatch.

GitHub Tree API (manifest)

Lists all 6,000 files

GitHub Zip Download (GCS)

Excludes ~2,800 files

The manifest promised 6,000 files. GCS only had ~3,200. Cloud Run tried to process all 6,000—ENOENT for every missing one.

This wasn't in the original brief. QEEK never mentioned export-ignore. The brief prescribed retry + manifest verification—a reasonable treatment for the symptom, but not the disease. The architectural context from the brief gave the agent enough understanding to discover something the brief itself had missed.

The Final Fix: Three Buckets

The first implementation wasn't the final one. The agent initially filtered the manifest against uploaded files, but that still conflated export-ignored files (expected, harmless) with real upload failures (unexpected, needs alerting). We had to ask for a second refinement pass to distinguish between three categories:

Verified files

Present in both the zip and GCS. Included in the manifest.

Export-ignored files

Listed by Tree API but absent from zip. Expected behavior—logged at info level.

Upload failures

Present in zip but failed to upload to GCS after retries. Real errors—logged at error level.

One file changed. Three iterations total (initial implementation, TS error fixes, three-bucket refinement). The final version handles both the symptom (transient upload failures) and the root cause (manifest/zip mismatch from export-ignore).

The Code Review: Enter the Third Amigo

We thought we were done. Build passed, tests passed, the three-bucket logic was clean. Then we ran the branch through CODEX for an automated code review. It came back with two P1 blockers that both QEEK and the implementing agent had missed.

P1: Export-ignore-only syncs were marked as failed

When every pending file was absent from the zip (all export-ignored), the code treated it as a hard failure. A repo change containing only export-ignored files would repeatedly fail sync instead of completing with zero processable files. The three-bucket categorization was correct—but the control flow after it was wrong.

P1: Partial upload failures could still complete the sync

If some files failed to upload but at least one succeeded, the code logged the failures but continued to write a manifest and launch the Cloud Run job for the successful subset. This created a silent data gap—files expected to be processed simply weren't, with no failure signal.

Both fixes were small—reordering the decision logic so upload failures fail the sync first, and export-ignore-only runs complete as a no-op instead of erroring. But the bugs were real, and neither the brief nor the agent caught them.

The Three Amigos

QEEK (Kimi 2.5)

Diagnosed the problem, produced the brief, provided architectural context

Windsurf (Claude Opus 4.6)

Implemented the fix, discovered the root cause, iterated through three passes

CODEX (GPT 5.4)

Caught two release-blocking logic bugs that both the brief and the agent missed

The Timeline

We can't give you a precise before/after measurement because we didn't run this bug through both paths. But here's what we can say about where time went:

Without Brief (estimated)

Search across 4 repos to find the right file

Read sync.service.ts (~900 lines)

Trace data flow across 5+ files

Reproduce the bug locally

Research .gitattributes export-ignore

Implement + iterate

Our guess: 4–6 hours

With QEEK Brief (actual)

Paste error log into QEEK (~2 min)

QEEK generates brief from codebase

Load brief via MCP (instant)

Agent implements + 3 iterations

Discover root cause during review

CODEX catches 2 logic bugs, agent fixes

Actual: ~1 session

The “without brief” estimate is a guess based on how long this kind of cross-service debugging usually takes us. We didn't A/B test this.

Honest Assessment

What worked

Starting from an error log paste—not a polished bug report—QEEK still produced a brief with specific file paths, method signatures, and data flows.

The agent didn't need to explore the codebase from scratch. It went straight to the right file and the right methods.

The recommended solution option was sound. Retry with exponential backoff was the right pattern for the symptom.

Architectural context enabled deeper discovery. The .gitattributes root cause emerged because the agent understood the full pipeline, not just the failing method.

One command to load everything. No copy-paste, no "here's the background" preamble.

CODEX caught two release-blocking logic bugs that both the brief and the implementing agent missed. Different AI tools have genuinely different strengths.

The fix landed clean. No 'we thought we fixed it but it didn't actually work' next-iteration loops. No misdirection, no 2-hour frustration spirals. It shipped and it worked.

The outcome wasn't just a bug fix—it was a system improvement. The three-bucket categorization (verified, export-ignored, upload-failed) gives the sync pipeline explicit handling for edge cases it used to silently mishandle.

What didn't

The brief missed the root cause entirely. It never mentioned .gitattributes or export-ignore. It treated 'files missing from GCS' as an upload reliability problem, not a data source mismatch. A human had to ask the right follow-up question.

First compilation failed. The agent produced code with two TypeScript errors—an undefined check and a wrong variable reference. Small fixes, but 'it compiled cleanly on the first pass' would have been false.

Three iterations, not one. Initial implementation → TS fixes → three-bucket refinement. The brief's plan suggested a single implementation pass. Reality required back-and-forth.

The brief's testing section was aspirational. It suggested unit tests, integration tests, and load tests. None were written. The agent correctly prioritized the fix itself, but the brief set expectations it couldn't deliver on.

Neither the brief nor the implementing agent caught the control flow bugs after the three-bucket categorization. The architecture was right but the decision logic was wrong. It took a separate code review tool to spot it.

The Takeaway

It took three AI tools to ship this fix. QEEK (Kimi 2.5) diagnosed the problem and produced the brief. Windsurf (Claude Opus 4.6) implemented the fix, discovered the root cause, and iterated through three passes. CODEX (GPT 5.4) caught two release-blocking logic bugs that the first two missed entirely.

No single tool did it all. The brief missed the root cause. The agent missed the control flow bugs. The code reviewer couldn't have built the fix from scratch. But together, they covered each other's blind spots—and the fix landed clean on the first deploy. No misdirection loops, no “we thought we fixed it” false starts.

And the result wasn't just a patch. The three-bucket categorization replaced silent mishandling with explicit control flow. The system is better than it was before the bug was filed.

A brief doesn't remove iteration. It removes the cold start. And when you layer diagnosis, implementation, and review across the right tools, each one catches what the others miss—and the fix ships clean.

QEEK turns expert thinking into structured briefs that AI agents can act on. Connect your repos and try it.

Try QEEK