Your AI writes the PR.
Reflow proves it works.
Reflow is an MCP server your coding agent connects to. The agent writes flows in plain language; Reflow runs them in real browsers against every PR preview — with your own model key — and posts the evidence where you merge.
Works with the agent you already use
Reflow is an MCP server. Add it to Claude Code, Cursor, Codex — anything that speaks MCP — and your agent can author flows, run them in real browsers, and bring back the evidence.
Connect the agent you already use
One click for the popular ones, one JSON block for everything else. No SDK, no test framework, nothing to learn up front — your agent reads the tools and knows what to do.
any other MCP client — raw configuration
{
"mcpServers": {
"reflow": {
"url": "https://mcp.reflow.io",
"headers": { "Authorization": "Bearer $REFLOW_TOKEN" }
}
}
} The tools, exactly as your agent reads them
These are the live tool descriptions, not marketing copy. The workflow — author, estimate, run, read evidence, commit the fix — is taught entirely in here.
list_flows read-only idempotent ▾
List every flow Reflow tracks for a repo: slug, path, branches seen, and the latest run per branch. Call this first when working in a repo — it tells you what coverage already exists before you write anything. An empty result means no flows yet: author one in .reflow/flows/ (see get_flow on any other repo for the format) and run it with create_run. Args: repo (owner/name), branch? — defaults to all branches.
get_flow read-only idempotent ▾
Fetch one flow in full: the markdown source (intent, plain-language steps, RFL blocks, expectations), recent runs, and the current visual baselines per device. Use before editing a flow or diagnosing a failure — the prose intent tells you what the flow is protecting, and the RFL blocks show how each step currently compiles. Args: repo, slug | path.
create_run billable ▾
Execute one or more flows against a target URL in real browsers. Returns run ids immediately; poll get_run for results. Provide the flow content you want executed (it is upserted content-addressed — safe to send repeatedly) plus repo, branch, commit_sha, and target_url — a PR preview, staging, or your local dev server. Each declared device in a flow's frontmatter runs separately. Idempotent per idempotency_key: reuse owner/repo:pr:sha:slug to make retries safe. Args: repo, ref, commit_sha, target_url, flows[], pr_number?, idempotency_key.
example
create_run(flows: [signup], devices from frontmatter)
→ { runs: [
{ id: "run_188", flow: "signup", device: "iphone-15", status: "queued" },
{ id: "run_189", flow: "signup", device: "desktop", status: "queued" }
] } get_run read-only ▾
Status and evidence for a run: state (queued|running|succeeded|visual_diff|failed), the step-by-step timeline with which lines ran as RFL vs free text, per-step screenshots, token usage against budget, and proposed_changes if the run healed anything. Poll until terminal. On "failed", the failing step names the expectation that could not be met — treat that as a product bug to report, not a flow to patch. On "visual_diff", fetch get_run_proposed_changes and show the human the comparison. Args: run_id.
example
get_run(run_188) → status: "running", events: ✓ compiled 4 steps from cached RFL ✓ step 1/4 fill the signup form rfl · 1.8s ✓ step 2/4 submit rfl · 0.6s ◐ step 3/4 read email, follow magic link rfl · waiting on inbox · step 4/4 welcome screen greets by name · validate end state satisfies intent
get_run_screenshot read-only idempotent ▾
Presigned URL for the screenshot at any step of a run — what the browser actually saw, per device. Use when explaining a failure to a human, or when you need to look at the page state yourself before proposing a fix. Args: run_id, step_index, device?.
get_run_proposed_changes read-only idempotent ▾
The heal as a unified diff against the flow file: RFL rewrites (new anchor chains prepended, old kept as fallbacks) and plain-language expectation edits. Review it, then commit it to the PR branch with your own git tooling — Reflow never writes to your repo. If the diff changes an expectation, surface that line to the human: it is a product-behavior decision, not a mechanical fix. Args: run_id. Returns: path, before, after, unified_diff.
estimate_run_cost read-only idempotent ▾
Worst-case cost in USD for a batch of flows (runner minutes × devices + storage headroom) against the team's current balance — before anything is spent. This is the only tool that talks money: runs hold their worst case up front and settle to actual seconds on completion, so what you see here is the most a batch can cost. Call before create_run when the balance is low or the batch is large; if the estimate exceeds the balance, tell the human to top up rather than letting create_run fail with insufficient_credit. Args: flows[], devices?.
example
estimate_run_cost(flows: [signup, checkout], devices: [iphone-15, desktop])
→ { est_max_usd: 0.09, current_balance_usd: 24.61 } cancel_run idempotent ▾
Stop an in-flight run. The browser stops, billing settles to actual seconds used, and the hold releases. Use when a newer commit supersedes the one being tested. Args: run_id.
list_provider_keys read-only idempotent ▾
The model keys this team has registered: label, provider, last used. Never returns key material. Use to pick a provider_key label for a flow's frontmatter, or to tell the human they need to add a key in the dashboard before runs can execute (keys cannot be created over MCP by design).
Steps in plain language, executed for real
“Fill out the signup form with realistic values” is a complete, runnable step — and so is “read the verification email and sign in with the magic link”: every run gets its own inbox. Reflow compiles steps to RFL once, then replays them deterministically — a real browser, realistic data, no model in the loop while they pass.
# Steps
1. Fill out the signup form with realistic values
2. Submit the form
3. Read the verification email and sign in with the magic link
4. The welcome screen greets the new user by name
# Always true
- no field ever shows a validation error
Full name
City
Verify your email — your-app
https://your-app.dev/auth/magic?token=…
✓ “Welcome, Maya” — signed in, intent satisfied
Every viewport that matters
Declare devices in the flow's frontmatter and the same plain-language steps run on each — phone, tablet, desktop — with per-device visual baselines and per-device evidence in the PR.
iphone-15
ipad
desktop
devices: [iphone-15, ipad, desktop]
Every pixel, diffed against main
Every run snapshots every step on every device and compares it to the baseline from your main branch — full page and per component. Comparison is perceptual: SSIM weighted the way human eyes weigh difference, so anti-aliasing and sub-pixel shifts never cry wolf, while a moved button always does. Intentional changes arrive as a reviewable comparison — one click commits the new baseline to the branch, so it merges with the code that caused it.
✓ snapshot every step, every device
✓ diff against the baseline from main
◐ 2 regions changed — intentional?
✓ approved — baseline travels with the branch
Snapshots are captured when the page stops changing — adaptive stabilization, not fixed sleeps — and replays are deterministic: same viewport, compiled RFL. A diff means the UI changed, not that the test got flaky.
capturing baseline…
Approve & commit delta◐ visual differences detected
A moved button or a new banner is a warning you review, not a failure. Approve it and the baseline updates in the same PR — nothing to reconcile after merge.
✗ checkout did not succeed
Product breakage fails loud and blocks the merge. Visual review never buries a real failure — the two outcomes are never mixed.
Works on localhost
The same flows run against your dev server before a PR even exists. Point
target_url at localhost and the MCP connection opens an
ephemeral, run-scoped tunnel — the cloud browser reaches your machine, nothing is exposed
beyond the run, and your agent verifies the change before you push it.
⏺ reflow:create_run (target_url: http://localhost:3000) → run_201 ⏺ tunnel up: run-201.tunnel.reflow.io → localhost:3000 (closes with the run) ⏺ run_201: 4/4 steps ✓ · magic link handled · intent satisfied ✓ safe to push — opening the PR
Set up by your agent, not by you
Connect the MCP server and ask your agent to add Reflow. It discovers the tools, writes the flow, estimates the cost, runs it against your preview, and posts the check.
It lives where your PRs live
Reflow posts into the pull request: a merge-blocking check per flow, the proposed fix as a reviewable diff, screenshot deltas against the main baseline — and one button that commits the accepted changes back to the branch.
- ✓ reflow / signup — 14s, intent satisfied
- ◐ reflow / browse-and-search — significant visual differences detected, review below
- ✗ reflow / checkout-happy-path — checkout did not succeed
browse-and-search
Layout changed against the main baseline: a promo banner was added and the cart badge moved. Every stated expectation still holds — this looks intentional. Approving commits the new baseline and the expectation update below.
baseline · main
this PR
delta · 2 regions
## Always true
- the cart badge shows the running item count
- - the page shows no banners above the product grid
+ - a single promo banner may appear above the product grid checkout-happy-path
Failed at “pay with the standard test card”: submitting payment returned
500 from /api/pay and no
confirmation rendered. The expectation
“the confirmation page shows an order number and the correct total” cannot be met.
This looks like a product regression, not a flow problem — nothing to heal.
step 3/3 · pay with the standard test card
POST /api/pay → 500 Internal Server Error
page state: spinner persisted 30s, no confirmation, console: TypeError in pay.ts
✗ checkout did not succeed A flow is just markdown
Plain language, no selectors, no test framework. Reflow compiles steps to Playwright on the fly and caches the compilation per flow version — when your app changes, the compilation adapts instead of the file rotting. Assertions are about state that matters, in your words.
---
slug: checkout-happy-path
url: /checkout
devices: [desktop, iphone-15]
budgets: { minutes: 5, tokens: 50000 }
---
A signed-in user buys a single item.
## Steps
1. Add the first product on the page to the cart
2. Check out with the standard test card
3. The confirmation page shows an order number and the correct total
## Always true
- the cart badge shows the running item count
- prices never render as NaN or $0.00
- no error banners appear at any point How it works
- 01 — Flows live in your repo as markdown: intent, cucumber-esque steps, and expectations — all plain language, compiled to Playwright on the fly and cached.
- 02 — On every PR, Reflow runs each flow against your preview URL in a real browser, on every declared viewport. Your model key drives recovery; you pay your provider, not a token markup.
- 03 — Green check with screenshots; visual differences arrive as a reviewable comparison; real breakage fails loud with the exact state that went wrong. Git stays the source of truth.