reflow

Your AI writes the PR.
Reflow proves it works.

Reflow is an MCP server your coding agent connects to. The agent writes flows in plain language; Reflow runs them in real browsers against every PR preview — with your own model key — and posts the evidence where you merge.

Works with the agent you already use

Reflow is an MCP server. Add it to Claude Code, Cursor, Codex — anything that speaks MCP — and your agent can author flows, run them in real browsers, and bring back the evidence.

mcp.reflow.io

Connect the agent you already use

One click for the popular ones, one JSON block for everything else. No SDK, no test framework, nothing to learn up front — your agent reads the tools and knows what to do.

any other MCP client — raw configuration
{
  "mcpServers": {
    "reflow": {
      "url": "https://mcp.reflow.io",
      "headers": { "Authorization": "Bearer $REFLOW_TOKEN" }
    }
  }
}

The tools, exactly as your agent reads them

These are the live tool descriptions, not marketing copy. The workflow — author, estimate, run, read evidence, commit the fix — is taught entirely in here.

list_flows read-only idempotent

List every flow Reflow tracks for a repo: slug, path, branches seen, and the latest run per branch. Call this first when working in a repo — it tells you what coverage already exists before you write anything. An empty result means no flows yet: author one in .reflow/flows/ (see get_flow on any other repo for the format) and run it with create_run. Args: repo (owner/name), branch? — defaults to all branches.

get_flow read-only idempotent

Fetch one flow in full: the markdown source (intent, plain-language steps, RFL blocks, expectations), recent runs, and the current visual baselines per device. Use before editing a flow or diagnosing a failure — the prose intent tells you what the flow is protecting, and the RFL blocks show how each step currently compiles. Args: repo, slug | path.

create_run billable

Execute one or more flows against a target URL in real browsers. Returns run ids immediately; poll get_run for results. Provide the flow content you want executed (it is upserted content-addressed — safe to send repeatedly) plus repo, branch, commit_sha, and target_url — a PR preview, staging, or your local dev server. Each declared device in a flow's frontmatter runs separately. Idempotent per idempotency_key: reuse owner/repo:pr:sha:slug to make retries safe. Args: repo, ref, commit_sha, target_url, flows[], pr_number?, idempotency_key.

example

create_run(flows: [signup], devices from frontmatter)
→ { runs: [
    { id: "run_188", flow: "signup", device: "iphone-15", status: "queued" },
    { id: "run_189", flow: "signup", device: "desktop",   status: "queued" }
  ] }
get_run read-only

Status and evidence for a run: state (queued|running|succeeded|visual_diff|failed), the step-by-step timeline with which lines ran as RFL vs free text, per-step screenshots, token usage against budget, and proposed_changes if the run healed anything. Poll until terminal. On "failed", the failing step names the expectation that could not be met — treat that as a product bug to report, not a flow to patch. On "visual_diff", fetch get_run_proposed_changes and show the human the comparison. Args: run_id.

example

get_run(run_188) → status: "running", events:
  ✓ compiled      4 steps from cached RFL
  ✓ step 1/4      fill the signup form          rfl · 1.8s
  ✓ step 2/4      submit                        rfl · 0.6s
  ◐ step 3/4      read email, follow magic link rfl · waiting on inbox
  · step 4/4      welcome screen greets by name
  · validate      end state satisfies intent
get_run_screenshot read-only idempotent

Presigned URL for the screenshot at any step of a run — what the browser actually saw, per device. Use when explaining a failure to a human, or when you need to look at the page state yourself before proposing a fix. Args: run_id, step_index, device?.

get_run_proposed_changes read-only idempotent

The heal as a unified diff against the flow file: RFL rewrites (new anchor chains prepended, old kept as fallbacks) and plain-language expectation edits. Review it, then commit it to the PR branch with your own git tooling — Reflow never writes to your repo. If the diff changes an expectation, surface that line to the human: it is a product-behavior decision, not a mechanical fix. Args: run_id. Returns: path, before, after, unified_diff.

estimate_run_cost read-only idempotent

Worst-case cost in USD for a batch of flows (runner minutes × devices + storage headroom) against the team's current balance — before anything is spent. This is the only tool that talks money: runs hold their worst case up front and settle to actual seconds on completion, so what you see here is the most a batch can cost. Call before create_run when the balance is low or the batch is large; if the estimate exceeds the balance, tell the human to top up rather than letting create_run fail with insufficient_credit. Args: flows[], devices?.

example

estimate_run_cost(flows: [signup, checkout], devices: [iphone-15, desktop])
→ { est_max_usd: 0.09, current_balance_usd: 24.61 }
cancel_run idempotent

Stop an in-flight run. The browser stops, billing settles to actual seconds used, and the hold releases. Use when a newer commit supersedes the one being tested. Args: run_id.

list_provider_keys read-only idempotent

The model keys this team has registered: label, provider, last used. Never returns key material. Use to pick a provider_key label for a flow's frontmatter, or to tell the human they need to add a key in the dashboard before runs can execute (keys cannot be created over MCP by design).

Steps in plain language, executed for real

“Fill out the signup form with realistic values” is a complete, runnable step — and so is “read the verification email and sign in with the magic link”: every run gets its own inbox. Reflow compiles steps to RFL once, then replays them deterministically — a real browser, realistic data, no model in the loop while they pass.

.reflow/flows/signup.md

# Steps

1. Fill out the signup form with realistic values

2. Submit the form

3. Read the verification email and sign in with the magic link

4. The welcome screen greets the new user by name

# Always true

- no field ever shows a validation error

your-app.dev/signup — chromium · run_188

Full name

City

Email

Create account
run-188@inbox.reflow.io 0 messages

✓ “Welcome, Maya” — signed in, intent satisfied

Every viewport that matters

Declare devices in the flow's frontmatter and the same plain-language steps run on each — phone, tablet, desktop — with per-device visual baselines and per-device evidence in the PR.

iphone-15

ipad

desktop

frontmatter — one flow, three viewports
devices: [iphone-15, ipad, desktop]

Every pixel, diffed against main

Every run snapshots every step on every device and compares it to the baseline from your main branch — full page and per component. Comparison is perceptual: SSIM weighted the way human eyes weigh difference, so anti-aliasing and sub-pixel shifts never cry wolf, while a moved button always does. Intentional changes arrive as a reviewable comparison — one click commits the new baseline to the branch, so it merges with the code that caused it.

run_204 · checkout-happy-path · desktop

✓ snapshot every step, every device

✓ diff against the baseline from main

◐ 2 regions changed — intentional?

✓ approved — baseline travels with the branch

Snapshots are captured when the page stops changing — adaptive stabilization, not fixed sleeps — and replays are deterministic: same viewport, compiled RFL. A diff means the UI changed, not that the test got flaky.

baseline · main 2 regions
+ promo banner · cart badge

capturing baseline…

Approve & commit delta

◐ visual differences detected

A moved button or a new banner is a warning you review, not a failure. Approve it and the baseline updates in the same PR — nothing to reconcile after merge.

✗ checkout did not succeed

Product breakage fails loud and blocks the merge. Visual review never buries a real failure — the two outcomes are never mixed.

Works on localhost

The same flows run against your dev server before a PR even exists. Point target_url at localhost and the MCP connection opens an ephemeral, run-scoped tunnel — the cloud browser reaches your machine, nothing is exposed beyond the run, and your agent verifies the change before you push it.

your agent, mid dev-loop
⏺ reflow:create_run (target_url: http://localhost:3000) → run_201
⏺ tunnel up: run-201.tunnel.reflow.io → localhost:3000 (closes with the run)
⏺ run_201: 4/4 steps ✓ · magic link handled · intent satisfied
✓ safe to push — opening the PR

Set up by your agent, not by you

Connect the MCP server and ask your agent to add Reflow. It discovers the tools, writes the flow, estimates the cost, runs it against your preview, and posts the check.

your agent, setting reflow up
$ claude "add reflow to this repo and verify checkout still works"
reflow:list_flows — none found for this repo
Write(.reflow/config.yml) — team, default provider key
Write(.reflow/flows/checkout-happy-path.md) — intent, 3 plain-language steps, 3 expectations
reflow:estimate_run_cost — $0.04 max, balance $25.00
reflow:create_run (chromium · preview url) → run_142
add-to-cart · shipping · pay · confirm
end-state validation — intent satisfied, 1 model call
check posted: reflow / checkout-happy-path — 31s, screenshots attached

It lives where your PRs live

Reflow posts into the pull request: a merge-blocking check per flow, the proposed fix as a reviewable diff, screenshot deltas against the main baseline — and one button that commits the accepted changes back to the branch.

what your PR sees — github.com/your-org/your-app/pull/142
Some checks were not successful — merge blocked
  • reflow / signup — 14s, intent satisfied
  • reflow / browse-and-search — significant visual differences detected, review below
  • reflow / checkout-happy-path — checkout did not succeed
r reflow bot commented 2 minutes ago · edited

browse-and-search

Layout changed against the main baseline: a promo banner was added and the cart badge moved. Every stated expectation still holds — this looks intentional. Approving commits the new baseline and the expectation update below.

baseline · main

this PR

delta · 2 regions

+ promo banner · cart badge
proposed change — .reflow/flows/browse-and-search.md
  ## Always true
  - the cart badge shows the running item count
- - the page shows no banners above the product grid
+ - a single promo banner may appear above the product grid
Approve & commit delta View side-by-side commits the new visual baseline + expectation edit to this branch

checkout-happy-path

Failed at “pay with the standard test card”: submitting payment returned 500 from /api/pay and no confirmation rendered. The expectation “the confirmation page shows an order number and the correct total” cannot be met. This looks like a product regression, not a flow problem — nothing to heal.

step 3/3 · pay with the standard test card
  POST /api/pay → 500 Internal Server Error
  page state: spinner persisted 30s, no confirmation, console: TypeError in pay.ts
  ✗ checkout did not succeed
View run · screenshots · trace · console

A flow is just markdown

Plain language, no selectors, no test framework. Reflow compiles steps to Playwright on the fly and caches the compilation per flow version — when your app changes, the compilation adapts instead of the file rotting. Assertions are about state that matters, in your words.

.reflow/flows/checkout-happy-path.md
---
slug: checkout-happy-path
url: /checkout
devices: [desktop, iphone-15]
budgets: { minutes: 5, tokens: 50000 }
---

A signed-in user buys a single item.

## Steps
1. Add the first product on the page to the cart
2. Check out with the standard test card
3. The confirmation page shows an order number and the correct total

## Always true
- the cart badge shows the running item count
- prices never render as NaN or $0.00
- no error banners appear at any point

How it works

  1. 01 — Flows live in your repo as markdown: intent, cucumber-esque steps, and expectations — all plain language, compiled to Playwright on the fly and cached.
  2. 02 — On every PR, Reflow runs each flow against your preview URL in a real browser, on every declared viewport. Your model key drives recovery; you pay your provider, not a token markup.
  3. 03 — Green check with screenshots; visual differences arrive as a reviewable comparison; real breakage fails loud with the exact state that went wrong. Git stays the source of truth.