2022-05-04

Make your end-to-end tests fast

How reflow v4.14.0 cut average end-to-end sequence time by ~70% — parallel flows, conditional stability, culling explicit waits, worker-thread image comparison, and running tests once per release.

If your end-to-end tests are slow, you will either avoid running them or waste time waiting on them. A fast end-to-end suite is a productivity asset; a slow one quietly sets your release cadence.

Reflow v4.14.0 shipped today and reduces our average end-to-end sequence time by roughly 70%. This post explains how. Reflow replays recorded browser flows with Playwright under the hood, so every technique here applies whether or not you use it.

Embarrassingly parallel testing

A flow must run its user steps synchronously. A suite should run its independent flows in parallel. Reflow’s architecture embraces this — test composition with pipelines creates N servers for N user flows — but synchronous code creeps in anyway.

This code provides progressive real-time updates by executing an array of mutations sequentially:

for (const action of toUpdate) {
  await queryAll<ActionInContextModel, MutationUpdateActionInContextArgs>(client, {
    mutation: shallowUpdateActionInContext,
    variables: {
      input: action,
    },
  });
}

It rewrites trivially to run in parallel:

await Promise.all(toUpdate.map((action) =>
  queryAll<ActionInContextModel, MutationUpdateActionInContextArgs>(client, {
    mutation: shallowUpdateActionInContext,
    variables: {
      input: action,
    },
  }))
);

Reflow’s APIs run on AppSync resolvers over serverless DynamoDB, which scale with demand — we measured no downside to the added parallelism. v4.12.1 fixed this case; v4.14.0 adds a caching layer that cuts S3 pushes for screenshot images on top.

Conditional stability

Reflow learns about your application and uses that knowledge to keep replays stable — but stability checks cost time. v4.14.0 makes them conditional: applied when the evidence says they’re needed, skipped when not.

Before v4.14.0, this ran after every action — a load wait, a networkidle wait, and a screenshotstable wait:

private async pageStable(baselineAction): Promise<void> {
  try {
    await this.page.waitForLoadState('load', { timeout: 30000 });
  } catch (e) {
    logger.verbose(this.test.id, "waitForLoadState('load')", e);
  }
  try {
    await this.page.waitForLoadState('networkidle', { timeout: 5000 });
  } catch (e) {
    logger.verbose(this.test.id, "waitForLoadState('networkidle')", e);
  }

   await this.screenshotStable(baselineAction?.preStepScreenshot?.image);
}

load — all markup, stylesheets, scripts and static assets loaded.
networkidle — no network connections for at least 500ms.
screenshotStable — the page matches a baseline screenshot, or two consecutive screenshots match each other (the page stopped animating).

The trap: a page that is already stable but has background network chatter eats the full 5s networkidle timeout on every action. Reflow now waits for networkidle only when the action historically involved a navigation (or when a run opts into full stability checks).

The lesson generalizes: stability methods are not seasoning to sprinkle everywhere. Add them where an action needs one, and nowhere else.

No more explicit waits

The common fix for an unstable test is await wait(1000). This is a worse sin than over-applying stability methods — a stability wait exits early when its event fires; a sleep never does. At minimum, reach for waitForLoadState before reaching for wait.

Better: write a waitUntil clause for explicit page state. Reflow supports a raw wait action, but we advise a visual assertion instead — it waits (up to a configurable maximum) until an element matches a recorded baseline, then proceeds immediately. The suite spends exactly as long as the application needs, never longer.

We weren’t innocent here either. wait(X) statements had crept into reflow’s own hot paths — it’s tempting to ship a feature with a lazy sleep for stability. v4.14.0 culls all of them, replaced with dynamic wait times tuned to the application under test.

Move compute off the hot path

Reflow computes visual changes with an SSIM-weighted pixel diff over full-height page captures. That work is CPU-bound and slow — a 1080×10000px page can block the main thread for 2–3 seconds. It had become our largest performance bottleneck on big applications.

Some comparisons gate stability and must block. The ones that only provide visual feedback to the user don’t — they now return a promise resolved when someone actually needs the result:

private async screenshotStable(baselineScreenshot: S3ObjectInput | undefined): Promise<{
  diff: Promise<ComparisonModel | undefined>;
  current?: { image: S3ObjectInput };
}> {
/* ... */
}

Move compute into worker threads

Node.js is single-threaded by default, and for most workloads, distributing across processes beats managing threads. Image comparison is the exception: compute-heavy enough that we moved it into a worker thread, so realtime uploads of test progress proceed in parallel with the comparison instead of halting behind it.

We used the npm threads wrapper with esbuild: comparison code moved to a minimal-import imageCompare.worker.js, pre-compiled into a bundle, spawned as a blob worker, and called through the threads promise interface.

import fs from 'fs';
import { expose } from 'threads/worker';
import { isMainThread } from 'worker_threads';

/* ... */

const workerExports = {
  configureWorker,
  compareFiles,
  compareScreenshots,
};

if (!isMainThread) {
  expose(workerExports);
}
export type ImageCompareWorkerExports = typeof workerExports;

import { spawn, BlobWorker } from 'threads';

import type { ImageCompareWorkerExports } from './imageCompare.worker';
import { source as workerBlob } from '../../generated/imageCompare.workerSource';
import logger, { getLevel } from '../logger';

let worker: ImageCompareWorkerExports;

export async function bootImageCompareWorker() {
  try {
    worker = await spawn<ImageCompareWorkerExports>(BlobWorker.fromText(workerBlob));
    return worker.configureWorker(getLevel(process?.env?.LOG_LEVEL));
  } catch (e) {
    logger.fatal('Error starting worker', e);
  }
}

export async function compareFiles(imageA: string, imageB: string, outFile: string): Promise<void> {
  return worker.compareFiles(imageA, imageB, outFile);
}

export async function compareScreenshots(preData: Buffer, postData: Buffer, options): Promise<ScreenshotCompareOutput> {
  return worker.compareScreenshots(preData, postData, options);
}

Run end-to-end tests once per release

End-to-end tests exist to catch regressions, not heisenbugs. If a suite passed against a release, running it again against the same release buys nothing.

v4.14.0 takes a first step toward making this automatic: source-map-powered release tracking. Reflow hashes the source maps associated with a deployment (optionally filtered by a regular expression to application code) into a version identifier, so it can tell when your application actually changed — and therefore whether a sequence has already passed on the running release. It’s opt-in for now, while we work out how to download source maps without slowing test execution when a deployment exposes large numbers of them.

TL;DR

Cull every sleep statement; wait for specific events, and add stability methods only where needed.
Run independent flows in parallel.
Track releases and run each test once per release.

Caveat for reflow cloud users: self-host for maximum speed. Cloud recording uses AWS Fargate to spin up per-user ephemeral browser instances, which adds roughly a minute of cold start on first use.