Plugin API Test Suite
A Penpot plugin that is a launcher + runner for a battery of tests exercising the Penpot Plugin API against a live Penpot instance. It doubles as living documentation of what the public API actually does at runtime.
- A plain TypeScript + Vite Penpot plugin living in
plugins/apps/plugin-api-test-suite. - The UI (an iframe) lists auto-discovered tests and lets you run all / a subset / one. Each test shows green (pass) or red (fail, with the error message).
- It reports API coverage: which members of the public Plugin API the tests
exercised, measured against
libs/plugin-types/index.d.ts. - The same test files run both in the plugin UI and in a headless CI runner, so a test is never written twice.
This document is the context a developer (or agent) needs to add tests. Read it fully before writing any test.
The one rule that matters most
Always call the API through
ctx.penpot, never the globalpenpot.
ctx.penpot is a recording proxy. Calls made through it are what count towards
coverage and are correctly attributed to the right interface. Calls on the global
penpot still work but are invisible to coverage. Same for shapes: operate on the
objects returned by ctx.penpot.* (and on ctx.board), not on objects obtained
some other way.
Running and iterating
From plugins/:
- Dev server:
pnpm run start:plugin:api-test-suite(serves on port 4202). - In Penpot: open the Plugin Manager (Ctrl+Alt+P) and install
http://localhost:4202/manifest.json. - Hot-reloading tests: after editing a
*.test.ts, click Reload in the plugin UI. It fetches the freshly built test bundle and swaps in your changes — no need to close/reopen the plugin. (The dev server rebuilds the bundle on save.) - Adding a new test file: tests are discovered via
import.meta.globat build time, andvite build --watchdoes not reliably pick up a brand-new file (only edits to files already in its graph). After creating a new*.test.ts, restart the watch process (pnpm run watchorpnpm run init) and then click Reload (or reopen the plugin). Editing an existing test file does not need this. - The UI: tests are shown in collapsible groups (from
describe) with per-group passed/failed/total counts. Run with Run all, Run selected (per-test or per-group checkboxes), the per-group Run group, or the per-row Run button. Failures expand to show the error. The coverage panel shows the percentage, a progress bar, and per-interface get/set/call targets.
Running in CI
A headless runner executes the same tests against a live instance via Playwright:
E2E_LOGIN_EMAIL=… E2E_LOGIN_PASSWORD=… \
pnpm --filter plugin-api-test-suite run test:ci
- It builds
headless.js, logs in, creates a scratch file, injects the test bundle, and prints per-test results + the coverage report. - Exit code is non-zero iff any test failed (coverage does not affect it).
- Optional env:
PENPOT_BASE_URL(defaulthttps://localhost:3449). Against a local devenv with a self-signed certificate, prefix the command withNODE_TLS_REJECT_UNAUTHORIZED=0to avoid afetch failedTLS error. PRINT_UNCOVERED=1dumps the uncovered targets per interface;PRINT_STATIC=1dumps the statically-covered ones (see Coverage).
CI entry points reuse the exact same test files (src/ci/headless.ts discovers
them the same way the plugin does).
Mocked-backend mode
The same runner can run without a live instance — it serves the prebuilt
frontend via the frontend e2e static server and intercepts every backend RPC
with Playwright page.route, reusing the frontend e2e mock fixtures:
pnpm --filter plugin-api-test-suite run test:ci:mocked
(equivalently MOCK_BACKEND=1 … run test:ci). No login or backend is needed.
This validates the frontend Plugin API binding + in-memory store only, so it
can't faithfully reproduce results that depend on real backend behaviour
(validation, persistence, generated ids, …). Tests that need the real backend
opt out of this mode by tagging themselves skipIfMocked:
test.skipIfMocked('depends on backend validation', (ctx) => {
/* … */
});
// or a whole group:
describe.skipIfMocked('Backend-dependent', () => {
/* … */
});
Skipped tests are listed in the runner output. The wiring (fixtures, RPC mocks,
WebSocket mock) lives in ci/run-ci.ts; mocked-mode fidelity is its main
limitation, so prefer the live test:ci for anything backend-sensitive.
Anatomy of a test
Tests live in src/tests/*.test.ts and are auto-discovered (via
import.meta.glob) — just create a file matching that glob, no registration list
to update. A file registers one or more tests by calling test(name, fn).
import { expect } from '../framework/expect';
import { test } from '../framework/registry';
test('creates a rectangle', (ctx) => {
const rect = ctx.penpot.createRectangle();
ctx.board.appendChild(rect);
expect(rect.type).toBe('rectangle');
rect.name = 'sample-rect';
expect(rect.name).toBe('sample-rect');
});
Grouping tests
Wrap related tests in describe(groupName, fn) to group them. In the UI each group
is a collapsible section showing its own passed / failed / total counts, with a
"Run group" button and a select-all checkbox. Tests not inside any describe fall
into the General group.
import { expect } from '../framework/expect';
import { describe, test } from '../framework/registry';
describe('Shapes', () => {
test('creates a rectangle', (ctx) => {
/* … */
});
test('creates an ellipse', (ctx) => {
/* … */
});
});
describe blocks may be nested in a file. Nested names are joined into a single
group path with " / ", so the group reveals the file/area it lives in — e.g.
describe('Layout', () => describe('Flex', …)) produces the group Layout / Flex.
Wrap each file's tests in a top-level describe named after its area so every
group is recognizable. Several files may contribute to the same group path (they
merge in the UI). Prefer one clear group per feature area.
In the UI each group header shows an aggregate status dot rolled up from its tests: it turns purple while any test in the group is running, red if any failed, green only once every test passed, and grey until then.
The test context (ctx)
fn receives a TestContext (src/framework/types.ts):
ctx.penpot— the recording proxy over the realpenpotglobal. Use it for every API call.ctx.board— a fresh scratchBoardcreated for this test and removed automatically afterwards. Append shapes you create to it (ctx.board.appendChild(shape)) so the user's canvas is left clean. Do not rely on it persisting between tests.
The runner also resets shared state between tests: the selection is cleared and the
active page is restored to whatever was active when the run started (both through
the raw penpot, so they aren't credited toward coverage). A test that changes the
active page therefore won't leak into later tests.
Sync or async
fn may be void or Promise<void>; async tests are awaited. Use async (ctx) =>
and await when the API call is asynchronous (e.g. uploadMediaUrl,
library.availableLibraries(), token application — see notes below).
Naming
The test name becomes its id (slugified) and is shown in the UI. Keep names unique and descriptive; duplicates are de-duplicated automatically but that's confusing.
Assertions
Import expect from ../framework/expect. It is a small, dependency-free,
jest-like matcher set (it must stay dependency-free — it runs inside the SES
sandbox). Available matchers:
toBe(expected)—Object.isequalitytoEqual(expected)— deep structural equalitytoBeTruthy()/toBeFalsy()toBeNull()/toBeUndefined()/toBeDefined()toContain(item)— substring or array membershiptoHaveLength(n)toBeGreaterThan(n)/toBeLessThan(n)toBeCloseTo(n, numDigits?)— for floatstoThrow(expected?)—expectedis a substring orRegExpmatched against the error message; pass a function as the value:expect(() => …).toThrow('msg').notnegates any matcher:expect(x).not.toBeNull()
For asynchronous failures use expectReject(promiseOrThunk, expected?): toThrow
calls its argument synchronously, so it can't catch a rejected promise, whereas
expectReject awaits and asserts the rejection (string includes / RegExp on the
message).
A failing matcher throws; the runner turns that into a red test with the message.
You can also just throw new Error('…') to fail a test.
Do not add other assertion libraries. Anything imported here is bundled into the sandbox and must be SES-safe and dependency-free.
How coverage works (and how to write tests that move it)
Coverage is type-aware and tracks three separate targets per member:
name (get)— reading a property (const n = shape.name)name (set)— writing a property (shape.name = 'x')appendChild()— calling a method (credited only when actually called, not when merely referenced)
Implications when writing tests:
- A property has independent get/set targets. To cover both, read it and write
it. Read-only properties (declared
readonlyin the d.ts) only have a get target; methods only have a call target. - Accessing a member through a value you got from
ctx.penpotis what counts. Reaching a nested object also counts: e.g.ctx.board.children[0].typerecordsBoard.children (get)and then the element'stypeget, resolved to the concrete shape type at runtime. - Coverage accumulates across a run. Running all tests aggregates every test's accesses. Running a single test shows only that test's accesses.
Recorded vs. effective coverage
The report distinguishes three states per target:
- Covered (recorded) — credited by the recording proxy (green).
- Statically covered — exercised behaviourally by the tests but the proxy
structurally cannot credit it (shown in a distinct colour). These come from a
curated allowlist in
src/framework/static-coverage.ts, keyed byInterface.member#mode. See Coverage notes for which members and why. - Uncovered — neither.
The header shows two numbers: the recorded percentage (what the proxy actually credited) and the effective percentage (recorded + statically covered). Recorded coverage always wins, so listing a target in the static allowlist that turns out to be recorded is harmless — it simply never shows as static. Coverage is report-only; it never fails a run or the build.
The denominator comes from src/generated/api-surface.json, generated from
libs/plugin-types/index.d.ts. If the Plugin API types change, regenerate it:
pnpm --filter plugin-api-test-suite run gen:api
Runtime details you need to know
- Shape
typevalues returned at runtime:Board→'board',Rectangle→'rectangle',Ellipse→'ellipse', plus'text','path','group','image','svg-raw'. (createRectangle().type === 'rectangle'.) createText(str)returnsText | null— guard the result (if (text) { … }).width/heightare read-only; useresize(w, h).x/yare writable.- The plugin manifest already requests broad permissions (
content:*,library:*,user:read,comment:*,allow:downloads,allow:localstorage), so most of the API is callable from tests without changes. - The runner sets
throwValidationErrors = trueandnaturalChildOrdering = true, so invalid API usage throws (surfacing as a red test) andchildrenis always in z-index order. - The runtime is SES-sandboxed: no Node APIs, no DOM, no extra npm deps inside
tests. Stick to the Plugin API,
expect, and plain JS.
Coverage notes
The suite covers a large majority of the type surface. The remaining members are
uncovered or only statically covered for the reasons below — not missing
tests. Note these notes can drift as the API is fixed: when in doubt, write the
test asserting the documented correct behaviour and run test:ci to see what
actually happens.
Exercised behaviourally but not creditable by the recorder (statically covered)
Listed in src/framework/static-coverage.ts:
ContextTypesUtils.*andContextGeometryUtils.center—penpot.utils.typesandpenpot.utils.geometryare frozen (SES) data properties, so the recording proxy must return them raw and cannot wrap their members. Both are exercised behaviourally inplatform.test.ts.ColorShapeInfo.shapesInfo,ColorShapeInfoEntry.*—shapesColors()has an unresolved return type in the generated surface (type: null), so the recorder hands the result back raw and can't attribute nested access. Exercised incolors.test.ts. (Alternatively, resolving the return type intools/gen-api-surface.tswould make these genuinely recorded.)EventsMap.*— a type map, not a runtime object.on/offare credited onPenpot, never asEventsMapmembers. The deterministic events (selectionchange,shapechange) are exercised inevents.test.ts.ShapeBase.fills— every concrete shape redeclaresfills, so accesses are attributed to the concrete type (Rectangle.fills, …); the base-interface target is never the attribution.LibraryVariantComponent.*— the recorder types a component asLibraryComponentand can't narrow toLibraryVariantComponentvia theisVariant()type-guard. The behaviour is exercised viaVariantContainer.variantsinvariants.test.ts.
Read-only at runtime
Members that have no setter in the runtime binding (frontend/src/app/plugins/*.cljs)
are now marked readonly in the Plugin API d.ts (Font.*, FontVariant.*,
FontsContext.all, Image/Ellipse/SvgRaw.type, File.name/pages/revn, Page.root,
TokenTheme.activeSets, Variants.properties, ImageData.*, and the board guide
value objects GuideColumn/GuideRow/GuideSquare and their params — board.guides
returns a formatted snapshot, so guides are reconfigured by reassigning the whole
array, not by mutating a returned guide), the Point/Bounds value objects, the
Penpot.ui/Penpot.utils subcontexts, and the derived Boolean path data
(d/content/commands are computed from the operands — a Boolean isn't editable
like a Path). They therefore have only a (get) target and need no runtime
assertion — the type system enforces the contract.
Members that do have a runtime setter stay writable, even when the setter
rejects some inputs (that's input validation, not read-only-ness): Board.children
(assigning a reordered array reorders the children), Path.d/content/commands
(editing the path), and FileVersion.label (relabels the version).
Excluded from coverage
tools/gen-api-surface.ts drops two categories from the denominator so they never
count:
@deprecatedinterfaces and members — the legacyImageshape interface (images live in aFillviafillImage),Color.refId/refFile, and theBoolean/PathtoD()/contentpath accessors.- Members removed by the public interface via
Omit—Contextis the internal interface and the publicPenpotisOmit<Context, 'addListener' | 'removeListener'>(those are superseded byon/off). The generator honors theOmit, soContext.addListener/removeListeneraren't reachable surface and don't count.
Red tests pinning confirmed API bugs
When a member is confirmed broken, add a test that asserts its correct behaviour
and comment it as blocked-by-bug; it stays red until the API is fixed and then turns
green (at which point drop the "API bug" framing). There are currently no such red
tests — e.g. the fontFamilies token resolvedValue bug (it used to leak the raw
tokenscript structure instead of string[]) has since been fixed.
d.ts / runtime mismatches
strokeStyle: 'none' is listed in the d.ts but rejected at runtime ("Value not
valid"); fills-strokes.test.ts pins this with a toThrow.
External state / not reachable headless
ActiveUser.position/zoom— needs a second collaborator in the file.LibrarySummary.*,LibraryContext.connectLibrary— need a published shared library.FileVersion.restore,Penpot.closePlugin,Penpot.ui,Context.openViewer— tear down or navigate away from the running plugin/workspace.FileVersion.pin— only converts a system autosave to a permanent version; a plugin can only create manual versions (saveVersion), sopin()always rejects.Context.addListener/removeListener— omitted from thepenpotglobal (Omit<Context, 'addListener' | 'removeListener'>), so unreachable viapenpot.EventsMapeventspagechange/filechange/themechange/contentsave/finish— can't be triggered deterministically in the headless runner.
Checklist before finishing
- Test file is
src/tests/<name>.test.tsand usestest(...)+expect, ideally wrapped in adescribe('<Group>', …). - All API calls go through
ctx.penpot; shapes are appended toctx.board. - Created shapes don't leak (rely on the scratch board cleanup; don't touch the user's existing content).
- Lint/format/typecheck pass:
pnpm --filter plugin-api-test-suite run lintand, fromplugins/,pnpm exec prettier --check "apps/plugin-api-test-suite/**/*.{ts,css,json}". - If you relied on new API members,
gen:apiwas re-run so coverage reflects them.
Where things live (for deeper changes)
src/framework/registry.ts—test(),describe(),getTests(),setTests()(reload).src/framework/runner.ts— runs tests, scratch board lifecycle, per-test state reset, coverage.src/framework/coverage.ts— the recording proxy + coverage computation.src/framework/static-coverage.ts— the statically-covered allowlist.src/framework/expect.ts— the assertion library.src/framework/types.ts—TestContext,TestResult,CoverageReport, etc.tools/gen-api-surface.ts— generatessrc/generated/api-surface.json.src/plugin.ts(sandbox),src/ui.ts(iframe),src/model.ts(messages).src/ci/headless.ts+ci/run-ci.ts— CI path.
Writing tests should only ever require touching src/tests/.