mirror of https://github.com/bytedance/deer-flow.git synced 2026-05-30 04:18:09 +00:00

test(runtime): add Blockbuster runtime anchor for JsonlRunEventStore async IO (#3313 )

* test(runtime): add Blockbuster runtime anchor for JsonlRunEventStore async IO

#3084 offloaded `JsonlRunEventStore`'s file IO via `asyncio.to_thread` and added
a mock-based offload assertion (`tests/test_jsonl_event_store_async_io.py`) that
covers `put()` only. That guard is not part of the Blockbuster runtime gate
(`tests/blocking_io/`) run by `backend-blocking-io-tests.yml`.

Add a runtime anchor that drives the full async surface (`put`, `put_batch`,
`list_messages`, `list_events`, `list_messages_by_run`, `count_messages`,
`delete_by_run`, `delete_by_thread`) under the strict Blockbuster gate, so any
blocking IO reintroduced on the event loop in any of these methods fails CI —
not only removal of a specific `to_thread` call. Verified each offloaded method
goes red when its offload is reverted. Test-only; no production change.

* test(runtime): exercise list_events event_types filter branch

Per review feedback: the anchor called list_events without event_types,
so the filter branch never ran after _read_run_events' filesystem IO.
Add a second list_events call with event_types=["message"] so the full
read path -- including the filter branch -- executes under the gate.

2026-05-29 23:02:41 +08:00

4.5 KiB

Raw Blame History

Blocking IO detection usage and maintenance

This document describes how to use and maintain DeerFlow backend blocking-IO detection for async event-loop safety.

The goal is narrow: find and prevent synchronous IO from blocking backend async event-loop paths. Static and runtime detection are complementary, but they have different jobs.

Static detector

The static detector is the discovery tool. It scans backend source code and reports candidate blocking-IO call sites that may need human review.

Run it from the repository root:

make detect-blocking-io

Or from backend/:

make detect-blocking-io

The report is written to:

.deer-flow/blocking-io-findings.json

Use this output for review and triage. A static finding is a candidate, not proof that production blocks the event loop at runtime. The current static rules are intentionally broad; prefer triaging existing output before adding new static rules.

Add a static rule only when review finds a recurring high-risk blocking pattern that is invisible to the current detector.

Runtime detector

The runtime detector is the CI regression guard. It uses Blockbuster to fail a focused test when code under app.* or deerflow.* performs blocking IO on the asyncio event-loop thread.

Run it from backend/:

make test-blocking-io

The runtime gate starts from confirmed production bugs and protects those paths from regressing. It does not prove that the entire backend is free of blocking IO; it only covers the production paths exercised by backend/tests/blocking_io/.

Maintenance workflow

Use the static detector to find candidates, then use review to decide which async production paths are worth protecting in CI.

The normal workflow is:

Run the static detector to find backend blocking-IO candidates.
Use human review to pick high-risk production async paths.
Add or update a focused runtime anchor in backend/tests/blocking_io/.
Let CI prevent that path from regressing.

Runtime detection has two maintenance paths.

Add a runtime rule

Add a runtime rule when Blockbuster's default rules do not cover a generic blocking primitive used by production code.

Rules belong in:

backend/tests/support/detectors/blocking_io_runtime.py

Add them to _PROJECT_BLOCKING_RULES, not directly inside individual tests. Keeping rules centralized makes it clear which extra primitives DeerFlow expects Blockbuster to catch.

Example shape:

import subprocess

from blockbuster import BlockBusterFunction

_PROJECT_BLOCKING_RULES = (
    (
        "subprocess.Popen.__init__",
        BlockBusterFunction(
            subprocess.Popen,
            "__init__",
            scanned_modules=["app", "deerflow"],
        ),
    ),
)

Do not add a runtime rule just because a business path is not tested. A rule only expands what Blockbuster can intercept after code runs.

Add a runtime anchor

Add a runtime anchor when a high-risk async production path should be protected by CI but no existing backend/tests/blocking_io/ test executes it.

Anchors belong in:

backend/tests/blocking_io/

A good anchor should:

Call the real production async entry point.
Avoid bypassing the blocking surface with test-only asyncio.to_thread wrappers.
Use real local filesystem inputs when the bug shape is filesystem IO.
Mock only the external dependency boundary, such as a network service or third-party saver class.
Fail if a future change moves the blocking operation back onto the event loop.

Avoid testing only the low-level helper unless that helper is the production async entry point. The runtime gate is most useful when it protects the caller that production actually executes.

Current runtime coverage

The runtime anchors protect confirmed blocking-IO bug shapes:

SQLite checkpointer setup, including path resolution and parent-directory creation.
Subagent skill metadata loading through SubagentExecutor._load_skills().
JsonlRunEventStore async API (put / list_* / delete_*): the JSONL run-event backend offloads its synchronous file IO via asyncio.to_thread (fix #3084); this anchor drives the real async API under the gate so any blocking IO reintroduced on the loop fails, not only removal of one to_thread call.
Gate health checks: Blockbuster catches unoffloaded calls, opt-out works, and patches are restored after exceptions.

As static detection and review identify more high-risk async paths, add new runtime anchors incrementally.

4.5 KiB Raw Blame History