deer-flow/backend/docs/BLOCKING_IO_DETECTION.md
AochenShen99 052b1e2102
test(runtime): add Blockbuster runtime anchor for JsonlRunEventStore async IO (#3313)
* test(runtime): add Blockbuster runtime anchor for JsonlRunEventStore async IO

#3084 offloaded `JsonlRunEventStore`'s file IO via `asyncio.to_thread` and added
a mock-based offload assertion (`tests/test_jsonl_event_store_async_io.py`) that
covers `put()` only. That guard is not part of the Blockbuster runtime gate
(`tests/blocking_io/`) run by `backend-blocking-io-tests.yml`.

Add a runtime anchor that drives the full async surface (`put`, `put_batch`,
`list_messages`, `list_events`, `list_messages_by_run`, `count_messages`,
`delete_by_run`, `delete_by_thread`) under the strict Blockbuster gate, so any
blocking IO reintroduced on the event loop in any of these methods fails CI —
not only removal of a specific `to_thread` call. Verified each offloaded method
goes red when its offload is reverted. Test-only; no production change.

* test(runtime): exercise list_events event_types filter branch

Per review feedback: the anchor called list_events without event_types,
so the filter branch never ran after _read_run_events' filesystem IO.
Add a second list_events call with event_types=["message"] so the full
read path -- including the filter branch -- executes under the gate.
2026-05-29 23:02:41 +08:00

152 lines
4.5 KiB
Markdown

# Blocking IO detection usage and maintenance
This document describes how to use and maintain DeerFlow backend blocking-IO
detection for async event-loop safety.
The goal is narrow: find and prevent synchronous IO from blocking backend
async event-loop paths. Static and runtime detection are complementary, but
they have different jobs.
## Static detector
The static detector is the discovery tool. It scans backend source code and
reports candidate blocking-IO call sites that may need human review.
Run it from the repository root:
```bash
make detect-blocking-io
```
Or from `backend/`:
```bash
make detect-blocking-io
```
The report is written to:
```text
.deer-flow/blocking-io-findings.json
```
Use this output for review and triage. A static finding is a candidate, not
proof that production blocks the event loop at runtime. The current static
rules are intentionally broad; prefer triaging existing output before adding
new static rules.
Add a static rule only when review finds a recurring high-risk blocking
pattern that is invisible to the current detector.
## Runtime detector
The runtime detector is the CI regression guard. It uses Blockbuster to fail a
focused test when code under `app.*` or `deerflow.*` performs blocking IO on
the asyncio event-loop thread.
Run it from `backend/`:
```bash
make test-blocking-io
```
The runtime gate starts from confirmed production bugs and protects those
paths from regressing. It does not prove that the entire backend is free of
blocking IO; it only covers the production paths exercised by
`backend/tests/blocking_io/`.
## Maintenance workflow
Use the static detector to find candidates, then use review to decide which
async production paths are worth protecting in CI.
The normal workflow is:
1. Run the static detector to find backend blocking-IO candidates.
2. Use human review to pick high-risk production async paths.
3. Add or update a focused runtime anchor in `backend/tests/blocking_io/`.
4. Let CI prevent that path from regressing.
Runtime detection has two maintenance paths.
### Add a runtime rule
Add a runtime rule when Blockbuster's default rules do not cover a generic
blocking primitive used by production code.
Rules belong in:
```text
backend/tests/support/detectors/blocking_io_runtime.py
```
Add them to `_PROJECT_BLOCKING_RULES`, not directly inside individual tests.
Keeping rules centralized makes it clear which extra primitives DeerFlow
expects Blockbuster to catch.
Example shape:
```python
import subprocess
from blockbuster import BlockBusterFunction
_PROJECT_BLOCKING_RULES = (
(
"subprocess.Popen.__init__",
BlockBusterFunction(
subprocess.Popen,
"__init__",
scanned_modules=["app", "deerflow"],
),
),
)
```
Do not add a runtime rule just because a business path is not tested. A rule
only expands what Blockbuster can intercept after code runs.
### Add a runtime anchor
Add a runtime anchor when a high-risk async production path should be protected
by CI but no existing `backend/tests/blocking_io/` test executes it.
Anchors belong in:
```text
backend/tests/blocking_io/
```
A good anchor should:
- Call the real production async entry point.
- Avoid bypassing the blocking surface with test-only `asyncio.to_thread`
wrappers.
- Use real local filesystem inputs when the bug shape is filesystem IO.
- Mock only the external dependency boundary, such as a network service or
third-party saver class.
- Fail if a future change moves the blocking operation back onto the event
loop.
Avoid testing only the low-level helper unless that helper is the production
async entry point. The runtime gate is most useful when it protects the caller
that production actually executes.
## Current runtime coverage
The runtime anchors protect confirmed blocking-IO bug shapes:
- SQLite checkpointer setup, including path resolution and parent-directory
creation.
- Subagent skill metadata loading through `SubagentExecutor._load_skills()`.
- `JsonlRunEventStore` async API (`put` / `list_*` / `delete_*`): the JSONL
run-event backend offloads its synchronous file IO via `asyncio.to_thread`
(fix #3084); this anchor drives the real async API under the gate so any
blocking IO reintroduced on the loop fails, not only removal of one
`to_thread` call.
- Gate health checks: Blockbuster catches unoffloaded calls, opt-out works, and
patches are restored after exceptions.
As static detection and review identify more high-risk async paths, add new
runtime anchors incrementally.