mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-05-30 12:28:10 +00:00
* test(runtime): add Blockbuster runtime anchor for JsonlRunEventStore async IO #3084 offloaded `JsonlRunEventStore`'s file IO via `asyncio.to_thread` and added a mock-based offload assertion (`tests/test_jsonl_event_store_async_io.py`) that covers `put()` only. That guard is not part of the Blockbuster runtime gate (`tests/blocking_io/`) run by `backend-blocking-io-tests.yml`. Add a runtime anchor that drives the full async surface (`put`, `put_batch`, `list_messages`, `list_events`, `list_messages_by_run`, `count_messages`, `delete_by_run`, `delete_by_thread`) under the strict Blockbuster gate, so any blocking IO reintroduced on the event loop in any of these methods fails CI — not only removal of a specific `to_thread` call. Verified each offloaded method goes red when its offload is reverted. Test-only; no production change. * test(runtime): exercise list_events event_types filter branch Per review feedback: the anchor called list_events without event_types, so the filter branch never ran after _read_run_events' filesystem IO. Add a second list_events call with event_types=["message"] so the full read path -- including the filter branch -- executes under the gate.
152 lines
4.5 KiB
Markdown
152 lines
4.5 KiB
Markdown
# Blocking IO detection usage and maintenance
|
|
|
|
This document describes how to use and maintain DeerFlow backend blocking-IO
|
|
detection for async event-loop safety.
|
|
|
|
The goal is narrow: find and prevent synchronous IO from blocking backend
|
|
async event-loop paths. Static and runtime detection are complementary, but
|
|
they have different jobs.
|
|
|
|
## Static detector
|
|
|
|
The static detector is the discovery tool. It scans backend source code and
|
|
reports candidate blocking-IO call sites that may need human review.
|
|
|
|
Run it from the repository root:
|
|
|
|
```bash
|
|
make detect-blocking-io
|
|
```
|
|
|
|
Or from `backend/`:
|
|
|
|
```bash
|
|
make detect-blocking-io
|
|
```
|
|
|
|
The report is written to:
|
|
|
|
```text
|
|
.deer-flow/blocking-io-findings.json
|
|
```
|
|
|
|
Use this output for review and triage. A static finding is a candidate, not
|
|
proof that production blocks the event loop at runtime. The current static
|
|
rules are intentionally broad; prefer triaging existing output before adding
|
|
new static rules.
|
|
|
|
Add a static rule only when review finds a recurring high-risk blocking
|
|
pattern that is invisible to the current detector.
|
|
|
|
## Runtime detector
|
|
|
|
The runtime detector is the CI regression guard. It uses Blockbuster to fail a
|
|
focused test when code under `app.*` or `deerflow.*` performs blocking IO on
|
|
the asyncio event-loop thread.
|
|
|
|
Run it from `backend/`:
|
|
|
|
```bash
|
|
make test-blocking-io
|
|
```
|
|
|
|
The runtime gate starts from confirmed production bugs and protects those
|
|
paths from regressing. It does not prove that the entire backend is free of
|
|
blocking IO; it only covers the production paths exercised by
|
|
`backend/tests/blocking_io/`.
|
|
|
|
## Maintenance workflow
|
|
|
|
Use the static detector to find candidates, then use review to decide which
|
|
async production paths are worth protecting in CI.
|
|
|
|
The normal workflow is:
|
|
|
|
1. Run the static detector to find backend blocking-IO candidates.
|
|
2. Use human review to pick high-risk production async paths.
|
|
3. Add or update a focused runtime anchor in `backend/tests/blocking_io/`.
|
|
4. Let CI prevent that path from regressing.
|
|
|
|
Runtime detection has two maintenance paths.
|
|
|
|
### Add a runtime rule
|
|
|
|
Add a runtime rule when Blockbuster's default rules do not cover a generic
|
|
blocking primitive used by production code.
|
|
|
|
Rules belong in:
|
|
|
|
```text
|
|
backend/tests/support/detectors/blocking_io_runtime.py
|
|
```
|
|
|
|
Add them to `_PROJECT_BLOCKING_RULES`, not directly inside individual tests.
|
|
Keeping rules centralized makes it clear which extra primitives DeerFlow
|
|
expects Blockbuster to catch.
|
|
|
|
Example shape:
|
|
|
|
```python
|
|
import subprocess
|
|
|
|
from blockbuster import BlockBusterFunction
|
|
|
|
_PROJECT_BLOCKING_RULES = (
|
|
(
|
|
"subprocess.Popen.__init__",
|
|
BlockBusterFunction(
|
|
subprocess.Popen,
|
|
"__init__",
|
|
scanned_modules=["app", "deerflow"],
|
|
),
|
|
),
|
|
)
|
|
```
|
|
|
|
Do not add a runtime rule just because a business path is not tested. A rule
|
|
only expands what Blockbuster can intercept after code runs.
|
|
|
|
### Add a runtime anchor
|
|
|
|
Add a runtime anchor when a high-risk async production path should be protected
|
|
by CI but no existing `backend/tests/blocking_io/` test executes it.
|
|
|
|
Anchors belong in:
|
|
|
|
```text
|
|
backend/tests/blocking_io/
|
|
```
|
|
|
|
A good anchor should:
|
|
|
|
- Call the real production async entry point.
|
|
- Avoid bypassing the blocking surface with test-only `asyncio.to_thread`
|
|
wrappers.
|
|
- Use real local filesystem inputs when the bug shape is filesystem IO.
|
|
- Mock only the external dependency boundary, such as a network service or
|
|
third-party saver class.
|
|
- Fail if a future change moves the blocking operation back onto the event
|
|
loop.
|
|
|
|
Avoid testing only the low-level helper unless that helper is the production
|
|
async entry point. The runtime gate is most useful when it protects the caller
|
|
that production actually executes.
|
|
|
|
## Current runtime coverage
|
|
|
|
The runtime anchors protect confirmed blocking-IO bug shapes:
|
|
|
|
- SQLite checkpointer setup, including path resolution and parent-directory
|
|
creation.
|
|
- Subagent skill metadata loading through `SubagentExecutor._load_skills()`.
|
|
- `JsonlRunEventStore` async API (`put` / `list_*` / `delete_*`): the JSONL
|
|
run-event backend offloads its synchronous file IO via `asyncio.to_thread`
|
|
(fix #3084); this anchor drives the real async API under the gate so any
|
|
blocking IO reintroduced on the loop fails, not only removal of one
|
|
`to_thread` call.
|
|
- Gate health checks: Blockbuster catches unoffloaded calls, opt-out works, and
|
|
patches are restored after exceptions.
|
|
|
|
As static detection and review identify more high-risk async paths, add new
|
|
runtime anchors incrementally.
|