From 8330b244a9e337e7ccf3b60393f6452ddfb01ae3 Mon Sep 17 00:00:00 2001 From: AochenShen99 Date: Thu, 28 May 2026 18:26:26 +0800 Subject: [PATCH] docs: add blocking IO detection usage and maintenance (#3233) * docs: add blocking IO detection usage and maintenance * docs: address blocking io doc review feedback --- backend/docs/BLOCKING_IO_DETECTION.md | 146 ++++++++++++++++++++++++++ 1 file changed, 146 insertions(+) create mode 100644 backend/docs/BLOCKING_IO_DETECTION.md diff --git a/backend/docs/BLOCKING_IO_DETECTION.md b/backend/docs/BLOCKING_IO_DETECTION.md new file mode 100644 index 000000000..8cfc49ada --- /dev/null +++ b/backend/docs/BLOCKING_IO_DETECTION.md @@ -0,0 +1,146 @@ +# Blocking IO detection usage and maintenance + +This document describes how to use and maintain DeerFlow backend blocking-IO +detection for async event-loop safety. + +The goal is narrow: find and prevent synchronous IO from blocking backend +async event-loop paths. Static and runtime detection are complementary, but +they have different jobs. + +## Static detector + +The static detector is the discovery tool. It scans backend source code and +reports candidate blocking-IO call sites that may need human review. + +Run it from the repository root: + +```bash +make detect-blocking-io +``` + +Or from `backend/`: + +```bash +make detect-blocking-io +``` + +The report is written to: + +```text +.deer-flow/blocking-io-findings.json +``` + +Use this output for review and triage. A static finding is a candidate, not +proof that production blocks the event loop at runtime. The current static +rules are intentionally broad; prefer triaging existing output before adding +new static rules. + +Add a static rule only when review finds a recurring high-risk blocking +pattern that is invisible to the current detector. + +## Runtime detector + +The runtime detector is the CI regression guard. It uses Blockbuster to fail a +focused test when code under `app.*` or `deerflow.*` performs blocking IO on +the asyncio event-loop thread. + +Run it from `backend/`: + +```bash +make test-blocking-io +``` + +The runtime gate starts from confirmed production bugs and protects those +paths from regressing. It does not prove that the entire backend is free of +blocking IO; it only covers the production paths exercised by +`backend/tests/blocking_io/`. + +## Maintenance workflow + +Use the static detector to find candidates, then use review to decide which +async production paths are worth protecting in CI. + +The normal workflow is: + +1. Run the static detector to find backend blocking-IO candidates. +2. Use human review to pick high-risk production async paths. +3. Add or update a focused runtime anchor in `backend/tests/blocking_io/`. +4. Let CI prevent that path from regressing. + +Runtime detection has two maintenance paths. + +### Add a runtime rule + +Add a runtime rule when Blockbuster's default rules do not cover a generic +blocking primitive used by production code. + +Rules belong in: + +```text +backend/tests/support/detectors/blocking_io_runtime.py +``` + +Add them to `_PROJECT_BLOCKING_RULES`, not directly inside individual tests. +Keeping rules centralized makes it clear which extra primitives DeerFlow +expects Blockbuster to catch. + +Example shape: + +```python +import subprocess + +from blockbuster import BlockBusterFunction + +_PROJECT_BLOCKING_RULES = ( + ( + "subprocess.Popen.__init__", + BlockBusterFunction( + subprocess.Popen, + "__init__", + scanned_modules=["app", "deerflow"], + ), + ), +) +``` + +Do not add a runtime rule just because a business path is not tested. A rule +only expands what Blockbuster can intercept after code runs. + +### Add a runtime anchor + +Add a runtime anchor when a high-risk async production path should be protected +by CI but no existing `backend/tests/blocking_io/` test executes it. + +Anchors belong in: + +```text +backend/tests/blocking_io/ +``` + +A good anchor should: + +- Call the real production async entry point. +- Avoid bypassing the blocking surface with test-only `asyncio.to_thread` + wrappers. +- Use real local filesystem inputs when the bug shape is filesystem IO. +- Mock only the external dependency boundary, such as a network service or + third-party saver class. +- Fail if a future change moves the blocking operation back onto the event + loop. + +Avoid testing only the low-level helper unless that helper is the production +async entry point. The runtime gate is most useful when it protects the caller +that production actually executes. + +## Current runtime coverage + +The initial runtime anchors protect confirmed blocking-IO bug shapes: + +- SQLite checkpointer setup, including path resolution and parent-directory + creation. +- Subagent skill metadata loading through `SubagentExecutor._load_skills()`. +- Gate health checks: Blockbuster catches unoffloaded calls, opt-out works, and + patches are restored after exceptions. + +As static detection and review identify more high-risk async paths, add new +runtime anchors incrementally.