mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-09 17:12:01 +00:00
* fix(subagent): structured subagent_status field over text parsing Closes #3146. ## Why The frontend used to derive subtask card state by string-matching the leading text of the `task` tool's result. That contract surface was fragile — `#3107` BUG-007 and the `#3131` review both surfaced cases where new backend wording (`Task cancelled by user.`, `Task polling timed out after N minutes`, `ToolErrorHandlingMiddleware` exception wrappers) silently broke the card lifecycle. The frontend fallback kept growing more prefixes; any future rewording would break it again. ## Design 1. **Backend → frontend contract**: `ToolMessage.additional_kwargs` carries `subagent_status` (one of `completed | failed | cancelled | timed_out | polling_timed_out`) and an optional `subagent_error` blob. The frontend prefers it over parsing `content`. 2. **Centralised stamping, not 8 sprinkled stamps**: rather than have each of `task_tool.py`'s 5 normal-return + 3 pre-execution `Error:` paths remember to set `additional_kwargs`, `ToolErrorHandlingMiddleware` stamps the field after every task-tool call. Adding a new return path in `task_tool.py` cannot now skip the stamp. 3. **Cross-language contract fixture**: the prefix→status mapping is the one piece both sides must agree on. The shared fixture at `contracts/subagent_status_contract.json` lists every backend return string, the expected status, and what the error substring should contain. Backend test (`backend/tests/test_subagent_status_contract.py`) and frontend test (`frontend/tests/unit/core/tasks/subtask-result.test.ts`) both load that fixture and assert the same cases. A wording drift on either side fails the matching language's test. 4. **Round-trip serialisation pinned**: the round-trip test asserts `ToolMessage.model_dump_json()` → `model_validate_json()` preserves `additional_kwargs.subagent_status`. Catches the case where a future LangChain or Pydantic upgrade silently strips unknown kwargs. 5. **Frontend status collapse documented**: the backend has five status values, the frontend card has three (`completed | failed | in_progress`). `cancelled` / `timed_out` / `polling_timed_out` all collapse to `failed` with the original status preserved in `error`. `parseSubtaskResult` returns `in_progress` for unknown values so a backend that ships a new enum variant before the frontend upgrades degrades to the legacy prefix fallback instead of getting pinned. ## Changes Backend: - `deerflow.subagents.status_contract` — new module exporting `SUBAGENT_STATUS_KEY`, `SUBAGENT_ERROR_KEY`, `SUBAGENT_STATUS_VALUES`, `extract_subagent_status(content)`, and `make_subagent_additional_kwargs(status, error)`. - `ToolErrorHandlingMiddleware`: new `_stamp_task_subagent_status` helper centralises the stamp; `wrap_tool_call` / `awrap_tool_call` stamp on the success path; `_build_error_message` stamps on the wrapper path (carrying `ExcClass: detail` into `subagent_error`). Non-task tools are untouched. - New tests: `test_subagent_status_contract.py` (19 cases from the shared fixture + status-enum / blank-error / unknown-status rejection) and `test_tool_error_handling_subagent_stamp.py` (middleware integration: terminal-content stamps, non-terminal doesn't, non-task tools untouched, async path mirrors sync, existing additional_kwargs survive, JSON round-trip preserved). Frontend: - `parseSubtaskResult(text, additionalKwargs?)` — prefers the structured stamp; falls back to the legacy prefix matcher for historical threads / unknown future status values. - `STRUCTURED_STATUS_TO_SUBTASK` documents the five→three collapse. - `message-list.tsx` passes `message.additional_kwargs` through. - `subtask-result.test.ts` adds a structured-status block + a fixture-driven contract block; legacy prefix tests stay green for the fallback path. Contract: - `contracts/subagent_status_contract.json` — single source of truth both languages load. Whitespace variants, varied N for polling timeouts, the 3 pre-execution `Error:` returns task_tool produces, and the middleware wrapper shape are all in there. ## Test plan - `make lint` clean (backend + frontend). - `pytest tests/test_subagent_status_contract.py tests/test_tool_error_handling_subagent_stamp.py` → 37 passed. - `pnpm test --run` → 103 passed (was 76, +27 new). ## Migration / fallback retirement The text-prefix fallback stays in place until backend telemetry shows the frontend never hits it for newly produced messages. At that point a follow-up PR can drop the prefix branches and keep only the structured-status branch. Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131 (prior prefix-only fix), #3146 (this issue). * fix(subtask): back-fill result/error from text when structured status present Three follow-ups on the PR #3154 review: 1. `readStructuredStatus` no longer short-circuits the prefix parse. The backend currently stamps only the `subagent_status` enum value; the human-facing `result` body and wrapped-error message still live in `ToolMessage.content`. Dropping the text parse meant successful tasks rendered empty completed pills and wrapped failures lost their diagnostic. Now both shapes get composed: structured status wins, `result`/`error` come from text when both sides agree, and a lying success body under a `failed` stamp is dropped instead of leaking. 2. Replace the ESM-incompatible `__dirname` fixture lookup in subtask-result.test.ts with `fileURLToPath(new URL(..., import.meta.url))`. The frontend package is `"type": "module"`, so the previous path would have thrown at runtime if anything ever changed under the contract directory. 3. Drop the `$schema` reference from contracts/subagent_status_contract.json pointing at a file that doesn't exist in the tree. Three new tests cover the structured + text composition: completed back-fills the success body, failed back-fills the wrapper text, and unrecognised content under a `failed` stamp stays empty rather than echoing noise.
152 lines
5.6 KiB
Python
152 lines
5.6 KiB
Python
"""Regression tests for ToolErrorHandlingMiddleware's subagent status stamp.
|
|
|
|
Bytedance/deer-flow issue #3146: rather than stamp
|
|
``ToolMessage.additional_kwargs.subagent_status`` from each of
|
|
task_tool.py's 5 normal returns + 3 pre-execution Error: returns (which
|
|
would be 8 separate places to drift over time), the middleware that
|
|
already wraps every tool call does the stamping in one place. These
|
|
tests pin that centralisation.
|
|
|
|
For non-``task`` tools the middleware must not touch additional_kwargs
|
|
— other tools have their own conventions and we do not want to leak a
|
|
``subagent_status`` field onto them.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import asyncio
|
|
import json
|
|
from pathlib import Path
|
|
|
|
import pytest
|
|
from langchain_core.messages import ToolMessage
|
|
|
|
from deerflow.agents.middlewares.tool_error_handling_middleware import (
|
|
ToolErrorHandlingMiddleware,
|
|
)
|
|
from deerflow.subagents.status_contract import (
|
|
SUBAGENT_ERROR_KEY,
|
|
SUBAGENT_STATUS_KEY,
|
|
)
|
|
|
|
_CONTRACT_PATH = Path(__file__).resolve().parents[2] / "contracts" / "subagent_status_contract.json"
|
|
|
|
|
|
def _load_terminal_cases() -> list[dict]:
|
|
"""Load only the cases that should produce a terminal status stamp."""
|
|
data = json.loads(_CONTRACT_PATH.read_text(encoding="utf-8"))
|
|
return [c for c in data["cases"] if c["expected_status"] is not None]
|
|
|
|
|
|
class _FakeRequest:
|
|
"""Stand-in for ``ToolCallRequest`` used by the middleware."""
|
|
|
|
def __init__(self, tool_name: str, tool_call_id: str = "call-1") -> None:
|
|
self.tool_call = {"name": tool_name, "id": tool_call_id}
|
|
|
|
|
|
@pytest.mark.parametrize("case", _load_terminal_cases(), ids=lambda c: c["name"])
|
|
def test_stamps_subagent_status_on_successful_task_return(case):
|
|
"""Every terminal task tool result string stamps the matching status."""
|
|
middleware = ToolErrorHandlingMiddleware()
|
|
request = _FakeRequest("task")
|
|
|
|
def handler(_req):
|
|
return ToolMessage(content=case["content"], tool_call_id="call-1", name="task")
|
|
|
|
result = middleware.wrap_tool_call(request, handler)
|
|
assert isinstance(result, ToolMessage)
|
|
assert result.additional_kwargs.get(SUBAGENT_STATUS_KEY) == case["expected_status"]
|
|
|
|
|
|
def test_does_not_stamp_unknown_streaming_chunk():
|
|
"""Non-terminal content leaves additional_kwargs alone."""
|
|
middleware = ToolErrorHandlingMiddleware()
|
|
request = _FakeRequest("task")
|
|
|
|
def handler(_req):
|
|
return ToolMessage(content="Investigating ...", tool_call_id="call-1", name="task")
|
|
|
|
result = middleware.wrap_tool_call(request, handler)
|
|
assert SUBAGENT_STATUS_KEY not in (result.additional_kwargs or {})
|
|
|
|
|
|
def test_does_not_stamp_non_task_tool():
|
|
"""A non-task tool returning a string that happens to start with
|
|
``Error:`` must not pick up a subagent stamp."""
|
|
middleware = ToolErrorHandlingMiddleware()
|
|
request = _FakeRequest("bash")
|
|
|
|
def handler(_req):
|
|
return ToolMessage(content="Error: command not found", tool_call_id="call-1", name="bash")
|
|
|
|
result = middleware.wrap_tool_call(request, handler)
|
|
assert SUBAGENT_STATUS_KEY not in (result.additional_kwargs or {})
|
|
|
|
|
|
def test_stamps_failed_when_task_tool_raises():
|
|
"""The exception path goes through ``_build_error_message`` which is
|
|
the only place ToolErrorHandlingMiddleware ever emits a brand-new
|
|
ToolMessage. It must stamp ``failed`` for task too, since the wrapper
|
|
text starts with ``Error:``.
|
|
"""
|
|
middleware = ToolErrorHandlingMiddleware()
|
|
request = _FakeRequest("task")
|
|
|
|
def handler(_req):
|
|
raise RuntimeError("blew up during execution")
|
|
|
|
result = middleware.wrap_tool_call(request, handler)
|
|
assert isinstance(result, ToolMessage)
|
|
assert result.additional_kwargs.get(SUBAGENT_STATUS_KEY) == "failed"
|
|
assert "RuntimeError" in result.additional_kwargs.get(SUBAGENT_ERROR_KEY, "")
|
|
|
|
|
|
def test_async_wrap_also_stamps():
|
|
"""The async wrap path must behave identically."""
|
|
middleware = ToolErrorHandlingMiddleware()
|
|
request = _FakeRequest("task")
|
|
|
|
async def handler(_req):
|
|
return ToolMessage(content="Task Succeeded. Result: ok", tool_call_id="call-1", name="task")
|
|
|
|
result = asyncio.run(middleware.awrap_tool_call(request, handler))
|
|
assert result.additional_kwargs.get(SUBAGENT_STATUS_KEY) == "completed"
|
|
|
|
|
|
def test_preserves_existing_additional_kwargs():
|
|
"""The stamper must not clobber unrelated fields the tool already set."""
|
|
middleware = ToolErrorHandlingMiddleware()
|
|
request = _FakeRequest("task")
|
|
|
|
def handler(_req):
|
|
return ToolMessage(
|
|
content="Task Succeeded. Result: ok",
|
|
tool_call_id="call-1",
|
|
name="task",
|
|
additional_kwargs={"existing_field": "must_survive"},
|
|
)
|
|
|
|
result = middleware.wrap_tool_call(request, handler)
|
|
assert result.additional_kwargs.get("existing_field") == "must_survive"
|
|
assert result.additional_kwargs.get(SUBAGENT_STATUS_KEY) == "completed"
|
|
|
|
|
|
def test_additional_kwargs_round_trip_via_json():
|
|
"""Pydantic dump → JSON → restore must keep the stamp intact.
|
|
|
|
``ToolMessage`` is what LangGraph serialises into the checkpoint and
|
|
what the frontend deserialises off the stream. If a future Pydantic /
|
|
LangChain upgrade silently strips unknown ``additional_kwargs`` we
|
|
want that to fail loudly here rather than in the wild.
|
|
"""
|
|
msg = ToolMessage(
|
|
content="Task Succeeded. Result: ok",
|
|
tool_call_id="call-1",
|
|
name="task",
|
|
additional_kwargs={SUBAGENT_STATUS_KEY: "completed", SUBAGENT_ERROR_KEY: ""},
|
|
)
|
|
serialised = msg.model_dump_json()
|
|
restored = ToolMessage.model_validate_json(serialised)
|
|
assert restored.additional_kwargs.get(SUBAGENT_STATUS_KEY) == "completed"
|