deer-flow/backend/tests/test_setup_agent_http_e2e_real_server.py
Xinmin Zeng 68d8caec1f
fix(agents): make update_agent honor runtime.context user_id like setup_agent (#2867)
* fix(agents): make update_agent honor runtime.context user_id like setup_agent

PR #2784 hardened setup_agent to prefer runtime.context["user_id"] (set by
inject_authenticated_user_context from the auth-validated request) over the
contextvar, so an agent created during the bootstrap flow always lands under
users/<auth_uid>/agents/<name>. update_agent was left calling
get_effective_user_id() unconditionally — the same class of bug that produced
issues #2782 / #2862 still applies whenever the contextvar is not available
on the executing task (background work, future cross-process drivers,
checkpoint resume on a different task). In that regime update_agent silently
routes writes to users/default/agents/<name>, corrupting the shared default
bucket and losing the user's edit.

Extract the resolution policy into a shared resolve_runtime_user_id helper
on deerflow.runtime.user_context and route both setup_agent and update_agent
through it so the two halves of the lifecycle stay in lockstep.

Add load-bearing end-to-end tests that drive a real langchain.agents
create_agent graph with a fake LLM, exercising the full pipeline:

  HTTP wire format
    -> app.gateway.services.start_run config-assembly
    -> deerflow.runtime.runs.worker._build_runtime_context
    -> langchain.agents create_agent graph
    -> ToolNode dispatch (sync + async + sub-graph + ContextThreadPoolExecutor)
    -> setup_agent / update_agent

The negative-control tests intentionally land in users/default/ to prove the
positive tests are actually load-bearing rather than vacuously passing.

The new test_update_agent_e2e_user_isolation suite included a test that
failed against main and now passes after this fix.

* style: ruff format on new e2e tests

* test(e2e): real-server HTTP test driving setup_agent through the full ASGI stack

Adds tests/test_setup_agent_http_e2e_real_server.py — a single load-bearing
test that drives the entire FastAPI gateway through starlette.testclient.
TestClient with no mocks above the LLM:

  - lifespan boots (config, sqlite engine, LangGraph runtime, channels)
  - POST /api/v1/auth/register (real password hash, real sqlite write,
    issues access_token + csrf_token cookies)
  - POST /api/threads (real thread_meta + checkpoint creation)
  - POST /api/threads/{id}/runs/stream with the exact wire shape the React
    frontend sends (assistant_id + input + config + context with
    agent_name/is_bootstrap)
  - AuthMiddleware -> CSRFMiddleware -> require_permission ->
    start_run -> inject_authenticated_user_context ->
    asyncio.create_task(run_agent) -> worker._build_runtime_context ->
    Runtime injection -> ToolNode dispatch -> real setup_agent
  - Asserts SOUL.md is under users/<authenticated_uid>/agents/<name>/
    and NOT under users/default/agents/<name>/.

DEER_FLOW_HOME and the sqlite path are redirected into tmp_path so the test
never touches the real .deer-flow directory or developer database. The only
patch above the LLM boundary is replacing create_chat_model with a fake that
emits a single setup_agent tool_call.

This is the "真实验证" answer: it reproduces what curl-against-uvicorn would
do, minus the network socket layer.

* test: address Copilot review on user-isolation e2e tests

- Drop "currently expected to FAIL" wording from update_agent e2e docstring
  and header (Copilot review): the fix is in this PR, the test pins the
  corrected behaviour rather than driving a future change.
- Rephrase the assertion failure messages from "BUG:" to "REGRESSION:" to
  match the test's role on the fixed branch.
- Bound _drain_stream with a wall-clock timeout, a max-bytes cap, and an
  early break on the "event: end" SSE frame (Copilot review). Stops the
  test from hanging on a stuck run or runaway heartbeat loop.
- Replace the misleading "patch both module aliases" comment with an
  explanation of why patching lead_agent.agent.create_chat_model is the
  only correct target (Copilot review): lead_agent rebinds the symbol
  into its own namespace at import time, so patching deerflow.models is
  too late.

* test(refactor): address WillemJiang review on user-isolation e2e tests

- Extract the duplicated FakeToolCallingModel (and a
  build_single_tool_call_model helper) into tests/_agent_e2e_helpers.py.
  All three e2e files now import from the shared module instead of
  redefining the shim locally.
- Convert the manual p.start() / p.stop() try/finally blocks in
  test_update_agent_e2e_user_isolation.py to contextlib.ExitStack so
  patch lifecycle is Pythonic and exception-safe.
- Lift the isolated_app fixture's private-attribute resets into a
  named _reset_process_singletons helper with a comment block
  explaining why each singleton has to be invalidated for true e2e
  isolation, and why raising=False is intentional. Makes the
  fragility visible and the intent self-documenting rather than
  leaving the resets inline as opaque monkeypatch calls.

Net change: -59 lines (143 -> 84) across the three test files, with
every assertion intact. Full suite remains 69 passed / lint clean.

* test(e2e): make real-server test self-supply its config

CI's actions/checkout only ships config.example.yaml (the real config.yaml
is gitignored), so the production config-discovery search
(./config.yaml -> ../config.yaml -> $DEER_FLOW_CONFIG_PATH) finds nothing
and the test fails at lifespan boot with FileNotFoundError. The dev-machine
run passed only because a local config.yaml happened to exist.

Write a minimal AppConfig-valid yaml into tmp_path and pin
DEER_FLOW_CONFIG_PATH to it. The yaml carries just what the schema requires
(a single fake-test-model entry, LocalSandboxProvider, sqlite database).
The LLM never gets instantiated because the test patches create_chat_model
on the lead agent module, so the api_key/base_url stay placeholders.

Verified by hiding the local config.yaml to mirror the CI checkout — the
test now passes in both environments.
2026-05-12 23:18:54 +08:00

327 lines
14 KiB
Python

"""Real HTTP end-to-end verification for issue #2862's setup_agent path.
This test drives the **entire** FastAPI gateway through ``starlette.testclient.TestClient``:
starlette.testclient.TestClient (real ASGI stack)
-> AuthMiddleware (real cookie parsing, real JWT decode)
-> /api/v1/auth/register endpoint (real password hash + sqlite write)
-> /api/threads/{id}/runs/stream endpoint (real start_run config-assembly)
-> background asyncio.create_task(run_agent) (real worker, real Runtime)
-> langchain.agents.create_agent graph (real, with fake LLM)
-> ToolNode dispatch (real)
-> setup_agent tool (real file I/O)
The only mock is the LLM (no API key needed). Every layer that participates
in ``user_id`` propagation — auth, ContextVar, ``inject_authenticated_user_context``,
``worker._build_runtime_context``, ``Runtime.merge`` — is the real production
code path. If the chain is broken at any layer, this test fails.
This is what "真实验证" looks like for a server that lives behind authentication:
register a user, log in (cookie), POST to /runs/stream, wait for the run to
finish, then read the filesystem.
"""
from __future__ import annotations
from pathlib import Path
from typing import Any
from unittest.mock import patch
import pytest
from _agent_e2e_helpers import FakeToolCallingModel, build_single_tool_call_model
def _build_fake_create_chat_model(agent_name: str):
"""Return a callable matching the real ``create_chat_model`` signature.
Whenever the lead agent constructs a chat model during the bootstrap flow,
we hand it a fake that emits a single setup_agent tool_call on its first
turn, then a benign final answer on its second turn.
"""
def fake_create_chat_model(*args: Any, **kwargs: Any) -> FakeToolCallingModel:
return build_single_tool_call_model(
tool_name="setup_agent",
tool_args={
"soul": f"# Real HTTP E2E SOUL for {agent_name}",
"description": "real-http-e2e agent",
},
tool_call_id="call_real_http_1",
final_text=f"Agent {agent_name} created via real HTTP e2e.",
)
return fake_create_chat_model
@pytest.fixture
def isolated_deer_flow_home(tmp_path: Path, monkeypatch: pytest.MonkeyPatch):
"""Stand up an isolated DeerFlow data root + config under tmp_path.
- Sets ``DEER_FLOW_HOME`` so paths land under tmp_path, not the real
``.deer-flow`` directory.
- Stages a copy of the project's ``config.yaml`` (or ``config.example.yaml``
on a fresh CI checkout where ``config.yaml`` is gitignored) and pins
``DEER_FLOW_CONFIG_PATH`` to it, so lifespan boot doesn't depend on the
developer's local config layout.
- Sets a placeholder OPENAI_API_KEY because the config has
``$OPENAI_API_KEY`` that gets resolved at parse time; the LLM itself is
mocked, so any non-empty value works.
"""
home = tmp_path / "deer-flow-home"
home.mkdir()
monkeypatch.setenv("DEER_FLOW_HOME", str(home))
monkeypatch.setenv("OPENAI_API_KEY", "sk-fake-key-not-used-because-llm-is-mocked")
monkeypatch.setenv("OPENAI_API_BASE", "https://example.invalid")
# Hermetic config: do not depend on whether the dev machine has a real
# ``config.yaml`` at the repo root. CI's ``actions/checkout`` only ships
# ``config.example.yaml`` (and its ``models:`` list is commented out, so
# AppConfig validation would reject it). Write a minimal, self-sufficient
# config to tmp_path and pin ``DEER_FLOW_CONFIG_PATH`` to it.
staged_config = tmp_path / "config.yaml"
staged_config.write_text(_MINIMAL_CONFIG_YAML, encoding="utf-8")
monkeypatch.setenv("DEER_FLOW_CONFIG_PATH", str(staged_config))
return home
# Minimal config that satisfies AppConfig + LeadAgent's _resolve_model_name.
# The model `use` path must resolve to a real class for config parsing to
# succeed; the test patches ``create_chat_model`` on the lead agent module,
# so the model is never actually instantiated. SandboxConfig.use is required
# at schema level; LocalSandboxProvider is the only sandbox that runs without
# Docker.
_MINIMAL_CONFIG_YAML = """\
log_level: info
models:
- name: fake-test-model
display_name: Fake Test Model
use: langchain_openai:ChatOpenAI
model: gpt-4o-mini
api_key: $OPENAI_API_KEY
base_url: $OPENAI_API_BASE
sandbox:
use: deerflow.sandbox.local:LocalSandboxProvider
agents_api:
enabled: true
database:
backend: sqlite
"""
def _reset_process_singletons(monkeypatch: pytest.MonkeyPatch) -> None:
"""Reset every process-wide cache that would survive across tests.
This fixture stands up a full FastAPI app + sqlite DB + LangGraph runtime
inside ``tmp_path``. To get true per-test isolation we have to invalidate
a handful of module-level caches that production normally never resets,
so they pick up our test-only ``DEER_FLOW_HOME`` and sqlite path:
- ``deerflow.config.app_config`` caches the parsed ``config.yaml``.
- ``deerflow.config.paths`` caches the ``Paths`` singleton derived from
``DEER_FLOW_HOME`` at first access.
- ``deerflow.persistence.engine`` caches the SQLAlchemy engine and
session factory after the first call to ``init_engine_from_config``.
``raising=False`` keeps the fixture resilient if upstream renames or
drops one of these attributes — the test will simply skip that reset
instead of failing with a confusing AttributeError, and the next test
to call ``get_app_config()``/``get_paths()`` will surface the real
incompatibility loudly.
"""
from deerflow.config import app_config as app_config_module
from deerflow.config import paths as paths_module
from deerflow.persistence import engine as engine_module
for module, attr in (
(app_config_module, "_app_config"),
(app_config_module, "_app_config_path"),
(app_config_module, "_app_config_mtime"),
(paths_module, "_paths_singleton"),
(engine_module, "_engine"),
(engine_module, "_session_factory"),
):
monkeypatch.setattr(module, attr, None, raising=False)
@pytest.fixture
def isolated_app(isolated_deer_flow_home: Path, monkeypatch: pytest.MonkeyPatch):
"""Build a fresh FastAPI app inside a clean DEER_FLOW_HOME.
Each test gets its own sqlite DB and checkpoint store under ``tmp_path``,
with no cross-test contamination.
"""
_reset_process_singletons(monkeypatch)
# Re-resolve the config from the test-only DEER_FLOW_HOME and pin its
# sqlite path into tmp_path so the lifespan-time engine init lands there.
from deerflow.config import app_config as app_config_module
cfg = app_config_module.get_app_config()
cfg.database.sqlite_dir = str(isolated_deer_flow_home / "db")
from app.gateway.app import create_app
return create_app()
def _drain_stream(response, *, timeout: float = 30.0, max_bytes: int = 4 * 1024 * 1024) -> str:
"""Consume an SSE response body until the run terminates and return the text.
Bounded to keep the test fail-fast:
- Stops as soon as an ``event: end`` SSE frame is observed (the gateway
sends this when the background run finishes — see ``services.format_sse``
and ``StreamBridge.publish_end``).
- Stops at ``timeout`` seconds wall-clock so a stuck run / runaway heartbeat
loop surfaces a real failure instead of hanging pytest.
- Stops at ``max_bytes`` so a runaway producer can't OOM the test process.
"""
import time as _time
deadline = _time.monotonic() + timeout
body = b""
for chunk in response.iter_bytes():
body += chunk
if b"event: end" in body:
break
if len(body) >= max_bytes:
break
if _time.monotonic() >= deadline:
break
return body.decode("utf-8", errors="replace")
def _wait_for_file(path: Path, *, timeout: float = 10.0) -> bool:
"""Block until *path* exists or *timeout* elapses.
The run completes inside ``asyncio.create_task`` after start_run returns,
so the test must wait for the background task to flush its writes.
"""
import time as _time
deadline = _time.monotonic() + timeout
while _time.monotonic() < deadline:
if path.exists():
return True
_time.sleep(0.05)
return False
@pytest.mark.no_auto_user
def test_real_http_create_agent_lands_in_authenticated_user_dir(
isolated_app: Any,
isolated_deer_flow_home: Path,
monkeypatch: pytest.MonkeyPatch,
):
"""The full real-server contract test.
1. Register a real user via POST /api/v1/auth/register (also auto-logs in)
2. POST to /api/threads/{tid}/runs/stream with the **exact** body shape the
frontend (LangGraph SDK) sends during the bootstrap flow.
3. Wait for the background run to finish.
4. Assert SOUL.md exists under users/<authenticated_uid>/agents/<name>/.
5. Assert NOTHING exists under users/default/agents/<name>/.
"""
# ``deerflow.agents.lead_agent.agent`` imports ``create_chat_model`` with
# ``from deerflow.models import create_chat_model`` at module load time,
# rebinding the symbol into its own namespace. So the only patch that
# intercepts the call is the bound name on ``lead_agent.agent`` — patching
# ``deerflow.models.create_chat_model`` would be too late.
agent_name = "real-http-agent"
from starlette.testclient import TestClient
with (
patch(
"deerflow.agents.lead_agent.agent.create_chat_model",
new=_build_fake_create_chat_model(agent_name),
),
TestClient(isolated_app) as client,
):
# --- 1. Register & auto-login ---
register = client.post(
"/api/v1/auth/register",
json={"email": "e2e-user@example.com", "password": "very-strong-password-123"},
)
assert register.status_code == 201, register.text
registered = register.json()
auth_uid = registered["id"]
# The endpoint sets both access_token (auth) and csrf_token (CSRF Double
# Submit Cookie) cookies; the TestClient cookie jar propagates them.
assert client.cookies.get("access_token"), "register endpoint must set session cookie"
csrf_token = client.cookies.get("csrf_token")
assert csrf_token, "register endpoint must set csrf_token cookie"
# --- 2. Create a thread (require_existing=True on /runs/stream means
# we must call POST /api/threads first; the React frontend does the
# same via the LangGraph SDK's threads.create) ---
import uuid as _uuid
thread_id = str(_uuid.uuid4())
created = client.post(
"/api/threads",
json={"thread_id": thread_id, "metadata": {}},
headers={"X-CSRF-Token": csrf_token},
)
assert created.status_code == 200, created.text
# --- 3. POST /runs/stream with the bootstrap wire format ---
# This is the EXACT shape the React frontend sends after PR #2784:
# thread.submit(input, {config, context}) ->
# POST /api/threads/{id}/runs/stream body =
# {assistant_id, input, config, context}
body = {
"assistant_id": "lead_agent",
"input": {
"messages": [
{
"role": "user",
"content": (f"The new custom agent name is {agent_name}. Help me design its SOUL.md before saving it."),
}
]
},
"config": {"recursion_limit": 50},
"context": {
"agent_name": agent_name,
"is_bootstrap": True,
"mode": "flash",
"thinking_enabled": False,
"is_plan_mode": False,
"subagent_enabled": False,
},
"stream_mode": ["values"],
}
# The /stream endpoint returns SSE; we drain it so the server-side
# background task (run_agent) gets to completion before we look at disk.
with client.stream(
"POST",
f"/api/threads/{thread_id}/runs/stream",
json=body,
headers={"X-CSRF-Token": csrf_token},
) as resp:
assert resp.status_code == 200, resp.read().decode()
transcript = _drain_stream(resp)
# Sanity: the stream should have produced at least one event
assert "event:" in transcript, f"no SSE events in response: {transcript[:500]!r}"
# --- 4. Verify filesystem outcome ---
expected_dir = isolated_deer_flow_home / "users" / auth_uid / "agents" / agent_name
default_dir = isolated_deer_flow_home / "users" / "default" / "agents" / agent_name
# The setup_agent tool runs inside the background asyncio task spawned
# by start_run; SSE-drain typically waits for it, but we add a bounded
# poll to be robust against scheduler jitter.
assert _wait_for_file(expected_dir / "SOUL.md", timeout=15.0), (
"SOUL.md did not appear under users/<auth_uid>/agents/. "
f"Expected: {expected_dir / 'SOUL.md'}. "
f"tmp tree: {sorted(str(p.relative_to(isolated_deer_flow_home)) for p in isolated_deer_flow_home.rglob('SOUL.md'))}. "
f"SSE transcript tail: {transcript[-1000:]!r}"
)
soul_text = (expected_dir / "SOUL.md").read_text()
assert agent_name in soul_text, f"unexpected SOUL content: {soul_text!r}"
# The smoking-gun assertion: the agent must NOT have landed in default/
assert not default_dir.exists(), f"REGRESSION: agent landed under users/default/{agent_name} instead of the authenticated user. Default-dir contents: {list(default_dir.rglob('*')) if default_dir.exists() else 'n/a'}"