From 68d8caec1f6b543fa7936d8a0c382f33726e00b0 Mon Sep 17 00:00:00 2001
From: Xinmin Zeng <135568692+fancyboi999@users.noreply.github.com>
Date: Tue, 12 May 2026 23:18:54 +0800
Subject: [PATCH] fix(agents): make update_agent honor runtime.context user_id
 like setup_agent (#2867)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* fix(agents): make update_agent honor runtime.context user_id like setup_agent

PR #2784 hardened setup_agent to prefer runtime.context["user_id"] (set by
inject_authenticated_user_context from the auth-validated request) over the
contextvar, so an agent created during the bootstrap flow always lands under
users/<auth_uid>/agents/<name>. update_agent was left calling
get_effective_user_id() unconditionally — the same class of bug that produced
issues #2782 / #2862 still applies whenever the contextvar is not available
on the executing task (background work, future cross-process drivers,
checkpoint resume on a different task). In that regime update_agent silently
routes writes to users/default/agents/<name>, corrupting the shared default
bucket and losing the user's edit.

Extract the resolution policy into a shared resolve_runtime_user_id helper
on deerflow.runtime.user_context and route both setup_agent and update_agent
through it so the two halves of the lifecycle stay in lockstep.

Add load-bearing end-to-end tests that drive a real langchain.agents
create_agent graph with a fake LLM, exercising the full pipeline:

  HTTP wire format
    -> app.gateway.services.start_run config-assembly
    -> deerflow.runtime.runs.worker._build_runtime_context
    -> langchain.agents create_agent graph
    -> ToolNode dispatch (sync + async + sub-graph + ContextThreadPoolExecutor)
    -> setup_agent / update_agent

The negative-control tests intentionally land in users/default/ to prove the
positive tests are actually load-bearing rather than vacuously passing.

The new test_update_agent_e2e_user_isolation suite included a test that
failed against main and now passes after this fix.

* style: ruff format on new e2e tests

* test(e2e): real-server HTTP test driving setup_agent through the full ASGI stack

Adds tests/test_setup_agent_http_e2e_real_server.py — a single load-bearing
test that drives the entire FastAPI gateway through starlette.testclient.
TestClient with no mocks above the LLM:

  - lifespan boots (config, sqlite engine, LangGraph runtime, channels)
  - POST /api/v1/auth/register (real password hash, real sqlite write,
    issues access_token + csrf_token cookies)
  - POST /api/threads (real thread_meta + checkpoint creation)
  - POST /api/threads/{id}/runs/stream with the exact wire shape the React
    frontend sends (assistant_id + input + config + context with
    agent_name/is_bootstrap)
  - AuthMiddleware -> CSRFMiddleware -> require_permission ->
    start_run -> inject_authenticated_user_context ->
    asyncio.create_task(run_agent) -> worker._build_runtime_context ->
    Runtime injection -> ToolNode dispatch -> real setup_agent
  - Asserts SOUL.md is under users/<authenticated_uid>/agents/<name>/
    and NOT under users/default/agents/<name>/.

DEER_FLOW_HOME and the sqlite path are redirected into tmp_path so the test
never touches the real .deer-flow directory or developer database. The only
patch above the LLM boundary is replacing create_chat_model with a fake that
emits a single setup_agent tool_call.

This is the "真实验证" answer: it reproduces what curl-against-uvicorn would
do, minus the network socket layer.

* test: address Copilot review on user-isolation e2e tests

- Drop "currently expected to FAIL" wording from update_agent e2e docstring
  and header (Copilot review): the fix is in this PR, the test pins the
  corrected behaviour rather than driving a future change.
- Rephrase the assertion failure messages from "BUG:" to "REGRESSION:" to
  match the test's role on the fixed branch.
- Bound _drain_stream with a wall-clock timeout, a max-bytes cap, and an
  early break on the "event: end" SSE frame (Copilot review). Stops the
  test from hanging on a stuck run or runaway heartbeat loop.
- Replace the misleading "patch both module aliases" comment with an
  explanation of why patching lead_agent.agent.create_chat_model is the
  only correct target (Copilot review): lead_agent rebinds the symbol
  into its own namespace at import time, so patching deerflow.models is
  too late.

* test(refactor): address WillemJiang review on user-isolation e2e tests

- Extract the duplicated FakeToolCallingModel (and a
  build_single_tool_call_model helper) into tests/_agent_e2e_helpers.py.
  All three e2e files now import from the shared module instead of
  redefining the shim locally.
- Convert the manual p.start() / p.stop() try/finally blocks in
  test_update_agent_e2e_user_isolation.py to contextlib.ExitStack so
  patch lifecycle is Pythonic and exception-safe.
- Lift the isolated_app fixture's private-attribute resets into a
  named _reset_process_singletons helper with a comment block
  explaining why each singleton has to be invalidated for true e2e
  isolation, and why raising=False is intentional. Makes the
  fragility visible and the intent self-documenting rather than
  leaving the resets inline as opaque monkeypatch calls.

Net change: -59 lines (143 -> 84) across the three test files, with
every assertion intact. Full suite remains 69 passed / lint clean.

* test(e2e): make real-server test self-supply its config

CI's actions/checkout only ships config.example.yaml (the real config.yaml
is gitignored), so the production config-discovery search
(./config.yaml -> ../config.yaml -> $DEER_FLOW_CONFIG_PATH) finds nothing
and the test fails at lifespan boot with FileNotFoundError. The dev-machine
run passed only because a local config.yaml happened to exist.

Write a minimal AppConfig-valid yaml into tmp_path and pin
DEER_FLOW_CONFIG_PATH to it. The yaml carries just what the schema requires
(a single fake-test-model entry, LocalSandboxProvider, sqlite database).
The LLM never gets instantiated because the test patches create_chat_model
on the lead agent module, so the api_key/base_url stay placeholders.

Verified by hiding the local config.yaml to mirror the CI checkout — the
test now passes in both environments.
---
 .../harness/deerflow/runtime/user_context.py  |  28 ++
 .../tools/builtins/setup_agent_tool.py        |  11 +-
 .../tools/builtins/update_agent_tool.py       |  12 +-
 backend/tests/_agent_e2e_helpers.py           |  68 +++
 .../test_setup_agent_e2e_user_isolation.py    | 429 ++++++++++++++++++
 .../test_setup_agent_http_e2e_real_server.py  | 326 +++++++++++++
 .../test_update_agent_e2e_user_isolation.py   | 253 +++++++++++
 7 files changed, 1114 insertions(+), 13 deletions(-)
 create mode 100644 backend/tests/_agent_e2e_helpers.py
 create mode 100644 backend/tests/test_setup_agent_e2e_user_isolation.py
 create mode 100644 backend/tests/test_setup_agent_http_e2e_real_server.py
 create mode 100644 backend/tests/test_update_agent_e2e_user_isolation.py
diff --git a/backend/packages/harness/deerflow/runtime/user_context.py b/backend/packages/harness/deerflow/runtime/user_context.py
index ffe4be690..cfbb68c94 100644
--- a/backend/packages/harness/deerflow/runtime/user_context.py
+++ b/backend/packages/harness/deerflow/runtime/user_context.py
@@ -109,6 +109,34 @@ def get_effective_user_id() -> str:
     return str(user.id)
 
 
+def resolve_runtime_user_id(runtime: object | None) -> str:
+    """Single source of truth for a tool/middleware's effective user_id.
+
+    Resolution order (most authoritative first):
+      1. ``runtime.context["user_id"]`` — set by ``inject_authenticated_user_context``
+         in the gateway from the auth-validated ``request.state.user``. This is
+         the only source that survives boundaries where the contextvar may have
+         been lost (background tasks scheduled outside the request task,
+         worker pools that don't copy_context, future cross-process drivers).
+      2. The ``_current_user`` ContextVar — set by the auth middleware at
+         request entry. Reliable for in-task work; copied by ``asyncio``
+         child tasks and by ``ContextThreadPoolExecutor``.
+      3. ``DEFAULT_USER_ID`` — last-resort fallback so unauthenticated
+         CLI / migration / test paths keep working without raising.
+
+    Tools that persist user-scoped state (custom agents, memory, uploads)
+    MUST call this instead of ``get_effective_user_id()`` directly so they
+    benefit from the runtime.context channel that ``setup_agent`` already
+    relies on.
+    """
+    context = getattr(runtime, "context", None)
+    if isinstance(context, dict):
+        ctx_user_id = context.get("user_id")
+        if ctx_user_id:
+            return str(ctx_user_id)
+    return get_effective_user_id()
+
+
 # ---------------------------------------------------------------------------
 # Sentinel-based user_id resolution
 # ---------------------------------------------------------------------------
diff --git a/backend/packages/harness/deerflow/tools/builtins/setup_agent_tool.py b/backend/packages/harness/deerflow/tools/builtins/setup_agent_tool.py
index 2f796b005..dfbcf8b6e 100644
--- a/backend/packages/harness/deerflow/tools/builtins/setup_agent_tool.py
+++ b/backend/packages/harness/deerflow/tools/builtins/setup_agent_tool.py
@@ -7,19 +7,12 @@ from langgraph.types import Command
 
 from deerflow.config.agents_config import validate_agent_name
 from deerflow.config.paths import get_paths
-from deerflow.runtime.user_context import get_effective_user_id
+from deerflow.runtime.user_context import resolve_runtime_user_id
 from deerflow.tools.types import Runtime
 
 logger = logging.getLogger(__name__)
 
 
-def _get_runtime_user_id(runtime: Runtime) -> str:
-    context_user_id = runtime.context.get("user_id") if runtime.context else None
-    if context_user_id:
-        return str(context_user_id)
-    return get_effective_user_id()
-
-
 @tool(parse_docstring=True)
 def setup_agent(
     soul: str,
@@ -45,7 +38,7 @@ def setup_agent(
         if agent_name:
             # Custom agents are persisted under the current user's bucket so
             # different users do not see each other's agents.
-            user_id = _get_runtime_user_id(runtime)
+            user_id = resolve_runtime_user_id(runtime)
             agent_dir = paths.user_agent_dir(user_id, agent_name)
         else:
             # Default agent (no agent_name): SOUL.md lives at the global base dir.
diff --git a/backend/packages/harness/deerflow/tools/builtins/update_agent_tool.py b/backend/packages/harness/deerflow/tools/builtins/update_agent_tool.py
index b2dc8ca72..18500a248 100644
--- a/backend/packages/harness/deerflow/tools/builtins/update_agent_tool.py
+++ b/backend/packages/harness/deerflow/tools/builtins/update_agent_tool.py
@@ -27,7 +27,7 @@ from langgraph.types import Command
 from deerflow.config.agents_config import load_agent_config, validate_agent_name
 from deerflow.config.app_config import get_app_config
 from deerflow.config.paths import get_paths
-from deerflow.runtime.user_context import get_effective_user_id
+from deerflow.runtime.user_context import resolve_runtime_user_id
 from deerflow.tools.types import Runtime
 
 logger = logging.getLogger(__name__)
@@ -118,9 +118,13 @@ def update_agent(
         return _err("update_agent is only available inside a custom agent's chat. There is no agent_name in the current runtime context, so there is nothing to update. If you are inside the bootstrap flow, use setup_agent instead.")
 
     # Resolve the active user so that updates only affect this user's agent.
-    # ``get_effective_user_id`` returns DEFAULT_USER_ID when no auth context
-    # is set (matching how memory and thread storage behave).
-    user_id = get_effective_user_id()
+    # ``resolve_runtime_user_id`` prefers ``runtime.context["user_id"]`` (set by
+    # the gateway from the auth-validated request) and falls back to the
+    # contextvar, then DEFAULT_USER_ID. This matches setup_agent so a user
+    # creating an agent and later refining it always touches the same files,
+    # even if the contextvar gets lost across an async/thread boundary
+    # (issue #2782 / #2862 class of bugs).
+    user_id = resolve_runtime_user_id(runtime)
 
     # Reject an unknown ``model`` *before* touching the filesystem. Otherwise
     # ``_resolve_model_name`` silently falls back to the default at runtime
diff --git a/backend/tests/_agent_e2e_helpers.py b/backend/tests/_agent_e2e_helpers.py
new file mode 100644
index 000000000..2f28390a9
--- /dev/null
+++ b/backend/tests/_agent_e2e_helpers.py
@@ -0,0 +1,68 @@
+"""Shared helpers for user-isolation e2e tests on the custom-agent tooling.
+
+Centralises the small fake-LLM shim and a few test-data builders that the
+three e2e files in this PR (``test_setup_agent_e2e_user_isolation``,
+``test_update_agent_e2e_user_isolation``, ``test_setup_agent_http_e2e_real_server``)
+all need. The shim is what lets a real ``langchain.agents.create_agent``
+graph run without an API key — every other layer in those tests is real
+production code, which is the entire point of the test design.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+from langchain_core.language_models.fake_chat_models import FakeMessagesListChatModel
+from langchain_core.messages import AIMessage
+from langchain_core.runnables import Runnable
+
+
+class FakeToolCallingModel(FakeMessagesListChatModel):
+    """FakeMessagesListChatModel plus a no-op ``bind_tools`` for create_agent.
+
+    ``langchain.agents.create_agent`` calls ``model.bind_tools(...)`` to
+    expose the tool schemas to the model; the upstream fake raises
+    ``NotImplementedError`` there. We just return ``self`` because we
+    drive deterministic tool_call output via ``responses=...``, no schema
+    handling needed.
+    """
+
+    def bind_tools(  # type: ignore[override]
+        self,
+        tools: Any,
+        *,
+        tool_choice: Any = None,
+        **kwargs: Any,
+    ) -> Runnable:
+        return self
+
+
+def build_single_tool_call_model(
+    *,
+    tool_name: str,
+    tool_args: dict[str, Any],
+    tool_call_id: str = "call_e2e_1",
+    final_text: str = "done",
+) -> FakeToolCallingModel:
+    """Build a fake model that emits exactly one tool_call then finishes.
+
+    Two-turn behaviour, identical across our e2e tests:
+      turn 1 → AIMessage with a single tool_call for *tool_name*
+      turn 2 → AIMessage with *final_text* (terminates the agent loop)
+    """
+    return FakeToolCallingModel(
+        responses=[
+            AIMessage(
+                content="",
+                tool_calls=[
+                    {
+                        "name": tool_name,
+                        "args": tool_args,
+                        "id": tool_call_id,
+                        "type": "tool_call",
+                    }
+                ],
+            ),
+            AIMessage(content=final_text),
+        ]
+    )
diff --git a/backend/tests/test_setup_agent_e2e_user_isolation.py b/backend/tests/test_setup_agent_e2e_user_isolation.py
new file mode 100644
index 000000000..034d4da84
--- /dev/null
+++ b/backend/tests/test_setup_agent_e2e_user_isolation.py
@@ -0,0 +1,429 @@
+"""End-to-end verification for issue #2862 (and the regression of #2782).
+
+Goal: prove — without trusting any single layer's claim — that an authenticated
+user creating a custom agent through the real ``setup_agent`` tool, driven by a
+real LangGraph ``create_agent`` graph, ends up with files under
+``users/<auth_uid>/agents/<name>`` and **not** under ``users/default/agents/...``.
+
+We intentionally exercise the full pipeline:
+
+    HTTP body shape (mimics LangGraph SDK wire format)
+      -> app.gateway.services.start_run config-assembly chain
+      -> deerflow.runtime.runs.worker._build_runtime_context
+      -> langchain.agents.create_agent graph
+      -> ToolNode dispatch
+      -> setup_agent tool
+
+The only thing we mock is the LLM (FakeMessagesListChatModel) — every layer
+that handles ``user_id`` is the real production code path. If the
+``user_id`` propagation is broken anywhere in this chain, these tests will
+fail.
+
+These tests intentionally ``no_auto_user`` so that the ``contextvar``
+fallback would put files into ``default/`` if propagation breaks.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from types import SimpleNamespace
+from unittest.mock import patch
+from uuid import UUID
+
+import pytest
+from _agent_e2e_helpers import FakeToolCallingModel
+from langchain_core.messages import AIMessage, HumanMessage
+
+from app.gateway.services import (
+    build_run_config,
+    inject_authenticated_user_context,
+    merge_run_context_overrides,
+)
+from deerflow.runtime.runs.worker import _build_runtime_context, _install_runtime_context
+
+# ---------------------------------------------------------------------------
+# Helpers — real production code paths
+# ---------------------------------------------------------------------------
+
+
+def _make_request(user_id_str: str | None) -> SimpleNamespace:
+    """Build a fake FastAPI Request that carries an authenticated user."""
+    if user_id_str is None:
+        user = None
+    else:
+        # User.id is UUID in production; honour that
+        user = SimpleNamespace(id=UUID(user_id_str), email="alice@local")
+    return SimpleNamespace(state=SimpleNamespace(user=user))
+
+
+def _assemble_config(
+    *,
+    body_config: dict | None,
+    body_context: dict | None,
+    request_user_id: str | None,
+    thread_id: str = "thread-e2e",
+    assistant_id: str = "lead_agent",
+) -> dict:
+    """Replay the **exact** start_run config-assembly sequence."""
+    config = build_run_config(thread_id, body_config, None, assistant_id=assistant_id)
+    merge_run_context_overrides(config, body_context)
+    inject_authenticated_user_context(config, _make_request(request_user_id))
+    return config
+
+
+def _make_paths_mock(tmp_path: Path):
+    """Mirror the production paths.user_agent_dir signature."""
+    from unittest.mock import MagicMock
+
+    paths = MagicMock()
+    paths.base_dir = tmp_path
+    paths.agent_dir = lambda name: tmp_path / "agents" / name
+    paths.user_agent_dir = lambda user_id, name: tmp_path / "users" / user_id / "agents" / name
+    return paths
+
+
+# ---------------------------------------------------------------------------
+# L1-L3: HTTP wire format → start_run → worker._build_runtime_context
+# ---------------------------------------------------------------------------
+
+
+class TestConfigAssembly:
+    """Covers L1-L3: validate that user_id reaches runtime_ctx for every wire shape."""
+
+    def test_typical_wire_format_user_id_in_runtime_ctx(self):
+        """Real frontend: body.config={recursion_limit}, body.context={agent_name,...}."""
+        config = _assemble_config(
+            body_config={"recursion_limit": 1000},
+            body_context={"agent_name": "myagent", "is_bootstrap": True, "mode": "flash"},
+            request_user_id="11111111-2222-3333-4444-555555555555",
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert runtime_ctx["user_id"] == "11111111-2222-3333-4444-555555555555"
+        assert runtime_ctx["agent_name"] == "myagent"
+
+    def test_body_context_none_still_injects_user_id(self):
+        """If frontend omits body.context entirely, inject must still create it."""
+        config = _assemble_config(
+            body_config={"recursion_limit": 1000},
+            body_context=None,
+            request_user_id="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert runtime_ctx["user_id"] == "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
+
+    def test_body_context_empty_dict_still_injects_user_id(self):
+        """body.context={} (falsy) path: inject must still produce user_id."""
+        config = _assemble_config(
+            body_config={"recursion_limit": 1000},
+            body_context={},
+            request_user_id="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert runtime_ctx["user_id"] == "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
+
+    def test_body_config_already_contains_context_field(self):
+        """body.config={'context': {...}} (LG 0.6 alt wire): inject still wins."""
+        config = _assemble_config(
+            body_config={"context": {"agent_name": "myagent"}, "recursion_limit": 1000},
+            body_context=None,
+            request_user_id="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert runtime_ctx["user_id"] == "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
+
+    def test_client_supplied_user_id_is_overridden(self):
+        """Spoofed client user_id must be overwritten by inject (auth-trusted source)."""
+        config = _assemble_config(
+            body_config={"recursion_limit": 1000},
+            body_context={"agent_name": "myagent", "user_id": "spoofed"},
+            request_user_id="11111111-2222-3333-4444-555555555555",
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert runtime_ctx["user_id"] == "11111111-2222-3333-4444-555555555555"
+
+    def test_unauthenticated_request_does_not_inject(self):
+        """If request.state.user is missing (impossible under fail-closed auth, but
+        verify defensively), inject must not write user_id and runtime_ctx must
+        therefore lack it — forcing the tool fallback path to reveal itself."""
+        config = _assemble_config(
+            body_config={"recursion_limit": 1000},
+            body_context={"agent_name": "myagent"},
+            request_user_id=None,
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert "user_id" not in runtime_ctx
+
+
+# ---------------------------------------------------------------------------
+# L4-L7: Real LangGraph create_agent driving the real setup_agent tool
+# ---------------------------------------------------------------------------
+
+
+def _build_real_bootstrap_graph(authenticated_user_id: str):
+    """Construct a real LangGraph using create_agent + the real setup_agent tool.
+
+    The LLM is faked (FakeMessagesListChatModel) so we don't need an API key.
+    Everything else — ToolNode dispatch, runtime injection, middleware — is
+    the real production code path.
+    """
+    from langchain.agents import create_agent
+
+    from deerflow.tools.builtins.setup_agent_tool import setup_agent
+
+    # First model turn: emit a tool_call for setup_agent
+    # Second model turn (after tool result): final answer (terminates the loop)
+    fake_model = FakeToolCallingModel(
+        responses=[
+            AIMessage(
+                content="",
+                tool_calls=[
+                    {
+                        "name": "setup_agent",
+                        "args": {
+                            "soul": "# My E2E Agent\n\nA SOUL written by the model.",
+                            "description": "End-to-end test agent",
+                        },
+                        "id": "call_setup_1",
+                        "type": "tool_call",
+                    }
+                ],
+            ),
+            AIMessage(content=f"Done. Agent created for user {authenticated_user_id}."),
+        ]
+    )
+
+    graph = create_agent(
+        model=fake_model,
+        tools=[setup_agent],
+        system_prompt="You are a bootstrap agent. Call setup_agent immediately.",
+    )
+    return graph
+
+
+@pytest.mark.no_auto_user
+@pytest.mark.asyncio
+async def test_real_graph_real_setup_agent_writes_to_authenticated_user_dir(tmp_path: Path):
+    """The smoking-gun test for issue #2862.
+
+    Under no_auto_user (contextvar = empty), if user_id propagation through
+    runtime.context is broken, setup_agent will fall back to DEFAULT_USER_ID
+    and write to users/default/agents/... The assertion that this directory
+    DOES NOT exist is what makes this test load-bearing.
+    """
+    from langgraph.runtime import Runtime
+
+    auth_uid = "abcdef01-2345-6789-abcd-ef0123456789"
+    config = _assemble_config(
+        body_config={"recursion_limit": 50},
+        body_context={"agent_name": "e2e-agent", "is_bootstrap": True},
+        request_user_id=auth_uid,
+        thread_id="thread-e2e-1",
+    )
+
+    # Replay worker.run_agent's runtime construction. This is the key step:
+    # it is what makes ToolRuntime.context contain user_id when the tool
+    # actually fires.
+    runtime_ctx = _build_runtime_context("thread-e2e-1", "run-1", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    graph = _build_real_bootstrap_graph(auth_uid)
+
+    # Patch get_paths only (the file-system rooting); everything else is real
+    with patch(
+        "deerflow.tools.builtins.setup_agent_tool.get_paths",
+        return_value=_make_paths_mock(tmp_path),
+    ):
+        # Drive the real graph. This goes through real ToolNode + real Runtime merge.
+        final_state = await graph.ainvoke(
+            {"messages": [HumanMessage(content="Create an agent named e2e-agent")]},
+            config=config,
+        )
+
+    expected_dir = tmp_path / "users" / auth_uid / "agents" / "e2e-agent"
+    default_dir = tmp_path / "users" / "default" / "agents" / "e2e-agent"
+
+    # Load-bearing assertions:
+    assert expected_dir.exists(), f"Agent directory not found at the authenticated user's path. Expected: {expected_dir}. tmp_path tree: {[str(p) for p in tmp_path.rglob('*')]}"
+    assert (expected_dir / "SOUL.md").read_text() == "# My E2E Agent\n\nA SOUL written by the model."
+    assert (expected_dir / "config.yaml").exists()
+    assert not default_dir.exists(), "REGRESSION: agent landed under users/default/. user_id propagation broke somewhere between HTTP layer and ToolRuntime.context."
+
+    # And final state should reflect tool success
+    last = final_state["messages"][-1]
+    assert "Done" in (last.content if isinstance(last.content, str) else str(last.content))
+
+
+@pytest.mark.no_auto_user
+@pytest.mark.asyncio
+async def test_inject_failure_falls_back_to_default_proving_test_is_load_bearing(tmp_path: Path):
+    """Negative control: if inject does NOT happen (no user in request), and
+    contextvar is empty (no_auto_user), setup_agent must land in default/.
+
+    This proves the positive test is actually load-bearing — i.e. it would
+    have failed before PR #2784, not passed accidentally.
+    """
+    from langgraph.runtime import Runtime
+
+    config = _assemble_config(
+        body_config={"recursion_limit": 50},
+        body_context={"agent_name": "fallback-agent", "is_bootstrap": True},
+        request_user_id=None,  # no auth — inject is a no-op
+        thread_id="thread-e2e-2",
+    )
+
+    runtime_ctx = _build_runtime_context("thread-e2e-2", "run-2", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    graph = _build_real_bootstrap_graph("does-not-matter")
+
+    with patch(
+        "deerflow.tools.builtins.setup_agent_tool.get_paths",
+        return_value=_make_paths_mock(tmp_path),
+    ):
+        await graph.ainvoke(
+            {"messages": [HumanMessage(content="Create fallback-agent")]},
+            config=config,
+        )
+
+    default_dir = tmp_path / "users" / "default" / "agents" / "fallback-agent"
+    assert default_dir.exists(), "Negative control failed: even without inject + contextvar, agent did not land in default/. The test infrastructure may not be reproducing the bug condition."
+
+
+# ---------------------------------------------------------------------------
+# L5: Sub-graph runtime propagation (the task tool case)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.no_auto_user
+@pytest.mark.asyncio
+async def test_subgraph_invocation_preserves_user_id_in_runtime(tmp_path: Path):
+    """When a parent graph invokes a child graph (the pattern used by
+    subagents), parent_runtime.merge() must keep user_id intact.
+
+    We construct a child graph that contains setup_agent and call it from
+    a parent graph's tool. If LangGraph re-creates the Runtime and drops
+    user_id at the sub-graph boundary, this fails.
+    """
+    from langchain.agents import create_agent
+    from langgraph.runtime import Runtime
+
+    from deerflow.tools.builtins.setup_agent_tool import setup_agent
+
+    auth_uid = "deadbeef-0000-1111-2222-333344445555"
+
+    # Inner graph: same as the bootstrap flow
+    inner_model = FakeToolCallingModel(
+        responses=[
+            AIMessage(
+                content="",
+                tool_calls=[
+                    {
+                        "name": "setup_agent",
+                        "args": {"soul": "# Inner", "description": "subgraph"},
+                        "id": "call_inner_1",
+                        "type": "tool_call",
+                    }
+                ],
+            ),
+            AIMessage(content="inner done"),
+        ]
+    )
+    inner_graph = create_agent(
+        model=inner_model,
+        tools=[setup_agent],
+        system_prompt="inner",
+    )
+
+    config = _assemble_config(
+        body_config={"recursion_limit": 50},
+        body_context={"agent_name": "subgraph-agent", "is_bootstrap": True},
+        request_user_id=auth_uid,
+        thread_id="thread-e2e-3",
+    )
+    runtime_ctx = _build_runtime_context("thread-e2e-3", "run-3", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    with patch(
+        "deerflow.tools.builtins.setup_agent_tool.get_paths",
+        return_value=_make_paths_mock(tmp_path),
+    ):
+        # Direct sub-graph invoke (mimics what a subagent invocation looks like
+        # — distinct ainvoke call, but parent config carries the same runtime).
+        await inner_graph.ainvoke(
+            {"messages": [HumanMessage(content="Create subgraph-agent")]},
+            config=config,
+        )
+
+    expected_dir = tmp_path / "users" / auth_uid / "agents" / "subgraph-agent"
+    default_dir = tmp_path / "users" / "default" / "agents" / "subgraph-agent"
+    assert expected_dir.exists()
+    assert not default_dir.exists()
+
+
+# ---------------------------------------------------------------------------
+# L6: Sync tool path through ContextThreadPoolExecutor
+# ---------------------------------------------------------------------------
+
+
+def test_sync_tool_dispatch_through_thread_pool_uses_runtime_context(tmp_path: Path):
+    """setup_agent is a sync function. When dispatched through ToolNode's
+    ContextThreadPoolExecutor, runtime.context must still carry user_id —
+    not via thread-local copy_context (which only carries contextvars), but
+    because it was passed in as the ToolRuntime constructor argument.
+    """
+    from langchain.agents import create_agent
+    from langgraph.runtime import Runtime
+
+    from deerflow.tools.builtins.setup_agent_tool import setup_agent
+
+    auth_uid = "11112222-3333-4444-5555-666677778888"
+
+    fake_model = FakeToolCallingModel(
+        responses=[
+            AIMessage(
+                content="",
+                tool_calls=[
+                    {
+                        "name": "setup_agent",
+                        "args": {"soul": "# Sync", "description": "sync path"},
+                        "id": "call_sync_1",
+                        "type": "tool_call",
+                    }
+                ],
+            ),
+            AIMessage(content="sync done"),
+        ]
+    )
+    graph = create_agent(model=fake_model, tools=[setup_agent], system_prompt="sync")
+
+    config = _assemble_config(
+        body_config={"recursion_limit": 50},
+        body_context={"agent_name": "sync-agent", "is_bootstrap": True},
+        request_user_id=auth_uid,
+        thread_id="thread-e2e-4",
+    )
+    runtime_ctx = _build_runtime_context("thread-e2e-4", "run-4", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    with patch(
+        "deerflow.tools.builtins.setup_agent_tool.get_paths",
+        return_value=_make_paths_mock(tmp_path),
+    ):
+        # Use SYNC invoke to hit the ContextThreadPoolExecutor path
+        graph.invoke(
+            {"messages": [HumanMessage(content="Create sync-agent")]},
+            config=config,
+        )
+
+    expected_dir = tmp_path / "users" / auth_uid / "agents" / "sync-agent"
+    default_dir = tmp_path / "users" / "default" / "agents" / "sync-agent"
+    assert expected_dir.exists()
+    assert not default_dir.exists()
diff --git a/backend/tests/test_setup_agent_http_e2e_real_server.py b/backend/tests/test_setup_agent_http_e2e_real_server.py
new file mode 100644
index 000000000..950d040a0
--- /dev/null
+++ b/backend/tests/test_setup_agent_http_e2e_real_server.py
@@ -0,0 +1,326 @@
+"""Real HTTP end-to-end verification for issue #2862's setup_agent path.
+
+This test drives the **entire** FastAPI gateway through ``starlette.testclient.TestClient``:
+
+  starlette.testclient.TestClient (real ASGI stack)
+    -> AuthMiddleware (real cookie parsing, real JWT decode)
+    -> /api/v1/auth/register endpoint (real password hash + sqlite write)
+    -> /api/threads/{id}/runs/stream endpoint (real start_run config-assembly)
+    -> background asyncio.create_task(run_agent) (real worker, real Runtime)
+    -> langchain.agents.create_agent graph (real, with fake LLM)
+    -> ToolNode dispatch (real)
+    -> setup_agent tool (real file I/O)
+
+The only mock is the LLM (no API key needed). Every layer that participates
+in ``user_id`` propagation — auth, ContextVar, ``inject_authenticated_user_context``,
+``worker._build_runtime_context``, ``Runtime.merge`` — is the real production
+code path. If the chain is broken at any layer, this test fails.
+
+This is what "真实验证" looks like for a server that lives behind authentication:
+register a user, log in (cookie), POST to /runs/stream, wait for the run to
+finish, then read the filesystem.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any
+from unittest.mock import patch
+
+import pytest
+from _agent_e2e_helpers import FakeToolCallingModel, build_single_tool_call_model
+
+
+def _build_fake_create_chat_model(agent_name: str):
+    """Return a callable matching the real ``create_chat_model`` signature.
+
+    Whenever the lead agent constructs a chat model during the bootstrap flow,
+    we hand it a fake that emits a single setup_agent tool_call on its first
+    turn, then a benign final answer on its second turn.
+    """
+
+    def fake_create_chat_model(*args: Any, **kwargs: Any) -> FakeToolCallingModel:
+        return build_single_tool_call_model(
+            tool_name="setup_agent",
+            tool_args={
+                "soul": f"# Real HTTP E2E SOUL for {agent_name}",
+                "description": "real-http-e2e agent",
+            },
+            tool_call_id="call_real_http_1",
+            final_text=f"Agent {agent_name} created via real HTTP e2e.",
+        )
+
+    return fake_create_chat_model
+
+
+@pytest.fixture
+def isolated_deer_flow_home(tmp_path: Path, monkeypatch: pytest.MonkeyPatch):
+    """Stand up an isolated DeerFlow data root + config under tmp_path.
+
+    - Sets ``DEER_FLOW_HOME`` so paths land under tmp_path, not the real
+      ``.deer-flow`` directory.
+    - Stages a copy of the project's ``config.yaml`` (or ``config.example.yaml``
+      on a fresh CI checkout where ``config.yaml`` is gitignored) and pins
+      ``DEER_FLOW_CONFIG_PATH`` to it, so lifespan boot doesn't depend on the
+      developer's local config layout.
+    - Sets a placeholder OPENAI_API_KEY because the config has
+      ``$OPENAI_API_KEY`` that gets resolved at parse time; the LLM itself is
+      mocked, so any non-empty value works.
+    """
+    home = tmp_path / "deer-flow-home"
+    home.mkdir()
+    monkeypatch.setenv("DEER_FLOW_HOME", str(home))
+    monkeypatch.setenv("OPENAI_API_KEY", "sk-fake-key-not-used-because-llm-is-mocked")
+    monkeypatch.setenv("OPENAI_API_BASE", "https://example.invalid")
+
+    # Hermetic config: do not depend on whether the dev machine has a real
+    # ``config.yaml`` at the repo root. CI's ``actions/checkout`` only ships
+    # ``config.example.yaml`` (and its ``models:`` list is commented out, so
+    # AppConfig validation would reject it). Write a minimal, self-sufficient
+    # config to tmp_path and pin ``DEER_FLOW_CONFIG_PATH`` to it.
+    staged_config = tmp_path / "config.yaml"
+    staged_config.write_text(_MINIMAL_CONFIG_YAML, encoding="utf-8")
+    monkeypatch.setenv("DEER_FLOW_CONFIG_PATH", str(staged_config))
+
+    return home
+
+
+# Minimal config that satisfies AppConfig + LeadAgent's _resolve_model_name.
+# The model `use` path must resolve to a real class for config parsing to
+# succeed; the test patches ``create_chat_model`` on the lead agent module,
+# so the model is never actually instantiated. SandboxConfig.use is required
+# at schema level; LocalSandboxProvider is the only sandbox that runs without
+# Docker.
+_MINIMAL_CONFIG_YAML = """\
+log_level: info
+models:
+  - name: fake-test-model
+    display_name: Fake Test Model
+    use: langchain_openai:ChatOpenAI
+    model: gpt-4o-mini
+    api_key: $OPENAI_API_KEY
+    base_url: $OPENAI_API_BASE
+sandbox:
+  use: deerflow.sandbox.local:LocalSandboxProvider
+agents_api:
+  enabled: true
+database:
+  backend: sqlite
+"""
+
+
+def _reset_process_singletons(monkeypatch: pytest.MonkeyPatch) -> None:
+    """Reset every process-wide cache that would survive across tests.
+
+    This fixture stands up a full FastAPI app + sqlite DB + LangGraph runtime
+    inside ``tmp_path``. To get true per-test isolation we have to invalidate
+    a handful of module-level caches that production normally never resets,
+    so they pick up our test-only ``DEER_FLOW_HOME`` and sqlite path:
+
+    - ``deerflow.config.app_config`` caches the parsed ``config.yaml``.
+    - ``deerflow.config.paths`` caches the ``Paths`` singleton derived from
+      ``DEER_FLOW_HOME`` at first access.
+    - ``deerflow.persistence.engine`` caches the SQLAlchemy engine and
+      session factory after the first call to ``init_engine_from_config``.
+
+    ``raising=False`` keeps the fixture resilient if upstream renames or
+    drops one of these attributes — the test will simply skip that reset
+    instead of failing with a confusing AttributeError, and the next test
+    to call ``get_app_config()``/``get_paths()`` will surface the real
+    incompatibility loudly.
+    """
+    from deerflow.config import app_config as app_config_module
+    from deerflow.config import paths as paths_module
+    from deerflow.persistence import engine as engine_module
+
+    for module, attr in (
+        (app_config_module, "_app_config"),
+        (app_config_module, "_app_config_path"),
+        (app_config_module, "_app_config_mtime"),
+        (paths_module, "_paths_singleton"),
+        (engine_module, "_engine"),
+        (engine_module, "_session_factory"),
+    ):
+        monkeypatch.setattr(module, attr, None, raising=False)
+
+
+@pytest.fixture
+def isolated_app(isolated_deer_flow_home: Path, monkeypatch: pytest.MonkeyPatch):
+    """Build a fresh FastAPI app inside a clean DEER_FLOW_HOME.
+
+    Each test gets its own sqlite DB and checkpoint store under ``tmp_path``,
+    with no cross-test contamination.
+    """
+    _reset_process_singletons(monkeypatch)
+
+    # Re-resolve the config from the test-only DEER_FLOW_HOME and pin its
+    # sqlite path into tmp_path so the lifespan-time engine init lands there.
+    from deerflow.config import app_config as app_config_module
+
+    cfg = app_config_module.get_app_config()
+    cfg.database.sqlite_dir = str(isolated_deer_flow_home / "db")
+
+    from app.gateway.app import create_app
+
+    return create_app()
+
+
+def _drain_stream(response, *, timeout: float = 30.0, max_bytes: int = 4 * 1024 * 1024) -> str:
+    """Consume an SSE response body until the run terminates and return the text.
+
+    Bounded to keep the test fail-fast:
+      - Stops as soon as an ``event: end`` SSE frame is observed (the gateway
+        sends this when the background run finishes — see ``services.format_sse``
+        and ``StreamBridge.publish_end``).
+      - Stops at ``timeout`` seconds wall-clock so a stuck run / runaway heartbeat
+        loop surfaces a real failure instead of hanging pytest.
+      - Stops at ``max_bytes`` so a runaway producer can't OOM the test process.
+    """
+    import time as _time
+
+    deadline = _time.monotonic() + timeout
+    body = b""
+    for chunk in response.iter_bytes():
+        body += chunk
+        if b"event: end" in body:
+            break
+        if len(body) >= max_bytes:
+            break
+        if _time.monotonic() >= deadline:
+            break
+    return body.decode("utf-8", errors="replace")
+
+
+def _wait_for_file(path: Path, *, timeout: float = 10.0) -> bool:
+    """Block until *path* exists or *timeout* elapses.
+
+    The run completes inside ``asyncio.create_task`` after start_run returns,
+    so the test must wait for the background task to flush its writes.
+    """
+    import time as _time
+
+    deadline = _time.monotonic() + timeout
+    while _time.monotonic() < deadline:
+        if path.exists():
+            return True
+        _time.sleep(0.05)
+    return False
+
+
+@pytest.mark.no_auto_user
+def test_real_http_create_agent_lands_in_authenticated_user_dir(
+    isolated_app: Any,
+    isolated_deer_flow_home: Path,
+    monkeypatch: pytest.MonkeyPatch,
+):
+    """The full real-server contract test.
+
+    1. Register a real user via POST /api/v1/auth/register (also auto-logs in)
+    2. POST to /api/threads/{tid}/runs/stream with the **exact** body shape the
+       frontend (LangGraph SDK) sends during the bootstrap flow.
+    3. Wait for the background run to finish.
+    4. Assert SOUL.md exists under users/<authenticated_uid>/agents/<name>/.
+    5. Assert NOTHING exists under users/default/agents/<name>/.
+    """
+    # ``deerflow.agents.lead_agent.agent`` imports ``create_chat_model`` with
+    # ``from deerflow.models import create_chat_model`` at module load time,
+    # rebinding the symbol into its own namespace. So the only patch that
+    # intercepts the call is the bound name on ``lead_agent.agent`` — patching
+    # ``deerflow.models.create_chat_model`` would be too late.
+    agent_name = "real-http-agent"
+
+    from starlette.testclient import TestClient
+
+    with (
+        patch(
+            "deerflow.agents.lead_agent.agent.create_chat_model",
+            new=_build_fake_create_chat_model(agent_name),
+        ),
+        TestClient(isolated_app) as client,
+    ):
+        # --- 1. Register & auto-login ---
+        register = client.post(
+            "/api/v1/auth/register",
+            json={"email": "e2e-user@example.com", "password": "very-strong-password-123"},
+        )
+        assert register.status_code == 201, register.text
+        registered = register.json()
+        auth_uid = registered["id"]
+        # The endpoint sets both access_token (auth) and csrf_token (CSRF Double
+        # Submit Cookie) cookies; the TestClient cookie jar propagates them.
+        assert client.cookies.get("access_token"), "register endpoint must set session cookie"
+        csrf_token = client.cookies.get("csrf_token")
+        assert csrf_token, "register endpoint must set csrf_token cookie"
+
+        # --- 2. Create a thread (require_existing=True on /runs/stream means
+        # we must call POST /api/threads first; the React frontend does the
+        # same via the LangGraph SDK's threads.create) ---
+        import uuid as _uuid
+
+        thread_id = str(_uuid.uuid4())
+        created = client.post(
+            "/api/threads",
+            json={"thread_id": thread_id, "metadata": {}},
+            headers={"X-CSRF-Token": csrf_token},
+        )
+        assert created.status_code == 200, created.text
+
+        # --- 3. POST /runs/stream with the bootstrap wire format ---
+        # This is the EXACT shape the React frontend sends after PR #2784:
+        #   thread.submit(input, {config, context}) ->
+        #   POST /api/threads/{id}/runs/stream body =
+        #     {assistant_id, input, config, context}
+        body = {
+            "assistant_id": "lead_agent",
+            "input": {
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": (f"The new custom agent name is {agent_name}. Help me design its SOUL.md before saving it."),
+                    }
+                ]
+            },
+            "config": {"recursion_limit": 50},
+            "context": {
+                "agent_name": agent_name,
+                "is_bootstrap": True,
+                "mode": "flash",
+                "thinking_enabled": False,
+                "is_plan_mode": False,
+                "subagent_enabled": False,
+            },
+            "stream_mode": ["values"],
+        }
+        # The /stream endpoint returns SSE; we drain it so the server-side
+        # background task (run_agent) gets to completion before we look at disk.
+        with client.stream(
+            "POST",
+            f"/api/threads/{thread_id}/runs/stream",
+            json=body,
+            headers={"X-CSRF-Token": csrf_token},
+        ) as resp:
+            assert resp.status_code == 200, resp.read().decode()
+            transcript = _drain_stream(resp)
+
+        # Sanity: the stream should have produced at least one event
+        assert "event:" in transcript, f"no SSE events in response: {transcript[:500]!r}"
+
+        # --- 4. Verify filesystem outcome ---
+        expected_dir = isolated_deer_flow_home / "users" / auth_uid / "agents" / agent_name
+        default_dir = isolated_deer_flow_home / "users" / "default" / "agents" / agent_name
+
+        # The setup_agent tool runs inside the background asyncio task spawned
+        # by start_run; SSE-drain typically waits for it, but we add a bounded
+        # poll to be robust against scheduler jitter.
+        assert _wait_for_file(expected_dir / "SOUL.md", timeout=15.0), (
+            "SOUL.md did not appear under users/<auth_uid>/agents/. "
+            f"Expected: {expected_dir / 'SOUL.md'}. "
+            f"tmp tree: {sorted(str(p.relative_to(isolated_deer_flow_home)) for p in isolated_deer_flow_home.rglob('SOUL.md'))}. "
+            f"SSE transcript tail: {transcript[-1000:]!r}"
+        )
+
+        soul_text = (expected_dir / "SOUL.md").read_text()
+        assert agent_name in soul_text, f"unexpected SOUL content: {soul_text!r}"
+
+        # The smoking-gun assertion: the agent must NOT have landed in default/
+        assert not default_dir.exists(), f"REGRESSION: agent landed under users/default/{agent_name} instead of the authenticated user. Default-dir contents: {list(default_dir.rglob('*')) if default_dir.exists() else 'n/a'}"
diff --git a/backend/tests/test_update_agent_e2e_user_isolation.py b/backend/tests/test_update_agent_e2e_user_isolation.py
new file mode 100644
index 000000000..7fa725352
--- /dev/null
+++ b/backend/tests/test_update_agent_e2e_user_isolation.py
@@ -0,0 +1,253 @@
+"""End-to-end verification for update_agent's user_id resolution.
+
+PR #2784 hardened setup_agent to prefer runtime.context["user_id"] over the
+contextvar. update_agent had the same latent gap: it unconditionally called
+get_effective_user_id() at module level, so any scenario where the contextvar
+was unavailable while runtime.context carried user_id (a background task
+scheduled outside the request task, a worker pool that doesn't copy_context,
+checkpoint resume on a different task) would silently route writes to
+users/default/agents/...
+
+These tests are load-bearing under @no_auto_user (contextvar empty):
+
+- The negative-control test confirms the fixture actually puts the tool in
+  the regime where the contextvar fallback would land in users/default/.
+  Without that, the positive test would be vacuously satisfied.
+- The positive test verifies update_agent honours runtime.context["user_id"]
+  injected by inject_authenticated_user_context in the gateway. Before the
+  fix in this PR, this test failed; now it passes.
+"""
+
+from __future__ import annotations
+
+from contextlib import ExitStack
+from pathlib import Path
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch
+from uuid import UUID
+
+import pytest
+import yaml
+from _agent_e2e_helpers import build_single_tool_call_model
+from langchain_core.messages import HumanMessage
+
+from app.gateway.services import (
+    build_run_config,
+    inject_authenticated_user_context,
+    merge_run_context_overrides,
+)
+from deerflow.runtime.runs.worker import _build_runtime_context, _install_runtime_context
+
+
+def _make_request(user_id_str: str | None) -> SimpleNamespace:
+    user = SimpleNamespace(id=UUID(user_id_str), email="alice@local") if user_id_str else None
+    return SimpleNamespace(state=SimpleNamespace(user=user))
+
+
+def _assemble_config(*, body_context: dict | None, request_user_id: str | None, thread_id: str) -> dict:
+    config = build_run_config(thread_id, {"recursion_limit": 50}, None, assistant_id="lead_agent")
+    merge_run_context_overrides(config, body_context)
+    inject_authenticated_user_context(config, _make_request(request_user_id))
+    return config
+
+
+def _seed_existing_agent(tmp_path: Path, user_id: str, agent_name: str, soul: str = "# Original"):
+    """Pre-create an agent on disk for update_agent to overwrite."""
+    agent_dir = tmp_path / "users" / user_id / "agents" / agent_name
+    agent_dir.mkdir(parents=True, exist_ok=True)
+    (agent_dir / "config.yaml").write_text(
+        yaml.dump({"name": agent_name, "description": "old"}, allow_unicode=True),
+        encoding="utf-8",
+    )
+    (agent_dir / "SOUL.md").write_text(soul, encoding="utf-8")
+    return agent_dir
+
+
+def _make_paths_mock(tmp_path: Path):
+    paths = MagicMock()
+    paths.base_dir = tmp_path
+    paths.agent_dir = lambda name: tmp_path / "agents" / name
+    paths.user_agent_dir = lambda user_id, name: tmp_path / "users" / user_id / "agents" / name
+    return paths
+
+
+def _patch_update_agent_dependencies(tmp_path: Path):
+    """update_agent reads load_agent_config + get_app_config — stub them
+    minimally so the tool can run without a real config file or LLM."""
+    fake_model_cfg = SimpleNamespace(name="fake-model")
+    fake_app_cfg = MagicMock()
+    fake_app_cfg.get_model_config = lambda name: fake_model_cfg if name == "fake-model" else None
+
+    return [
+        patch(
+            "deerflow.tools.builtins.update_agent_tool.get_paths",
+            return_value=_make_paths_mock(tmp_path),
+        ),
+        patch(
+            "deerflow.tools.builtins.update_agent_tool.get_app_config",
+            return_value=fake_app_cfg,
+        ),
+        # load_agent_config (used by update_agent to read existing config) also
+        # reads paths via its own module-level get_paths reference. Patch it too
+        # or the tool returns "Agent does not exist" before touching disk.
+        patch(
+            "deerflow.config.agents_config.get_paths",
+            return_value=_make_paths_mock(tmp_path),
+        ),
+    ]
+
+
+def _build_update_graph(*, soul_payload: str):
+    from langchain.agents import create_agent
+
+    from deerflow.tools.builtins.update_agent_tool import update_agent
+
+    fake_model = build_single_tool_call_model(
+        tool_name="update_agent",
+        tool_args={"soul": soul_payload, "description": "refined"},
+        tool_call_id="call_update_1",
+        final_text="updated",
+    )
+    return create_agent(model=fake_model, tools=[update_agent], system_prompt="updater")
+
+
+# ---------------------------------------------------------------------------
+# Negative control — proves the test environment puts update_agent in the
+# regime where the contextvar fallback would land in default/.
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.no_auto_user
+def test_update_agent_falls_back_to_default_when_no_inject_and_no_contextvar(tmp_path: Path):
+    """No request.state.user, no contextvar — update_agent must look in
+    users/default/agents/. We seed the file there so the tool succeeds and
+    we know which directory it actually consulted."""
+    from langgraph.runtime import Runtime
+
+    _seed_existing_agent(tmp_path, "default", "fallback-target")
+
+    config = _assemble_config(
+        body_context={"agent_name": "fallback-target"},
+        request_user_id=None,  # no auth, inject is no-op
+        thread_id="thread-update-1",
+    )
+    runtime_ctx = _build_runtime_context("thread-update-1", "run-1", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    graph = _build_update_graph(soul_payload="# Fallback Updated")
+
+    with ExitStack() as stack:
+        for p in _patch_update_agent_dependencies(tmp_path):
+            stack.enter_context(p)
+        graph.invoke(
+            {"messages": [HumanMessage(content="update fallback-target")]},
+            config=config,
+        )
+
+    soul = (tmp_path / "users" / "default" / "agents" / "fallback-target" / "SOUL.md").read_text()
+    assert soul == "# Fallback Updated", "Sanity: tool should have written under default/"
+
+
+# ---------------------------------------------------------------------------
+# Regression guard — passes on this branch, would fail on main before the fix.
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.no_auto_user
+def test_update_agent_should_use_runtime_context_user_id_when_contextvar_missing(tmp_path: Path):
+    """update_agent prefers the authenticated user_id carried in
+    runtime.context (placed there by inject_authenticated_user_context)
+    over the contextvar — same contract as setup_agent (PR #2784).
+
+    Before this PR's fix, update_agent unconditionally called
+    get_effective_user_id() and landed in default/ whenever the contextvar
+    was unavailable. This test pins the corrected behaviour.
+    """
+    from langgraph.runtime import Runtime
+
+    auth_uid = "abcdef01-2345-6789-abcd-ef0123456789"
+
+    # Seed the agent in BOTH locations so we can prove which one was opened.
+    auth_dir = _seed_existing_agent(tmp_path, auth_uid, "shared-name", soul="# Auth Original")
+    default_dir = _seed_existing_agent(tmp_path, "default", "shared-name", soul="# Default Original")
+
+    config = _assemble_config(
+        body_context={"agent_name": "shared-name"},
+        request_user_id=auth_uid,
+        thread_id="thread-update-2",
+    )
+    runtime_ctx = _build_runtime_context("thread-update-2", "run-2", config.get("context"), None)
+    assert runtime_ctx["user_id"] == auth_uid, "Pre-condition: inject must have placed user_id into runtime_ctx"
+
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    graph = _build_update_graph(soul_payload="# Auth Updated")
+
+    with ExitStack() as stack:
+        for p in _patch_update_agent_dependencies(tmp_path):
+            stack.enter_context(p)
+        graph.invoke(
+            {"messages": [HumanMessage(content="update shared-name")]},
+            config=config,
+        )
+
+    auth_soul = (auth_dir / "SOUL.md").read_text()
+    default_soul = (default_dir / "SOUL.md").read_text()
+
+    assert auth_soul == "# Auth Updated", f"REGRESSION: update_agent ignored runtime.context['user_id']={auth_uid!r} and routed the write to users/default/ instead. auth_soul={auth_soul!r}, default_soul={default_soul!r}"
+    assert default_soul == "# Default Original", "REGRESSION: update_agent corrupted the shared default-user agent. It should have written under the authenticated user's path."
+
+
+# ---------------------------------------------------------------------------
+# Positive — when contextvar IS the auth user (the normal HTTP case), things
+# already work. Pin it as a regression guard so future refactors don't
+# accidentally break the contextvar path in pursuit of the runtime-context fix.
+# ---------------------------------------------------------------------------
+
+
+def test_update_agent_uses_contextvar_when_present(tmp_path: Path, monkeypatch):
+    """The normal HTTP case: contextvar is set by auth_middleware. This must
+    keep working regardless of how runtime.context is populated."""
+    from types import SimpleNamespace as _SN
+
+    from deerflow.runtime.user_context import reset_current_user, set_current_user
+
+    auth_uid = "11112222-3333-4444-5555-666677778888"
+    user = _SN(id=auth_uid, email="ctxvar@local")
+
+    _seed_existing_agent(tmp_path, auth_uid, "ctxvar-agent", soul="# Original")
+
+    from langgraph.runtime import Runtime
+
+    config = _assemble_config(
+        body_context={"agent_name": "ctxvar-agent"},
+        request_user_id=auth_uid,
+        thread_id="thread-update-3",
+    )
+    runtime_ctx = _build_runtime_context("thread-update-3", "run-3", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    graph = _build_update_graph(soul_payload="# CtxVar Updated")
+
+    with ExitStack() as stack:
+        for p in _patch_update_agent_dependencies(tmp_path):
+            stack.enter_context(p)
+        token = set_current_user(user)
+        try:
+            final = graph.invoke(
+                {"messages": [HumanMessage(content="update ctxvar-agent")]},
+                config=config,
+            )
+        finally:
+            reset_current_user(token)
+
+    # surface the tool's reply for debug if it errored
+    tool_replies = [m.content for m in final["messages"] if getattr(m, "type", "") == "tool"]
+    soul = (tmp_path / "users" / auth_uid / "agents" / "ctxvar-agent" / "SOUL.md").read_text()
+    assert soul == "# CtxVar Updated", f"tool replies: {tool_replies}"