mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-05-15 13:13:45 +00:00
* fix(agents): make update_agent honor runtime.context user_id like setup_agent PR #2784 hardened setup_agent to prefer runtime.context["user_id"] (set by inject_authenticated_user_context from the auth-validated request) over the contextvar, so an agent created during the bootstrap flow always lands under users/<auth_uid>/agents/<name>. update_agent was left calling get_effective_user_id() unconditionally — the same class of bug that produced issues #2782 / #2862 still applies whenever the contextvar is not available on the executing task (background work, future cross-process drivers, checkpoint resume on a different task). In that regime update_agent silently routes writes to users/default/agents/<name>, corrupting the shared default bucket and losing the user's edit. Extract the resolution policy into a shared resolve_runtime_user_id helper on deerflow.runtime.user_context and route both setup_agent and update_agent through it so the two halves of the lifecycle stay in lockstep. Add load-bearing end-to-end tests that drive a real langchain.agents create_agent graph with a fake LLM, exercising the full pipeline: HTTP wire format -> app.gateway.services.start_run config-assembly -> deerflow.runtime.runs.worker._build_runtime_context -> langchain.agents create_agent graph -> ToolNode dispatch (sync + async + sub-graph + ContextThreadPoolExecutor) -> setup_agent / update_agent The negative-control tests intentionally land in users/default/ to prove the positive tests are actually load-bearing rather than vacuously passing. The new test_update_agent_e2e_user_isolation suite included a test that failed against main and now passes after this fix. * style: ruff format on new e2e tests * test(e2e): real-server HTTP test driving setup_agent through the full ASGI stack Adds tests/test_setup_agent_http_e2e_real_server.py — a single load-bearing test that drives the entire FastAPI gateway through starlette.testclient. TestClient with no mocks above the LLM: - lifespan boots (config, sqlite engine, LangGraph runtime, channels) - POST /api/v1/auth/register (real password hash, real sqlite write, issues access_token + csrf_token cookies) - POST /api/threads (real thread_meta + checkpoint creation) - POST /api/threads/{id}/runs/stream with the exact wire shape the React frontend sends (assistant_id + input + config + context with agent_name/is_bootstrap) - AuthMiddleware -> CSRFMiddleware -> require_permission -> start_run -> inject_authenticated_user_context -> asyncio.create_task(run_agent) -> worker._build_runtime_context -> Runtime injection -> ToolNode dispatch -> real setup_agent - Asserts SOUL.md is under users/<authenticated_uid>/agents/<name>/ and NOT under users/default/agents/<name>/. DEER_FLOW_HOME and the sqlite path are redirected into tmp_path so the test never touches the real .deer-flow directory or developer database. The only patch above the LLM boundary is replacing create_chat_model with a fake that emits a single setup_agent tool_call. This is the "真实验证" answer: it reproduces what curl-against-uvicorn would do, minus the network socket layer. * test: address Copilot review on user-isolation e2e tests - Drop "currently expected to FAIL" wording from update_agent e2e docstring and header (Copilot review): the fix is in this PR, the test pins the corrected behaviour rather than driving a future change. - Rephrase the assertion failure messages from "BUG:" to "REGRESSION:" to match the test's role on the fixed branch. - Bound _drain_stream with a wall-clock timeout, a max-bytes cap, and an early break on the "event: end" SSE frame (Copilot review). Stops the test from hanging on a stuck run or runaway heartbeat loop. - Replace the misleading "patch both module aliases" comment with an explanation of why patching lead_agent.agent.create_chat_model is the only correct target (Copilot review): lead_agent rebinds the symbol into its own namespace at import time, so patching deerflow.models is too late. * test(refactor): address WillemJiang review on user-isolation e2e tests - Extract the duplicated FakeToolCallingModel (and a build_single_tool_call_model helper) into tests/_agent_e2e_helpers.py. All three e2e files now import from the shared module instead of redefining the shim locally. - Convert the manual p.start() / p.stop() try/finally blocks in test_update_agent_e2e_user_isolation.py to contextlib.ExitStack so patch lifecycle is Pythonic and exception-safe. - Lift the isolated_app fixture's private-attribute resets into a named _reset_process_singletons helper with a comment block explaining why each singleton has to be invalidated for true e2e isolation, and why raising=False is intentional. Makes the fragility visible and the intent self-documenting rather than leaving the resets inline as opaque monkeypatch calls. Net change: -59 lines (143 -> 84) across the three test files, with every assertion intact. Full suite remains 69 passed / lint clean. * test(e2e): make real-server test self-supply its config CI's actions/checkout only ships config.example.yaml (the real config.yaml is gitignored), so the production config-discovery search (./config.yaml -> ../config.yaml -> $DEER_FLOW_CONFIG_PATH) finds nothing and the test fails at lifespan boot with FileNotFoundError. The dev-machine run passed only because a local config.yaml happened to exist. Write a minimal AppConfig-valid yaml into tmp_path and pin DEER_FLOW_CONFIG_PATH to it. The yaml carries just what the schema requires (a single fake-test-model entry, LocalSandboxProvider, sqlite database). The LLM never gets instantiated because the test patches create_chat_model on the lead agent module, so the api_key/base_url stay placeholders. Verified by hiding the local config.yaml to mirror the CI checkout — the test now passes in both environments.
196 lines
7.5 KiB
Python
196 lines
7.5 KiB
Python
"""Request-scoped user context for user-based authorization.
|
|
|
|
This module holds a :class:`~contextvars.ContextVar` that the gateway's
|
|
auth middleware sets after a successful authentication. Repository
|
|
methods read the contextvar via a sentinel default parameter, letting
|
|
routers stay free of ``user_id`` boilerplate.
|
|
|
|
Three-state semantics for the repository ``user_id`` parameter (the
|
|
consumer side of this module lives in ``deerflow.persistence.*``):
|
|
|
|
- ``_AUTO`` (module-private sentinel, default): read from contextvar;
|
|
raise :class:`RuntimeError` if unset.
|
|
- Explicit ``str``: use the provided value, overriding contextvar.
|
|
- Explicit ``None``: no WHERE clause — used only by migration scripts
|
|
and admin CLIs that intentionally bypass isolation.
|
|
|
|
Dependency direction
|
|
--------------------
|
|
``persistence`` (lower layer) reads from this module; ``gateway.auth``
|
|
(higher layer) writes to it. ``CurrentUser`` is defined here as a
|
|
:class:`typing.Protocol` so that ``persistence`` never needs to import
|
|
the concrete ``User`` class from ``gateway.auth.models``. Any object
|
|
with an ``.id: str`` attribute structurally satisfies the protocol.
|
|
|
|
Asyncio semantics
|
|
-----------------
|
|
``ContextVar`` is task-local under asyncio, not thread-local. Each
|
|
FastAPI request runs in its own task, so the context is naturally
|
|
isolated. ``asyncio.create_task`` and ``asyncio.to_thread`` inherit the
|
|
parent task's context, which is typically the intended behaviour; if
|
|
a background task must *not* see the foreground user, wrap it with
|
|
``contextvars.copy_context()`` to get a clean copy.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
from contextvars import ContextVar, Token
|
|
from typing import Final, Protocol, runtime_checkable
|
|
|
|
|
|
@runtime_checkable
|
|
class CurrentUser(Protocol):
|
|
"""Structural type for the current authenticated user.
|
|
|
|
Any object with an ``.id: str`` attribute satisfies this protocol.
|
|
Concrete implementations live in ``app.gateway.auth.models.User``.
|
|
"""
|
|
|
|
id: str
|
|
|
|
|
|
_current_user: Final[ContextVar[CurrentUser | None]] = ContextVar("deerflow_current_user", default=None)
|
|
|
|
|
|
def set_current_user(user: CurrentUser) -> Token[CurrentUser | None]:
|
|
"""Set the current user for this async task.
|
|
|
|
Returns a reset token that should be passed to
|
|
:func:`reset_current_user` in a ``finally`` block to restore the
|
|
previous context.
|
|
"""
|
|
return _current_user.set(user)
|
|
|
|
|
|
def reset_current_user(token: Token[CurrentUser | None]) -> None:
|
|
"""Restore the context to the state captured by ``token``."""
|
|
_current_user.reset(token)
|
|
|
|
|
|
def get_current_user() -> CurrentUser | None:
|
|
"""Return the current user, or ``None`` if unset.
|
|
|
|
Safe to call in any context. Used by code paths that can proceed
|
|
without a user (e.g. migration scripts, public endpoints).
|
|
"""
|
|
return _current_user.get()
|
|
|
|
|
|
def require_current_user() -> CurrentUser:
|
|
"""Return the current user, or raise :class:`RuntimeError`.
|
|
|
|
Used by repository code that must not be called outside a
|
|
request-authenticated context. The error message is phrased so
|
|
that a caller debugging a stack trace can locate the offending
|
|
code path.
|
|
"""
|
|
user = _current_user.get()
|
|
if user is None:
|
|
raise RuntimeError("repository accessed without user context")
|
|
return user
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Effective user_id helpers (filesystem isolation)
|
|
# ---------------------------------------------------------------------------
|
|
|
|
DEFAULT_USER_ID: Final[str] = "default"
|
|
|
|
|
|
def get_effective_user_id() -> str:
|
|
"""Return the current user's id as a string, or DEFAULT_USER_ID if unset.
|
|
|
|
Unlike :func:`require_current_user` this never raises — it is designed
|
|
for filesystem-path resolution where a valid user bucket is always needed.
|
|
"""
|
|
user = _current_user.get()
|
|
if user is None:
|
|
return DEFAULT_USER_ID
|
|
return str(user.id)
|
|
|
|
|
|
def resolve_runtime_user_id(runtime: object | None) -> str:
|
|
"""Single source of truth for a tool/middleware's effective user_id.
|
|
|
|
Resolution order (most authoritative first):
|
|
1. ``runtime.context["user_id"]`` — set by ``inject_authenticated_user_context``
|
|
in the gateway from the auth-validated ``request.state.user``. This is
|
|
the only source that survives boundaries where the contextvar may have
|
|
been lost (background tasks scheduled outside the request task,
|
|
worker pools that don't copy_context, future cross-process drivers).
|
|
2. The ``_current_user`` ContextVar — set by the auth middleware at
|
|
request entry. Reliable for in-task work; copied by ``asyncio``
|
|
child tasks and by ``ContextThreadPoolExecutor``.
|
|
3. ``DEFAULT_USER_ID`` — last-resort fallback so unauthenticated
|
|
CLI / migration / test paths keep working without raising.
|
|
|
|
Tools that persist user-scoped state (custom agents, memory, uploads)
|
|
MUST call this instead of ``get_effective_user_id()`` directly so they
|
|
benefit from the runtime.context channel that ``setup_agent`` already
|
|
relies on.
|
|
"""
|
|
context = getattr(runtime, "context", None)
|
|
if isinstance(context, dict):
|
|
ctx_user_id = context.get("user_id")
|
|
if ctx_user_id:
|
|
return str(ctx_user_id)
|
|
return get_effective_user_id()
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Sentinel-based user_id resolution
|
|
# ---------------------------------------------------------------------------
|
|
#
|
|
# Repository methods accept a ``user_id`` keyword-only argument that
|
|
# defaults to ``AUTO``. The three possible values drive distinct
|
|
# behaviours; see the docstring on :func:`resolve_user_id`.
|
|
|
|
|
|
class _AutoSentinel:
|
|
"""Singleton marker meaning 'resolve user_id from contextvar'."""
|
|
|
|
_instance: _AutoSentinel | None = None
|
|
|
|
def __new__(cls) -> _AutoSentinel:
|
|
if cls._instance is None:
|
|
cls._instance = super().__new__(cls)
|
|
return cls._instance
|
|
|
|
def __repr__(self) -> str:
|
|
return "<AUTO>"
|
|
|
|
|
|
AUTO: Final[_AutoSentinel] = _AutoSentinel()
|
|
|
|
|
|
def resolve_user_id(
|
|
value: str | None | _AutoSentinel,
|
|
*,
|
|
method_name: str = "repository method",
|
|
) -> str | None:
|
|
"""Resolve the user_id parameter passed to a repository method.
|
|
|
|
Three-state semantics:
|
|
|
|
- :data:`AUTO` (default): read from contextvar; raise
|
|
:class:`RuntimeError` if no user is in context. This is the
|
|
common case for request-scoped calls.
|
|
- Explicit ``str``: use the provided id verbatim, overriding any
|
|
contextvar value. Useful for tests and admin-override flows.
|
|
- Explicit ``None``: no filter — the repository should skip the
|
|
user_id WHERE clause entirely. Reserved for migration scripts
|
|
and CLI tools that intentionally bypass isolation.
|
|
"""
|
|
if isinstance(value, _AutoSentinel):
|
|
user = _current_user.get()
|
|
if user is None:
|
|
raise RuntimeError(f"{method_name} called with user_id=AUTO but no user context is set; pass an explicit user_id, set the contextvar via auth middleware, or opt out with user_id=None for migration/CLI paths.")
|
|
# Coerce to ``str`` at the boundary: ``User.id`` is typed as
|
|
# ``UUID`` for the API surface, but the persistence layer
|
|
# stores ``user_id`` as ``String(64)`` and aiosqlite cannot
|
|
# bind a raw UUID object to a VARCHAR column ("type 'UUID' is
|
|
# not supported"). Honour the documented return type here
|
|
# rather than ripple a type change through every caller.
|
|
return str(user.id)
|
|
return value
|