mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-04-25 11:18:22 +00:00
refactor(runtime): restructure runs module with new execution architecture
Major refactoring of deerflow/runtime/: - runs/callbacks/ - new callback system (builder, events, title, tokens) - runs/internal/ - execution internals (executor, supervisor, stream_logic, registry) - runs/internal/execution/ - execution artifacts and events handling - runs/facade.py - high-level run facade - runs/observer.py - run observation protocol - runs/types.py - type definitions - runs/store/ - simplified store interfaces (create, delete, query, event) Refactor stream_bridge/: - Replace old providers with contract.py and exceptions.py - Remove async_provider.py, base.py, memory.py Add documentation: - README.md and README_zh.md for runtime module Remove deprecated: - manager.py moved to internal/ - worker.py, schemas.py - user_context.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
39a575617b
commit
9d0a42c1fb
594
backend/packages/harness/deerflow/runtime/README.md
Normal file
594
backend/packages/harness/deerflow/runtime/README.md
Normal file
@ -0,0 +1,594 @@
|
||||
# deerflow.runtime Design Overview
|
||||
|
||||
This document describes the current implementation of `backend/packages/harness/deerflow/runtime`, including its overall design, boundary model, the collaboration between `runs` and `stream_bridge`, how it interacts with external infrastructure and the `app` layer, and how `actor_context` is dynamically injected to provide user isolation.
|
||||
|
||||
## 1. Overall Role
|
||||
|
||||
`deerflow.runtime` is the runtime kernel layer of DeerFlow.
|
||||
|
||||
It sits below agents / tools / middlewares and above app / gateway / infra. Its purpose is to define runtime semantics and boundary contracts, without directly owning web endpoints, ORM models, or concrete infrastructure implementations.
|
||||
|
||||
Its public surface is re-exported from [`__init__.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/__init__.py) and currently exposes four main capability areas:
|
||||
|
||||
1. `runs`
|
||||
- Run domain types, execution facade, lifecycle observers, and store protocols
|
||||
2. `stream_bridge`
|
||||
- Stream event bridge contract and public stream types
|
||||
3. `actor_context`
|
||||
- Request/task-scoped actor context and user-isolation bridge
|
||||
4. `serialization`
|
||||
- Runtime serialization helpers for LangChain / LangGraph data and outward-facing events
|
||||
|
||||
Structurally, the current package looks like:
|
||||
|
||||
```text
|
||||
runtime
|
||||
├─ runs
|
||||
│ ├─ facade / types / observer / store
|
||||
│ ├─ internal/*
|
||||
│ └─ callbacks/*
|
||||
├─ stream_bridge
|
||||
│ ├─ contract
|
||||
│ └─ exceptions
|
||||
├─ actor_context
|
||||
└─ serialization / converters
|
||||
```
|
||||
|
||||
## 2. Overall Design and Constraint Model
|
||||
|
||||
### 2.1 Design Goal
|
||||
|
||||
The core goal of `runtime` is to decouple runtime control-plane semantics from infrastructure implementations.
|
||||
|
||||
It only cares about:
|
||||
|
||||
1. What a run is and how run state changes over time
|
||||
2. What lifecycle events and stream events are produced during execution
|
||||
3. Which capabilities must be injected from the outside, such as checkpointer, event store, stream bridge, and durable stores
|
||||
4. Who the current actor is, and how lower layers can use that for isolation
|
||||
|
||||
It deliberately does not care about:
|
||||
|
||||
1. Whether events are stored in memory, Redis, or another transport
|
||||
2. How run / thread / feedback data is persisted
|
||||
3. HTTP / SSE / FastAPI details
|
||||
4. How the auth plugin resolves the request user
|
||||
|
||||
### 2.2 Boundary Rules
|
||||
|
||||
The current package has a fairly clear boundary model:
|
||||
|
||||
1. `runs` owns execution orchestration, not ORM or SQL writes
|
||||
2. `stream_bridge` defines stream semantics, not app-level bridge construction
|
||||
3. `actor_context` defines runtime context, not auth-plugin behavior
|
||||
4. Durable data enters only through boundary protocols:
|
||||
- `RunCreateStore`
|
||||
- `RunQueryStore`
|
||||
- `RunDeleteStore`
|
||||
- `RunEventStore`
|
||||
5. Lifecycle side effects enter only through `RunObserver`
|
||||
6. User isolation is not implemented ad hoc in each module; it is propagated through actor context
|
||||
|
||||
In one sentence:
|
||||
|
||||
`runtime` defines semantics and contracts; `app.infra` provides implementations.
|
||||
|
||||
## 3. runs Subsystem Design
|
||||
|
||||
### 3.1 Purpose
|
||||
|
||||
`runtime/runs` is the run orchestration domain. It is responsible for:
|
||||
|
||||
1. Defining run domain objects and status transitions
|
||||
2. Organizing create / stream / wait / join / cancel / delete behavior
|
||||
3. Maintaining the in-process runtime control plane
|
||||
4. Emitting stream events and lifecycle events during execution
|
||||
5. Collecting trace, token, title, and message data through callbacks
|
||||
|
||||
### 3.2 Core Objects
|
||||
|
||||
See [`runs/types.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/types.py).
|
||||
|
||||
The most important types are:
|
||||
|
||||
1. `RunSpec`
|
||||
- Built by the app-side input layer
|
||||
- The real execution input
|
||||
2. `RunRecord`
|
||||
- The runtime record managed by `RunRegistry`
|
||||
3. `RunStatus`
|
||||
- `pending`, `starting`, `running`, `success`, `error`, `interrupted`, `timeout`
|
||||
4. `RunScope`
|
||||
- Distinguishes stateful vs stateless execution and temporary thread behavior
|
||||
|
||||
### 3.3 Current Constraints
|
||||
|
||||
The current implementation explicitly limits some parts of the problem space:
|
||||
|
||||
1. `multitask_strategy` currently supports only `reject` and `interrupt` on the main path
|
||||
2. `enqueue`, `after_seconds`, and batch execution are not on the current primary path
|
||||
3. `RunRegistry` is an in-process state source, not a durable source of truth
|
||||
4. External queries may use durable stores, but the live control plane still centers on the in-memory registry
|
||||
|
||||
### 3.4 Facade and Internal Components
|
||||
|
||||
`RunsFacade` in [`runs/facade.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/facade.py) provides the unified API:
|
||||
|
||||
1. `create_background`
|
||||
2. `create_and_stream`
|
||||
3. `create_and_wait`
|
||||
4. `join_stream`
|
||||
5. `join_wait`
|
||||
6. `cancel`
|
||||
7. `get_run`
|
||||
8. `list_runs`
|
||||
9. `delete_run`
|
||||
|
||||
Internally it composes:
|
||||
|
||||
1. `RunRegistry`
|
||||
2. `ExecutionPlanner`
|
||||
3. `RunSupervisor`
|
||||
4. `RunStreamService`
|
||||
5. `RunWaitService`
|
||||
6. `RunCreateStore` / `RunQueryStore` / `RunDeleteStore`
|
||||
7. `RunObserver`
|
||||
|
||||
So `RunsFacade` is the public entry point, while execution and state transitions are distributed across smaller components.
|
||||
|
||||
## 4. stream_bridge Design and Implementation
|
||||
|
||||
### 4.1 Why stream_bridge Is a Separate Abstraction
|
||||
|
||||
`StreamBridge` is defined in [`stream_bridge/contract.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/stream_bridge/contract.py).
|
||||
|
||||
It exists because run execution needs an event channel that is:
|
||||
|
||||
1. Subscribable
|
||||
2. Replayable
|
||||
3. Terminal-state aware
|
||||
4. Resume-capable
|
||||
|
||||
That behavior must not be hard-coupled to HTTP SSE, in-memory queues, or Redis-specific details.
|
||||
|
||||
So:
|
||||
|
||||
1. harness defines stream semantics
|
||||
2. the app layer owns backend selection and implementation
|
||||
|
||||
### 4.2 Contract Contents
|
||||
|
||||
The abstract `StreamBridge` currently exposes:
|
||||
|
||||
1. `publish(run_id, event, data)`
|
||||
2. `publish_end(run_id)`
|
||||
3. `publish_terminal(run_id, kind, data)`
|
||||
4. `subscribe(run_id, last_event_id, heartbeat_interval)`
|
||||
5. `cleanup(run_id, delay=0)`
|
||||
6. `cancel(run_id)`
|
||||
7. `mark_awaiting_input(run_id)`
|
||||
8. `start()`
|
||||
9. `close()`
|
||||
|
||||
Public types include:
|
||||
|
||||
1. `StreamEvent`
|
||||
2. `StreamStatus`
|
||||
3. `ResumeResult`
|
||||
4. `HEARTBEAT_SENTINEL`
|
||||
5. `END_SENTINEL`
|
||||
6. `CANCELLED_SENTINEL`
|
||||
|
||||
### 4.3 Semantic Boundary
|
||||
|
||||
The contract explicitly distinguishes:
|
||||
|
||||
1. `end` / `cancel` / `error`
|
||||
- Real business-level terminal events for a run
|
||||
2. `close()`
|
||||
- Bridge-level shutdown
|
||||
- Not equivalent to run cancellation
|
||||
|
||||
### 4.4 Current Implementation Style
|
||||
|
||||
The concrete implementation currently used is the app-layer [`MemoryStreamBridge`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/stream_bridge/adapters/memory.py).
|
||||
|
||||
Its design is effectively “one in-memory event log per run”:
|
||||
|
||||
1. `_RunStream` stores the event list, offset mapping, status, subscriber count, and awaiting-input state
|
||||
2. `publish()` generates increasing event IDs and appends to the per-run log
|
||||
3. `subscribe()` supports replay, heartbeat, resume, and terminal exit
|
||||
4. `cleanup_loop()` handles:
|
||||
- old streams
|
||||
- active streams with no publish activity
|
||||
- orphan terminal streams
|
||||
- TTL expiration
|
||||
5. `mark_awaiting_input()` extends timeout behavior for HITL flows
|
||||
|
||||
The Redis implementation is still only a placeholder in [`RedisStreamBridge`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/stream_bridge/adapters/redis.py).
|
||||
|
||||
### 4.5 Call Chain
|
||||
|
||||
The stream bridge participates in the execution chain like this:
|
||||
|
||||
```text
|
||||
RunsFacade
|
||||
-> RunStreamService
|
||||
-> StreamBridge
|
||||
-> app route converts events to SSE
|
||||
```
|
||||
|
||||
More concretely:
|
||||
|
||||
1. `_RunExecution._start()` publishes `metadata`
|
||||
2. `_RunExecution._stream()` converts agent `astream()` output into bridge events
|
||||
3. `_RunExecution._finish_success()` / `_finish_failed()` / `_finish_aborted()` publish terminal events
|
||||
4. `RunWaitService` waits by subscribing for `values`, `error`, or terminal events
|
||||
5. The app route layer converts those events into outward-facing SSE
|
||||
|
||||
### 4.6 Future Extensions
|
||||
|
||||
Likely future directions include:
|
||||
|
||||
1. A real Redis bridge for cross-process / multi-instance streaming
|
||||
2. Stronger Last-Event-ID gap recovery behavior
|
||||
3. Richer HITL state handling
|
||||
4. Cross-node run coordination and more explicit dead-letter strategies
|
||||
|
||||
## 5. External Communication and Store Read/Write Boundaries
|
||||
|
||||
### 5.1 Two Main Outward Boundaries
|
||||
|
||||
`runtime` does not send HTTP requests directly and does not write ORM models directly, but it communicates outward through two main boundaries:
|
||||
|
||||
1. `StreamBridge`
|
||||
- For outward-facing stream events
|
||||
2. `store` / `observer`
|
||||
- For durable data and lifecycle side effects
|
||||
|
||||
### 5.2 Store Boundary Protocols
|
||||
|
||||
Under [`runs/store`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/store), the harness layer defines:
|
||||
|
||||
1. `RunCreateStore`
|
||||
2. `RunQueryStore`
|
||||
3. `RunDeleteStore`
|
||||
4. `RunEventStore`
|
||||
|
||||
These are not harness-internal persistence implementations. They are app-facing contracts declared by the runtime.
|
||||
|
||||
### 5.3 How the app Layer Supplies Store Implementations
|
||||
|
||||
The app layer currently provides:
|
||||
|
||||
1. [`AppRunCreateStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/store/create_store.py)
|
||||
2. [`AppRunQueryStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/store/query_store.py)
|
||||
3. [`AppRunDeleteStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/store/delete_store.py)
|
||||
4. [`AppRunEventStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/storage/run_events.py)
|
||||
5. [`JsonlRunEventStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/run_events/jsonl_store.py)
|
||||
|
||||
The shared pattern is:
|
||||
|
||||
1. harness depends only on protocols
|
||||
2. the app layer owns session lifecycle, commit behavior, access control, and backend choice
|
||||
3. durable data eventually lands in `store.repositories.*` or JSONL files
|
||||
|
||||
### 5.4 How Run Lifecycle Data Leaves the Runtime
|
||||
|
||||
The single-run executor [`_RunExecution`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/internal/execution/executor.py) does not write to the database directly.
|
||||
|
||||
It exports data through three paths:
|
||||
|
||||
1. bridge events
|
||||
- Streamed outward to subscribers
|
||||
2. callback -> `RunEventStore`
|
||||
- Execution trace / message / tool / custom events are persisted in batches
|
||||
3. lifecycle event -> `RunObserver`
|
||||
- Run started, completed, failed, cancelled, and thread-status updates are emitted for app observers
|
||||
|
||||
### 5.5 `RunEventStore` Backends
|
||||
|
||||
The app-side factory [`app/infra/run_events/factory.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/run_events/factory.py) currently selects:
|
||||
|
||||
1. `run_events.backend == "db"`
|
||||
- `AppRunEventStore`
|
||||
2. `run_events.backend == "jsonl"`
|
||||
- `JsonlRunEventStore`
|
||||
|
||||
So the runtime does not care whether events end up in a database or in files. It only requires the event-store protocol.
|
||||
|
||||
## 6. Run Lifecycle Data, Callbacks, Write-Back, and Query Flow
|
||||
|
||||
### 6.1 Main Single-Run Flow
|
||||
|
||||
The main `_RunExecution.run()` flow is:
|
||||
|
||||
1. `_start()`
|
||||
2. `_prepare()`
|
||||
3. `_stream()`
|
||||
4. `_finish_after_stream()`
|
||||
5. `finally`
|
||||
- `_emit_final_thread_status()`
|
||||
- `callbacks.flush()`
|
||||
- `bridge.cleanup(run_id)`
|
||||
|
||||
### 6.2 What the Start Phase Records
|
||||
|
||||
`_start()`:
|
||||
|
||||
1. sets run status to `running`
|
||||
2. emits `RUN_STARTED`
|
||||
3. extracts the first human message and emits `HUMAN_MESSAGE`
|
||||
4. captures the pre-run checkpoint ID
|
||||
5. publishes a `metadata` stream event
|
||||
|
||||
### 6.3 What the Callbacks Collect
|
||||
|
||||
Callbacks live under [`runs/callbacks`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/callbacks).
|
||||
|
||||
The main ones are:
|
||||
|
||||
1. `RunEventCallback`
|
||||
- Records `run_start`, `run_end`, `llm_request`, `llm_response`, `tool_start`, `tool_end`, `tool_result`, `custom_event`, and more
|
||||
- Flushes batches into `RunEventStore`
|
||||
2. `RunTokenCallback`
|
||||
- Aggregates token usage, LLM call counts, lead/subagent/middleware token split, message counts, first human message, and last AI message
|
||||
3. `RunTitleCallback`
|
||||
- Extracts thread title from title middleware output or custom events
|
||||
|
||||
### 6.4 How completion_data Is Produced
|
||||
|
||||
`RunTokenCallback.completion_data()` yields `RunCompletionData`, including:
|
||||
|
||||
1. `total_input_tokens`
|
||||
2. `total_output_tokens`
|
||||
3. `total_tokens`
|
||||
4. `llm_call_count`
|
||||
5. `lead_agent_tokens`
|
||||
6. `subagent_tokens`
|
||||
7. `middleware_tokens`
|
||||
8. `message_count`
|
||||
9. `last_ai_message`
|
||||
10. `first_human_message`
|
||||
|
||||
The executor includes this data in lifecycle payloads on success, failure, and cancellation.
|
||||
|
||||
### 6.5 How the app Layer Writes Lifecycle Results Back
|
||||
|
||||
The executor emits `RunLifecycleEvent` objects through [`RunEventEmitter`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/internal/execution/events.py).
|
||||
|
||||
The app-layer [`StorageRunObserver`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/storage/runs.py) then persists durable state:
|
||||
|
||||
1. `RUN_STARTED`
|
||||
- Marks the run as `running`
|
||||
2. `RUN_COMPLETED`
|
||||
- Writes completion data
|
||||
- Syncs thread title if present
|
||||
3. `RUN_FAILED`
|
||||
- Writes error and completion data
|
||||
4. `RUN_CANCELLED`
|
||||
- Writes `interrupted` state and completion data
|
||||
5. `THREAD_STATUS_UPDATED`
|
||||
- Syncs thread status
|
||||
|
||||
### 6.6 Query Paths
|
||||
|
||||
`RunsFacade.get_run()` and `list_runs()` have two paths:
|
||||
|
||||
1. If a `RunQueryStore` is injected, durable state is used first
|
||||
2. Otherwise, the facade falls back to `RunRegistry`
|
||||
|
||||
So:
|
||||
|
||||
1. the in-memory registry is the control plane
|
||||
2. the durable store is the preferred query surface
|
||||
|
||||
## 7. How actor_context Is Dynamically Injected for User Isolation
|
||||
|
||||
### 7.1 Design Goal
|
||||
|
||||
`actor_context` is defined in [`actor_context.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/actor_context.py).
|
||||
|
||||
Its purpose is to let the runtime and lower-level infrastructure modules depend on a stable notion of “who the current actor is” without importing the auth plugin, FastAPI request objects, or a specific user model.
|
||||
|
||||
### 7.2 Current Implementation
|
||||
|
||||
The current implementation is a request/task-scoped context built on top of `ContextVar`:
|
||||
|
||||
1. `ActorContext`
|
||||
- Currently carries only `user_id`
|
||||
2. `_current_actor`
|
||||
- A `ContextVar[ActorContext | None]`
|
||||
3. `bind_actor_context(actor)`
|
||||
- Binds the current actor
|
||||
4. `reset_actor_context(token)`
|
||||
- Restores the previous context
|
||||
5. `get_actor_context()`
|
||||
- Returns the current actor
|
||||
6. `get_effective_user_id()`
|
||||
- Returns the current user ID or `DEFAULT_USER_ID`
|
||||
7. `resolve_user_id(value=AUTO | explicit | None)`
|
||||
- Resolves repository/storage-facing user IDs consistently
|
||||
|
||||
### 7.3 How the app Layer Injects It Dynamically
|
||||
|
||||
Dynamic injection currently happens at the app/auth boundary.
|
||||
|
||||
For HTTP request flows:
|
||||
|
||||
1. [`app.plugins.auth.security.middleware`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/plugins/auth/security/middleware.py)
|
||||
- Builds `ActorContext(user_id=...)` from the authenticated request user
|
||||
- Binds and resets runtime actor context around request handling
|
||||
2. [`app.plugins.auth.security.actor_context`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/plugins/auth/security/actor_context.py)
|
||||
- Provides `bind_request_actor_context(request)` and `bind_user_actor_context(user_id)`
|
||||
- Allows routes and non-HTTP entry points to bind runtime actor context explicitly
|
||||
|
||||
For non-HTTP / external channel flows:
|
||||
|
||||
1. [`app/channels/manager.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/channels/manager.py)
|
||||
2. [`app/channels/feishu.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/channels/feishu.py)
|
||||
|
||||
Those entry points also wrap execution with `bind_user_actor_context(user_id)` before they enter runtime-facing code. This matters because:
|
||||
|
||||
1. the runtime does not need to distinguish HTTP from Feishu or other channels
|
||||
2. any entry point that can resolve a user ID can inject the same isolation semantics
|
||||
3. the same runtime/store/path/memory code can stay protocol-agnostic
|
||||
|
||||
So the runtime itself does not know what a request is, and it does not know the auth plugin’s user model. It only knows whether an `ActorContext` is currently bound in the `ContextVar`.
|
||||
|
||||
### 7.4 Propagation Semantics After Injection
|
||||
|
||||
In practice, “dynamic injection” here does not mean manually threading `user_id` through every function signature. The app boundary binds the actor into a `ContextVar`, and runtime-facing code reads it only where isolation is actually needed.
|
||||
|
||||
The current semantics are:
|
||||
|
||||
1. an entry boundary calls `bind_actor_context(...)`
|
||||
2. the async call chain created inside that context sees the same actor view
|
||||
3. the boundary restores the previous value with `reset_actor_context(token)` when the request/task exits
|
||||
|
||||
That gives two practical outcomes:
|
||||
|
||||
1. most runtime interfaces do not need to carry `user_id` as an explicit parameter through every layer
|
||||
2. boundaries that do need durable isolation or path isolation can still read explicitly via `resolve_user_id()` or `get_effective_user_id()`
|
||||
|
||||
### 7.5 How User Isolation Actually Works
|
||||
|
||||
User isolation is implemented through “dynamic injection + boundary-specific reads”.
|
||||
|
||||
The main paths are:
|
||||
|
||||
1. path / uploads / sandbox / memory
|
||||
- Use `get_effective_user_id()` to derive per-user directories and resource scopes
|
||||
2. app storage adapters
|
||||
- Use `resolve_user_id(AUTO)` in `RunStoreAdapter`, `ThreadMetaStorage`, and related boundaries
|
||||
3. run event store
|
||||
- `AppRunEventStore` reads `get_actor_context()` and decides whether the current actor may see a thread
|
||||
|
||||
So user isolation is not centralized in a single middleware and then forgotten. Instead:
|
||||
|
||||
1. the app boundary dynamically binds the actor into runtime context
|
||||
2. runtime and lower layers read that context when they need isolation input
|
||||
3. each boundary applies the user ID according to its own responsibility
|
||||
|
||||
### 7.6 Why This Approach Works Well
|
||||
|
||||
The current design has several practical strengths:
|
||||
|
||||
1. The runtime does not depend on a specific auth implementation
|
||||
2. HTTP and non-HTTP entry points can reuse the same isolation mechanism
|
||||
3. The same user ID propagates naturally into paths, memory, store access, and event visibility
|
||||
4. Where stronger enforcement is needed, `AUTO` + `resolve_user_id()` can require a bound actor context
|
||||
|
||||
### 7.7 Future Extensions
|
||||
|
||||
`ActorContext` already contains explicit future-extension hints. The current pattern can be extended without changing the architecture:
|
||||
|
||||
1. `tenant_id`
|
||||
- For multi-tenant isolation
|
||||
2. `subject_id`
|
||||
- For a more stable identity key
|
||||
3. `scopes`
|
||||
- For finer-grained authorization
|
||||
4. `auth_source`
|
||||
- To track the source channel or auth mechanism
|
||||
|
||||
The recommended extension model is to preserve the current shape:
|
||||
|
||||
1. The app/auth boundary binds a richer `ActorContext`
|
||||
2. The runtime depends only on abstract context fields, never on request/user objects
|
||||
3. Lower layers read only the fields they actually need
|
||||
4. Store / path / sandbox / stream / memory boundaries can gradually become tenant-aware or scope-aware
|
||||
|
||||
More concretely, stronger isolation can be added incrementally at the boundaries:
|
||||
|
||||
1. store boundaries
|
||||
- add `tenant_id` filtering in `RunStoreAdapter`, `ThreadMetaStorage`, and feedback/event stores
|
||||
2. path and sandbox boundaries
|
||||
- shard directories by `tenant_id/user_id` instead of `user_id` alone
|
||||
3. event-visibility boundaries
|
||||
- layer `scopes` or `subject_id` checks into run-event and thread queries
|
||||
4. external-channel boundaries
|
||||
- populate `auth_source` so API, channel, and internal-job traffic can be distinguished
|
||||
|
||||
That keeps the runtime dependent on the abstract “current actor context” concept, not on FastAPI request objects or a specific auth implementation.
|
||||
|
||||
## 8. Interaction with the app Layer
|
||||
|
||||
### 8.1 How the app Layer Wires the Runtime
|
||||
|
||||
The app composition root for runs is [`app/gateway/services/runs/facade_factory.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/facade_factory.py).
|
||||
|
||||
It assembles:
|
||||
|
||||
1. `RunRegistry`
|
||||
2. `ExecutionPlanner`
|
||||
3. `RunSupervisor`
|
||||
4. `RunStreamService`
|
||||
5. `RunWaitService`
|
||||
6. `RunsRuntime`
|
||||
- `bridge`
|
||||
- `checkpointer`
|
||||
- `store`
|
||||
- `event_store`
|
||||
- `agent_factory_resolver`
|
||||
7. `StorageRunObserver`
|
||||
8. `AppRunCreateStore`
|
||||
9. `AppRunQueryStore`
|
||||
10. `AppRunDeleteStore`
|
||||
|
||||
### 8.2 How app.state Provides Infrastructure
|
||||
|
||||
In [`app/gateway/registrar.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/registrar.py):
|
||||
|
||||
1. `init_persistence()` creates:
|
||||
- `persistence`
|
||||
- `checkpointer`
|
||||
- `run_store`
|
||||
- `thread_meta_storage`
|
||||
- `run_event_store`
|
||||
2. `init_runtime()` creates:
|
||||
- `stream_bridge`
|
||||
|
||||
Those objects are then attached to `app.state` for dependency injection and facade construction.
|
||||
|
||||
### 8.3 The app Boundary for `stream_bridge`
|
||||
|
||||
Concrete stream bridge construction now belongs entirely to the app layer:
|
||||
|
||||
1. harness exports only the `StreamBridge` contract
|
||||
2. [`app.infra.stream_bridge.build_stream_bridge`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/stream_bridge/factory.py) constructs the actual implementation
|
||||
|
||||
That is a very explicit boundary:
|
||||
|
||||
1. harness defines runtime semantics and interfaces
|
||||
2. app selects and constructs infrastructure
|
||||
|
||||
## 9. Summary
|
||||
|
||||
The most accurate one-line summary of `deerflow.runtime` today is:
|
||||
|
||||
It is a runtime kernel built around run orchestration, a stream bridge as the streaming boundary, actor context as the dynamic isolation bridge, and store / observer protocols as the durable and side-effect boundaries.
|
||||
|
||||
More concretely:
|
||||
|
||||
1. `runs` owns orchestration and lifecycle progression
|
||||
2. `stream_bridge` owns stream semantics
|
||||
3. `actor_context` owns runtime-scoped user context and isolation bridging
|
||||
4. `serialization` / `converters` own outward event and message formatting
|
||||
5. the app layer owns real persistence, stream infrastructure, and auth-driven context injection
|
||||
|
||||
The main strengths of this structure are:
|
||||
|
||||
1. Runtime semantics are decoupled from infrastructure implementations
|
||||
2. Request identity is decoupled from runtime logic
|
||||
3. HTTP, CLI, and channel-worker entry points can reuse the same runtime boundaries
|
||||
4. The system can grow toward multi-tenancy, cross-process stream bridges, and richer durable backends without changing the core model
|
||||
|
||||
The current limitations are also clear:
|
||||
|
||||
1. `RunRegistry` is still an in-process control plane
|
||||
2. The Redis bridge is not implemented yet
|
||||
3. Some multitask strategies and batch capabilities are still outside the main path
|
||||
4. `ActorContext` currently carries only `user_id`, not richer fields such as tenant, scopes, or auth source
|
||||
|
||||
So the best way to understand the current code is not as a final platform, but as a runtime kernel with clear semantics and extension boundaries.
|
||||
584
backend/packages/harness/deerflow/runtime/README_zh.md
Normal file
584
backend/packages/harness/deerflow/runtime/README_zh.md
Normal file
@ -0,0 +1,584 @@
|
||||
# deerflow.runtime 设计说明
|
||||
|
||||
本文基于当前代码实现,说明 `backend/packages/harness/deerflow/runtime` 的总体设计、约束边界、`stream_bridge` 与 `runs` 的协作方式、与外部基础设施和 `app` 层的交互方式,以及 `actor_context` 如何通过动态注入实现用户隔离。
|
||||
|
||||
## 1. 总体定位
|
||||
|
||||
`deerflow.runtime` 是 DeerFlow 的运行时内核层。它位于 agent / tool / middleware 之下、app / gateway / infra 之上,主要负责定义“运行时语义”和“基础边界契约”,而不直接拥有 Web 接口、数据库模型或具体基础设施实现。
|
||||
|
||||
当前 `runtime` 的公开表面由 [`__init__.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/__init__.py) 统一导出,主要包括四类能力:
|
||||
|
||||
1. `runs`
|
||||
- run 领域类型、执行 façade、生命周期观察者、store 协议
|
||||
2. `stream_bridge`
|
||||
- 流式事件桥接契约与公共类型
|
||||
3. `actor_context`
|
||||
- 请求/任务级的 actor 上下文与用户隔离桥
|
||||
4. `serialization`
|
||||
- 运行时对外事件与 LangChain / LangGraph 数据的序列化能力
|
||||
|
||||
从结构上看,可以把当前 `runtime` 理解成:
|
||||
|
||||
```text
|
||||
runtime
|
||||
├─ runs
|
||||
│ ├─ facade / types / observer / store
|
||||
│ ├─ internal/*
|
||||
│ └─ callbacks/*
|
||||
├─ stream_bridge
|
||||
│ ├─ contract
|
||||
│ └─ exceptions
|
||||
├─ actor_context
|
||||
└─ serialization / converters
|
||||
```
|
||||
|
||||
## 2. 总体设计与约束范式
|
||||
|
||||
### 2.1 设计目标
|
||||
|
||||
`runtime` 当前最核心的设计目标是把“运行时控制面”和“基础设施实现”解耦。
|
||||
|
||||
它自己只关心:
|
||||
|
||||
1. run 是什么、状态如何变化
|
||||
2. 执行时会产出哪些生命周期事件和流式事件
|
||||
3. 哪些能力必须由外部注入,例如 checkpointer、event store、stream bridge、durable store
|
||||
4. 当前 actor 是谁,以及下游如何据此做隔离
|
||||
|
||||
它刻意不关心:
|
||||
|
||||
1. 事件是落到内存、Redis 还是别的消息介质
|
||||
2. run / thread / feedback 是怎么持久化的
|
||||
3. HTTP / SSE / FastAPI 细节
|
||||
4. 认证插件如何识别 request user
|
||||
|
||||
### 2.2 约束边界
|
||||
|
||||
当前 `runtime` 的边界约束比较明确:
|
||||
|
||||
1. `runs` 负责运行编排,不直接写 ORM 或 SQL。
|
||||
2. `stream_bridge` 只定义流语义,不提供 app 级基础设施装配。
|
||||
3. `actor_context` 只定义运行时上下文,不依赖 auth plugin。
|
||||
4. durable 数据只能通过协议边界接入:
|
||||
- `RunCreateStore`
|
||||
- `RunQueryStore`
|
||||
- `RunDeleteStore`
|
||||
- `RunEventStore`
|
||||
5. 生命周期副作用只能通过 `RunObserver` 接入。
|
||||
6. 用户隔离不是散落在每个模块里做,而是通过 actor context 自上而下传递。
|
||||
|
||||
这套范式可以概括成一句话:
|
||||
|
||||
`runtime` 定义语义和边界,`app.infra` 提供实现和装配。
|
||||
|
||||
## 3. runs 子系统的设计
|
||||
|
||||
### 3.1 作用
|
||||
|
||||
`runtime/runs` 是运行编排域。它负责:
|
||||
|
||||
1. 定义 run 的领域对象与状态机
|
||||
2. 组织 create / stream / wait / join / cancel / delete 等操作
|
||||
3. 维护进程内运行控制面
|
||||
4. 在执行期间发出流式事件与生命周期事件
|
||||
5. 通过 callbacks 收集 trace、token、title、message 等运行数据
|
||||
|
||||
### 3.2 核心对象
|
||||
|
||||
见 [`runs/types.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/types.py)。
|
||||
|
||||
关键对象有:
|
||||
|
||||
1. `RunSpec`
|
||||
- 由 app 输入层构建,是执行器输入
|
||||
2. `RunRecord`
|
||||
- 运行中的记录对象,由 `RunRegistry` 管理
|
||||
3. `RunStatus`
|
||||
- `pending` / `starting` / `running` / `success` / `error` / `interrupted` / `timeout`
|
||||
4. `RunScope`
|
||||
- 区分 stateful / stateless 与临时 thread
|
||||
|
||||
### 3.3 当前约束
|
||||
|
||||
当前 `runs` 明确限制了一些能力范围:
|
||||
|
||||
1. `multitask_strategy` 当前主路径只支持 `reject` 和 `interrupt`
|
||||
2. `enqueue`、`after_seconds`、批量执行等尚未进入当前主路径
|
||||
3. `RunRegistry` 是进程内状态,不是 durable source of truth
|
||||
4. 外部查询可以走 durable store,但控制面仍然以内存 registry 为中心
|
||||
|
||||
### 3.4 façade 与内部组件
|
||||
|
||||
`RunsFacade` 在 [`runs/facade.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/facade.py) 中暴露统一入口:
|
||||
|
||||
1. `create_background`
|
||||
2. `create_and_stream`
|
||||
3. `create_and_wait`
|
||||
4. `join_stream`
|
||||
5. `join_wait`
|
||||
6. `cancel`
|
||||
7. `get_run`
|
||||
8. `list_runs`
|
||||
9. `delete_run`
|
||||
|
||||
它底层组合了:
|
||||
|
||||
1. `RunRegistry`
|
||||
2. `ExecutionPlanner`
|
||||
3. `RunSupervisor`
|
||||
4. `RunStreamService`
|
||||
5. `RunWaitService`
|
||||
6. `RunCreateStore` / `RunQueryStore` / `RunDeleteStore`
|
||||
7. `RunObserver`
|
||||
|
||||
也就是说,`RunsFacade` 是 public entry point,而真正的执行和状态推进拆散在内部组件中。
|
||||
|
||||
## 4. stream_bridge 的设计和实现思路
|
||||
|
||||
### 4.1 为什么单独抽象
|
||||
|
||||
`StreamBridge` 在 [`stream_bridge/contract.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/stream_bridge/contract.py) 中定义。
|
||||
|
||||
把它单独抽象出来的原因是:run 执行期间需要一个“可订阅、可回放、可终止、可恢复”的事件通道,而这件事不能直接绑定到 HTTP SSE、in-memory queue 或 Redis 细节。
|
||||
|
||||
所以:
|
||||
|
||||
1. harness 负责定义流语义
|
||||
2. app 层负责选择和实现流后端
|
||||
|
||||
### 4.2 契约内容
|
||||
|
||||
`StreamBridge` 当前提供这些关键方法:
|
||||
|
||||
1. `publish(run_id, event, data)`
|
||||
2. `publish_end(run_id)`
|
||||
3. `publish_terminal(run_id, kind, data)`
|
||||
4. `subscribe(run_id, last_event_id, heartbeat_interval)`
|
||||
5. `cleanup(run_id, delay=0)`
|
||||
6. `cancel(run_id)`
|
||||
7. `mark_awaiting_input(run_id)`
|
||||
8. `start()`
|
||||
9. `close()`
|
||||
|
||||
公共类型包括:
|
||||
|
||||
1. `StreamEvent`
|
||||
2. `StreamStatus`
|
||||
3. `ResumeResult`
|
||||
4. `HEARTBEAT_SENTINEL`
|
||||
5. `END_SENTINEL`
|
||||
6. `CANCELLED_SENTINEL`
|
||||
|
||||
### 4.3 语义边界
|
||||
|
||||
当前契约显式区分了两类终止语义:
|
||||
|
||||
1. `end` / `cancel` / `error`
|
||||
- 是 run 级别的真实业务终止事件
|
||||
2. `close()`
|
||||
- 是 bridge 自身关闭
|
||||
- 不应被当作 run 被取消
|
||||
|
||||
### 4.4 当前实现方式
|
||||
|
||||
当前实际使用的实现是 app 层的 [`MemoryStreamBridge`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/stream_bridge/adapters/memory.py)。
|
||||
|
||||
它的设计是“每个 run 一条内存事件日志”:
|
||||
|
||||
1. `_RunStream` 保存事件列表、offset 映射、状态、subscriber 计数和 awaiting-input 标记
|
||||
2. `publish()` 生成递增事件 ID 并追加到 per-run log
|
||||
3. `subscribe()` 支持 replay、heartbeat、resume、terminal 退出
|
||||
4. `cleanup_loop()` 处理:
|
||||
- 过老 stream
|
||||
- 长时间无 publish 的 active stream
|
||||
- orphan terminal stream
|
||||
- TTL 过期 stream
|
||||
5. `mark_awaiting_input()` 为 HITL 场景延长超时
|
||||
|
||||
Redis 版本当前仍在 [`RedisStreamBridge`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/stream_bridge/adapters/redis.py) 中作为占位。
|
||||
|
||||
### 4.5 调用链路
|
||||
|
||||
stream bridge 在运行链路中的作用可以概括为:
|
||||
|
||||
```text
|
||||
RunsFacade
|
||||
-> RunStreamService
|
||||
-> StreamBridge
|
||||
-> app route converts events to SSE
|
||||
```
|
||||
|
||||
更具体地说:
|
||||
|
||||
1. `_RunExecution._start()` 会发布 `metadata`
|
||||
2. `_RunExecution._stream()` 会把 agent 的 `astream()` 输出统一转成 bridge 事件
|
||||
3. `_RunExecution._finish_success()` / `_finish_failed()` / `_finish_aborted()` 会发布 terminal 事件
|
||||
4. `RunWaitService` 通过 `subscribe()` 等待 `values` / `error` / terminal
|
||||
5. app 路由层再把这些事件转换为对外 SSE
|
||||
|
||||
### 4.6 后续扩展
|
||||
|
||||
后续可以沿几个方向扩展:
|
||||
|
||||
1. Redis 真正落地,支持跨进程 / 多实例流桥接
|
||||
2. 更完整的 Last-Event-ID gap recovery
|
||||
3. 更细粒度的 HITL 状态管理
|
||||
4. 跨节点运行协调和 dead-letter 策略
|
||||
|
||||
## 5. 如何与外部通信,store 如何读写数据
|
||||
|
||||
### 5.1 两条主要外部边界
|
||||
|
||||
`runtime` 自身不直接发 HTTP 请求,也不直接写 ORM,但通过两条主边界与外界交互:
|
||||
|
||||
1. `StreamBridge`
|
||||
- 对外输出流式运行事件
|
||||
2. `store` / `observer`
|
||||
- 对外输出 durable 数据与生命周期副作用
|
||||
|
||||
### 5.2 store 边界协议
|
||||
|
||||
在 [`runs/store`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/store) 中定义了四个协议:
|
||||
|
||||
1. `RunCreateStore`
|
||||
2. `RunQueryStore`
|
||||
3. `RunDeleteStore`
|
||||
4. `RunEventStore`
|
||||
|
||||
这些协议不是 harness 内部的数据层,而是 harness 对 app 层的依赖声明。
|
||||
|
||||
### 5.3 app 层如何提供 store 实现
|
||||
|
||||
当前 app 层提供了这些实现:
|
||||
|
||||
1. [`AppRunCreateStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/store/create_store.py)
|
||||
2. [`AppRunQueryStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/store/query_store.py)
|
||||
3. [`AppRunDeleteStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/store/delete_store.py)
|
||||
4. [`AppRunEventStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/storage/run_events.py)
|
||||
5. [`JsonlRunEventStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/run_events/jsonl_store.py)
|
||||
|
||||
这里的统一模式是:
|
||||
|
||||
1. harness 只看协议
|
||||
2. app 层自己决定 session、commit、访问控制和后端选型
|
||||
3. durable 数据最终通过 `store.repositories.*` 落数据库,或者通过 JSONL 落盘
|
||||
|
||||
### 5.4 runs 生命周期数据是怎么写出去的
|
||||
|
||||
单次执行器 [`_RunExecution`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/internal/execution/executor.py) 不直接写数据库。
|
||||
|
||||
它把数据写出去的方式有三条:
|
||||
|
||||
1. bridge 事件
|
||||
- 流式发布给订阅者
|
||||
2. callback -> `RunEventStore`
|
||||
- 执行 trace / message / tool / custom event 以批次方式落地
|
||||
3. lifecycle event -> `RunObserver`
|
||||
- 把 run 开始、完成、失败、取消、thread status 更新发给 app 层观察者
|
||||
|
||||
### 5.5 `RunEventStore` 的后端
|
||||
|
||||
`RunEventStore` 当前由 app 层工厂 [`app/infra/run_events/factory.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/run_events/factory.py) 统一构造:
|
||||
|
||||
1. `run_events.backend == "db"`
|
||||
- 走 `AppRunEventStore`
|
||||
2. `run_events.backend == "jsonl"`
|
||||
- 走 `JsonlRunEventStore`
|
||||
|
||||
因此,`runtime` 不关心事件最终是数据库还是文件,它只要求支持 `put_batch()` 和相关读取方法。
|
||||
|
||||
## 6. runs 生命周期数据、callback 和查询回写
|
||||
|
||||
### 6.1 单次 run 的主流程
|
||||
|
||||
`_RunExecution.run()` 的主流程是:
|
||||
|
||||
1. `_start()`
|
||||
2. `_prepare()`
|
||||
3. `_stream()`
|
||||
4. `_finish_after_stream()`
|
||||
5. `finally`
|
||||
- `_emit_final_thread_status()`
|
||||
- `callbacks.flush()`
|
||||
- `bridge.cleanup(run_id)`
|
||||
|
||||
### 6.2 start 阶段记录什么
|
||||
|
||||
`_start()` 会:
|
||||
|
||||
1. 把 run 状态置为 `running`
|
||||
2. 发出 `RUN_STARTED`
|
||||
3. 抽取首条 human message,并发出 `HUMAN_MESSAGE`
|
||||
4. 捕获 pre-run checkpoint id
|
||||
5. 发布 `metadata` 流事件
|
||||
|
||||
### 6.3 callbacks 收集什么
|
||||
|
||||
当前 callbacks 位于 [`runs/callbacks`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/callbacks)。
|
||||
|
||||
主要有三类:
|
||||
|
||||
1. `RunEventCallback`
|
||||
- 记录 run_start / run_end / llm_request / llm_response / tool_start / tool_end / tool_result / custom_event 等
|
||||
- 按批 flush 到 `RunEventStore`
|
||||
2. `RunTokenCallback`
|
||||
- 聚合 token 使用、LLM 调用次数、lead/subagent/middleware token、message_count、首条 human message、最后一条 AI message
|
||||
3. `RunTitleCallback`
|
||||
- 从 title middleware 响应或 custom event 中提取 thread title
|
||||
|
||||
### 6.4 completion_data 如何形成
|
||||
|
||||
`RunTokenCallback.completion_data()` 会得到 `RunCompletionData`,包括:
|
||||
|
||||
1. `total_input_tokens`
|
||||
2. `total_output_tokens`
|
||||
3. `total_tokens`
|
||||
4. `llm_call_count`
|
||||
5. `lead_agent_tokens`
|
||||
6. `subagent_tokens`
|
||||
7. `middleware_tokens`
|
||||
8. `message_count`
|
||||
9. `last_ai_message`
|
||||
10. `first_human_message`
|
||||
|
||||
执行器在完成 / 失败 / 取消时都会把这份数据带入 lifecycle payload。
|
||||
|
||||
### 6.5 app 层如何回写
|
||||
|
||||
执行器通过 [`RunEventEmitter`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/internal/execution/events.py) 发出 `RunLifecycleEvent`。
|
||||
|
||||
app 层 [`StorageRunObserver`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/storage/runs.py) 再根据事件类型回写 durable 状态:
|
||||
|
||||
1. `RUN_STARTED`
|
||||
- 更新 run 状态为 `running`
|
||||
2. `RUN_COMPLETED`
|
||||
- 写 completion_data
|
||||
- 同步 title 到 thread metadata
|
||||
3. `RUN_FAILED`
|
||||
- 写 error 和 completion_data
|
||||
4. `RUN_CANCELLED`
|
||||
- 写 `interrupted` 状态与 completion_data
|
||||
5. `THREAD_STATUS_UPDATED`
|
||||
- 同步 thread status
|
||||
|
||||
### 6.6 查询路径
|
||||
|
||||
`RunsFacade.get_run()` / `list_runs()` 有两条路径:
|
||||
|
||||
1. 注入了 `RunQueryStore` 时,优先查 durable store
|
||||
2. 否则回退到 `RunRegistry`
|
||||
|
||||
这意味着:
|
||||
|
||||
1. 内存 registry 负责控制面
|
||||
2. durable store 负责对外查询面
|
||||
|
||||
## 7. actor_context 如何动态注入并实现用户隔离
|
||||
|
||||
### 7.1 设计目标
|
||||
|
||||
`actor_context` 在 [`actor_context.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/actor_context.py) 中定义。
|
||||
|
||||
它的目标是让 runtime 和下游基础模块可以依赖“当前 actor 是谁”这个运行时事实,但不直接依赖 auth plugin、FastAPI request 或具体用户模型。
|
||||
|
||||
### 7.2 当前实现方式
|
||||
|
||||
当前实现是一个基于 `ContextVar` 的请求/任务级上下文:
|
||||
|
||||
1. `ActorContext`
|
||||
- 当前只有 `user_id`
|
||||
2. `_current_actor`
|
||||
- `ContextVar[ActorContext | None]`
|
||||
3. `bind_actor_context(actor)`
|
||||
- 绑定当前 actor
|
||||
4. `reset_actor_context(token)`
|
||||
- 恢复之前上下文
|
||||
5. `get_actor_context()`
|
||||
- 获取当前 actor
|
||||
6. `get_effective_user_id()`
|
||||
- 取当前 user_id,如果没有则返回 `DEFAULT_USER_ID`
|
||||
7. `resolve_user_id(value=AUTO | explicit | None)`
|
||||
- 在 repository / storage 边界统一解析 user_id
|
||||
|
||||
### 7.3 app 如何动态注入
|
||||
|
||||
动态注入链路当前在 auth plugin 侧完成。
|
||||
|
||||
HTTP 请求路径:
|
||||
|
||||
1. [`app.plugins.auth.security.middleware`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/plugins/auth/security/middleware.py)
|
||||
- 从认证后的 request user 构造 `ActorContext(user_id=...)`
|
||||
- 在请求处理期间绑定 / 重置 runtime actor context
|
||||
2. [`app.plugins.auth.security.actor_context`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/plugins/auth/security/actor_context.py)
|
||||
- 提供 `bind_request_actor_context(request)` 和 `bind_user_actor_context(user_id)`
|
||||
- 在路由或非 HTTP 入口中显式绑定 runtime actor
|
||||
|
||||
非 HTTP / 外部通道路径:
|
||||
|
||||
1. [`app/channels/manager.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/channels/manager.py)
|
||||
2. [`app/channels/feishu.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/channels/feishu.py)
|
||||
|
||||
这些入口在把外部消息转入 runtime 前,也会用 `bind_user_actor_context(user_id)` 包住执行过程。这样做的意义是:
|
||||
|
||||
1. runtime 不区分请求来自 HTTP、飞书还是别的 channel
|
||||
2. 只要入口能解析出 user_id,就能把同一套隔离语义注入进去
|
||||
3. 同一份 runtime/store/path/memory 代码不需要知道上层协议来源
|
||||
|
||||
因此 runtime 自己不知道 request 是什么,也不知道 auth plugin 的 user model 长什么样;它只知道当前 `ContextVar` 中是否绑定了 `ActorContext`。
|
||||
|
||||
### 7.4 注入后的传播语义
|
||||
|
||||
这里的“动态注入”本质上不是把 `user_id` 一层层作为函数参数硬传下去,而是在 app 边界把 actor 绑定进 `ContextVar`,让当前请求/任务上下文中的 runtime 代码按需读取。
|
||||
|
||||
当前语义可以理解为:
|
||||
|
||||
1. 入口边界先 `bind_actor_context(...)`
|
||||
2. 在该上下文内创建的异步调用链共享同一个 actor 视图
|
||||
3. 请求结束或任务退出后用 `reset_actor_context(token)` 恢复
|
||||
|
||||
这有两个直接效果:
|
||||
|
||||
1. 运行链路中的大部分接口不需要把 `user_id` 塞进每一层函数签名
|
||||
2. 真正需要 durable 隔离或路径隔离的边界,仍然可以通过 `resolve_user_id()` / `get_effective_user_id()` 显式取值
|
||||
|
||||
### 7.5 用户隔离如何生效
|
||||
|
||||
用户隔离当前是通过“动态注入 + 下游统一读取”实现的。
|
||||
|
||||
几条关键链路如下:
|
||||
|
||||
1. path / uploads / sandbox / memory
|
||||
- 通过 `get_effective_user_id()` 把 user_id 带入路径解析和目录隔离
|
||||
2. app storage adapter
|
||||
- 通过 `resolve_user_id(AUTO)` 在 `RunStoreAdapter`、`ThreadMetaStorage` 等处做查询和写入隔离
|
||||
3. run event store
|
||||
- `AppRunEventStore` 会读取 `get_actor_context()`,判断当前 actor 是否可见指定 thread
|
||||
|
||||
也就是说,用户隔离并不是靠单一中间件“一次性做完”,而是:
|
||||
|
||||
1. app 边界把 actor 动态绑定进 runtime context
|
||||
2. runtime 及其下游模块在需要时读取该 context
|
||||
3. 每个边界按自己的职责决定如何使用 user_id
|
||||
|
||||
### 7.6 这种方式的优点
|
||||
|
||||
当前设计有几个明显优点:
|
||||
|
||||
1. runtime 不依赖具体 auth 实现
|
||||
2. HTTP 和非 HTTP 入口都能复用同一套隔离机制
|
||||
3. user_id 可以自然传递到路径、memory、store、事件可见性等不同边界
|
||||
4. 需要强约束时可通过 `AUTO` + `resolve_user_id()` 强制要求 actor context 存在
|
||||
|
||||
### 7.7 后续如何扩展
|
||||
|
||||
`ActorContext` 文件里已经预留了扩展点注释,后续完全可以在不破坏当前模式的前提下继续扩展:
|
||||
|
||||
1. `tenant_id`
|
||||
- 用于多租户隔离
|
||||
2. `subject_id`
|
||||
- 用于更稳定的主体标识
|
||||
3. `scopes`
|
||||
- 用于更细粒度授权
|
||||
4. `auth_source`
|
||||
- 用于记录来源渠道
|
||||
|
||||
扩展方式建议保持现有模式不变:
|
||||
|
||||
1. 继续由 app/auth 边界负责绑定 richer `ActorContext`
|
||||
2. runtime 只依赖抽象上下文字段,不依赖 request/user 对象
|
||||
3. 下游基础模块按需读取必要字段
|
||||
4. 在 store / path / sandbox / stream / memory 等边界逐步引入 tenant-aware 或 scope-aware 行为
|
||||
|
||||
更具体地说,后续如果要做多租户和更强隔离,推荐按边界渐进式扩展:
|
||||
|
||||
1. store 边界
|
||||
- 在 `RunStoreAdapter`、`ThreadMetaStorage`、feedback/event store 中引入 `tenant_id` 过滤
|
||||
2. 路径与沙箱边界
|
||||
- 把目录分片从 `user_id` 扩展成 `tenant_id/user_id`
|
||||
3. 事件可见性边界
|
||||
- 在 run event 查询和 thread 查询时叠加 `scopes` 或 `subject_id`
|
||||
4. 外部通道边界
|
||||
- 为不同来源填充 `auth_source`,区分 API / channel / internal job
|
||||
|
||||
这样 runtime 仍然只依赖“当前 actor 上下文”这个抽象,不会重新耦合回 FastAPI request 或某个认证实现。
|
||||
|
||||
## 8. 与 app 层的交互
|
||||
|
||||
### 8.1 app 如何装配 runtime
|
||||
|
||||
当前 app 层会在 [`app/gateway/services/runs/facade_factory.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/facade_factory.py) 装配 `RunsFacade`。
|
||||
|
||||
它会组装:
|
||||
|
||||
1. `RunRegistry`
|
||||
2. `ExecutionPlanner`
|
||||
3. `RunSupervisor`
|
||||
4. `RunStreamService`
|
||||
5. `RunWaitService`
|
||||
6. `RunsRuntime`
|
||||
- `bridge`
|
||||
- `checkpointer`
|
||||
- `store`
|
||||
- `event_store`
|
||||
- `agent_factory_resolver`
|
||||
7. `StorageRunObserver`
|
||||
8. `AppRunCreateStore`
|
||||
9. `AppRunQueryStore`
|
||||
10. `AppRunDeleteStore`
|
||||
|
||||
### 8.2 app.state 如何提供基础设施
|
||||
|
||||
在 [`app/gateway/registrar.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/registrar.py):
|
||||
|
||||
1. `init_persistence()` 创建:
|
||||
- `persistence`
|
||||
- `checkpointer`
|
||||
- `run_store`
|
||||
- `thread_meta_storage`
|
||||
- `run_event_store`
|
||||
2. `init_runtime()` 创建:
|
||||
- `stream_bridge`
|
||||
|
||||
然后这些对象挂在 `app.state`,供依赖注入和 façade 构造使用。
|
||||
|
||||
### 8.3 `stream_bridge` 的 app 边界
|
||||
|
||||
当前具体 stream bridge 的装配已经完全属于 app 层:
|
||||
|
||||
1. harness 只导出 `StreamBridge` 契约
|
||||
2. 具体实现由 [`app.infra.stream_bridge.build_stream_bridge`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/stream_bridge/factory.py) 构造
|
||||
|
||||
这条边界非常清晰:
|
||||
|
||||
1. harness 定义运行语义和接口
|
||||
2. app 选择和构造基础设施实现
|
||||
|
||||
## 9. 设计总结
|
||||
|
||||
可以把当前 `deerflow.runtime` 总结为一句话:
|
||||
|
||||
它是一个“以 run orchestration 为核心、以 stream bridge 为流式边界、以 actor context 为动态隔离桥、以 store / observer 为 durable 与副作用边界”的运行时内核层。
|
||||
|
||||
更具体地说:
|
||||
|
||||
1. `runs` 负责编排和生命周期推进
|
||||
2. `stream_bridge` 负责流语义
|
||||
3. `actor_context` 负责运行时用户上下文和隔离桥
|
||||
4. `serialization` / `converters` 负责对外事件与消息格式转换
|
||||
5. app 层通过 infra 负责真正的持久化、流式基础设施和 auth 注入
|
||||
|
||||
这套结构的优势是:
|
||||
|
||||
1. 运行语义与基础设施实现解耦
|
||||
2. 请求身份与 runtime 逻辑解耦
|
||||
3. HTTP、CLI、channel worker 等多种入口都可以复用同一套 runtime 边界
|
||||
4. 后续可平滑扩展到多租户、跨进程 stream bridge、更多 durable backend
|
||||
|
||||
当前的主要限制也同样清楚:
|
||||
|
||||
1. `RunRegistry` 仍然是进程内控制面
|
||||
2. Redis bridge 仍未落地
|
||||
3. 一些多任务策略和批量能力仍未进入主路径
|
||||
4. `actor_context` 目前只携带 `user_id`,还没有 tenant / scopes / auth_source 等 richer context
|
||||
|
||||
因此,当前最准确的理解方式不是“最终态平台”,而是“已经具备清晰语义和扩展边界的 runtime kernel”。
|
||||
@ -5,42 +5,98 @@ Re-exports the public API of :mod:`~deerflow.runtime.runs` and
|
||||
directly from ``deerflow.runtime``.
|
||||
"""
|
||||
|
||||
from .checkpointer import checkpointer_context, get_checkpointer, make_checkpointer, reset_checkpointer
|
||||
from .runs import ConflictError, DisconnectMode, RunContext, RunManager, RunRecord, RunStatus, UnsupportedStrategyError, run_agent
|
||||
from .runs import (
|
||||
CallbackObserver,
|
||||
CompositeObserver,
|
||||
CancelAction,
|
||||
LifecycleEventType,
|
||||
NullObserver,
|
||||
ObserverBinding,
|
||||
ObserverLike,
|
||||
RunEventCallback,
|
||||
RunCreateStore,
|
||||
RunDeleteStore,
|
||||
RunEventStore,
|
||||
RunManager,
|
||||
RunRecord,
|
||||
RunQueryStore,
|
||||
RunScope,
|
||||
RunSpec,
|
||||
RunLifecycleEvent,
|
||||
RunObserver,
|
||||
RunResult,
|
||||
RunsFacade,
|
||||
RunStatus,
|
||||
WaitResult,
|
||||
ensure_observer,
|
||||
)
|
||||
from .actor_context import (
|
||||
AUTO,
|
||||
ActorContext,
|
||||
DEFAULT_USER_ID,
|
||||
bind_actor_context,
|
||||
get_actor_context,
|
||||
get_effective_user_id,
|
||||
require_actor_context,
|
||||
reset_actor_context,
|
||||
resolve_user_id,
|
||||
)
|
||||
from .serialization import serialize, serialize_channel_values, serialize_lc_object, serialize_messages_tuple
|
||||
from .store import get_store, make_store, reset_store, store_context
|
||||
from .stream_bridge import END_SENTINEL, HEARTBEAT_SENTINEL, MemoryStreamBridge, StreamBridge, StreamEvent, make_stream_bridge
|
||||
from .stream_bridge import (
|
||||
CANCELLED_SENTINEL,
|
||||
END_SENTINEL,
|
||||
HEARTBEAT_SENTINEL,
|
||||
StreamBridge,
|
||||
StreamEvent,
|
||||
StreamStatus,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# checkpointer
|
||||
"checkpointer_context",
|
||||
"get_checkpointer",
|
||||
"make_checkpointer",
|
||||
"reset_checkpointer",
|
||||
# runs
|
||||
"ConflictError",
|
||||
"DisconnectMode",
|
||||
"RunContext",
|
||||
# runs - hooks
|
||||
"RunsFacade",
|
||||
"RunCreateStore",
|
||||
"RunDeleteStore",
|
||||
"RunEventStore",
|
||||
"RunManager",
|
||||
"RunQueryStore",
|
||||
"CallbackObserver",
|
||||
"CompositeObserver",
|
||||
"ensure_observer",
|
||||
"LifecycleEventType",
|
||||
"NullObserver",
|
||||
"ObserverBinding",
|
||||
"ObserverLike",
|
||||
"RunEventCallback",
|
||||
"RunLifecycleEvent",
|
||||
"RunObserver",
|
||||
"RunResult",
|
||||
# runs - types
|
||||
"CancelAction",
|
||||
"RunScope",
|
||||
"RunSpec",
|
||||
"WaitResult",
|
||||
"RunRecord",
|
||||
"RunStatus",
|
||||
"UnsupportedStrategyError",
|
||||
"run_agent",
|
||||
# actor context
|
||||
"AUTO",
|
||||
"ActorContext",
|
||||
"DEFAULT_USER_ID",
|
||||
"bind_actor_context",
|
||||
"get_actor_context",
|
||||
"get_effective_user_id",
|
||||
"require_actor_context",
|
||||
"reset_actor_context",
|
||||
"resolve_user_id",
|
||||
# serialization
|
||||
"serialize",
|
||||
"serialize_channel_values",
|
||||
"serialize_lc_object",
|
||||
"serialize_messages_tuple",
|
||||
# store
|
||||
"get_store",
|
||||
"make_store",
|
||||
"reset_store",
|
||||
"store_context",
|
||||
# stream_bridge
|
||||
"CANCELLED_SENTINEL",
|
||||
"END_SENTINEL",
|
||||
"HEARTBEAT_SENTINEL",
|
||||
"MemoryStreamBridge",
|
||||
"StreamBridge",
|
||||
"StreamEvent",
|
||||
"make_stream_bridge",
|
||||
"StreamStatus",
|
||||
]
|
||||
|
||||
117
backend/packages/harness/deerflow/runtime/actor_context.py
Normal file
117
backend/packages/harness/deerflow/runtime/actor_context.py
Normal file
@ -0,0 +1,117 @@
|
||||
"""Request/task-scoped actor context for runtime-facing user isolation.
|
||||
|
||||
This module defines a runtime-owned context bridge that lower layers can
|
||||
depend on without importing the auth plugin. The app/auth boundary maps
|
||||
``request.user`` into :class:`ActorContext` and binds it before entering
|
||||
runtime-facing code.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from contextvars import ContextVar, Token
|
||||
from dataclasses import dataclass
|
||||
from typing import Final
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ActorContext:
|
||||
user_id: str | None = None
|
||||
# Future extension points:
|
||||
# subject_id: str | None = None
|
||||
# tenant_id: str | None = None
|
||||
# scopes: frozenset[str] = frozenset()
|
||||
# auth_source: str | None = None
|
||||
|
||||
|
||||
_current_actor: Final[ContextVar[ActorContext | None]] = ContextVar(
|
||||
"deerflow_actor_context",
|
||||
default=None,
|
||||
)
|
||||
|
||||
|
||||
def bind_actor_context(actor: ActorContext) -> Token[ActorContext | None]:
|
||||
"""Bind the current actor for this async task."""
|
||||
|
||||
return _current_actor.set(actor)
|
||||
|
||||
|
||||
def reset_actor_context(token: Token[ActorContext | None]) -> None:
|
||||
"""Restore the actor context captured by ``token``."""
|
||||
|
||||
_current_actor.reset(token)
|
||||
|
||||
|
||||
def get_actor_context() -> ActorContext | None:
|
||||
"""Return the current actor context, or ``None`` if unset."""
|
||||
|
||||
return _current_actor.get()
|
||||
|
||||
|
||||
def require_actor_context() -> ActorContext:
|
||||
"""Return the current actor context, or raise if unset."""
|
||||
|
||||
actor = _current_actor.get()
|
||||
if actor is None:
|
||||
raise RuntimeError("runtime accessed without actor context")
|
||||
return actor
|
||||
|
||||
|
||||
DEFAULT_USER_ID: Final[str] = "default"
|
||||
|
||||
|
||||
def get_effective_user_id() -> str:
|
||||
"""Return the effective user id, or ``DEFAULT_USER_ID`` if unset."""
|
||||
|
||||
actor = _current_actor.get()
|
||||
if actor is None or actor.user_id is None:
|
||||
return DEFAULT_USER_ID
|
||||
return str(actor.user_id)
|
||||
|
||||
|
||||
class _AutoSentinel:
|
||||
"""Singleton marker meaning 'resolve user_id from actor context'."""
|
||||
|
||||
_instance: _AutoSentinel | None = None
|
||||
|
||||
def __new__(cls) -> _AutoSentinel:
|
||||
if cls._instance is None:
|
||||
cls._instance = super().__new__(cls)
|
||||
return cls._instance
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return "<AUTO>"
|
||||
|
||||
|
||||
AUTO: Final[_AutoSentinel] = _AutoSentinel()
|
||||
|
||||
|
||||
def resolve_user_id(
|
||||
value: str | None | _AutoSentinel,
|
||||
*,
|
||||
method_name: str = "repository method",
|
||||
) -> str | None:
|
||||
"""Resolve a repository ``user_id`` argument against the current actor."""
|
||||
|
||||
if isinstance(value, _AutoSentinel):
|
||||
actor = _current_actor.get()
|
||||
if actor is None or actor.user_id is None:
|
||||
raise RuntimeError(
|
||||
f"{method_name} called with user_id=AUTO but no actor context is set; "
|
||||
"pass an explicit user_id, bind ActorContext at the app/runtime boundary, "
|
||||
"or opt out with user_id=None for migration/CLI paths."
|
||||
)
|
||||
return str(actor.user_id)
|
||||
return value
|
||||
|
||||
|
||||
__all__ = [
|
||||
"AUTO",
|
||||
"ActorContext",
|
||||
"DEFAULT_USER_ID",
|
||||
"bind_actor_context",
|
||||
"get_actor_context",
|
||||
"get_effective_user_id",
|
||||
"require_actor_context",
|
||||
"reset_actor_context",
|
||||
"resolve_user_id",
|
||||
]
|
||||
@ -1,6 +1,6 @@
|
||||
"""Pure functions to convert LangChain message objects to OpenAI Chat Completions format.
|
||||
|
||||
Used by RunJournal to build content dicts for event storage.
|
||||
Used by run callbacks to build content dicts for event storage.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
@ -1,16 +1,48 @@
|
||||
"""Run lifecycle management for LangGraph Platform API compatibility."""
|
||||
"""Public runs API."""
|
||||
|
||||
from .manager import ConflictError, RunManager, RunRecord, UnsupportedStrategyError
|
||||
from .schemas import DisconnectMode, RunStatus
|
||||
from .worker import RunContext, run_agent
|
||||
from .facade import RunsFacade
|
||||
from .internal.manager import RunManager
|
||||
from .observer import (
|
||||
CallbackObserver,
|
||||
CompositeObserver,
|
||||
LifecycleEventType,
|
||||
NullObserver,
|
||||
ObserverBinding,
|
||||
ObserverLike,
|
||||
RunEventCallback,
|
||||
RunLifecycleEvent,
|
||||
RunObserver,
|
||||
RunResult,
|
||||
ensure_observer,
|
||||
)
|
||||
from .store import RunCreateStore, RunDeleteStore, RunEventStore, RunQueryStore
|
||||
from .types import CancelAction, RunRecord, RunScope, RunSpec, RunStatus, WaitResult
|
||||
|
||||
__all__ = [
|
||||
"ConflictError",
|
||||
"DisconnectMode",
|
||||
"RunContext",
|
||||
# facade
|
||||
"RunsFacade",
|
||||
"RunManager",
|
||||
"RunCreateStore",
|
||||
"RunDeleteStore",
|
||||
"RunEventStore",
|
||||
"RunQueryStore",
|
||||
# hooks
|
||||
"CallbackObserver",
|
||||
"CompositeObserver",
|
||||
"LifecycleEventType",
|
||||
"NullObserver",
|
||||
"ObserverBinding",
|
||||
"ObserverLike",
|
||||
"RunEventCallback",
|
||||
"RunLifecycleEvent",
|
||||
"RunObserver",
|
||||
"RunResult",
|
||||
"ensure_observer",
|
||||
# types
|
||||
"CancelAction",
|
||||
"RunRecord",
|
||||
"RunScope",
|
||||
"RunSpec",
|
||||
"WaitResult",
|
||||
"RunStatus",
|
||||
"UnsupportedStrategyError",
|
||||
"run_agent",
|
||||
]
|
||||
|
||||
@ -0,0 +1,15 @@
|
||||
"""Runs execution callbacks."""
|
||||
|
||||
from .builder import RunCallbackArtifacts, build_run_callbacks
|
||||
from .events import RunEventCallback
|
||||
from .title import RunTitleCallback
|
||||
from .tokens import RunCompletionData, RunTokenCallback
|
||||
|
||||
__all__ = [
|
||||
"RunCallbackArtifacts",
|
||||
"RunCompletionData",
|
||||
"RunEventCallback",
|
||||
"RunTitleCallback",
|
||||
"RunTokenCallback",
|
||||
"build_run_callbacks",
|
||||
]
|
||||
@ -0,0 +1,138 @@
|
||||
"""Callback assembly for runs execution."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Iterable
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackHandler
|
||||
|
||||
from ..store import RunEventStore
|
||||
from ..types import RunRecord
|
||||
from .events import RunEventCallback
|
||||
from .title import RunTitleCallback
|
||||
from .tokens import RunCompletionData, RunTokenCallback
|
||||
|
||||
|
||||
@dataclass
|
||||
class RunCallbackArtifacts:
|
||||
"""Callbacks plus handles used by the executor after callbacks run."""
|
||||
|
||||
callbacks: list[BaseCallbackHandler]
|
||||
event_callback: RunEventCallback | None = None
|
||||
token_callback: RunTokenCallback | None = None
|
||||
title_callback: RunTitleCallback | None = None
|
||||
|
||||
async def flush(self) -> None:
|
||||
for callback in self.callbacks:
|
||||
flush = getattr(callback, "flush", None)
|
||||
if flush is None:
|
||||
continue
|
||||
result = flush()
|
||||
if hasattr(result, "__await__"):
|
||||
await result
|
||||
|
||||
def completion_data(self) -> RunCompletionData:
|
||||
if self.token_callback is None:
|
||||
return RunCompletionData()
|
||||
return self.token_callback.completion_data()
|
||||
|
||||
def title(self) -> str | None:
|
||||
if self.title_callback is None:
|
||||
return None
|
||||
return self.title_callback.title()
|
||||
|
||||
|
||||
def build_run_callbacks(
|
||||
*,
|
||||
record: RunRecord,
|
||||
graph_input: dict[str, Any],
|
||||
event_store: RunEventStore | None,
|
||||
existing_callbacks: Iterable[BaseCallbackHandler] = (),
|
||||
) -> RunCallbackArtifacts:
|
||||
"""Build execution callbacks for a run.
|
||||
|
||||
Reference callbacks are intentionally not assembled here yet; they remain
|
||||
in the existing artifacts path until that integration is migrated.
|
||||
"""
|
||||
callbacks = list(existing_callbacks)
|
||||
|
||||
event_callback = None
|
||||
if event_store is not None:
|
||||
event_callback = RunEventCallback(
|
||||
run_id=record.run_id,
|
||||
thread_id=record.thread_id,
|
||||
event_store=event_store,
|
||||
)
|
||||
callbacks.append(event_callback)
|
||||
|
||||
token_callback = RunTokenCallback(track_token_usage=True)
|
||||
_set_first_human_message(token_callback, graph_input)
|
||||
callbacks.append(token_callback)
|
||||
|
||||
title_callback = RunTitleCallback()
|
||||
callbacks.append(title_callback)
|
||||
|
||||
return RunCallbackArtifacts(
|
||||
callbacks=callbacks,
|
||||
event_callback=event_callback,
|
||||
token_callback=token_callback,
|
||||
title_callback=title_callback,
|
||||
)
|
||||
|
||||
|
||||
def _set_first_human_message(token_callback: RunTokenCallback, graph_input: dict[str, Any]) -> None:
|
||||
messages = graph_input.get("messages")
|
||||
if not isinstance(messages, list) or not messages:
|
||||
return
|
||||
|
||||
first = messages[0]
|
||||
content = _extract_first_human_text(first)
|
||||
if content:
|
||||
token_callback.set_first_human_message(content)
|
||||
|
||||
|
||||
def _extract_first_human_text(message: Any) -> str | None:
|
||||
if isinstance(message, str):
|
||||
return message
|
||||
|
||||
content = getattr(message, "content", None)
|
||||
if content is not None:
|
||||
return _extract_text_content(content)
|
||||
|
||||
if isinstance(message, dict):
|
||||
return _extract_text_content(message.get("content"))
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _extract_text_content(content: Any) -> str | None:
|
||||
if isinstance(content, str):
|
||||
return content
|
||||
|
||||
if isinstance(content, list):
|
||||
parts: list[str] = []
|
||||
for item in content:
|
||||
if isinstance(item, str):
|
||||
parts.append(item)
|
||||
continue
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
if item.get("type") == "text" and isinstance(item.get("text"), str):
|
||||
parts.append(item["text"])
|
||||
continue
|
||||
if isinstance(item.get("content"), str):
|
||||
parts.append(item["content"])
|
||||
joined = "".join(parts).strip()
|
||||
return joined or None
|
||||
|
||||
if isinstance(content, dict):
|
||||
text = content.get("text")
|
||||
if isinstance(text, str):
|
||||
return text
|
||||
nested = content.get("content")
|
||||
if isinstance(nested, str):
|
||||
return nested
|
||||
|
||||
return None
|
||||
@ -0,0 +1,353 @@
|
||||
"""Run execution event recording callback."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import time
|
||||
from datetime import UTC, datetime
|
||||
from typing import Any
|
||||
from uuid import UUID
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackHandler
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
from deerflow.runtime.converters import langchain_messages_to_openai, langchain_to_openai_completion
|
||||
|
||||
from ..store import RunEventStore
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class RunEventCallback(BaseCallbackHandler):
|
||||
"""Capture LangChain execution events into the run event store."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
run_id: str,
|
||||
thread_id: str,
|
||||
event_store: RunEventStore,
|
||||
flush_threshold: int = 5,
|
||||
max_trace_content: int = 10240,
|
||||
) -> None:
|
||||
super().__init__()
|
||||
self.run_id = run_id
|
||||
self.thread_id = thread_id
|
||||
self._store = event_store
|
||||
self._flush_threshold = flush_threshold
|
||||
self._max_trace_content = max_trace_content
|
||||
self._buffer: list[dict[str, Any]] = []
|
||||
self._llm_start_times: dict[str, float] = {}
|
||||
self._llm_call_index = 0
|
||||
self._cached_prompts: dict[str, list[dict[str, Any]]] = {}
|
||||
self._tool_call_ids: dict[str, str] = {}
|
||||
self._human_message_recorded = False
|
||||
|
||||
def on_chain_start(self, serialized: dict, inputs: Any, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
if kwargs.get("parent_run_id") is not None:
|
||||
return
|
||||
self._put(
|
||||
event_type="run_start",
|
||||
category="lifecycle",
|
||||
metadata={"input_preview": str(inputs)[:500]},
|
||||
)
|
||||
|
||||
def on_chain_end(self, outputs: Any, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
if kwargs.get("parent_run_id") is not None:
|
||||
return
|
||||
self._put(event_type="run_end", category="lifecycle", metadata={"status": "success"})
|
||||
self._flush_sync()
|
||||
|
||||
def on_chain_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
if kwargs.get("parent_run_id") is not None:
|
||||
return
|
||||
self._put(
|
||||
event_type="run_error",
|
||||
category="lifecycle",
|
||||
content=str(error),
|
||||
metadata={"error_type": type(error).__name__},
|
||||
)
|
||||
self._flush_sync()
|
||||
|
||||
def on_chat_model_start(self, serialized: dict, messages: list[list], *, run_id: UUID, **kwargs: Any) -> None:
|
||||
rid = str(run_id)
|
||||
self._llm_start_times[rid] = time.monotonic()
|
||||
self._llm_call_index += 1
|
||||
|
||||
prompt_msgs = messages[0] if messages else []
|
||||
openai_msgs = langchain_messages_to_openai(prompt_msgs)
|
||||
self._cached_prompts[rid] = openai_msgs
|
||||
caller = self._identify_caller(kwargs)
|
||||
|
||||
self._record_first_human_message(prompt_msgs, caller=caller)
|
||||
|
||||
self._put(
|
||||
event_type="llm_request",
|
||||
category="trace",
|
||||
content={"model": serialized.get("name", ""), "messages": openai_msgs},
|
||||
metadata={
|
||||
"caller": caller,
|
||||
"llm_call_index": self._llm_call_index,
|
||||
},
|
||||
)
|
||||
|
||||
def on_llm_start(self, serialized: dict, prompts: list[str], *, run_id: UUID, **kwargs: Any) -> None:
|
||||
self._llm_start_times[str(run_id)] = time.monotonic()
|
||||
|
||||
def on_llm_end(self, response: Any, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
try:
|
||||
message = response.generations[0][0].message
|
||||
except (IndexError, AttributeError):
|
||||
logger.debug("on_llm_end: could not extract message from response")
|
||||
return
|
||||
|
||||
rid = str(run_id)
|
||||
start = self._llm_start_times.pop(rid, None)
|
||||
latency_ms = int((time.monotonic() - start) * 1000) if start else None
|
||||
usage = dict(getattr(message, "usage_metadata", None) or {})
|
||||
caller = self._identify_caller(kwargs)
|
||||
|
||||
call_index = self._llm_call_index
|
||||
if rid not in self._cached_prompts:
|
||||
self._llm_call_index += 1
|
||||
call_index = self._llm_call_index
|
||||
self._cached_prompts.pop(rid, None)
|
||||
|
||||
self._put(
|
||||
event_type="llm_response",
|
||||
category="trace",
|
||||
content=langchain_to_openai_completion(message),
|
||||
metadata={
|
||||
"caller": caller,
|
||||
"usage": usage,
|
||||
"latency_ms": latency_ms,
|
||||
"llm_call_index": call_index,
|
||||
},
|
||||
)
|
||||
|
||||
content = getattr(message, "content", "")
|
||||
tool_calls = getattr(message, "tool_calls", None) or []
|
||||
if caller != "lead_agent":
|
||||
return
|
||||
if tool_calls:
|
||||
self._put(
|
||||
event_type="ai_tool_call",
|
||||
category="message",
|
||||
content=message.model_dump(),
|
||||
metadata={"finish_reason": "tool_calls"},
|
||||
)
|
||||
elif isinstance(content, str) and content:
|
||||
self._put(
|
||||
event_type="ai_message",
|
||||
category="message",
|
||||
content=message.model_dump(),
|
||||
metadata={"finish_reason": "stop"},
|
||||
)
|
||||
|
||||
def on_llm_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
self._llm_start_times.pop(str(run_id), None)
|
||||
self._put(event_type="llm_error", category="trace", content=str(error))
|
||||
|
||||
def on_tool_start(self, serialized: dict, input_str: str, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
tool_call_id = kwargs.get("tool_call_id")
|
||||
if tool_call_id:
|
||||
self._tool_call_ids[str(run_id)] = tool_call_id
|
||||
self._put(
|
||||
event_type="tool_start",
|
||||
category="trace",
|
||||
metadata={
|
||||
"tool_name": serialized.get("name", ""),
|
||||
"tool_call_id": tool_call_id,
|
||||
"args": str(input_str)[:2000],
|
||||
},
|
||||
)
|
||||
|
||||
def on_tool_end(self, output: Any, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
from langchain_core.messages import ToolMessage
|
||||
|
||||
if isinstance(output, ToolMessage):
|
||||
tool_call_id = output.tool_call_id or kwargs.get("tool_call_id") or self._tool_call_ids.pop(str(run_id), None)
|
||||
tool_name = output.name or kwargs.get("name", "")
|
||||
status = getattr(output, "status", "success") or "success"
|
||||
content_str = output.content if isinstance(output.content, str) else str(output.content)
|
||||
msg_content = output.model_dump()
|
||||
if msg_content.get("tool_call_id") != tool_call_id:
|
||||
msg_content["tool_call_id"] = tool_call_id
|
||||
else:
|
||||
tool_call_id = kwargs.get("tool_call_id") or self._tool_call_ids.pop(str(run_id), None)
|
||||
tool_name = kwargs.get("name", "")
|
||||
status = "success"
|
||||
content_str = str(output)
|
||||
msg_content = ToolMessage(
|
||||
content=content_str,
|
||||
tool_call_id=tool_call_id or "",
|
||||
name=tool_name,
|
||||
status=status,
|
||||
).model_dump()
|
||||
|
||||
self._put(
|
||||
event_type="tool_end",
|
||||
category="trace",
|
||||
content=content_str,
|
||||
metadata={
|
||||
"tool_name": tool_name,
|
||||
"tool_call_id": tool_call_id,
|
||||
"status": status,
|
||||
},
|
||||
)
|
||||
self._put(
|
||||
event_type="tool_result",
|
||||
category="message",
|
||||
content=msg_content,
|
||||
metadata={"tool_name": tool_name, "status": status},
|
||||
)
|
||||
|
||||
def on_tool_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
from langchain_core.messages import ToolMessage
|
||||
|
||||
tool_call_id = kwargs.get("tool_call_id") or self._tool_call_ids.pop(str(run_id), None)
|
||||
tool_name = kwargs.get("name", "")
|
||||
self._put(
|
||||
event_type="tool_error",
|
||||
category="trace",
|
||||
content=str(error),
|
||||
metadata={"tool_name": tool_name, "tool_call_id": tool_call_id},
|
||||
)
|
||||
self._put(
|
||||
event_type="tool_result",
|
||||
category="message",
|
||||
content=ToolMessage(
|
||||
content=str(error),
|
||||
tool_call_id=tool_call_id or "",
|
||||
name=tool_name,
|
||||
status="error",
|
||||
).model_dump(),
|
||||
metadata={"tool_name": tool_name, "status": "error"},
|
||||
)
|
||||
|
||||
def on_custom_event(self, name: str, data: Any, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
from deerflow.runtime.serialization import serialize_lc_object
|
||||
|
||||
if name == "summarization":
|
||||
data_dict = data if isinstance(data, dict) else {}
|
||||
self._put(
|
||||
event_type="summarization",
|
||||
category="trace",
|
||||
content=data_dict.get("summary", ""),
|
||||
metadata={
|
||||
"replaced_message_ids": data_dict.get("replaced_message_ids", []),
|
||||
"replaced_count": data_dict.get("replaced_count", 0),
|
||||
},
|
||||
)
|
||||
self._put(
|
||||
event_type="middleware:summarize",
|
||||
category="middleware",
|
||||
content={"role": "system", "content": data_dict.get("summary", "")},
|
||||
metadata={"replaced_count": data_dict.get("replaced_count", 0)},
|
||||
)
|
||||
return
|
||||
|
||||
event_data = serialize_lc_object(data) if not isinstance(data, dict) else data
|
||||
self._put(
|
||||
event_type=name,
|
||||
category="trace",
|
||||
metadata=event_data if isinstance(event_data, dict) else {"data": event_data},
|
||||
)
|
||||
|
||||
async def flush(self) -> None:
|
||||
if self._buffer:
|
||||
batch = self._buffer.copy()
|
||||
self._buffer.clear()
|
||||
await self._store.put_batch(batch)
|
||||
|
||||
def _put(
|
||||
self,
|
||||
*,
|
||||
event_type: str,
|
||||
category: str,
|
||||
content: Any = "",
|
||||
metadata: dict[str, Any] | None = None,
|
||||
) -> None:
|
||||
normalized_metadata = dict(metadata or {})
|
||||
if category != "message" and isinstance(content, str) and len(content) > self._max_trace_content:
|
||||
normalized_metadata["content_truncated"] = True
|
||||
normalized_metadata["original_content_length"] = len(content)
|
||||
content = content[: self._max_trace_content]
|
||||
|
||||
self._buffer.append(
|
||||
{
|
||||
"thread_id": self.thread_id,
|
||||
"run_id": self.run_id,
|
||||
"event_type": event_type,
|
||||
"category": category,
|
||||
"content": content,
|
||||
"metadata": normalized_metadata,
|
||||
"created_at": datetime.now(UTC).isoformat(),
|
||||
}
|
||||
)
|
||||
if len(self._buffer) >= self._flush_threshold:
|
||||
self._flush_sync()
|
||||
|
||||
def _flush_sync(self) -> None:
|
||||
if not self._buffer:
|
||||
return
|
||||
try:
|
||||
loop = asyncio.get_running_loop()
|
||||
except RuntimeError:
|
||||
return
|
||||
batch = self._buffer.copy()
|
||||
self._buffer.clear()
|
||||
task = loop.create_task(self._flush_async(batch))
|
||||
task.add_done_callback(self._on_flush_done)
|
||||
|
||||
async def _flush_async(self, batch: list[dict[str, Any]]) -> None:
|
||||
try:
|
||||
await self._store.put_batch(batch)
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"Failed to flush %d events for run %s; returning to buffer",
|
||||
len(batch),
|
||||
self.run_id,
|
||||
exc_info=True,
|
||||
)
|
||||
self._buffer = batch + self._buffer
|
||||
|
||||
@staticmethod
|
||||
def _on_flush_done(task: asyncio.Task) -> None:
|
||||
if task.cancelled():
|
||||
return
|
||||
exc = task.exception()
|
||||
if exc:
|
||||
logger.warning("Run event flush task failed: %s", exc)
|
||||
|
||||
def _identify_caller(self, kwargs: dict[str, Any]) -> str:
|
||||
for tag in kwargs.get("tags") or []:
|
||||
if isinstance(tag, str) and (
|
||||
tag.startswith("subagent:")
|
||||
or tag.startswith("middleware:")
|
||||
or tag == "lead_agent"
|
||||
):
|
||||
return tag
|
||||
return "lead_agent"
|
||||
|
||||
def _record_first_human_message(self, messages: list[Any], *, caller: str) -> None:
|
||||
if self._human_message_recorded:
|
||||
return
|
||||
|
||||
for message in messages:
|
||||
if not isinstance(message, HumanMessage):
|
||||
continue
|
||||
if message.name == "summary":
|
||||
continue
|
||||
self._put(
|
||||
event_type="human_message",
|
||||
category="message",
|
||||
content=message.model_dump(),
|
||||
metadata={
|
||||
"caller": caller,
|
||||
"source": "chat_model_start",
|
||||
},
|
||||
)
|
||||
self._human_message_recorded = True
|
||||
return
|
||||
@ -0,0 +1,51 @@
|
||||
"""Title capture callback for runs."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
from uuid import UUID
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackHandler
|
||||
|
||||
|
||||
class RunTitleCallback(BaseCallbackHandler):
|
||||
"""Capture title generated by title middleware LLM calls or custom events."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
super().__init__()
|
||||
self._title: str | None = None
|
||||
|
||||
def on_llm_end(self, response: Any, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
if self._identify_caller(kwargs) != "middleware:title":
|
||||
return
|
||||
try:
|
||||
message = response.generations[0][0].message
|
||||
except (IndexError, AttributeError):
|
||||
return
|
||||
content = getattr(message, "content", "")
|
||||
if isinstance(content, str) and content:
|
||||
self._title = content.strip().strip('"').strip("'")[:200]
|
||||
|
||||
def on_custom_event(self, name: str, data: Any, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
if name not in {"title", "thread_title", "middleware:title"}:
|
||||
return
|
||||
if isinstance(data, str):
|
||||
self._title = data.strip()[:200]
|
||||
return
|
||||
if isinstance(data, dict):
|
||||
title = data.get("title")
|
||||
if isinstance(title, str):
|
||||
self._title = title.strip()[:200]
|
||||
|
||||
def title(self) -> str | None:
|
||||
return self._title
|
||||
|
||||
def _identify_caller(self, kwargs: dict[str, Any]) -> str:
|
||||
for tag in kwargs.get("tags") or []:
|
||||
if isinstance(tag, str) and (
|
||||
tag.startswith("subagent:")
|
||||
or tag.startswith("middleware:")
|
||||
or tag == "lead_agent"
|
||||
):
|
||||
return tag
|
||||
return "lead_agent"
|
||||
@ -0,0 +1,122 @@
|
||||
"""Token and message summary callback for runs."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
from uuid import UUID
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackHandler
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RunCompletionData:
|
||||
total_input_tokens: int = 0
|
||||
total_output_tokens: int = 0
|
||||
total_tokens: int = 0
|
||||
llm_call_count: int = 0
|
||||
lead_agent_tokens: int = 0
|
||||
subagent_tokens: int = 0
|
||||
middleware_tokens: int = 0
|
||||
message_count: int = 0
|
||||
last_ai_message: str | None = None
|
||||
first_human_message: str | None = None
|
||||
|
||||
def to_dict(self) -> dict[str, object]:
|
||||
return {
|
||||
"total_input_tokens": self.total_input_tokens,
|
||||
"total_output_tokens": self.total_output_tokens,
|
||||
"total_tokens": self.total_tokens,
|
||||
"llm_call_count": self.llm_call_count,
|
||||
"lead_agent_tokens": self.lead_agent_tokens,
|
||||
"subagent_tokens": self.subagent_tokens,
|
||||
"middleware_tokens": self.middleware_tokens,
|
||||
"message_count": self.message_count,
|
||||
"last_ai_message": self.last_ai_message,
|
||||
"first_human_message": self.first_human_message,
|
||||
}
|
||||
|
||||
|
||||
class RunTokenCallback(BaseCallbackHandler):
|
||||
"""Aggregate token and message summary data for one run."""
|
||||
|
||||
def __init__(self, *, track_token_usage: bool = True) -> None:
|
||||
super().__init__()
|
||||
self._track_token_usage = track_token_usage
|
||||
self._total_input_tokens = 0
|
||||
self._total_output_tokens = 0
|
||||
self._total_tokens = 0
|
||||
self._llm_call_count = 0
|
||||
self._lead_agent_tokens = 0
|
||||
self._subagent_tokens = 0
|
||||
self._middleware_tokens = 0
|
||||
self._message_count = 0
|
||||
self._last_ai_message: str | None = None
|
||||
self._first_human_message: str | None = None
|
||||
|
||||
def set_first_human_message(self, content: str) -> None:
|
||||
self._first_human_message = content[:2000] if content else None
|
||||
|
||||
def on_llm_end(self, response: Any, *, run_id: UUID, **kwargs: Any) -> None:
|
||||
try:
|
||||
message = response.generations[0][0].message
|
||||
except (IndexError, AttributeError):
|
||||
return
|
||||
|
||||
self._record_ai_message(message, kwargs)
|
||||
if not self._track_token_usage:
|
||||
return
|
||||
|
||||
usage = dict(getattr(message, "usage_metadata", None) or {})
|
||||
input_tk = usage.get("input_tokens", 0) or 0
|
||||
output_tk = usage.get("output_tokens", 0) or 0
|
||||
total_tk = usage.get("total_tokens", 0) or input_tk + output_tk
|
||||
if total_tk <= 0:
|
||||
return
|
||||
|
||||
self._total_input_tokens += input_tk
|
||||
self._total_output_tokens += output_tk
|
||||
self._total_tokens += total_tk
|
||||
self._llm_call_count += 1
|
||||
|
||||
caller = self._identify_caller(kwargs)
|
||||
if caller.startswith("subagent:"):
|
||||
self._subagent_tokens += total_tk
|
||||
elif caller.startswith("middleware:"):
|
||||
self._middleware_tokens += total_tk
|
||||
else:
|
||||
self._lead_agent_tokens += total_tk
|
||||
|
||||
def completion_data(self) -> RunCompletionData:
|
||||
return RunCompletionData(
|
||||
total_input_tokens=self._total_input_tokens,
|
||||
total_output_tokens=self._total_output_tokens,
|
||||
total_tokens=self._total_tokens,
|
||||
llm_call_count=self._llm_call_count,
|
||||
lead_agent_tokens=self._lead_agent_tokens,
|
||||
subagent_tokens=self._subagent_tokens,
|
||||
middleware_tokens=self._middleware_tokens,
|
||||
message_count=self._message_count,
|
||||
last_ai_message=self._last_ai_message,
|
||||
first_human_message=self._first_human_message,
|
||||
)
|
||||
|
||||
def _record_ai_message(self, message: Any, kwargs: dict[str, Any]) -> None:
|
||||
if self._identify_caller(kwargs) != "lead_agent":
|
||||
return
|
||||
if getattr(message, "tool_calls", None):
|
||||
return
|
||||
content = getattr(message, "content", "")
|
||||
if isinstance(content, str) and content:
|
||||
self._last_ai_message = content[:2000]
|
||||
self._message_count += 1
|
||||
|
||||
def _identify_caller(self, kwargs: dict[str, Any]) -> str:
|
||||
for tag in kwargs.get("tags") or []:
|
||||
if isinstance(tag, str) and (
|
||||
tag.startswith("subagent:")
|
||||
or tag.startswith("middleware:")
|
||||
or tag == "lead_agent"
|
||||
):
|
||||
return tag
|
||||
return "lead_agent"
|
||||
240
backend/packages/harness/deerflow/runtime/runs/facade.py
Normal file
240
backend/packages/harness/deerflow/runtime/runs/facade.py
Normal file
@ -0,0 +1,240 @@
|
||||
"""Public runs facade."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, AsyncIterator, Callable
|
||||
|
||||
from deerflow.runtime.stream_bridge import StreamEvent
|
||||
|
||||
from .internal.execution.executor import _RunExecution
|
||||
from .internal.execution.supervisor import RunSupervisor
|
||||
from .internal.planner import ExecutionPlanner
|
||||
from .internal.registry import RunRegistry
|
||||
from .internal.streams import RunStreamService
|
||||
from .internal.wait import RunWaitService, WaitErrorResult
|
||||
from .observer import ObserverLike
|
||||
from .store import RunCreateStore, RunDeleteStore, RunEventStore, RunQueryStore
|
||||
from .types import CancelAction, RunRecord, RunSpec
|
||||
|
||||
|
||||
class MultitaskRejectError(Exception):
|
||||
"""Raised when multitask_strategy is reject and thread has inflight runs."""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RunsRuntime:
|
||||
"""Runtime dependencies needed to execute a run."""
|
||||
|
||||
bridge: Any
|
||||
checkpointer: Any
|
||||
store: Any | None
|
||||
event_store: RunEventStore | None
|
||||
agent_factory_resolver: Callable[[str | None], Any]
|
||||
|
||||
|
||||
class _RegistryStatusAdapter:
|
||||
"""Minimal adapter so execution can update registry-backed run status."""
|
||||
|
||||
def __init__(self, registry: RunRegistry) -> None:
|
||||
self._registry = registry
|
||||
|
||||
async def set_status(self, run_id: str, status: Any, *, error: str | None = None) -> None:
|
||||
await self._registry.set_status(run_id, status, error=error)
|
||||
|
||||
|
||||
class RunsFacade:
|
||||
"""
|
||||
Phase 1 runs domain facade.
|
||||
|
||||
Provides unified interface for:
|
||||
- create_background
|
||||
- create_and_stream
|
||||
- create_and_wait
|
||||
- join_stream
|
||||
- join_wait
|
||||
|
||||
Orchestrates registry, planner, supervisor, stream, and wait services.
|
||||
Execution now flows through ExecutionPlanner + RunSupervisor rather than
|
||||
the legacy RunManager create/start path.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
registry: RunRegistry,
|
||||
planner: ExecutionPlanner,
|
||||
supervisor: RunSupervisor,
|
||||
stream_service: RunStreamService,
|
||||
wait_service: RunWaitService,
|
||||
runtime: RunsRuntime,
|
||||
observer: ObserverLike = None,
|
||||
query_store: RunQueryStore | None = None,
|
||||
create_store: RunCreateStore | None = None,
|
||||
delete_store: RunDeleteStore | None = None,
|
||||
) -> None:
|
||||
self._registry = registry
|
||||
self._planner = planner
|
||||
self._supervisor = supervisor
|
||||
self._stream = stream_service
|
||||
self._wait = wait_service
|
||||
self._runtime = runtime
|
||||
self._observer = observer
|
||||
self._query_store = query_store
|
||||
self._create_store = create_store
|
||||
self._delete_store = delete_store
|
||||
|
||||
async def create_background(self, spec: RunSpec) -> RunRecord:
|
||||
"""
|
||||
Create a run in background mode.
|
||||
|
||||
Returns immediately with the run record.
|
||||
The run executes asynchronously.
|
||||
"""
|
||||
return await self._create_run(spec)
|
||||
|
||||
async def create_and_stream(
|
||||
self,
|
||||
spec: RunSpec,
|
||||
) -> tuple[RunRecord, AsyncIterator[StreamEvent]]:
|
||||
"""
|
||||
Create a run and return stream.
|
||||
|
||||
Returns (record, stream_iterator).
|
||||
"""
|
||||
record = await self._create_run(spec)
|
||||
|
||||
stream = self._stream.subscribe(record.run_id)
|
||||
return record, stream
|
||||
|
||||
async def create_and_wait(
|
||||
self,
|
||||
spec: RunSpec,
|
||||
) -> tuple[RunRecord, dict[str, Any] | WaitErrorResult | None]:
|
||||
"""
|
||||
Create a run and wait for completion.
|
||||
|
||||
Returns (record, final_values_or_error).
|
||||
"""
|
||||
record = await self._create_run(spec)
|
||||
|
||||
result = await self._wait.wait_for_values_or_error(record.run_id)
|
||||
return record, result
|
||||
|
||||
async def join_stream(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
last_event_id: str | None = None,
|
||||
) -> AsyncIterator[StreamEvent]:
|
||||
"""
|
||||
Join an existing run stream.
|
||||
|
||||
Supports resumption via last_event_id.
|
||||
"""
|
||||
return self._stream.subscribe(run_id, last_event_id=last_event_id)
|
||||
|
||||
async def join_wait(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
last_event_id: str | None = None,
|
||||
) -> dict[str, Any] | WaitErrorResult | None:
|
||||
"""
|
||||
Join an existing run and wait for completion.
|
||||
"""
|
||||
return await self._wait.wait_for_values_or_error(
|
||||
run_id,
|
||||
last_event_id=last_event_id,
|
||||
)
|
||||
|
||||
async def cancel(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
action: CancelAction = "interrupt",
|
||||
) -> bool:
|
||||
"""Request cancellation for an active run."""
|
||||
return await self._supervisor.cancel(run_id, action=action)
|
||||
|
||||
async def get_run(self, run_id: str) -> RunRecord | None:
|
||||
"""Get run record by ID."""
|
||||
if self._query_store is not None:
|
||||
return await self._query_store.get_run(run_id)
|
||||
return self._registry.get(run_id)
|
||||
|
||||
async def list_runs(self, thread_id: str) -> list[RunRecord]:
|
||||
"""List runs for a thread."""
|
||||
if self._query_store is not None:
|
||||
return await self._query_store.list_runs(thread_id)
|
||||
return await self._registry.list_by_thread(thread_id)
|
||||
|
||||
async def delete_run(self, run_id: str) -> bool:
|
||||
"""Delete a run from durable storage and local runtime state."""
|
||||
record = await self.get_run(run_id)
|
||||
if record is None:
|
||||
return False
|
||||
|
||||
await self._supervisor.cancel(run_id, action="interrupt")
|
||||
await self._registry.delete(run_id)
|
||||
|
||||
if self._delete_store is not None:
|
||||
return await self._delete_store.delete_run(run_id)
|
||||
|
||||
return True
|
||||
|
||||
async def _create_run(self, spec: RunSpec) -> RunRecord:
|
||||
"""Create a run record and hand it to the execution backend."""
|
||||
await self._apply_multitask_strategy(spec)
|
||||
record = await self._registry.create(spec)
|
||||
if self._create_store is not None:
|
||||
await self._create_store.create_run(record)
|
||||
await self._start_execution(record, spec)
|
||||
return record
|
||||
|
||||
async def _apply_multitask_strategy(self, spec: RunSpec) -> None:
|
||||
"""Apply multitask strategy before creating run."""
|
||||
has_inflight = await self._registry.has_inflight(spec.scope.thread_id)
|
||||
|
||||
if not has_inflight:
|
||||
return
|
||||
|
||||
if spec.multitask_strategy == "reject":
|
||||
raise MultitaskRejectError(
|
||||
f"Thread {spec.scope.thread_id} has inflight runs"
|
||||
)
|
||||
elif spec.multitask_strategy == "interrupt":
|
||||
interrupted = await self._registry.interrupt_inflight(spec.scope.thread_id)
|
||||
for run_id in interrupted:
|
||||
await self._supervisor.cancel(run_id, action="interrupt")
|
||||
|
||||
async def _start_execution(self, record: RunRecord, spec: RunSpec) -> None:
|
||||
"""Start run execution via planner + supervisor."""
|
||||
# Update status to starting
|
||||
await self._registry.set_status(record.run_id, "starting")
|
||||
|
||||
plan = self._planner.build(record, spec)
|
||||
status_adapter = _RegistryStatusAdapter(self._registry)
|
||||
agent_factory = self._runtime.agent_factory_resolver(spec.assistant_id)
|
||||
|
||||
async def _runner(handle) -> Any:
|
||||
return await _RunExecution(
|
||||
bridge=self._runtime.bridge,
|
||||
run_manager=status_adapter, # type: ignore[arg-type]
|
||||
record=record,
|
||||
checkpointer=self._runtime.checkpointer,
|
||||
store=self._runtime.store,
|
||||
event_store=self._runtime.event_store,
|
||||
agent_factory=agent_factory,
|
||||
graph_input=plan.graph_input,
|
||||
config=plan.runnable_config,
|
||||
observer=self._observer,
|
||||
stream_modes=plan.stream_modes,
|
||||
stream_subgraphs=plan.stream_subgraphs,
|
||||
interrupt_before=plan.interrupt_before,
|
||||
interrupt_after=plan.interrupt_after,
|
||||
handle=handle,
|
||||
).run()
|
||||
|
||||
await self._supervisor.launch(record.run_id, runner=_runner)
|
||||
@ -0,0 +1,4 @@
|
||||
"""Internal runs implementation modules.
|
||||
|
||||
These modules are implementation details behind the public runs surface.
|
||||
"""
|
||||
@ -0,0 +1 @@
|
||||
"""Internal execution components for runs domain."""
|
||||
@ -0,0 +1,64 @@
|
||||
"""Execution preparation helpers for a single run."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackHandler
|
||||
from langchain_core.runnables import RunnableConfig
|
||||
from langgraph.runtime import Runtime
|
||||
|
||||
from deerflow.runtime.stream_bridge import StreamBridge
|
||||
|
||||
|
||||
@dataclass
|
||||
class RunBuildArtifacts:
|
||||
"""Assembled agent runtime pieces for a single run."""
|
||||
|
||||
agent: Any
|
||||
runnable_config: dict[str, Any]
|
||||
reference_store: Any | None = None
|
||||
|
||||
|
||||
def build_run_artifacts(
|
||||
*,
|
||||
thread_id: str,
|
||||
run_id: str,
|
||||
checkpointer: Any | None,
|
||||
store: Any | None,
|
||||
agent_factory: Any,
|
||||
config: dict[str, Any],
|
||||
bridge: StreamBridge,
|
||||
interrupt_before: list[str] | None = None,
|
||||
interrupt_after: list[str] | None = None,
|
||||
callbacks: list[BaseCallbackHandler] | None = None,
|
||||
) -> RunBuildArtifacts:
|
||||
"""Assemble all components needed for agent execution."""
|
||||
runtime = Runtime(context={"thread_id": thread_id}, store=store)
|
||||
if "context" in config and isinstance(config["context"], dict):
|
||||
config["context"].setdefault("thread_id", thread_id)
|
||||
config.setdefault("configurable", {})["__pregel_runtime"] = runtime
|
||||
|
||||
config_callbacks = config.setdefault("callbacks", [])
|
||||
if callbacks:
|
||||
config_callbacks.extend(callbacks)
|
||||
|
||||
runnable_config = RunnableConfig(**config)
|
||||
agent = agent_factory(config=runnable_config)
|
||||
|
||||
if checkpointer is not None:
|
||||
agent.checkpointer = checkpointer
|
||||
if store is not None:
|
||||
agent.store = store
|
||||
|
||||
if interrupt_before:
|
||||
agent.interrupt_before_nodes = interrupt_before
|
||||
if interrupt_after:
|
||||
agent.interrupt_after_nodes = interrupt_after
|
||||
|
||||
return RunBuildArtifacts(
|
||||
agent=agent,
|
||||
runnable_config=dict(runnable_config),
|
||||
reference_store=store,
|
||||
)
|
||||
@ -0,0 +1,45 @@
|
||||
"""Lifecycle event helpers for run execution."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import UTC, datetime
|
||||
from typing import Any
|
||||
|
||||
from ...observer import LifecycleEventType, RunLifecycleEvent, RunObserver
|
||||
|
||||
|
||||
class RunEventEmitter:
|
||||
"""Build and dispatch lifecycle events for a single run."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
run_id: str,
|
||||
thread_id: str,
|
||||
observer: RunObserver,
|
||||
) -> None:
|
||||
self._run_id = run_id
|
||||
self._thread_id = thread_id
|
||||
self._observer = observer
|
||||
self._sequence = 0
|
||||
|
||||
@property
|
||||
def sequence(self) -> int:
|
||||
return self._sequence
|
||||
|
||||
async def emit(
|
||||
self,
|
||||
event_type: LifecycleEventType,
|
||||
payload: dict[str, Any] | None = None,
|
||||
) -> None:
|
||||
self._sequence += 1
|
||||
event = RunLifecycleEvent(
|
||||
event_id=f"{self._run_id}:{event_type.value}:{self._sequence}",
|
||||
event_type=event_type,
|
||||
run_id=self._run_id,
|
||||
thread_id=self._thread_id,
|
||||
sequence=self._sequence,
|
||||
occurred_at=datetime.now(UTC),
|
||||
payload=payload or {},
|
||||
)
|
||||
await self._observer.on_event(event)
|
||||
@ -0,0 +1,376 @@
|
||||
"""Single-run execution orchestrator and execution-local helpers."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from typing import Any, Literal
|
||||
|
||||
from langchain_core.runnables import RunnableConfig
|
||||
|
||||
from deerflow.runtime.serialization import serialize
|
||||
from deerflow.runtime.stream_bridge import StreamBridge, StreamStatus
|
||||
|
||||
from ...callbacks.builder import RunCallbackArtifacts, build_run_callbacks
|
||||
from ...observer import LifecycleEventType, RunObserver, RunResult
|
||||
from ...store import RunEventStore
|
||||
from ...types import RunStatus
|
||||
from .artifacts import build_run_artifacts
|
||||
from .events import RunEventEmitter
|
||||
from .stream_logic import external_stream_event_name, normalize_stream_modes, should_filter_event, unpack_stream_item
|
||||
from .supervisor import RunHandle
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class _RunExecution:
|
||||
"""Encapsulate the lifecycle of a single run."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
bridge: StreamBridge,
|
||||
run_manager: Any,
|
||||
record: Any,
|
||||
checkpointer: Any | None = None,
|
||||
store: Any | None = None,
|
||||
event_store: RunEventStore | None = None,
|
||||
ctx: Any | None = None,
|
||||
agent_factory: Any,
|
||||
graph_input: dict,
|
||||
config: dict,
|
||||
observer: RunObserver,
|
||||
stream_modes: list[str] | None,
|
||||
stream_subgraphs: bool,
|
||||
interrupt_before: list[str] | Literal["*"] | None,
|
||||
interrupt_after: list[str] | Literal["*"] | None,
|
||||
handle: RunHandle | None = None,
|
||||
) -> None:
|
||||
if ctx is not None:
|
||||
checkpointer = getattr(ctx, "checkpointer", checkpointer)
|
||||
store = getattr(ctx, "store", store)
|
||||
|
||||
self.bridge = bridge
|
||||
self.run_manager = run_manager
|
||||
self.record = record
|
||||
self.checkpointer = checkpointer
|
||||
self.store = store
|
||||
self.event_store = event_store
|
||||
self.agent_factory = agent_factory
|
||||
self.graph_input = graph_input
|
||||
self.config = config
|
||||
self.observer = observer
|
||||
self.stream_modes = stream_modes
|
||||
self.stream_subgraphs = stream_subgraphs
|
||||
self.interrupt_before = interrupt_before
|
||||
self.interrupt_after = interrupt_after
|
||||
self.handle = handle
|
||||
|
||||
self.run_id = record.run_id
|
||||
self.thread_id = record.thread_id
|
||||
self._pre_run_checkpoint_id: str | None = None
|
||||
self._emitter = RunEventEmitter(
|
||||
run_id=self.run_id,
|
||||
thread_id=self.thread_id,
|
||||
observer=observer,
|
||||
)
|
||||
self.result = RunResult(
|
||||
run_id=self.run_id,
|
||||
thread_id=self.thread_id,
|
||||
status=RunStatus.pending,
|
||||
)
|
||||
self._agent: Any = None
|
||||
self._runnable_config: dict[str, Any] = {}
|
||||
self._lg_modes: list[str] = []
|
||||
self._callback_artifacts: RunCallbackArtifacts | None = None
|
||||
|
||||
@property
|
||||
def _event_sequence(self) -> int:
|
||||
return self._emitter.sequence
|
||||
|
||||
async def _emit(
|
||||
self,
|
||||
event_type: LifecycleEventType,
|
||||
payload: dict[str, Any] | None = None,
|
||||
) -> None:
|
||||
await self._emitter.emit(event_type, payload)
|
||||
|
||||
async def _start(self) -> None:
|
||||
await self.run_manager.set_status(self.run_id, RunStatus.running)
|
||||
await self._emit(LifecycleEventType.RUN_STARTED, {})
|
||||
|
||||
human_msg = self._extract_human_message()
|
||||
if human_msg is not None:
|
||||
await self._emit(
|
||||
LifecycleEventType.HUMAN_MESSAGE,
|
||||
{"message": human_msg.model_dump()},
|
||||
)
|
||||
|
||||
await self._capture_pre_run_checkpoint()
|
||||
await self.bridge.publish(
|
||||
self.run_id,
|
||||
"metadata",
|
||||
{"run_id": self.run_id, "thread_id": self.thread_id},
|
||||
)
|
||||
|
||||
def _extract_human_message(self) -> Any:
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
messages = self.graph_input.get("messages")
|
||||
if not messages:
|
||||
return None
|
||||
last = messages[-1] if isinstance(messages, list) else messages
|
||||
if isinstance(last, HumanMessage):
|
||||
return last
|
||||
if isinstance(last, str):
|
||||
return HumanMessage(content=last) if last else None
|
||||
if hasattr(last, "content"):
|
||||
return HumanMessage(content=last.content)
|
||||
if isinstance(last, dict):
|
||||
content = last.get("content", "")
|
||||
return HumanMessage(content=content) if content else None
|
||||
return None
|
||||
|
||||
async def _capture_pre_run_checkpoint(self) -> None:
|
||||
try:
|
||||
config_for_check = {"configurable": {"thread_id": self.thread_id, "checkpoint_ns": ""}}
|
||||
ckpt_tuple = await self.checkpointer.aget_tuple(config_for_check)
|
||||
if ckpt_tuple is not None:
|
||||
self._pre_run_checkpoint_id = (
|
||||
getattr(ckpt_tuple, "config", {})
|
||||
.get("configurable", {})
|
||||
.get("checkpoint_id")
|
||||
)
|
||||
except Exception:
|
||||
logger.debug("Could not get pre-run checkpoint_id for run %s", self.run_id)
|
||||
|
||||
async def _prepare(self) -> None:
|
||||
config = dict(self.config)
|
||||
existing_callbacks = config.pop("callbacks", [])
|
||||
if existing_callbacks is None:
|
||||
existing_callbacks = []
|
||||
elif not isinstance(existing_callbacks, list):
|
||||
existing_callbacks = [existing_callbacks]
|
||||
|
||||
self._callback_artifacts = build_run_callbacks(
|
||||
record=self.record,
|
||||
graph_input=self.graph_input,
|
||||
event_store=self.event_store,
|
||||
existing_callbacks=existing_callbacks,
|
||||
)
|
||||
|
||||
artifacts = build_run_artifacts(
|
||||
thread_id=self.thread_id,
|
||||
run_id=self.run_id,
|
||||
checkpointer=self.checkpointer,
|
||||
store=self.store,
|
||||
agent_factory=self.agent_factory,
|
||||
config=config,
|
||||
bridge=self.bridge,
|
||||
interrupt_before=self.interrupt_before,
|
||||
interrupt_after=self.interrupt_after,
|
||||
callbacks=self._callback_artifacts.callbacks,
|
||||
)
|
||||
|
||||
self._agent = artifacts.agent
|
||||
self._runnable_config = artifacts.runnable_config
|
||||
self._lg_modes = normalize_stream_modes(self.stream_modes)
|
||||
logger.info(
|
||||
"Run %s: streaming with modes %s (requested: %s)",
|
||||
self.run_id,
|
||||
self._lg_modes,
|
||||
self.stream_modes,
|
||||
)
|
||||
|
||||
async def _finish_success(self) -> None:
|
||||
await self.run_manager.set_status(self.run_id, RunStatus.success)
|
||||
await self.bridge.publish_terminal(self.run_id, StreamStatus.ENDED)
|
||||
self.result.status = RunStatus.success
|
||||
completion_data = self._completion_data()
|
||||
title = self._callback_title() or await self._extract_title_from_checkpoint()
|
||||
self.result.title = title
|
||||
self.result.completion_data = completion_data
|
||||
await self._emit(
|
||||
LifecycleEventType.RUN_COMPLETED,
|
||||
{
|
||||
"title": title,
|
||||
"completion_data": completion_data,
|
||||
},
|
||||
)
|
||||
|
||||
async def _finish_aborted(self, cancel_mode: str) -> None:
|
||||
payload = {
|
||||
"cancel_mode": cancel_mode,
|
||||
"pre_run_checkpoint_id": self._pre_run_checkpoint_id,
|
||||
"completion_data": self._completion_data(),
|
||||
}
|
||||
|
||||
if cancel_mode == "rollback":
|
||||
await self.run_manager.set_status(
|
||||
self.run_id,
|
||||
RunStatus.error,
|
||||
error="Rolled back by user",
|
||||
)
|
||||
await self.bridge.publish_terminal(
|
||||
self.run_id,
|
||||
StreamStatus.CANCELLED,
|
||||
{"cancel_mode": "rollback", "message": "Rolled back by user"},
|
||||
)
|
||||
self.result.status = RunStatus.error
|
||||
self.result.error = "Rolled back by user"
|
||||
logger.info("Run %s rolled back", self.run_id)
|
||||
else:
|
||||
await self.run_manager.set_status(self.run_id, RunStatus.interrupted)
|
||||
await self.bridge.publish_terminal(
|
||||
self.run_id,
|
||||
StreamStatus.CANCELLED,
|
||||
{"cancel_mode": cancel_mode},
|
||||
)
|
||||
self.result.status = RunStatus.interrupted
|
||||
logger.info("Run %s cancelled (mode=%s)", self.run_id, cancel_mode)
|
||||
|
||||
await self._emit(LifecycleEventType.RUN_CANCELLED, payload)
|
||||
|
||||
async def _finish_failed(self, exc: Exception) -> None:
|
||||
error_msg = str(exc)
|
||||
logger.exception("Run %s failed: %s", self.run_id, error_msg)
|
||||
|
||||
await self.run_manager.set_status(self.run_id, RunStatus.error, error=error_msg)
|
||||
await self.bridge.publish_terminal(
|
||||
self.run_id,
|
||||
StreamStatus.ERRORED,
|
||||
{"message": error_msg, "name": type(exc).__name__},
|
||||
)
|
||||
self.result.status = RunStatus.error
|
||||
self.result.error = error_msg
|
||||
|
||||
await self._emit(
|
||||
LifecycleEventType.RUN_FAILED,
|
||||
{
|
||||
"error": error_msg,
|
||||
"error_type": type(exc).__name__,
|
||||
"completion_data": self._completion_data(),
|
||||
},
|
||||
)
|
||||
|
||||
def _completion_data(self) -> dict[str, object]:
|
||||
if self._callback_artifacts is None:
|
||||
return {}
|
||||
return self._callback_artifacts.completion_data().to_dict()
|
||||
|
||||
def _callback_title(self) -> str | None:
|
||||
if self._callback_artifacts is None:
|
||||
return None
|
||||
return self._callback_artifacts.title()
|
||||
|
||||
async def _extract_title_from_checkpoint(self) -> str | None:
|
||||
if self.checkpointer is None:
|
||||
return None
|
||||
try:
|
||||
ckpt_config = {"configurable": {"thread_id": self.thread_id, "checkpoint_ns": ""}}
|
||||
ckpt_tuple = await self.checkpointer.aget_tuple(ckpt_config)
|
||||
if ckpt_tuple is not None:
|
||||
ckpt = getattr(ckpt_tuple, "checkpoint", {}) or {}
|
||||
return ckpt.get("channel_values", {}).get("title")
|
||||
except Exception:
|
||||
logger.debug("Failed to extract title from checkpoint for thread %s", self.thread_id)
|
||||
return None
|
||||
|
||||
def _map_run_status_to_thread_status(self, status: RunStatus) -> str:
|
||||
if status == RunStatus.success:
|
||||
return "idle"
|
||||
if status == RunStatus.interrupted:
|
||||
return "interrupted"
|
||||
if status in (RunStatus.error, RunStatus.timeout):
|
||||
return "error"
|
||||
return "running"
|
||||
|
||||
def _abort_requested(self) -> bool:
|
||||
if self.handle is not None:
|
||||
return self.handle.cancel_event.is_set()
|
||||
return self.record.abort_event.is_set()
|
||||
|
||||
def _abort_action(self) -> str:
|
||||
if self.handle is not None:
|
||||
return self.handle.cancel_action
|
||||
return self.record.abort_action
|
||||
|
||||
async def _stream(self) -> None:
|
||||
runnable_config = RunnableConfig(**self._runnable_config)
|
||||
|
||||
if len(self._lg_modes) == 1 and not self.stream_subgraphs:
|
||||
single_mode = self._lg_modes[0]
|
||||
async for chunk in self._agent.astream(
|
||||
self.graph_input,
|
||||
config=runnable_config,
|
||||
stream_mode=single_mode,
|
||||
):
|
||||
if self._abort_requested():
|
||||
logger.info("Run %s abort requested - stopping", self.run_id)
|
||||
break
|
||||
if should_filter_event(single_mode, chunk):
|
||||
continue
|
||||
await self.bridge.publish(
|
||||
self.run_id,
|
||||
external_stream_event_name(single_mode),
|
||||
serialize(chunk, mode=single_mode),
|
||||
)
|
||||
return
|
||||
|
||||
async for item in self._agent.astream(
|
||||
self.graph_input,
|
||||
config=runnable_config,
|
||||
stream_mode=self._lg_modes,
|
||||
subgraphs=self.stream_subgraphs,
|
||||
):
|
||||
if self._abort_requested():
|
||||
logger.info("Run %s abort requested - stopping", self.run_id)
|
||||
break
|
||||
|
||||
mode, chunk = unpack_stream_item(item, self._lg_modes, stream_subgraphs=self.stream_subgraphs)
|
||||
if mode is None:
|
||||
continue
|
||||
if should_filter_event(mode, chunk):
|
||||
continue
|
||||
await self.bridge.publish(
|
||||
self.run_id,
|
||||
external_stream_event_name(mode),
|
||||
serialize(chunk, mode=mode),
|
||||
)
|
||||
|
||||
async def _finish_after_stream(self) -> None:
|
||||
if self._abort_requested():
|
||||
action = self._abort_action()
|
||||
cancel_mode = "rollback" if action == "rollback" else "interrupt"
|
||||
await self._finish_aborted(cancel_mode)
|
||||
return
|
||||
|
||||
await self._finish_success()
|
||||
|
||||
async def _emit_final_thread_status(self) -> None:
|
||||
final_thread_status = self._map_run_status_to_thread_status(self.result.status)
|
||||
await self._emit(
|
||||
LifecycleEventType.THREAD_STATUS_UPDATED,
|
||||
{"status": final_thread_status},
|
||||
)
|
||||
|
||||
async def run(self) -> RunResult:
|
||||
try:
|
||||
await self._start()
|
||||
await self._prepare()
|
||||
await self._stream()
|
||||
await self._finish_after_stream()
|
||||
except asyncio.CancelledError:
|
||||
await self._finish_aborted("task_cancelled")
|
||||
except Exception as exc:
|
||||
await self._finish_failed(exc)
|
||||
finally:
|
||||
await self._emit_final_thread_status()
|
||||
if self._callback_artifacts is not None:
|
||||
await self._callback_artifacts.flush()
|
||||
await self.bridge.cleanup(self.run_id)
|
||||
|
||||
return self.result
|
||||
|
||||
|
||||
__all__ = ["_RunExecution"]
|
||||
@ -0,0 +1,93 @@
|
||||
"""Execution-local stream processing helpers."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class StreamItem:
|
||||
"""Normalized stream item from LangGraph."""
|
||||
|
||||
mode: str
|
||||
chunk: Any
|
||||
|
||||
|
||||
_FILTERED_NODES = frozenset({"__start__", "__end__"})
|
||||
_VALID_LG_MODES = {"values", "updates", "checkpoints", "tasks", "debug", "messages", "custom"}
|
||||
|
||||
|
||||
def normalize_stream_modes(requested_modes: list[str] | None) -> list[str]:
|
||||
"""Normalize requested stream modes to valid LangGraph modes."""
|
||||
input_modes: list[str] = list(requested_modes or ["values"])
|
||||
|
||||
lg_modes: list[str] = []
|
||||
for mode in input_modes:
|
||||
if mode == "messages-tuple":
|
||||
lg_modes.append("messages")
|
||||
elif mode == "events":
|
||||
logger.info("'events' stream_mode not supported (requires astream_events). Skipping.")
|
||||
continue
|
||||
elif mode in _VALID_LG_MODES:
|
||||
lg_modes.append(mode)
|
||||
|
||||
if not lg_modes:
|
||||
lg_modes = ["values"]
|
||||
|
||||
seen: set[str] = set()
|
||||
deduped: list[str] = []
|
||||
for mode in lg_modes:
|
||||
if mode not in seen:
|
||||
seen.add(mode)
|
||||
deduped.append(mode)
|
||||
|
||||
return deduped
|
||||
|
||||
|
||||
def unpack_stream_item(
|
||||
item: Any,
|
||||
lg_modes: list[str],
|
||||
*,
|
||||
stream_subgraphs: bool,
|
||||
) -> tuple[str | None, Any]:
|
||||
"""Unpack a multi-mode or subgraph stream item into ``(mode, chunk)``."""
|
||||
if stream_subgraphs:
|
||||
if isinstance(item, tuple) and len(item) == 3:
|
||||
_namespace, mode, chunk = item
|
||||
return str(mode), chunk
|
||||
if isinstance(item, tuple) and len(item) == 2:
|
||||
mode, chunk = item
|
||||
return str(mode), chunk
|
||||
return None, None
|
||||
|
||||
if isinstance(item, tuple) and len(item) == 2:
|
||||
mode, chunk = item
|
||||
return str(mode), chunk
|
||||
|
||||
return lg_modes[0] if lg_modes else None, item
|
||||
|
||||
|
||||
def should_filter_event(mode: str, chunk: Any) -> bool:
|
||||
"""Determine whether a stream event should be filtered before publish."""
|
||||
if mode == "updates" and isinstance(chunk, dict):
|
||||
node_names = set(chunk.keys())
|
||||
if node_names & _FILTERED_NODES:
|
||||
return True
|
||||
|
||||
if mode == "messages" and isinstance(chunk, tuple) and len(chunk) == 2:
|
||||
_, metadata = chunk
|
||||
if isinstance(metadata, dict):
|
||||
node = metadata.get("langgraph_node", "")
|
||||
if node in _FILTERED_NODES:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
|
||||
def external_stream_event_name(mode: str) -> str:
|
||||
"""Map LangGraph internal modes to the external SSE event contract."""
|
||||
return mode
|
||||
@ -0,0 +1,78 @@
|
||||
"""Active execution handle management for runs domain."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from collections.abc import Awaitable, Callable
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
from ...types import CancelAction
|
||||
|
||||
|
||||
@dataclass
|
||||
class RunHandle:
|
||||
"""In-process control handle for an active run."""
|
||||
|
||||
run_id: str
|
||||
task: asyncio.Task[Any] | None = None
|
||||
cancel_event: asyncio.Event = field(default_factory=asyncio.Event)
|
||||
cancel_action: CancelAction = "interrupt"
|
||||
|
||||
|
||||
class RunSupervisor:
|
||||
"""Own and control active run handles within the current process."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._handles: dict[str, RunHandle] = {}
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def launch(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
runner: Callable[[RunHandle], Awaitable[Any]],
|
||||
) -> RunHandle:
|
||||
"""Create a handle and start a background task for it."""
|
||||
handle = RunHandle(run_id=run_id)
|
||||
|
||||
async with self._lock:
|
||||
if run_id in self._handles:
|
||||
raise RuntimeError(f"Run {run_id} is already active")
|
||||
self._handles[run_id] = handle
|
||||
|
||||
task = asyncio.create_task(runner(handle))
|
||||
handle.task = task
|
||||
task.add_done_callback(lambda _: asyncio.create_task(self.cleanup(run_id)))
|
||||
return handle
|
||||
|
||||
async def cancel(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
action: CancelAction = "interrupt",
|
||||
) -> bool:
|
||||
"""Signal cancellation for an active handle."""
|
||||
async with self._lock:
|
||||
handle = self._handles.get(run_id)
|
||||
if handle is None:
|
||||
return False
|
||||
|
||||
handle.cancel_action = action
|
||||
handle.cancel_event.set()
|
||||
if handle.task is not None and not handle.task.done():
|
||||
handle.task.cancel()
|
||||
|
||||
return True
|
||||
|
||||
def get_handle(self, run_id: str) -> RunHandle | None:
|
||||
"""Return the active handle for a run, if any."""
|
||||
return self._handles.get(run_id)
|
||||
|
||||
async def cleanup(self, run_id: str, *, delay: float = 0) -> None:
|
||||
"""Remove a handle after optional delay."""
|
||||
if delay > 0:
|
||||
await asyncio.sleep(delay)
|
||||
|
||||
async with self._lock:
|
||||
self._handles.pop(run_id, None)
|
||||
@ -7,12 +7,9 @@ import logging
|
||||
import uuid
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import UTC, datetime
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, Any, Literal
|
||||
|
||||
from .schemas import DisconnectMode, RunStatus
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from deerflow.runtime.runs.store.base import RunStore
|
||||
from ..types import RunStatus
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@ -29,7 +26,7 @@ class RunRecord:
|
||||
thread_id: str
|
||||
assistant_id: str | None
|
||||
status: RunStatus
|
||||
on_disconnect: DisconnectMode
|
||||
on_disconnect: Literal["cancel", "continue"]
|
||||
multitask_strategy: str = "reject"
|
||||
metadata: dict = field(default_factory=dict)
|
||||
kwargs: dict = field(default_factory=dict)
|
||||
@ -49,12 +46,12 @@ class RunManager:
|
||||
that run history survives process restarts.
|
||||
"""
|
||||
|
||||
def __init__(self, store: RunStore | None = None) -> None:
|
||||
def __init__(self, store: Any | None = None) -> None:
|
||||
self._runs: dict[str, RunRecord] = {}
|
||||
self._lock = asyncio.Lock()
|
||||
self._store = store
|
||||
|
||||
async def _persist_to_store(self, record: RunRecord) -> None:
|
||||
async def _persist_to_store(self, record: RunRecord, *, follow_up_to_run_id: str | None = None) -> None:
|
||||
"""Best-effort persist run record to backing store."""
|
||||
if self._store is None:
|
||||
return
|
||||
@ -68,6 +65,7 @@ class RunManager:
|
||||
metadata=record.metadata or {},
|
||||
kwargs=record.kwargs or {},
|
||||
created_at=record.created_at,
|
||||
follow_up_to_run_id=follow_up_to_run_id,
|
||||
)
|
||||
except Exception:
|
||||
logger.warning("Failed to persist run %s to store", record.run_id, exc_info=True)
|
||||
@ -85,10 +83,11 @@ class RunManager:
|
||||
thread_id: str,
|
||||
assistant_id: str | None = None,
|
||||
*,
|
||||
on_disconnect: DisconnectMode = DisconnectMode.cancel,
|
||||
on_disconnect: Literal["cancel", "continue"] = "cancel",
|
||||
metadata: dict | None = None,
|
||||
kwargs: dict | None = None,
|
||||
multitask_strategy: str = "reject",
|
||||
follow_up_to_run_id: str | None = None,
|
||||
) -> RunRecord:
|
||||
"""Create a new pending run and register it."""
|
||||
run_id = str(uuid.uuid4())
|
||||
@ -107,7 +106,7 @@ class RunManager:
|
||||
)
|
||||
async with self._lock:
|
||||
self._runs[run_id] = record
|
||||
await self._persist_to_store(record)
|
||||
await self._persist_to_store(record, follow_up_to_run_id=follow_up_to_run_id)
|
||||
logger.info("Run created: run_id=%s thread_id=%s", run_id, thread_id)
|
||||
return record
|
||||
|
||||
@ -120,7 +119,7 @@ class RunManager:
|
||||
async with self._lock:
|
||||
# Dict insertion order matches creation order, so reversing it gives
|
||||
# us deterministic newest-first results even when timestamps tie.
|
||||
return [r for r in self._runs.values() if r.thread_id == thread_id]
|
||||
return [r for r in reversed(self._runs.values()) if r.thread_id == thread_id]
|
||||
|
||||
async def set_status(self, run_id: str, status: RunStatus, *, error: str | None = None) -> None:
|
||||
"""Transition a run to a new status."""
|
||||
@ -170,10 +169,11 @@ class RunManager:
|
||||
thread_id: str,
|
||||
assistant_id: str | None = None,
|
||||
*,
|
||||
on_disconnect: DisconnectMode = DisconnectMode.cancel,
|
||||
on_disconnect: Literal["cancel", "continue"] = "cancel",
|
||||
metadata: dict | None = None,
|
||||
kwargs: dict | None = None,
|
||||
multitask_strategy: str = "reject",
|
||||
follow_up_to_run_id: str | None = None,
|
||||
) -> RunRecord:
|
||||
"""Atomically check for inflight runs and create a new one.
|
||||
|
||||
@ -227,7 +227,7 @@ class RunManager:
|
||||
)
|
||||
self._runs[run_id] = record
|
||||
|
||||
await self._persist_to_store(record)
|
||||
await self._persist_to_store(record, follow_up_to_run_id=follow_up_to_run_id)
|
||||
logger.info("Run created: run_id=%s thread_id=%s", run_id, thread_id)
|
||||
return record
|
||||
|
||||
@ -0,0 +1,42 @@
|
||||
"""Execution plan builder for runs domain."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from copy import deepcopy
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, Literal
|
||||
|
||||
from ..types import RunRecord, RunSpec
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ExecutionPlan:
|
||||
"""Normalized execution inputs derived from a run record and spec."""
|
||||
|
||||
record: RunRecord
|
||||
graph_input: dict[str, Any]
|
||||
runnable_config: dict[str, Any]
|
||||
stream_modes: list[str]
|
||||
stream_subgraphs: bool
|
||||
interrupt_before: list[str] | Literal["*"] | None
|
||||
interrupt_after: list[str] | Literal["*"] | None
|
||||
|
||||
|
||||
class ExecutionPlanner:
|
||||
"""Build executor-ready plans from public run specs."""
|
||||
|
||||
def build(self, record: RunRecord, spec: RunSpec) -> ExecutionPlan:
|
||||
return ExecutionPlan(
|
||||
record=record,
|
||||
graph_input=self._normalize_graph_input(spec.input),
|
||||
runnable_config=deepcopy(spec.runnable_config),
|
||||
stream_modes=list(spec.stream_modes),
|
||||
stream_subgraphs=spec.stream_subgraphs,
|
||||
interrupt_before=spec.interrupt_before,
|
||||
interrupt_after=spec.interrupt_after,
|
||||
)
|
||||
|
||||
def _normalize_graph_input(self, raw_input: dict[str, Any] | None) -> dict[str, Any]:
|
||||
if raw_input is None:
|
||||
return {}
|
||||
return deepcopy(raw_input)
|
||||
@ -0,0 +1,146 @@
|
||||
"""In-memory run registry for runs domain state."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import uuid
|
||||
from datetime import datetime, timezone
|
||||
from typing import Any
|
||||
|
||||
from ..types import INFLIGHT_STATUSES, RunRecord, RunSpec, RunStatus
|
||||
|
||||
|
||||
class RunRegistry:
|
||||
"""In-memory source of truth for run records and their status."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._records: dict[str, RunRecord] = {}
|
||||
self._thread_index: dict[str, set[str]] = {} # thread_id -> set[run_id]
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def create(self, spec: RunSpec) -> RunRecord:
|
||||
"""Create a new RunRecord from RunSpec."""
|
||||
run_id = str(uuid.uuid4())
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
record = RunRecord(
|
||||
run_id=run_id,
|
||||
thread_id=spec.scope.thread_id,
|
||||
assistant_id=spec.assistant_id,
|
||||
status="pending",
|
||||
temporary=spec.scope.temporary,
|
||||
multitask_strategy=spec.multitask_strategy,
|
||||
metadata=dict(spec.metadata),
|
||||
follow_up_to_run_id=spec.follow_up_to_run_id,
|
||||
created_at=now,
|
||||
updated_at=now,
|
||||
)
|
||||
|
||||
async with self._lock:
|
||||
self._records[run_id] = record
|
||||
# Update thread index
|
||||
if spec.scope.thread_id not in self._thread_index:
|
||||
self._thread_index[spec.scope.thread_id] = set()
|
||||
self._thread_index[spec.scope.thread_id].add(run_id)
|
||||
|
||||
return record
|
||||
|
||||
def get(self, run_id: str) -> RunRecord | None:
|
||||
"""Get RunRecord by run_id."""
|
||||
return self._records.get(run_id)
|
||||
|
||||
async def list_by_thread(self, thread_id: str) -> list[RunRecord]:
|
||||
"""List all RunRecords for a thread."""
|
||||
async with self._lock:
|
||||
run_ids = self._thread_index.get(thread_id, set())
|
||||
return [self._records[rid] for rid in run_ids if rid in self._records]
|
||||
|
||||
async def set_status(
|
||||
self,
|
||||
run_id: str,
|
||||
status: RunStatus,
|
||||
*,
|
||||
error: str | None = None,
|
||||
started_at: str | None = None,
|
||||
ended_at: str | None = None,
|
||||
) -> None:
|
||||
"""Update run status and optional fields."""
|
||||
async with self._lock:
|
||||
record = self._records.get(run_id)
|
||||
if record is None:
|
||||
return
|
||||
|
||||
record.status = status
|
||||
record.updated_at = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
if error is not None:
|
||||
record.error = error
|
||||
if started_at is not None:
|
||||
record.started_at = started_at
|
||||
if ended_at is not None:
|
||||
record.ended_at = ended_at
|
||||
|
||||
async def has_inflight(self, thread_id: str) -> bool:
|
||||
"""Check if thread has any inflight runs."""
|
||||
async with self._lock:
|
||||
run_ids = self._thread_index.get(thread_id, set())
|
||||
for rid in run_ids:
|
||||
record = self._records.get(rid)
|
||||
if record and record.status in INFLIGHT_STATUSES:
|
||||
return True
|
||||
return False
|
||||
|
||||
async def interrupt_inflight(self, thread_id: str) -> list[str]:
|
||||
"""
|
||||
Mark all inflight runs for a thread as interrupted.
|
||||
|
||||
Returns list of interrupted run_ids.
|
||||
"""
|
||||
interrupted: list[str] = []
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
async with self._lock:
|
||||
run_ids = self._thread_index.get(thread_id, set())
|
||||
for rid in run_ids:
|
||||
record = self._records.get(rid)
|
||||
if record and record.status in INFLIGHT_STATUSES:
|
||||
record.status = "interrupted"
|
||||
record.updated_at = now
|
||||
record.ended_at = now
|
||||
interrupted.append(rid)
|
||||
|
||||
return interrupted
|
||||
|
||||
async def update_metadata(self, run_id: str, metadata: dict[str, Any]) -> None:
|
||||
"""Update run metadata."""
|
||||
async with self._lock:
|
||||
record = self._records.get(run_id)
|
||||
if record is not None:
|
||||
record.metadata.update(metadata)
|
||||
record.updated_at = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
async def delete(self, run_id: str) -> bool:
|
||||
"""Delete a run record. Returns True if deleted."""
|
||||
async with self._lock:
|
||||
record = self._records.pop(run_id, None)
|
||||
if record is None:
|
||||
return False
|
||||
|
||||
# Update thread index
|
||||
thread_runs = self._thread_index.get(record.thread_id)
|
||||
if thread_runs:
|
||||
thread_runs.discard(run_id)
|
||||
|
||||
return True
|
||||
|
||||
def count(self) -> int:
|
||||
"""Return total number of records."""
|
||||
return len(self._records)
|
||||
|
||||
def count_by_status(self, status: RunStatus) -> int:
|
||||
"""Return count of records with given status."""
|
||||
return sum(1 for r in self._records.values() if r.status == status)
|
||||
|
||||
|
||||
# Compatibility alias during the refactor.
|
||||
RuntimeRunRegistry = RunRegistry
|
||||
@ -0,0 +1,76 @@
|
||||
"""Internal run stream adapter over StreamBridge."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import AsyncIterator
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from deerflow.runtime.stream_bridge import JSONValue, StreamBridge, StreamEvent
|
||||
|
||||
from deerflow.runtime.stream_bridge import StreamStatus
|
||||
|
||||
|
||||
class RunStreamService:
|
||||
"""Thin runs-domain adapter over the harness stream bridge contract."""
|
||||
|
||||
def __init__(self, bridge: "StreamBridge") -> None:
|
||||
self._bridge = bridge
|
||||
|
||||
async def publish_event(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
event: str,
|
||||
data: "JSONValue",
|
||||
) -> str:
|
||||
"""Publish a replayable run event."""
|
||||
return await self._bridge.publish(run_id, event, data)
|
||||
|
||||
async def publish_end(self, run_id: str) -> str:
|
||||
"""Publish a successful terminal signal."""
|
||||
return await self._bridge.publish_terminal(run_id, StreamStatus.ENDED)
|
||||
|
||||
async def publish_cancelled(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
data: "JSONValue" = None,
|
||||
) -> str:
|
||||
"""Publish a cancelled terminal signal."""
|
||||
return await self._bridge.publish_terminal(
|
||||
run_id,
|
||||
StreamStatus.CANCELLED,
|
||||
data,
|
||||
)
|
||||
|
||||
async def publish_error(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
data: "JSONValue",
|
||||
) -> str:
|
||||
"""Publish a failed terminal signal."""
|
||||
return await self._bridge.publish_terminal(
|
||||
run_id,
|
||||
StreamStatus.ERRORED,
|
||||
data,
|
||||
)
|
||||
|
||||
def subscribe(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
last_event_id: str | None = None,
|
||||
heartbeat_interval: float = 15.0,
|
||||
) -> AsyncIterator[StreamEvent]:
|
||||
"""Subscribe to a run stream with resume support."""
|
||||
return self._bridge.subscribe(
|
||||
run_id,
|
||||
last_event_id=last_event_id,
|
||||
heartbeat_interval=heartbeat_interval,
|
||||
)
|
||||
|
||||
async def cleanup(self, run_id: str, *, delay: float = 0) -> None:
|
||||
"""Release per-run bridge resources after completion."""
|
||||
await self._bridge.cleanup(run_id, delay=delay)
|
||||
@ -0,0 +1,95 @@
|
||||
"""Internal run wait helpers based on stream events."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
from deerflow.runtime.stream_bridge import StreamEvent
|
||||
|
||||
from .streams import RunStreamService
|
||||
|
||||
|
||||
class WaitTimeoutError(TimeoutError):
|
||||
"""Raised when wait times out."""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
class WaitErrorResult:
|
||||
"""Represents an error result from wait."""
|
||||
|
||||
def __init__(self, error: str, details: dict[str, Any] | None = None) -> None:
|
||||
self.error = error
|
||||
self.details = details or {}
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {"error": self.error, **self.details}
|
||||
|
||||
|
||||
class RunWaitService:
|
||||
"""
|
||||
Wait service for runs domain.
|
||||
|
||||
Based on RunStreamService.subscribe(), implements wait semantics.
|
||||
|
||||
Phase 1 behavior:
|
||||
- Records last 'values' event
|
||||
- On 'error', returns unified error structure
|
||||
- On 'end' only, returns last values
|
||||
"""
|
||||
|
||||
TERMINAL_EVENTS = frozenset({"end", "error", "cancel"})
|
||||
|
||||
def __init__(self, stream_service: RunStreamService) -> None:
|
||||
self._stream_service = stream_service
|
||||
|
||||
async def wait_for_terminal(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
last_event_id: str | None = None,
|
||||
) -> StreamEvent | None:
|
||||
"""Block until the next terminal event for a run is observed."""
|
||||
async for event in self._stream_service.subscribe(
|
||||
run_id,
|
||||
last_event_id=last_event_id,
|
||||
):
|
||||
if event.event in self.TERMINAL_EVENTS:
|
||||
return event
|
||||
|
||||
return None
|
||||
|
||||
async def wait_for_values_or_error(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
last_event_id: str | None = None,
|
||||
) -> dict[str, Any] | WaitErrorResult | None:
|
||||
"""
|
||||
Wait for run to complete and return final values or error.
|
||||
|
||||
Returns:
|
||||
- dict: Final values if successful
|
||||
- WaitErrorResult: If run failed
|
||||
- None: If no values were produced
|
||||
"""
|
||||
last_values: dict[str, Any] | None = None
|
||||
|
||||
async for event in self._stream_service.subscribe(
|
||||
run_id,
|
||||
last_event_id=last_event_id,
|
||||
):
|
||||
if event.event == "values":
|
||||
last_values = event.data
|
||||
|
||||
elif event.event == "error":
|
||||
return WaitErrorResult(
|
||||
error=str(event.data) if event.data else "Unknown error",
|
||||
details={"run_id": run_id},
|
||||
)
|
||||
|
||||
elif event.event in self.TERMINAL_EVENTS:
|
||||
# Stream ended, return last values
|
||||
break
|
||||
|
||||
return last_values
|
||||
203
backend/packages/harness/deerflow/runtime/runs/observer.py
Normal file
203
backend/packages/harness/deerflow/runtime/runs/observer.py
Normal file
@ -0,0 +1,203 @@
|
||||
"""Run lifecycle observer types for decoupled observation.
|
||||
|
||||
Defines the RunObserver protocol and lifecycle event types that allow
|
||||
the harness layer to emit notifications without directly calling
|
||||
storage implementations.
|
||||
|
||||
The app layer provides concrete observers (e.g., StorageObserver) that
|
||||
map lifecycle events to persistence operations.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from collections.abc import Awaitable, Callable, Mapping
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from enum import Enum
|
||||
from typing import Any, Protocol, runtime_checkable
|
||||
|
||||
from .types import RunStatus
|
||||
|
||||
# Callback type for lightweight observer registration
|
||||
type RunEventCallback = Callable[["RunLifecycleEvent"], Awaitable[None]]
|
||||
|
||||
|
||||
class LifecycleEventType(str, Enum):
|
||||
"""Lifecycle event types emitted during run execution."""
|
||||
|
||||
# Run lifecycle
|
||||
RUN_STARTED = "run_started"
|
||||
RUN_COMPLETED = "run_completed"
|
||||
RUN_FAILED = "run_failed"
|
||||
RUN_CANCELLED = "run_cancelled"
|
||||
|
||||
# Human message (for event store)
|
||||
HUMAN_MESSAGE = "human_message"
|
||||
|
||||
# Thread status updates
|
||||
THREAD_STATUS_UPDATED = "thread_status_updated"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RunLifecycleEvent:
|
||||
"""A single lifecycle event emitted during run execution.
|
||||
|
||||
Attributes:
|
||||
event_type: The type of lifecycle event.
|
||||
run_id: The run that emitted this event.
|
||||
thread_id: The thread this run belongs to.
|
||||
payload: Event-specific data (varies by event_type).
|
||||
"""
|
||||
|
||||
event_id: str
|
||||
event_type: LifecycleEventType
|
||||
run_id: str
|
||||
thread_id: str
|
||||
sequence: int
|
||||
occurred_at: datetime
|
||||
payload: Mapping[str, Any] = field(default_factory=dict)
|
||||
|
||||
|
||||
@dataclass
|
||||
class RunResult:
|
||||
"""Minimal result returned after run execution.
|
||||
|
||||
Contains only the data needed for the caller to understand
|
||||
what happened. Detailed events are delivered via observer.
|
||||
|
||||
Attributes:
|
||||
run_id: The run ID.
|
||||
thread_id: The thread ID.
|
||||
status: Final status (success, error, interrupted, etc.).
|
||||
error: Error message if status is error.
|
||||
completion_data: Token usage and message counts from journal.
|
||||
title: Thread title extracted from checkpoint (if available).
|
||||
"""
|
||||
|
||||
run_id: str
|
||||
thread_id: str
|
||||
status: RunStatus
|
||||
error: str | None = None
|
||||
completion_data: dict[str, Any] = field(default_factory=dict)
|
||||
title: str | None = None
|
||||
|
||||
|
||||
@runtime_checkable
|
||||
class RunObserver(Protocol):
|
||||
"""Protocol for observing run lifecycle events.
|
||||
|
||||
Implementations receive events as they occur during execution
|
||||
and can perform side effects (storage, logging, metrics, etc.)
|
||||
without coupling the worker to specific implementations.
|
||||
|
||||
Methods are async to support IO-bound operations like database writes.
|
||||
"""
|
||||
|
||||
async def on_event(self, event: RunLifecycleEvent) -> None:
|
||||
"""Called when a lifecycle event occurs.
|
||||
|
||||
Args:
|
||||
event: The lifecycle event with type, IDs, and payload.
|
||||
|
||||
Implementations should be explicit about failure handling.
|
||||
CompositeObserver can be configured to either swallow or raise
|
||||
observer failures based on each binding's ``required`` flag.
|
||||
"""
|
||||
...
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ObserverBinding:
|
||||
"""Observer registration with failure policy.
|
||||
|
||||
Attributes:
|
||||
observer: Observer instance to invoke.
|
||||
required: When True, observer failures are raised to the caller.
|
||||
When False, failures are logged and dispatch continues.
|
||||
"""
|
||||
|
||||
observer: RunObserver
|
||||
required: bool = False
|
||||
|
||||
|
||||
class CompositeObserver:
|
||||
"""Observer that delegates to multiple child observers.
|
||||
|
||||
Useful for combining storage, metrics, and logging observers.
|
||||
Optional observers are logged on failure; required observers raise.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
observers: list[RunObserver | ObserverBinding] | None = None,
|
||||
) -> None:
|
||||
self._observers: list[ObserverBinding] = [
|
||||
obs if isinstance(obs, ObserverBinding) else ObserverBinding(obs)
|
||||
for obs in (observers or [])
|
||||
]
|
||||
|
||||
def add(self, observer: RunObserver, *, required: bool = False) -> None:
|
||||
"""Add an observer to the composite."""
|
||||
self._observers.append(ObserverBinding(observer=observer, required=required))
|
||||
|
||||
async def on_event(self, event: RunLifecycleEvent) -> None:
|
||||
"""Dispatch event to all child observers."""
|
||||
logger = logging.getLogger(__name__)
|
||||
for binding in self._observers:
|
||||
try:
|
||||
await binding.observer.on_event(event)
|
||||
except Exception:
|
||||
if binding.required:
|
||||
raise
|
||||
logger.warning(
|
||||
"Observer %s failed on event %s",
|
||||
type(binding.observer).__name__,
|
||||
event.event_type.value,
|
||||
exc_info=True,
|
||||
)
|
||||
|
||||
|
||||
class NullObserver:
|
||||
"""No-op observer for when no observation is needed."""
|
||||
|
||||
async def on_event(self, event: RunLifecycleEvent) -> None:
|
||||
"""Do nothing."""
|
||||
pass
|
||||
|
||||
|
||||
@dataclass(slots=True)
|
||||
class CallbackObserver:
|
||||
"""Adapter that wraps a callback function as a RunObserver.
|
||||
|
||||
Allows lightweight callback functions to participate in the
|
||||
observer protocol without defining a full class.
|
||||
"""
|
||||
|
||||
callback: RunEventCallback
|
||||
|
||||
async def on_event(self, event: RunLifecycleEvent) -> None:
|
||||
"""Invoke the wrapped callback with the event."""
|
||||
await self.callback(event)
|
||||
|
||||
|
||||
type ObserverLike = RunObserver | RunEventCallback | None
|
||||
|
||||
|
||||
def ensure_observer(observer: ObserverLike) -> RunObserver:
|
||||
"""Normalize an observer-like value to a RunObserver.
|
||||
|
||||
Args:
|
||||
observer: Can be:
|
||||
- None: returns NullObserver
|
||||
- A callable: wraps in CallbackObserver
|
||||
- A RunObserver: returns as-is
|
||||
|
||||
Returns:
|
||||
A RunObserver instance.
|
||||
"""
|
||||
if observer is None:
|
||||
return NullObserver()
|
||||
if callable(observer) and not isinstance(observer, RunObserver):
|
||||
return CallbackObserver(observer)
|
||||
return observer
|
||||
@ -1,21 +0,0 @@
|
||||
"""Run status and disconnect mode enums."""
|
||||
|
||||
from enum import StrEnum
|
||||
|
||||
|
||||
class RunStatus(StrEnum):
|
||||
"""Lifecycle status of a single run."""
|
||||
|
||||
pending = "pending"
|
||||
running = "running"
|
||||
success = "success"
|
||||
error = "error"
|
||||
timeout = "timeout"
|
||||
interrupted = "interrupted"
|
||||
|
||||
|
||||
class DisconnectMode(StrEnum):
|
||||
"""Behaviour when the SSE consumer disconnects."""
|
||||
|
||||
cancel = "cancel"
|
||||
continue_ = "continue"
|
||||
@ -1,4 +1,13 @@
|
||||
from deerflow.runtime.runs.store.base import RunStore
|
||||
from deerflow.runtime.runs.store.memory import MemoryRunStore
|
||||
"""Store boundary protocols for runs."""
|
||||
|
||||
__all__ = ["MemoryRunStore", "RunStore"]
|
||||
from .create_store import RunCreateStore
|
||||
from .delete_store import RunDeleteStore
|
||||
from .event_store import RunEventStore
|
||||
from .query_store import RunQueryStore
|
||||
|
||||
__all__ = [
|
||||
"RunCreateStore",
|
||||
"RunDeleteStore",
|
||||
"RunEventStore",
|
||||
"RunQueryStore",
|
||||
]
|
||||
|
||||
@ -1,95 +0,0 @@
|
||||
"""Abstract interface for run metadata storage.
|
||||
|
||||
RunManager depends on this interface. Implementations:
|
||||
- MemoryRunStore: in-memory dict (development, tests)
|
||||
- Future: RunRepository backed by SQLAlchemy ORM
|
||||
|
||||
All methods accept an optional user_id for user isolation.
|
||||
When user_id is None, no user filtering is applied (single-user mode).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import abc
|
||||
from typing import Any
|
||||
|
||||
|
||||
class RunStore(abc.ABC):
|
||||
@abc.abstractmethod
|
||||
async def put(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
thread_id: str,
|
||||
assistant_id: str | None = None,
|
||||
user_id: str | None = None,
|
||||
status: str = "pending",
|
||||
multitask_strategy: str = "reject",
|
||||
metadata: dict[str, Any] | None = None,
|
||||
kwargs: dict[str, Any] | None = None,
|
||||
error: str | None = None,
|
||||
created_at: str | None = None,
|
||||
) -> None:
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
async def get(self, run_id: str) -> dict[str, Any] | None:
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
async def list_by_thread(
|
||||
self,
|
||||
thread_id: str,
|
||||
*,
|
||||
user_id: str | None = None,
|
||||
limit: int = 100,
|
||||
) -> list[dict[str, Any]]:
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
async def update_status(
|
||||
self,
|
||||
run_id: str,
|
||||
status: str,
|
||||
*,
|
||||
error: str | None = None,
|
||||
) -> None:
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
async def delete(self, run_id: str) -> None:
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
async def update_run_completion(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
status: str,
|
||||
total_input_tokens: int = 0,
|
||||
total_output_tokens: int = 0,
|
||||
total_tokens: int = 0,
|
||||
llm_call_count: int = 0,
|
||||
lead_agent_tokens: int = 0,
|
||||
subagent_tokens: int = 0,
|
||||
middleware_tokens: int = 0,
|
||||
message_count: int = 0,
|
||||
last_ai_message: str | None = None,
|
||||
first_human_message: str | None = None,
|
||||
error: str | None = None,
|
||||
) -> None:
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
async def list_pending(self, *, before: str | None = None) -> list[dict[str, Any]]:
|
||||
pass
|
||||
|
||||
@abc.abstractmethod
|
||||
async def aggregate_tokens_by_thread(self, thread_id: str) -> dict[str, Any]:
|
||||
"""Aggregate token usage for completed runs in a thread.
|
||||
|
||||
Returns a dict with keys: total_tokens, total_input_tokens,
|
||||
total_output_tokens, total_runs, by_model (model_name → {tokens, runs}),
|
||||
by_caller ({lead_agent, subagent, middleware}).
|
||||
"""
|
||||
pass
|
||||
@ -0,0 +1,13 @@
|
||||
"""Create-side boundary for durable run initialization."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Protocol
|
||||
|
||||
from ..types import RunRecord
|
||||
|
||||
|
||||
class RunCreateStore(Protocol):
|
||||
"""Persist the initial durable row for a newly created run."""
|
||||
|
||||
async def create_run(self, record: RunRecord) -> None: ...
|
||||
@ -0,0 +1,11 @@
|
||||
"""Delete-side durable boundary for runs."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Protocol
|
||||
|
||||
|
||||
class RunDeleteStore(Protocol):
|
||||
"""Minimal protocol for removing durable run records."""
|
||||
|
||||
async def delete_run(self, run_id: str) -> bool: ...
|
||||
@ -0,0 +1,11 @@
|
||||
"""Run event store boundary used by runs callbacks."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Protocol
|
||||
|
||||
|
||||
class RunEventStore(Protocol):
|
||||
"""Minimal append-only event store protocol for execution callbacks."""
|
||||
|
||||
async def put_batch(self, events: list[dict[str, Any]]) -> list[dict[str, Any]]: ...
|
||||
@ -1,98 +0,0 @@
|
||||
"""In-memory RunStore. Used when database.backend=memory (default) and in tests.
|
||||
|
||||
Equivalent to the original RunManager._runs dict behavior.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import UTC, datetime
|
||||
from typing import Any
|
||||
|
||||
from deerflow.runtime.runs.store.base import RunStore
|
||||
|
||||
|
||||
class MemoryRunStore(RunStore):
|
||||
def __init__(self) -> None:
|
||||
self._runs: dict[str, dict[str, Any]] = {}
|
||||
|
||||
async def put(
|
||||
self,
|
||||
run_id,
|
||||
*,
|
||||
thread_id,
|
||||
assistant_id=None,
|
||||
user_id=None,
|
||||
status="pending",
|
||||
multitask_strategy="reject",
|
||||
metadata=None,
|
||||
kwargs=None,
|
||||
error=None,
|
||||
created_at=None,
|
||||
):
|
||||
now = datetime.now(UTC).isoformat()
|
||||
self._runs[run_id] = {
|
||||
"run_id": run_id,
|
||||
"thread_id": thread_id,
|
||||
"assistant_id": assistant_id,
|
||||
"user_id": user_id,
|
||||
"status": status,
|
||||
"multitask_strategy": multitask_strategy,
|
||||
"metadata": metadata or {},
|
||||
"kwargs": kwargs or {},
|
||||
"error": error,
|
||||
"created_at": created_at or now,
|
||||
"updated_at": now,
|
||||
}
|
||||
|
||||
async def get(self, run_id):
|
||||
return self._runs.get(run_id)
|
||||
|
||||
async def list_by_thread(self, thread_id, *, user_id=None, limit=100):
|
||||
results = [r for r in self._runs.values() if r["thread_id"] == thread_id and (user_id is None or r.get("user_id") == user_id)]
|
||||
results.sort(key=lambda r: r["created_at"], reverse=True)
|
||||
return results[:limit]
|
||||
|
||||
async def update_status(self, run_id, status, *, error=None):
|
||||
if run_id in self._runs:
|
||||
self._runs[run_id]["status"] = status
|
||||
if error is not None:
|
||||
self._runs[run_id]["error"] = error
|
||||
self._runs[run_id]["updated_at"] = datetime.now(UTC).isoformat()
|
||||
|
||||
async def delete(self, run_id):
|
||||
self._runs.pop(run_id, None)
|
||||
|
||||
async def update_run_completion(self, run_id, *, status, **kwargs):
|
||||
if run_id in self._runs:
|
||||
self._runs[run_id]["status"] = status
|
||||
for key, value in kwargs.items():
|
||||
if value is not None:
|
||||
self._runs[run_id][key] = value
|
||||
self._runs[run_id]["updated_at"] = datetime.now(UTC).isoformat()
|
||||
|
||||
async def list_pending(self, *, before=None):
|
||||
now = before or datetime.now(UTC).isoformat()
|
||||
results = [r for r in self._runs.values() if r["status"] == "pending" and r["created_at"] <= now]
|
||||
results.sort(key=lambda r: r["created_at"])
|
||||
return results
|
||||
|
||||
async def aggregate_tokens_by_thread(self, thread_id: str) -> dict[str, Any]:
|
||||
completed = [r for r in self._runs.values() if r["thread_id"] == thread_id and r.get("status") in ("success", "error")]
|
||||
by_model: dict[str, dict] = {}
|
||||
for r in completed:
|
||||
model = r.get("model_name") or "unknown"
|
||||
entry = by_model.setdefault(model, {"tokens": 0, "runs": 0})
|
||||
entry["tokens"] += r.get("total_tokens", 0)
|
||||
entry["runs"] += 1
|
||||
return {
|
||||
"total_tokens": sum(r.get("total_tokens", 0) for r in completed),
|
||||
"total_input_tokens": sum(r.get("total_input_tokens", 0) for r in completed),
|
||||
"total_output_tokens": sum(r.get("total_output_tokens", 0) for r in completed),
|
||||
"total_runs": len(completed),
|
||||
"by_model": by_model,
|
||||
"by_caller": {
|
||||
"lead_agent": sum(r.get("lead_agent_tokens", 0) for r in completed),
|
||||
"subagent": sum(r.get("subagent_tokens", 0) for r in completed),
|
||||
"middleware": sum(r.get("middleware_tokens", 0) for r in completed),
|
||||
},
|
||||
}
|
||||
@ -0,0 +1,20 @@
|
||||
"""Read-side boundary for durable run queries."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Protocol
|
||||
|
||||
from ..types import RunRecord
|
||||
|
||||
|
||||
class RunQueryStore(Protocol):
|
||||
"""Read durable run records for public query APIs."""
|
||||
|
||||
async def get_run(self, run_id: str) -> RunRecord | None: ...
|
||||
|
||||
async def list_runs(
|
||||
self,
|
||||
thread_id: str,
|
||||
*,
|
||||
limit: int = 100,
|
||||
) -> list[RunRecord]: ...
|
||||
117
backend/packages/harness/deerflow/runtime/runs/types.py
Normal file
117
backend/packages/harness/deerflow/runtime/runs/types.py
Normal file
@ -0,0 +1,117 @@
|
||||
"""Public runs domain types."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime, timezone
|
||||
from enum import StrEnum
|
||||
from typing import Any, Literal
|
||||
|
||||
# Intent: 表示请求的意图
|
||||
RunIntent = Literal[
|
||||
"create_background",
|
||||
"create_and_stream",
|
||||
"create_and_wait",
|
||||
"join_stream",
|
||||
"join_wait",
|
||||
]
|
||||
|
||||
# Scope kind: stateful (需要 thread_id) vs stateless (临时 thread)
|
||||
RunScopeKind = Literal["stateful", "stateless"]
|
||||
|
||||
class RunStatus(StrEnum):
|
||||
pending = "pending"
|
||||
starting = "starting"
|
||||
running = "running"
|
||||
success = "success"
|
||||
error = "error"
|
||||
interrupted = "interrupted"
|
||||
timeout = "timeout"
|
||||
|
||||
CancelAction = Literal["interrupt", "rollback"]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RunScope:
|
||||
"""Run 的作用域 - stateful 需要 thread_id, stateless 自动创建临时 thread."""
|
||||
|
||||
kind: RunScopeKind
|
||||
thread_id: str
|
||||
temporary: bool = False
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class CheckpointRequest:
|
||||
"""Checkpoint 恢复请求 - phase1 只接受但不实现 restore."""
|
||||
|
||||
checkpoint_id: str | None = None
|
||||
checkpoint: dict[str, Any] | None = None
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RunSpec:
|
||||
"""
|
||||
Run 规格对象 - 由 app 输入层构建,是执行器的输入。
|
||||
|
||||
Phase 1 限制:
|
||||
- multitask_strategy 只支持 reject/interrupt
|
||||
- 不支持 enqueue/rollback/after_seconds/batch
|
||||
"""
|
||||
|
||||
intent: RunIntent
|
||||
scope: RunScope
|
||||
assistant_id: str | None
|
||||
input: dict[str, Any] | None
|
||||
command: dict[str, Any] | None
|
||||
runnable_config: dict[str, Any]
|
||||
context: dict[str, Any] | None
|
||||
metadata: dict[str, Any]
|
||||
stream_modes: list[str]
|
||||
stream_subgraphs: bool
|
||||
stream_resumable: bool
|
||||
on_disconnect: Literal["cancel", "continue"]
|
||||
on_completion: Literal["delete", "keep"]
|
||||
multitask_strategy: Literal["reject", "interrupt"]
|
||||
interrupt_before: list[str] | Literal["*"] | None
|
||||
interrupt_after: list[str] | Literal["*"] | None
|
||||
checkpoint_request: CheckpointRequest | None
|
||||
follow_up_to_run_id: str | None = None
|
||||
webhook: str | None = None
|
||||
feedback_keys: list[str] | None = None
|
||||
|
||||
|
||||
type WaitResult = dict[str, Any] | None
|
||||
|
||||
|
||||
@dataclass
|
||||
class RunRecord:
|
||||
"""
|
||||
运行时 Run 记录 - 由 RuntimeRunRegistry 管理。
|
||||
|
||||
与 ORM 模型解耦,只在内存中维护。
|
||||
"""
|
||||
|
||||
run_id: str
|
||||
thread_id: str
|
||||
assistant_id: str | None
|
||||
status: RunStatus
|
||||
temporary: bool
|
||||
multitask_strategy: str
|
||||
metadata: dict[str, Any] = field(default_factory=dict)
|
||||
follow_up_to_run_id: str | None = None
|
||||
created_at: str = ""
|
||||
updated_at: str = ""
|
||||
started_at: str | None = None
|
||||
ended_at: str | None = None
|
||||
error: str | None = None
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
if not self.created_at:
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
self.created_at = now
|
||||
self.updated_at = now
|
||||
|
||||
|
||||
# Terminal statuses for quick checks
|
||||
TERMINAL_STATUSES: frozenset[RunStatus] = frozenset({"success", "error", "interrupted"})
|
||||
INFLIGHT_STATUSES: frozenset[RunStatus] = frozenset({"pending", "starting", "running"})
|
||||
@ -1,493 +0,0 @@
|
||||
"""Background agent execution.
|
||||
|
||||
Runs an agent graph inside an ``asyncio.Task``, publishing events to
|
||||
a :class:`StreamBridge` as they are produced.
|
||||
|
||||
Uses ``graph.astream(stream_mode=[...])`` which gives correct full-state
|
||||
snapshots for ``values`` mode, proper ``{node: writes}`` for ``updates``,
|
||||
and ``(chunk, metadata)`` tuples for ``messages`` mode.
|
||||
|
||||
Note: ``events`` mode is not supported through the gateway — it requires
|
||||
``graph.astream_events()`` which cannot simultaneously produce ``values``
|
||||
snapshots. The JS open-source LangGraph API server works around this via
|
||||
internal checkpoint callbacks that are not exposed in the Python public API.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import copy
|
||||
import inspect
|
||||
import logging
|
||||
from dataclasses import dataclass, field
|
||||
from typing import TYPE_CHECKING, Any, Literal
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
from deerflow.runtime.serialization import serialize
|
||||
from deerflow.runtime.stream_bridge import StreamBridge
|
||||
|
||||
from .manager import RunManager, RunRecord
|
||||
from .schemas import RunStatus
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Valid stream_mode values for LangGraph's graph.astream()
|
||||
_VALID_LG_MODES = {"values", "updates", "checkpoints", "tasks", "debug", "messages", "custom"}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RunContext:
|
||||
"""Infrastructure dependencies for a single agent run.
|
||||
|
||||
Groups checkpointer, store, and persistence-related singletons so that
|
||||
``run_agent`` (and any future callers) receive one object instead of a
|
||||
growing list of keyword arguments.
|
||||
"""
|
||||
|
||||
checkpointer: Any
|
||||
store: Any | None = field(default=None)
|
||||
event_store: Any | None = field(default=None)
|
||||
run_events_config: Any | None = field(default=None)
|
||||
thread_store: Any | None = field(default=None)
|
||||
|
||||
|
||||
async def run_agent(
|
||||
bridge: StreamBridge,
|
||||
run_manager: RunManager,
|
||||
record: RunRecord,
|
||||
*,
|
||||
ctx: RunContext,
|
||||
agent_factory: Any,
|
||||
graph_input: dict,
|
||||
config: dict,
|
||||
stream_modes: list[str] | None = None,
|
||||
stream_subgraphs: bool = False,
|
||||
interrupt_before: list[str] | Literal["*"] | None = None,
|
||||
interrupt_after: list[str] | Literal["*"] | None = None,
|
||||
) -> None:
|
||||
"""Execute an agent in the background, publishing events to *bridge*."""
|
||||
|
||||
# Unpack infrastructure dependencies from RunContext.
|
||||
checkpointer = ctx.checkpointer
|
||||
store = ctx.store
|
||||
event_store = ctx.event_store
|
||||
run_events_config = ctx.run_events_config
|
||||
thread_store = ctx.thread_store
|
||||
|
||||
run_id = record.run_id
|
||||
thread_id = record.thread_id
|
||||
requested_modes: set[str] = set(stream_modes or ["values"])
|
||||
pre_run_checkpoint_id: str | None = None
|
||||
pre_run_snapshot: dict[str, Any] | None = None
|
||||
snapshot_capture_failed = False
|
||||
|
||||
journal = None
|
||||
|
||||
journal = None
|
||||
|
||||
# Track whether "events" was requested but skipped
|
||||
if "events" in requested_modes:
|
||||
logger.info(
|
||||
"Run %s: 'events' stream_mode not supported in gateway (requires astream_events + checkpoint callbacks). Skipping.",
|
||||
run_id,
|
||||
)
|
||||
|
||||
try:
|
||||
# Initialize RunJournal + write human_message event.
|
||||
# These are inside the try block so any exception (e.g. a DB
|
||||
# error writing the event) flows through the except/finally
|
||||
# path that publishes an "end" event to the SSE bridge —
|
||||
# otherwise a failure here would leave the stream hanging
|
||||
# with no terminator.
|
||||
if event_store is not None:
|
||||
from deerflow.runtime.journal import RunJournal
|
||||
|
||||
journal = RunJournal(
|
||||
run_id=run_id,
|
||||
thread_id=thread_id,
|
||||
event_store=event_store,
|
||||
track_token_usage=getattr(run_events_config, "track_token_usage", True),
|
||||
)
|
||||
|
||||
# 1. Mark running
|
||||
await run_manager.set_status(run_id, RunStatus.running)
|
||||
|
||||
# Snapshot the latest pre-run checkpoint so rollback can restore it.
|
||||
if checkpointer is not None:
|
||||
try:
|
||||
config_for_check = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
|
||||
ckpt_tuple = await checkpointer.aget_tuple(config_for_check)
|
||||
if ckpt_tuple is not None:
|
||||
ckpt_config = getattr(ckpt_tuple, "config", {}).get("configurable", {})
|
||||
pre_run_checkpoint_id = ckpt_config.get("checkpoint_id")
|
||||
pre_run_snapshot = {
|
||||
"checkpoint_ns": ckpt_config.get("checkpoint_ns", ""),
|
||||
"checkpoint": copy.deepcopy(getattr(ckpt_tuple, "checkpoint", {})),
|
||||
"metadata": copy.deepcopy(getattr(ckpt_tuple, "metadata", {})),
|
||||
"pending_writes": copy.deepcopy(getattr(ckpt_tuple, "pending_writes", []) or []),
|
||||
}
|
||||
except Exception:
|
||||
snapshot_capture_failed = True
|
||||
logger.warning("Could not capture pre-run checkpoint snapshot for run %s", run_id, exc_info=True)
|
||||
|
||||
# 2. Publish metadata — useStream needs both run_id AND thread_id
|
||||
await bridge.publish(
|
||||
run_id,
|
||||
"metadata",
|
||||
{
|
||||
"run_id": run_id,
|
||||
"thread_id": thread_id,
|
||||
},
|
||||
)
|
||||
|
||||
# 3. Build the agent
|
||||
from langchain_core.runnables import RunnableConfig
|
||||
from langgraph.runtime import Runtime
|
||||
|
||||
# Inject runtime context so middlewares can access thread_id
|
||||
# (langgraph-cli does this automatically; we must do it manually)
|
||||
runtime = Runtime(context={"thread_id": thread_id, "run_id": run_id}, store=store)
|
||||
# If the caller already set a ``context`` key (LangGraph >= 0.6.0
|
||||
# prefers it over ``configurable`` for thread-level data), make
|
||||
# sure ``thread_id`` is available there too.
|
||||
if "context" in config and isinstance(config["context"], dict):
|
||||
config["context"].setdefault("thread_id", thread_id)
|
||||
config["context"].setdefault("run_id", run_id)
|
||||
config.setdefault("configurable", {})["__pregel_runtime"] = runtime
|
||||
|
||||
# Inject RunJournal as a LangChain callback handler.
|
||||
# on_llm_end captures token usage; on_chain_start/end captures lifecycle.
|
||||
if journal is not None:
|
||||
config.setdefault("callbacks", []).append(journal)
|
||||
|
||||
runnable_config = RunnableConfig(**config)
|
||||
agent = agent_factory(config=runnable_config)
|
||||
|
||||
# 4. Attach checkpointer and store
|
||||
if checkpointer is not None:
|
||||
agent.checkpointer = checkpointer
|
||||
if store is not None:
|
||||
agent.store = store
|
||||
|
||||
# 5. Set interrupt nodes
|
||||
if interrupt_before:
|
||||
agent.interrupt_before_nodes = interrupt_before
|
||||
if interrupt_after:
|
||||
agent.interrupt_after_nodes = interrupt_after
|
||||
|
||||
# 6. Build LangGraph stream_mode list
|
||||
# "events" is NOT a valid astream mode — skip it
|
||||
# "messages-tuple" maps to LangGraph's "messages" mode
|
||||
lg_modes: list[str] = []
|
||||
for m in requested_modes:
|
||||
if m == "messages-tuple":
|
||||
lg_modes.append("messages")
|
||||
elif m == "events":
|
||||
# Skipped — see log above
|
||||
continue
|
||||
elif m in _VALID_LG_MODES:
|
||||
lg_modes.append(m)
|
||||
if not lg_modes:
|
||||
lg_modes = ["values"]
|
||||
|
||||
# Deduplicate while preserving order
|
||||
seen: set[str] = set()
|
||||
deduped: list[str] = []
|
||||
for m in lg_modes:
|
||||
if m not in seen:
|
||||
seen.add(m)
|
||||
deduped.append(m)
|
||||
lg_modes = deduped
|
||||
|
||||
logger.info("Run %s: streaming with modes %s (requested: %s)", run_id, lg_modes, requested_modes)
|
||||
|
||||
# 7. Stream using graph.astream
|
||||
if len(lg_modes) == 1 and not stream_subgraphs:
|
||||
# Single mode, no subgraphs: astream yields raw chunks
|
||||
single_mode = lg_modes[0]
|
||||
async for chunk in agent.astream(graph_input, config=runnable_config, stream_mode=single_mode):
|
||||
if record.abort_event.is_set():
|
||||
logger.info("Run %s abort requested — stopping", run_id)
|
||||
break
|
||||
sse_event = _lg_mode_to_sse_event(single_mode)
|
||||
await bridge.publish(run_id, sse_event, serialize(chunk, mode=single_mode))
|
||||
else:
|
||||
# Multiple modes or subgraphs: astream yields tuples
|
||||
async for item in agent.astream(
|
||||
graph_input,
|
||||
config=runnable_config,
|
||||
stream_mode=lg_modes,
|
||||
subgraphs=stream_subgraphs,
|
||||
):
|
||||
if record.abort_event.is_set():
|
||||
logger.info("Run %s abort requested — stopping", run_id)
|
||||
break
|
||||
|
||||
mode, chunk = _unpack_stream_item(item, lg_modes, stream_subgraphs)
|
||||
if mode is None:
|
||||
continue
|
||||
|
||||
sse_event = _lg_mode_to_sse_event(mode)
|
||||
await bridge.publish(run_id, sse_event, serialize(chunk, mode=mode))
|
||||
|
||||
# 8. Final status
|
||||
if record.abort_event.is_set():
|
||||
action = record.abort_action
|
||||
if action == "rollback":
|
||||
await run_manager.set_status(run_id, RunStatus.error, error="Rolled back by user")
|
||||
try:
|
||||
await _rollback_to_pre_run_checkpoint(
|
||||
checkpointer=checkpointer,
|
||||
thread_id=thread_id,
|
||||
run_id=run_id,
|
||||
pre_run_checkpoint_id=pre_run_checkpoint_id,
|
||||
pre_run_snapshot=pre_run_snapshot,
|
||||
snapshot_capture_failed=snapshot_capture_failed,
|
||||
)
|
||||
logger.info("Run %s rolled back to pre-run checkpoint %s", run_id, pre_run_checkpoint_id)
|
||||
except Exception:
|
||||
logger.warning("Failed to rollback checkpoint for run %s", run_id, exc_info=True)
|
||||
else:
|
||||
await run_manager.set_status(run_id, RunStatus.interrupted)
|
||||
else:
|
||||
await run_manager.set_status(run_id, RunStatus.success)
|
||||
|
||||
except asyncio.CancelledError:
|
||||
action = record.abort_action
|
||||
if action == "rollback":
|
||||
await run_manager.set_status(run_id, RunStatus.error, error="Rolled back by user")
|
||||
try:
|
||||
await _rollback_to_pre_run_checkpoint(
|
||||
checkpointer=checkpointer,
|
||||
thread_id=thread_id,
|
||||
run_id=run_id,
|
||||
pre_run_checkpoint_id=pre_run_checkpoint_id,
|
||||
pre_run_snapshot=pre_run_snapshot,
|
||||
snapshot_capture_failed=snapshot_capture_failed,
|
||||
)
|
||||
logger.info("Run %s was cancelled and rolled back", run_id)
|
||||
except Exception:
|
||||
logger.warning("Run %s cancellation rollback failed", run_id, exc_info=True)
|
||||
else:
|
||||
await run_manager.set_status(run_id, RunStatus.interrupted)
|
||||
logger.info("Run %s was cancelled", run_id)
|
||||
|
||||
except Exception as exc:
|
||||
error_msg = f"{exc}"
|
||||
logger.exception("Run %s failed: %s", run_id, error_msg)
|
||||
await run_manager.set_status(run_id, RunStatus.error, error=error_msg)
|
||||
await bridge.publish(
|
||||
run_id,
|
||||
"error",
|
||||
{
|
||||
"message": error_msg,
|
||||
"name": type(exc).__name__,
|
||||
},
|
||||
)
|
||||
|
||||
finally:
|
||||
# Flush any buffered journal events and persist completion data
|
||||
if journal is not None:
|
||||
try:
|
||||
await journal.flush()
|
||||
except Exception:
|
||||
logger.warning("Failed to flush journal for run %s", run_id, exc_info=True)
|
||||
|
||||
try:
|
||||
# Persist token usage + convenience fields to RunStore
|
||||
completion = journal.get_completion_data()
|
||||
await run_manager.update_run_completion(run_id, status=record.status.value, **completion)
|
||||
except Exception:
|
||||
logger.warning("Failed to persist run completion for %s (non-fatal)", run_id, exc_info=True)
|
||||
|
||||
# Sync title from checkpoint to threads_meta.display_name
|
||||
if checkpointer is not None and thread_store is not None:
|
||||
try:
|
||||
ckpt_config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
|
||||
ckpt_tuple = await checkpointer.aget_tuple(ckpt_config)
|
||||
if ckpt_tuple is not None:
|
||||
ckpt = getattr(ckpt_tuple, "checkpoint", {}) or {}
|
||||
title = ckpt.get("channel_values", {}).get("title")
|
||||
if title:
|
||||
await thread_store.update_display_name(thread_id, title)
|
||||
except Exception:
|
||||
logger.debug("Failed to sync title for thread %s (non-fatal)", thread_id)
|
||||
|
||||
# Update threads_meta status based on run outcome
|
||||
if thread_store is not None:
|
||||
try:
|
||||
final_status = "idle" if record.status == RunStatus.success else record.status.value
|
||||
await thread_store.update_status(thread_id, final_status)
|
||||
except Exception:
|
||||
logger.debug("Failed to update thread_meta status for %s (non-fatal)", thread_id)
|
||||
|
||||
await bridge.publish_end(run_id)
|
||||
asyncio.create_task(bridge.cleanup(run_id, delay=60))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
async def _call_checkpointer_method(checkpointer: Any, async_name: str, sync_name: str, *args: Any, **kwargs: Any) -> Any:
|
||||
"""Call a checkpointer method, supporting async and sync variants."""
|
||||
method = getattr(checkpointer, async_name, None) or getattr(checkpointer, sync_name, None)
|
||||
if method is None:
|
||||
raise AttributeError(f"Missing checkpointer method: {async_name}/{sync_name}")
|
||||
result = method(*args, **kwargs)
|
||||
if inspect.isawaitable(result):
|
||||
return await result
|
||||
return result
|
||||
|
||||
|
||||
async def _rollback_to_pre_run_checkpoint(
|
||||
*,
|
||||
checkpointer: Any,
|
||||
thread_id: str,
|
||||
run_id: str,
|
||||
pre_run_checkpoint_id: str | None,
|
||||
pre_run_snapshot: dict[str, Any] | None,
|
||||
snapshot_capture_failed: bool,
|
||||
) -> None:
|
||||
"""Restore thread state to the checkpoint snapshot captured before run start."""
|
||||
if checkpointer is None:
|
||||
logger.info("Run %s rollback requested but no checkpointer is configured", run_id)
|
||||
return
|
||||
|
||||
if snapshot_capture_failed:
|
||||
logger.warning("Run %s rollback skipped: pre-run checkpoint snapshot capture failed", run_id)
|
||||
return
|
||||
|
||||
if pre_run_snapshot is None:
|
||||
await _call_checkpointer_method(checkpointer, "adelete_thread", "delete_thread", thread_id)
|
||||
logger.info("Run %s rollback reset thread %s to empty state", run_id, thread_id)
|
||||
return
|
||||
|
||||
checkpoint_to_restore = None
|
||||
metadata_to_restore: dict[str, Any] = {}
|
||||
checkpoint_ns = ""
|
||||
checkpoint = pre_run_snapshot.get("checkpoint")
|
||||
if not isinstance(checkpoint, dict):
|
||||
logger.warning("Run %s rollback skipped: invalid pre-run checkpoint snapshot", run_id)
|
||||
return
|
||||
checkpoint_to_restore = checkpoint
|
||||
if checkpoint_to_restore.get("id") is None and pre_run_checkpoint_id is not None:
|
||||
checkpoint_to_restore = {**checkpoint_to_restore, "id": pre_run_checkpoint_id}
|
||||
if checkpoint_to_restore.get("id") is None:
|
||||
logger.warning("Run %s rollback skipped: pre-run checkpoint has no checkpoint id", run_id)
|
||||
return
|
||||
metadata = pre_run_snapshot.get("metadata", {})
|
||||
metadata_to_restore = metadata if isinstance(metadata, dict) else {}
|
||||
raw_checkpoint_ns = pre_run_snapshot.get("checkpoint_ns")
|
||||
checkpoint_ns = raw_checkpoint_ns if isinstance(raw_checkpoint_ns, str) else ""
|
||||
|
||||
channel_versions = checkpoint_to_restore.get("channel_versions")
|
||||
new_versions = dict(channel_versions) if isinstance(channel_versions, dict) else {}
|
||||
|
||||
restore_config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": checkpoint_ns}}
|
||||
restored_config = await _call_checkpointer_method(
|
||||
checkpointer,
|
||||
"aput",
|
||||
"put",
|
||||
restore_config,
|
||||
checkpoint_to_restore,
|
||||
metadata_to_restore if isinstance(metadata_to_restore, dict) else {},
|
||||
new_versions,
|
||||
)
|
||||
if not isinstance(restored_config, dict):
|
||||
raise RuntimeError(f"Run {run_id} rollback restore returned invalid config: expected dict")
|
||||
restored_configurable = restored_config.get("configurable", {})
|
||||
if not isinstance(restored_configurable, dict):
|
||||
raise RuntimeError(f"Run {run_id} rollback restore returned invalid config payload")
|
||||
restored_checkpoint_id = restored_configurable.get("checkpoint_id")
|
||||
if not restored_checkpoint_id:
|
||||
raise RuntimeError(f"Run {run_id} rollback restore did not return checkpoint_id")
|
||||
|
||||
pending_writes = pre_run_snapshot.get("pending_writes", [])
|
||||
if not pending_writes:
|
||||
return
|
||||
|
||||
writes_by_task: dict[str, list[tuple[str, Any]]] = {}
|
||||
for item in pending_writes:
|
||||
if not isinstance(item, (tuple, list)) or len(item) != 3:
|
||||
raise RuntimeError(f"Run {run_id} rollback failed: pending_write is not a 3-tuple: {item!r}")
|
||||
task_id, channel, value = item
|
||||
if not isinstance(channel, str):
|
||||
raise RuntimeError(f"Run {run_id} rollback failed: pending_write has non-string channel: task_id={task_id!r}, channel={channel!r}")
|
||||
writes_by_task.setdefault(str(task_id), []).append((channel, value))
|
||||
|
||||
for task_id, writes in writes_by_task.items():
|
||||
await _call_checkpointer_method(
|
||||
checkpointer,
|
||||
"aput_writes",
|
||||
"put_writes",
|
||||
restored_config,
|
||||
writes,
|
||||
task_id=task_id,
|
||||
)
|
||||
|
||||
|
||||
def _lg_mode_to_sse_event(mode: str) -> str:
|
||||
"""Map LangGraph internal stream_mode name to SSE event name.
|
||||
|
||||
LangGraph's ``astream(stream_mode="messages")`` produces message
|
||||
tuples. The SSE protocol calls this ``messages-tuple`` when the
|
||||
client explicitly requests it, but the default SSE event name used
|
||||
by LangGraph Platform is simply ``"messages"``.
|
||||
"""
|
||||
# All LG modes map 1:1 to SSE event names — "messages" stays "messages"
|
||||
return mode
|
||||
|
||||
|
||||
def _extract_human_message(graph_input: dict) -> HumanMessage | None:
|
||||
"""Extract or construct a HumanMessage from graph_input for event recording.
|
||||
|
||||
Returns a LangChain HumanMessage so callers can use .model_dump() to get
|
||||
the checkpoint-aligned serialization format.
|
||||
"""
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
messages = graph_input.get("messages")
|
||||
if not messages:
|
||||
return None
|
||||
last = messages[-1] if isinstance(messages, list) else messages
|
||||
if isinstance(last, HumanMessage):
|
||||
return last
|
||||
if isinstance(last, str):
|
||||
return HumanMessage(content=last) if last else None
|
||||
if hasattr(last, "content"):
|
||||
content = last.content
|
||||
return HumanMessage(content=content)
|
||||
if isinstance(last, dict):
|
||||
content = last.get("content", "")
|
||||
return HumanMessage(content=content) if content else None
|
||||
return None
|
||||
|
||||
|
||||
def _unpack_stream_item(
|
||||
item: Any,
|
||||
lg_modes: list[str],
|
||||
stream_subgraphs: bool,
|
||||
) -> tuple[str | None, Any]:
|
||||
"""Unpack a multi-mode or subgraph stream item into (mode, chunk).
|
||||
|
||||
Returns ``(None, None)`` if the item cannot be parsed.
|
||||
"""
|
||||
if stream_subgraphs:
|
||||
if isinstance(item, tuple) and len(item) == 3:
|
||||
_ns, mode, chunk = item
|
||||
return str(mode), chunk
|
||||
if isinstance(item, tuple) and len(item) == 2:
|
||||
mode, chunk = item
|
||||
return str(mode), chunk
|
||||
return None, None
|
||||
|
||||
if isinstance(item, tuple) and len(item) == 2:
|
||||
mode, chunk = item
|
||||
return str(mode), chunk
|
||||
|
||||
# Fallback: single-element output from first mode
|
||||
return lg_modes[0] if lg_modes else None, item
|
||||
@ -4,8 +4,8 @@ Provides a single source of truth for converting LangChain message
|
||||
objects, Pydantic models, and LangGraph state dicts into plain
|
||||
JSON-serialisable Python structures.
|
||||
|
||||
Consumers: ``deerflow.runtime.runs.worker`` (SSE publishing) and
|
||||
``app.gateway.routers.threads`` (REST responses).
|
||||
Consumers: runs execution internals (SSE publishing) and
|
||||
gateway thread state/history responses.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
@ -1,21 +1,47 @@
|
||||
"""Stream bridge — decouples agent workers from SSE endpoints.
|
||||
"""Stream bridge public surface.
|
||||
|
||||
A ``StreamBridge`` sits between the background task that runs an agent
|
||||
(producer) and the HTTP endpoint that pushes Server-Sent Events to
|
||||
the client (consumer). This package provides an abstract protocol
|
||||
(:class:`StreamBridge`) plus a default in-memory implementation backed
|
||||
by :mod:`asyncio.Queue`.
|
||||
The harness package owns the stream abstraction and event semantics.
|
||||
Concrete backends are intentionally not part of the public API here so
|
||||
applications can inject infra-specific implementations.
|
||||
"""
|
||||
|
||||
from .async_provider import make_stream_bridge
|
||||
from .base import END_SENTINEL, HEARTBEAT_SENTINEL, StreamBridge, StreamEvent
|
||||
from .memory import MemoryStreamBridge
|
||||
from .contract import (
|
||||
CANCELLED_SENTINEL,
|
||||
END_SENTINEL,
|
||||
HEARTBEAT_SENTINEL,
|
||||
JSONScalar,
|
||||
JSONValue,
|
||||
TERMINAL_STATES,
|
||||
ResumeResult,
|
||||
StreamBridge,
|
||||
StreamEvent,
|
||||
StreamStatus,
|
||||
)
|
||||
from .exceptions import (
|
||||
BridgeClosedError,
|
||||
StreamBridgeError,
|
||||
StreamCapacityExceededError,
|
||||
StreamNotFoundError,
|
||||
StreamTerminatedError,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Sentinels
|
||||
"CANCELLED_SENTINEL",
|
||||
"END_SENTINEL",
|
||||
"HEARTBEAT_SENTINEL",
|
||||
"MemoryStreamBridge",
|
||||
# Types
|
||||
"JSONScalar",
|
||||
"JSONValue",
|
||||
"ResumeResult",
|
||||
"StreamBridge",
|
||||
"StreamEvent",
|
||||
"make_stream_bridge",
|
||||
"StreamStatus",
|
||||
"TERMINAL_STATES",
|
||||
# Exceptions
|
||||
"BridgeClosedError",
|
||||
"StreamBridgeError",
|
||||
"StreamCapacityExceededError",
|
||||
"StreamNotFoundError",
|
||||
"StreamTerminatedError",
|
||||
]
|
||||
|
||||
@ -1,52 +0,0 @@
|
||||
"""Async stream bridge factory.
|
||||
|
||||
Provides an **async context manager** aligned with
|
||||
:func:`deerflow.runtime.checkpointer.async_provider.make_checkpointer`.
|
||||
|
||||
Usage (e.g. FastAPI lifespan)::
|
||||
|
||||
from deerflow.agents.stream_bridge import make_stream_bridge
|
||||
|
||||
async with make_stream_bridge() as bridge:
|
||||
app.state.stream_bridge = bridge
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import contextlib
|
||||
import logging
|
||||
from collections.abc import AsyncIterator
|
||||
|
||||
from deerflow.config.stream_bridge_config import get_stream_bridge_config
|
||||
|
||||
from .base import StreamBridge
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@contextlib.asynccontextmanager
|
||||
async def make_stream_bridge(config=None) -> AsyncIterator[StreamBridge]:
|
||||
"""Async context manager that yields a :class:`StreamBridge`.
|
||||
|
||||
Falls back to :class:`MemoryStreamBridge` when no configuration is
|
||||
provided and nothing is set globally.
|
||||
"""
|
||||
if config is None:
|
||||
config = get_stream_bridge_config()
|
||||
|
||||
if config is None or config.type == "memory":
|
||||
from deerflow.runtime.stream_bridge.memory import MemoryStreamBridge
|
||||
|
||||
maxsize = config.queue_maxsize if config is not None else 256
|
||||
bridge = MemoryStreamBridge(queue_maxsize=maxsize)
|
||||
logger.info("Stream bridge initialised: memory (queue_maxsize=%d)", maxsize)
|
||||
try:
|
||||
yield bridge
|
||||
finally:
|
||||
await bridge.close()
|
||||
return
|
||||
|
||||
if config.type == "redis":
|
||||
raise NotImplementedError("Redis stream bridge planned for Phase 2")
|
||||
|
||||
raise ValueError(f"Unknown stream bridge type: {config.type!r}")
|
||||
@ -1,72 +0,0 @@
|
||||
"""Abstract stream bridge protocol.
|
||||
|
||||
StreamBridge decouples agent workers (producers) from SSE endpoints
|
||||
(consumers), aligning with LangGraph Platform's Queue + StreamManager
|
||||
architecture.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import abc
|
||||
from collections.abc import AsyncIterator
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class StreamEvent:
|
||||
"""Single stream event.
|
||||
|
||||
Attributes:
|
||||
id: Monotonically increasing event ID (used as SSE ``id:`` field,
|
||||
supports ``Last-Event-ID`` reconnection).
|
||||
event: SSE event name, e.g. ``"metadata"``, ``"updates"``,
|
||||
``"events"``, ``"error"``, ``"end"``.
|
||||
data: JSON-serialisable payload.
|
||||
"""
|
||||
|
||||
id: str
|
||||
event: str
|
||||
data: Any
|
||||
|
||||
|
||||
HEARTBEAT_SENTINEL = StreamEvent(id="", event="__heartbeat__", data=None)
|
||||
END_SENTINEL = StreamEvent(id="", event="__end__", data=None)
|
||||
|
||||
|
||||
class StreamBridge(abc.ABC):
|
||||
"""Abstract base for stream bridges."""
|
||||
|
||||
@abc.abstractmethod
|
||||
async def publish(self, run_id: str, event: str, data: Any) -> None:
|
||||
"""Enqueue a single event for *run_id* (producer side)."""
|
||||
|
||||
@abc.abstractmethod
|
||||
async def publish_end(self, run_id: str) -> None:
|
||||
"""Signal that no more events will be produced for *run_id*."""
|
||||
|
||||
@abc.abstractmethod
|
||||
def subscribe(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
last_event_id: str | None = None,
|
||||
heartbeat_interval: float = 15.0,
|
||||
) -> AsyncIterator[StreamEvent]:
|
||||
"""Async iterator that yields events for *run_id* (consumer side).
|
||||
|
||||
Yields :data:`HEARTBEAT_SENTINEL` when no event arrives within
|
||||
*heartbeat_interval* seconds. Yields :data:`END_SENTINEL` once
|
||||
the producer calls :meth:`publish_end`.
|
||||
"""
|
||||
|
||||
@abc.abstractmethod
|
||||
async def cleanup(self, run_id: str, *, delay: float = 0) -> None:
|
||||
"""Release resources associated with *run_id*.
|
||||
|
||||
If *delay* > 0 the implementation should wait before releasing,
|
||||
giving late subscribers a chance to drain remaining events.
|
||||
"""
|
||||
|
||||
async def close(self) -> None:
|
||||
"""Release backend resources. Default is a no-op."""
|
||||
@ -0,0 +1,112 @@
|
||||
"""Stream bridge contract and public types."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import abc
|
||||
from collections.abc import AsyncIterator
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
from typing import Literal
|
||||
|
||||
type JSONScalar = None | bool | int | float | str
|
||||
type JSONValue = JSONScalar | list["JSONValue"] | dict[str, "JSONValue"]
|
||||
|
||||
|
||||
class StreamStatus(str, Enum):
|
||||
"""Stream lifecycle states."""
|
||||
|
||||
ACTIVE = "active"
|
||||
ENDED = "ended"
|
||||
CANCELLED = "cancelled"
|
||||
ERRORED = "errored"
|
||||
CLOSED = "closed"
|
||||
|
||||
|
||||
TERMINAL_STATES = frozenset({
|
||||
StreamStatus.ENDED,
|
||||
StreamStatus.CANCELLED,
|
||||
StreamStatus.ERRORED,
|
||||
})
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class StreamEvent:
|
||||
"""Single stream event."""
|
||||
|
||||
id: str
|
||||
event: str
|
||||
data: JSONValue
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class ResumeResult:
|
||||
"""Result of resolving Last-Event-ID."""
|
||||
|
||||
next_offset: int
|
||||
status: Literal["fresh", "resumed", "evicted", "invalid", "unknown"]
|
||||
gap_count: int = 0
|
||||
|
||||
|
||||
HEARTBEAT_SENTINEL = StreamEvent(id="", event="__heartbeat__", data=None)
|
||||
END_SENTINEL = StreamEvent(id="", event="__end__", data=None)
|
||||
CANCELLED_SENTINEL = StreamEvent(id="", event="__cancelled__", data=None)
|
||||
|
||||
|
||||
class StreamBridge(abc.ABC):
|
||||
"""Abstract base for stream bridges.
|
||||
|
||||
``StreamBridge`` defines runtime stream semantics, not storage semantics.
|
||||
Concrete backends may live outside the harness package and be injected by
|
||||
the application composition root.
|
||||
|
||||
Important boundary rules:
|
||||
- Terminal run events (``end``/``cancel``/``error``) are real replayable
|
||||
events and belong to run-level semantics.
|
||||
- ``close()`` is bridge-level shutdown and must not be treated as a run
|
||||
cancellation signal.
|
||||
"""
|
||||
|
||||
@abc.abstractmethod
|
||||
async def publish(self, run_id: str, event: str, data: JSONValue) -> str:
|
||||
"""Enqueue a single event for *run_id* and return its event ID."""
|
||||
|
||||
@abc.abstractmethod
|
||||
async def publish_end(self, run_id: str) -> str:
|
||||
"""Signal that no more events will be produced for *run_id*."""
|
||||
|
||||
async def publish_terminal(
|
||||
self,
|
||||
run_id: str,
|
||||
kind: StreamStatus,
|
||||
data: JSONValue = None,
|
||||
) -> str:
|
||||
"""Publish a terminal event (end/cancel/error)."""
|
||||
await self.publish_end(run_id)
|
||||
return ""
|
||||
|
||||
@abc.abstractmethod
|
||||
def subscribe(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
last_event_id: str | None = None,
|
||||
heartbeat_interval: float = 15.0,
|
||||
) -> AsyncIterator[StreamEvent]:
|
||||
"""Yield replayable stream events for *run_id*."""
|
||||
|
||||
@abc.abstractmethod
|
||||
async def cleanup(self, run_id: str, *, delay: float = 0) -> None:
|
||||
"""Release resources associated with *run_id*."""
|
||||
|
||||
async def cancel(self, run_id: str) -> None:
|
||||
"""Cancel a run and notify all subscribers."""
|
||||
await self.publish_terminal(run_id, StreamStatus.CANCELLED)
|
||||
|
||||
async def mark_awaiting_input(self, run_id: str) -> None:
|
||||
"""Mark stream as awaiting human input."""
|
||||
|
||||
async def start(self) -> None:
|
||||
"""Start background tasks, if needed."""
|
||||
|
||||
async def close(self) -> None:
|
||||
"""Release bridge-level backend resources."""
|
||||
@ -0,0 +1,23 @@
|
||||
"""Stream bridge exceptions."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
|
||||
class StreamBridgeError(Exception):
|
||||
"""Base exception for stream bridge errors."""
|
||||
|
||||
|
||||
class BridgeClosedError(StreamBridgeError):
|
||||
"""Raised when operating on a closed bridge."""
|
||||
|
||||
|
||||
class StreamCapacityExceededError(StreamBridgeError):
|
||||
"""Raised when max_active_streams is reached and eviction is not possible."""
|
||||
|
||||
|
||||
class StreamTerminatedError(StreamBridgeError):
|
||||
"""Raised when publishing to a terminal stream."""
|
||||
|
||||
|
||||
class StreamNotFoundError(StreamBridgeError):
|
||||
"""Raised when referencing a non-existent stream."""
|
||||
@ -1,133 +0,0 @@
|
||||
"""In-memory stream bridge backed by an in-process event log."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import time
|
||||
from collections.abc import AsyncIterator
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
from .base import END_SENTINEL, HEARTBEAT_SENTINEL, StreamBridge, StreamEvent
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class _RunStream:
|
||||
events: list[StreamEvent] = field(default_factory=list)
|
||||
condition: asyncio.Condition = field(default_factory=asyncio.Condition)
|
||||
ended: bool = False
|
||||
start_offset: int = 0
|
||||
|
||||
|
||||
class MemoryStreamBridge(StreamBridge):
|
||||
"""Per-run in-memory event log implementation.
|
||||
|
||||
Events are retained for a bounded time window per run so late subscribers
|
||||
and reconnecting clients can replay buffered events from ``Last-Event-ID``.
|
||||
"""
|
||||
|
||||
def __init__(self, *, queue_maxsize: int = 256) -> None:
|
||||
self._maxsize = queue_maxsize
|
||||
self._streams: dict[str, _RunStream] = {}
|
||||
self._counters: dict[str, int] = {}
|
||||
|
||||
# -- helpers ---------------------------------------------------------------
|
||||
|
||||
def _get_or_create_stream(self, run_id: str) -> _RunStream:
|
||||
if run_id not in self._streams:
|
||||
self._streams[run_id] = _RunStream()
|
||||
self._counters[run_id] = 0
|
||||
return self._streams[run_id]
|
||||
|
||||
def _next_id(self, run_id: str) -> str:
|
||||
self._counters[run_id] = self._counters.get(run_id, 0) + 1
|
||||
ts = int(time.time() * 1000)
|
||||
seq = self._counters[run_id] - 1
|
||||
return f"{ts}-{seq}"
|
||||
|
||||
def _resolve_start_offset(self, stream: _RunStream, last_event_id: str | None) -> int:
|
||||
if last_event_id is None:
|
||||
return stream.start_offset
|
||||
|
||||
for index, entry in enumerate(stream.events):
|
||||
if entry.id == last_event_id:
|
||||
return stream.start_offset + index + 1
|
||||
|
||||
if stream.events:
|
||||
logger.warning(
|
||||
"last_event_id=%s not found in retained buffer; replaying from earliest retained event",
|
||||
last_event_id,
|
||||
)
|
||||
return stream.start_offset
|
||||
|
||||
# -- StreamBridge API ------------------------------------------------------
|
||||
|
||||
async def publish(self, run_id: str, event: str, data: Any) -> None:
|
||||
stream = self._get_or_create_stream(run_id)
|
||||
entry = StreamEvent(id=self._next_id(run_id), event=event, data=data)
|
||||
async with stream.condition:
|
||||
stream.events.append(entry)
|
||||
if len(stream.events) > self._maxsize:
|
||||
overflow = len(stream.events) - self._maxsize
|
||||
del stream.events[:overflow]
|
||||
stream.start_offset += overflow
|
||||
stream.condition.notify_all()
|
||||
|
||||
async def publish_end(self, run_id: str) -> None:
|
||||
stream = self._get_or_create_stream(run_id)
|
||||
async with stream.condition:
|
||||
stream.ended = True
|
||||
stream.condition.notify_all()
|
||||
|
||||
async def subscribe(
|
||||
self,
|
||||
run_id: str,
|
||||
*,
|
||||
last_event_id: str | None = None,
|
||||
heartbeat_interval: float = 15.0,
|
||||
) -> AsyncIterator[StreamEvent]:
|
||||
stream = self._get_or_create_stream(run_id)
|
||||
async with stream.condition:
|
||||
next_offset = self._resolve_start_offset(stream, last_event_id)
|
||||
|
||||
while True:
|
||||
async with stream.condition:
|
||||
if next_offset < stream.start_offset:
|
||||
logger.warning(
|
||||
"subscriber for run %s fell behind retained buffer; resuming from offset %s",
|
||||
run_id,
|
||||
stream.start_offset,
|
||||
)
|
||||
next_offset = stream.start_offset
|
||||
|
||||
local_index = next_offset - stream.start_offset
|
||||
if 0 <= local_index < len(stream.events):
|
||||
entry = stream.events[local_index]
|
||||
next_offset += 1
|
||||
elif stream.ended:
|
||||
entry = END_SENTINEL
|
||||
else:
|
||||
try:
|
||||
await asyncio.wait_for(stream.condition.wait(), timeout=heartbeat_interval)
|
||||
except TimeoutError:
|
||||
entry = HEARTBEAT_SENTINEL
|
||||
else:
|
||||
continue
|
||||
|
||||
if entry is END_SENTINEL:
|
||||
yield END_SENTINEL
|
||||
return
|
||||
yield entry
|
||||
|
||||
async def cleanup(self, run_id: str, *, delay: float = 0) -> None:
|
||||
if delay > 0:
|
||||
await asyncio.sleep(delay)
|
||||
self._streams.pop(run_id, None)
|
||||
self._counters.pop(run_id, None)
|
||||
|
||||
async def close(self) -> None:
|
||||
self._streams.clear()
|
||||
self._counters.clear()
|
||||
@ -1,167 +0,0 @@
|
||||
"""Request-scoped user context for user-based authorization.
|
||||
|
||||
This module holds a :class:`~contextvars.ContextVar` that the gateway's
|
||||
auth middleware sets after a successful authentication. Repository
|
||||
methods read the contextvar via a sentinel default parameter, letting
|
||||
routers stay free of ``user_id`` boilerplate.
|
||||
|
||||
Three-state semantics for the repository ``user_id`` parameter (the
|
||||
consumer side of this module lives in ``deerflow.persistence.*``):
|
||||
|
||||
- ``_AUTO`` (module-private sentinel, default): read from contextvar;
|
||||
raise :class:`RuntimeError` if unset.
|
||||
- Explicit ``str``: use the provided value, overriding contextvar.
|
||||
- Explicit ``None``: no WHERE clause — used only by migration scripts
|
||||
and admin CLIs that intentionally bypass isolation.
|
||||
|
||||
Dependency direction
|
||||
--------------------
|
||||
``persistence`` (lower layer) reads from this module; ``gateway.auth``
|
||||
(higher layer) writes to it. ``CurrentUser`` is defined here as a
|
||||
:class:`typing.Protocol` so that ``persistence`` never needs to import
|
||||
the concrete ``User`` class from ``gateway.auth.models``. Any object
|
||||
with an ``.id: str`` attribute structurally satisfies the protocol.
|
||||
|
||||
Asyncio semantics
|
||||
-----------------
|
||||
``ContextVar`` is task-local under asyncio, not thread-local. Each
|
||||
FastAPI request runs in its own task, so the context is naturally
|
||||
isolated. ``asyncio.create_task`` and ``asyncio.to_thread`` inherit the
|
||||
parent task's context, which is typically the intended behaviour; if
|
||||
a background task must *not* see the foreground user, wrap it with
|
||||
``contextvars.copy_context()`` to get a clean copy.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from contextvars import ContextVar, Token
|
||||
from typing import Final, Protocol, runtime_checkable
|
||||
|
||||
|
||||
@runtime_checkable
|
||||
class CurrentUser(Protocol):
|
||||
"""Structural type for the current authenticated user.
|
||||
|
||||
Any object with an ``.id: str`` attribute satisfies this protocol.
|
||||
Concrete implementations live in ``app.gateway.auth.models.User``.
|
||||
"""
|
||||
|
||||
id: str
|
||||
|
||||
|
||||
_current_user: Final[ContextVar[CurrentUser | None]] = ContextVar("deerflow_current_user", default=None)
|
||||
|
||||
|
||||
def set_current_user(user: CurrentUser) -> Token[CurrentUser | None]:
|
||||
"""Set the current user for this async task.
|
||||
|
||||
Returns a reset token that should be passed to
|
||||
:func:`reset_current_user` in a ``finally`` block to restore the
|
||||
previous context.
|
||||
"""
|
||||
return _current_user.set(user)
|
||||
|
||||
|
||||
def reset_current_user(token: Token[CurrentUser | None]) -> None:
|
||||
"""Restore the context to the state captured by ``token``."""
|
||||
_current_user.reset(token)
|
||||
|
||||
|
||||
def get_current_user() -> CurrentUser | None:
|
||||
"""Return the current user, or ``None`` if unset.
|
||||
|
||||
Safe to call in any context. Used by code paths that can proceed
|
||||
without a user (e.g. migration scripts, public endpoints).
|
||||
"""
|
||||
return _current_user.get()
|
||||
|
||||
|
||||
def require_current_user() -> CurrentUser:
|
||||
"""Return the current user, or raise :class:`RuntimeError`.
|
||||
|
||||
Used by repository code that must not be called outside a
|
||||
request-authenticated context. The error message is phrased so
|
||||
that a caller debugging a stack trace can locate the offending
|
||||
code path.
|
||||
"""
|
||||
user = _current_user.get()
|
||||
if user is None:
|
||||
raise RuntimeError("repository accessed without user context")
|
||||
return user
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Effective user_id helpers (filesystem isolation)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
DEFAULT_USER_ID: Final[str] = "default"
|
||||
|
||||
|
||||
def get_effective_user_id() -> str:
|
||||
"""Return the current user's id as a string, or DEFAULT_USER_ID if unset.
|
||||
|
||||
Unlike :func:`require_current_user` this never raises — it is designed
|
||||
for filesystem-path resolution where a valid user bucket is always needed.
|
||||
"""
|
||||
user = _current_user.get()
|
||||
if user is None:
|
||||
return DEFAULT_USER_ID
|
||||
return str(user.id)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Sentinel-based user_id resolution
|
||||
# ---------------------------------------------------------------------------
|
||||
#
|
||||
# Repository methods accept a ``user_id`` keyword-only argument that
|
||||
# defaults to ``AUTO``. The three possible values drive distinct
|
||||
# behaviours; see the docstring on :func:`resolve_user_id`.
|
||||
|
||||
|
||||
class _AutoSentinel:
|
||||
"""Singleton marker meaning 'resolve user_id from contextvar'."""
|
||||
|
||||
_instance: _AutoSentinel | None = None
|
||||
|
||||
def __new__(cls) -> _AutoSentinel:
|
||||
if cls._instance is None:
|
||||
cls._instance = super().__new__(cls)
|
||||
return cls._instance
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return "<AUTO>"
|
||||
|
||||
|
||||
AUTO: Final[_AutoSentinel] = _AutoSentinel()
|
||||
|
||||
|
||||
def resolve_user_id(
|
||||
value: str | None | _AutoSentinel,
|
||||
*,
|
||||
method_name: str = "repository method",
|
||||
) -> str | None:
|
||||
"""Resolve the user_id parameter passed to a repository method.
|
||||
|
||||
Three-state semantics:
|
||||
|
||||
- :data:`AUTO` (default): read from contextvar; raise
|
||||
:class:`RuntimeError` if no user is in context. This is the
|
||||
common case for request-scoped calls.
|
||||
- Explicit ``str``: use the provided id verbatim, overriding any
|
||||
contextvar value. Useful for tests and admin-override flows.
|
||||
- Explicit ``None``: no filter — the repository should skip the
|
||||
user_id WHERE clause entirely. Reserved for migration scripts
|
||||
and CLI tools that intentionally bypass isolation.
|
||||
"""
|
||||
if isinstance(value, _AutoSentinel):
|
||||
user = _current_user.get()
|
||||
if user is None:
|
||||
raise RuntimeError(f"{method_name} called with user_id=AUTO but no user context is set; pass an explicit user_id, set the contextvar via auth middleware, or opt out with user_id=None for migration/CLI paths.")
|
||||
# Coerce to ``str`` at the boundary: ``User.id`` is typed as
|
||||
# ``UUID`` for the API surface, but the persistence layer
|
||||
# stores ``user_id`` as ``String(64)`` and aiosqlite cannot
|
||||
# bind a raw UUID object to a VARCHAR column ("type 'UUID' is
|
||||
# not supported"). Honour the documented return type here
|
||||
# rather than ripple a type change through every caller.
|
||||
return str(user.id)
|
||||
return value
|
||||
Loading…
x
Reference in New Issue
Block a user