Merge branch 'main' into rayhpeng/persistence-scaffold

# Conflicts:
#	.env.example
#	backend/packages/harness/deerflow/agents/middlewares/title_middleware.py
This commit is contained in:
rayhpeng 2026-04-04 21:28:07 +08:00
commit 4737fc3aa9
180 changed files with 10945 additions and 787 deletions

View File

@ -32,6 +32,11 @@ INFOQUEST_API_KEY=your-infoquest-api-key
# GitHub API Token
# GITHUB_TOKEN=your-github-token
<<<<<<< HEAD
# Database (only needed when config.yaml has database.backend: postgres)
# DATABASE_URL=postgresql://deerflow:password@localhost:5432/deerflow
=======
# WECOM_BOT_ID=your-wecom-bot-id
# WECOM_BOT_SECRET=your-wecom-bot-secret
>>>>>>> main

2
.gitignore vendored
View File

@ -54,3 +54,5 @@ web/
# Deployment artifacts
backend/Dockerfile.langgraph
config.yaml.bak
.playwright-mcp
.gstack/

View File

@ -2,12 +2,14 @@
.PHONY: help config config-upgrade check install dev dev-daemon start stop up down clean docker-init docker-start docker-stop docker-logs docker-logs-frontend docker-logs-gateway
PYTHON ?= python
BASH ?= bash
# Detect OS for Windows compatibility
ifeq ($(OS),Windows_NT)
SHELL := cmd.exe
PYTHON ?= python
else
PYTHON ?= python3
endif
help:
@ -96,6 +98,7 @@ setup-sandbox:
# Start all services in development mode (with hot-reloading)
dev:
@$(PYTHON) ./scripts/check.py
ifeq ($(OS),Windows_NT)
@call scripts\run-with-git-bash.cmd ./scripts/serve.sh --dev
else
@ -104,6 +107,7 @@ endif
# Start all services in production mode (with optimizations)
start:
@$(PYTHON) ./scripts/check.py
ifeq ($(OS),Windows_NT)
@call scripts\run-with-git-bash.cmd ./scripts/serve.sh --prod
else
@ -112,7 +116,12 @@ endif
# Start all services in daemon mode (background)
dev-daemon:
@$(PYTHON) ./scripts/check.py
ifeq ($(OS),Windows_NT)
@call scripts\run-with-git-bash.cmd ./scripts/start-daemon.sh
else
@./scripts/start-daemon.sh
endif
# Stop all services
stop:

View File

@ -46,6 +46,7 @@ DeerFlow has newly integrated the intelligent search and crawling toolset indepe
- [🦌 DeerFlow - 2.0](#-deerflow---20)
- [Official Website](#official-website)
- [Coding Plan from ByteDance Volcengine](#coding-plan-from-bytedance-volcengine)
- [InfoQuest](#infoquest)
- [Table of Contents](#table-of-contents)
- [One-Line Agent Setup](#one-line-agent-setup)
@ -59,6 +60,8 @@ DeerFlow has newly integrated the intelligent search and crawling toolset indepe
- [MCP Server](#mcp-server)
- [IM Channels](#im-channels)
- [LangSmith Tracing](#langsmith-tracing)
- [Langfuse Tracing](#langfuse-tracing)
- [Using Both Providers](#using-both-providers)
- [From Deep Research to Super Agent Harness](#from-deep-research-to-super-agent-harness)
- [Core Features](#core-features)
- [Skills \& Tools](#skills--tools)
@ -71,6 +74,8 @@ DeerFlow has newly integrated the intelligent search and crawling toolset indepe
- [Embedded Python Client](#embedded-python-client)
- [Documentation](#documentation)
- [⚠️ Security Notice](#-security-notice)
- [Improper Deployment May Introduce Security Risks](#improper-deployment-may-introduce-security-risks)
- [Security Recommendations](#security-recommendations)
- [Contributing](#contributing)
- [License](#license)
- [Acknowledgments](#acknowledgments)
@ -243,6 +248,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed Docker development guide.
If you prefer running services locally:
Prerequisite: complete the "Configuration" steps above first (`make config` and model API keys). `make dev` requires a valid configuration file (defaults to `config.yaml` in the project root; can be overridden via `DEER_FLOW_CONFIG_PATH`).
On Windows, run the local development flow from Git Bash. Native `cmd.exe` and PowerShell shells are not supported for the bash-based service scripts, and WSL is not guaranteed because some scripts rely on Git for Windows utilities such as `cygpath`.
1. **Check prerequisites**:
```bash
@ -301,6 +307,7 @@ DeerFlow supports receiving tasks from messaging apps. Channels auto-start when
| Telegram | Bot API (long-polling) | Easy |
| Slack | Socket Mode | Moderate |
| Feishu / Lark | WebSocket | Moderate |
| WeCom | WebSocket | Moderate |
**Configuration in `config.yaml`:**
@ -328,6 +335,11 @@ channels:
# domain: https://open.feishu.cn # China (default)
# domain: https://open.larksuite.com # International
wecom:
enabled: true
bot_id: $WECOM_BOT_ID
bot_secret: $WECOM_BOT_SECRET
slack:
enabled: true
bot_token: $SLACK_BOT_TOKEN # xoxb-...
@ -371,6 +383,10 @@ SLACK_APP_TOKEN=xapp-...
# Feishu / Lark
FEISHU_APP_ID=cli_xxxx
FEISHU_APP_SECRET=your_app_secret
# WeCom
WECOM_BOT_ID=your_bot_id
WECOM_BOT_SECRET=your_bot_secret
```
**Telegram Setup**
@ -393,6 +409,14 @@ FEISHU_APP_SECRET=your_app_secret
3. Under **Events**, subscribe to `im.message.receive_v1` and select **Long Connection** mode.
4. Copy the App ID and App Secret. Set `FEISHU_APP_ID` and `FEISHU_APP_SECRET` in `.env` and enable the channel in `config.yaml`.
**WeCom Setup**
1. Create a bot on the WeCom AI Bot platform and obtain the `bot_id` and `bot_secret`.
2. Enable `channels.wecom` in `config.yaml` and fill in `bot_id` / `bot_secret`.
3. Set `WECOM_BOT_ID` and `WECOM_BOT_SECRET` in `.env`.
4. Make sure backend dependencies include `wecom-aibot-python-sdk`. The channel uses a WebSocket long connection and does not require a public callback URL.
5. The current integration supports inbound text, image, and file messages. Final images/files generated by the agent are also sent back to the WeCom conversation.
When DeerFlow runs in Docker Compose, IM channels execute inside the `gateway` container. In that case, do not point `channels.langgraph_url` or `channels.gateway_url` at `localhost`; use container service names such as `http://langgraph:2024` and `http://gateway:8001`, or set `DEER_FLOW_CHANNELS_LANGGRAPH_URL` and `DEER_FLOW_CHANNELS_GATEWAY_URL`.
**Commands**
@ -422,6 +446,27 @@ LANGSMITH_API_KEY=lsv2_pt_xxxxxxxxxxxxxxxx
LANGSMITH_PROJECT=xxx
```
#### Langfuse Tracing
DeerFlow also supports [Langfuse](https://langfuse.com) observability for LangChain-compatible runs.
Add the following to your `.env` file:
```bash
LANGFUSE_TRACING=true
LANGFUSE_PUBLIC_KEY=pk-lf-xxxxxxxxxxxxxxxx
LANGFUSE_SECRET_KEY=sk-lf-xxxxxxxxxxxxxxxx
LANGFUSE_BASE_URL=https://cloud.langfuse.com
```
If you are using a self-hosted Langfuse instance, set `LANGFUSE_BASE_URL` to your deployment URL.
#### Using Both Providers
If both LangSmith and Langfuse are enabled, DeerFlow attaches both tracing callbacks and reports the same model activity to both systems.
If a provider is explicitly enabled but missing required credentials, or if its callback fails to initialize, DeerFlow fails fast when tracing is initialized during model creation and the error message names the provider that caused the failure.
For Docker deployments, tracing is disabled by default. Set `LANGSMITH_TRACING=true` and `LANGSMITH_API_KEY` in your `.env` to enable it.
## From Deep Research to Super Agent Harness

View File

@ -180,6 +180,7 @@ make down # 停止并移除容器
如果你更希望直接在本地启动各个服务:
前提:先完成上面的“配置”步骤(`make config` 和模型 API key 配置)。`make dev` 需要有效配置文件,默认读取项目根目录下的 `config.yaml`,也可以通过 `DEER_FLOW_CONFIG_PATH` 覆盖。
在 Windows 上,请使用 Git Bash 运行本地开发流程。基于 bash 的服务脚本不支持直接在原生 `cmd.exe` 或 PowerShell 中执行,且 WSL 也不保证可用,因为部分脚本依赖 Git for Windows 的 `cygpath` 等工具。
1. **检查依赖环境**
```bash
@ -231,6 +232,7 @@ DeerFlow 支持从即时通讯应用接收任务。只要配置完成,对应
| Telegram | Bot APIlong-polling | 简单 |
| Slack | Socket Mode | 中等 |
| Feishu / Lark | WebSocket | 中等 |
| 企业微信智能机器人 | WebSocket | 中等 |
**`config.yaml` 中的配置示例:**
@ -258,6 +260,11 @@ channels:
# domain: https://open.feishu.cn # 国内版(默认)
# domain: https://open.larksuite.com # 国际版
wecom:
enabled: true
bot_id: $WECOM_BOT_ID
bot_secret: $WECOM_BOT_SECRET
slack:
enabled: true
bot_token: $SLACK_BOT_TOKEN # xoxb-...
@ -301,6 +308,10 @@ SLACK_APP_TOKEN=xapp-...
# Feishu / Lark
FEISHU_APP_ID=cli_xxxx
FEISHU_APP_SECRET=your_app_secret
# 企业微信智能机器人
WECOM_BOT_ID=your_bot_id
WECOM_BOT_SECRET=your_bot_secret
```
**Telegram 配置**
@ -323,6 +334,14 @@ FEISHU_APP_SECRET=your_app_secret
3. 在 **事件订阅** 中订阅 `im.message.receive_v1`,连接方式选择 **长连接**
4. 复制 App ID 和 App Secret`.env` 中设置 `FEISHU_APP_ID``FEISHU_APP_SECRET`,并在 `config.yaml` 中启用该渠道。
**企业微信智能机器人配置**
1. 在企业微信智能机器人平台创建机器人,获取 `bot_id``bot_secret`
2. 在 `config.yaml` 中启用 `channels.wecom`,并填入 `bot_id` / `bot_secret`
3. 在 `.env` 中设置 `WECOM_BOT_ID``WECOM_BOT_SECRET`
4. 安装后端依赖时确保包含 `wecom-aibot-python-sdk`,渠道会通过 WebSocket 长连接接收消息,无需公网回调地址。
5. 当前支持文本、图片和文件入站消息agent 生成的最终图片/文件也会回传到企业微信会话中。
**命令**
渠道连接完成后,你可以直接在聊天窗口里和 DeerFlow 交互:

View File

@ -232,7 +232,7 @@ Proxied through nginx: `/api/langgraph/*` → LangGraph, all other `/api/*` →
- `ls` - Directory listing (tree format, max 2 levels)
- `read_file` - Read file contents with optional line range
- `write_file` - Write/append to files, creates directories
- `str_replace` - Substring replacement (single or all occurrences)
- `str_replace` - Substring replacement (single or all occurrences); same-path serialization is scoped to `(sandbox.id, path)` so isolated sandboxes do not contend on identical virtual paths inside one process
### Subagent System (`packages/harness/deerflow/subagents/`)

View File

@ -2,7 +2,7 @@ install:
uv sync
dev:
uv run langgraph dev --no-browser --allow-blocking --no-reload
uv run langgraph dev --no-browser --no-reload --n-jobs-per-worker 10
gateway:
PYTHONPATH=. uv run uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001

View File

@ -78,6 +78,7 @@ Per-thread isolated execution with virtual path translation:
- **Virtual paths**: `/mnt/user-data/{workspace,uploads,outputs}` → thread-specific physical directories
- **Skills path**: `/mnt/skills``deer-flow/skills/` directory
- **Skills loading**: Recursively discovers nested `SKILL.md` files under `skills/{public,custom}` and preserves nested container paths
- **File-write safety**: `str_replace` serializes read-modify-write per `(sandbox.id, path)` so isolated sandboxes keep concurrency even when virtual paths match
- **Tools**: `bash`, `ls`, `read_file`, `write_file`, `str_replace` (`bash` is disabled by default when using `LocalSandboxProvider`; use `AioSandboxProvider` for isolated shell access)
### Subagent System
@ -330,7 +331,28 @@ LANGSMITH_PROJECT=xxx
**Legacy variables:** The `LANGCHAIN_TRACING_V2`, `LANGCHAIN_API_KEY`, `LANGCHAIN_PROJECT`, and `LANGCHAIN_ENDPOINT` variables are also supported for backward compatibility. `LANGSMITH_*` variables take precedence when both are set.
**Docker:** In `docker-compose.yaml`, tracing is disabled by default (`LANGSMITH_TRACING=false`). Set `LANGSMITH_TRACING=true` and provide `LANGSMITH_API_KEY` in your `.env` to enable it in containerized deployments.
### Langfuse Tracing
DeerFlow also supports [Langfuse](https://langfuse.com) observability for LangChain-compatible runs.
Add the following to your `.env` file:
```bash
LANGFUSE_TRACING=true
LANGFUSE_PUBLIC_KEY=pk-lf-xxxxxxxxxxxxxxxx
LANGFUSE_SECRET_KEY=sk-lf-xxxxxxxxxxxxxxxx
LANGFUSE_BASE_URL=https://cloud.langfuse.com
```
If you are using a self-hosted Langfuse deployment, set `LANGFUSE_BASE_URL` to your Langfuse host.
### Dual Provider Behavior
If both LangSmith and Langfuse are enabled, DeerFlow initializes and attaches both callbacks so the same run data is reported to both systems.
If a provider is explicitly enabled but required credentials are missing, or the provider callback cannot be initialized, DeerFlow raises an error when tracing is initialized during model creation instead of silently disabling tracing.
**Docker:** In `docker-compose.yaml`, tracing is disabled by default (`LANGSMITH_TRACING=false`). Set `LANGSMITH_TRACING=true` and/or `LANGFUSE_TRACING=true` in your `.env`, together with the required credentials, to enable tracing in containerized deployments.
---

View File

@ -0,0 +1,20 @@
"""Shared command definitions used by all channel implementations.
Keeping the authoritative command set in one place ensures that channel
parsers (e.g. Feishu) and the ChannelManager dispatcher stay in sync
automatically adding or removing a command here is the single edit
required.
"""
from __future__ import annotations
KNOWN_CHANNEL_COMMANDS: frozenset[str] = frozenset(
{
"/bootstrap",
"/new",
"/status",
"/models",
"/memory",
"/help",
}
)

View File

@ -9,11 +9,18 @@ import threading
from typing import Any
from app.channels.base import Channel
from app.channels.commands import KNOWN_CHANNEL_COMMANDS
from app.channels.message_bus import InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
logger = logging.getLogger(__name__)
def _is_feishu_command(text: str) -> bool:
if not text.startswith("/"):
return False
return text.split(maxsplit=1)[0].lower() in KNOWN_CHANNEL_COMMANDS
class FeishuChannel(Channel):
"""Feishu/Lark IM channel using the ``lark-oapi`` WebSocket client.
@ -199,7 +206,9 @@ class FeishuChannel(Channel):
await asyncio.sleep(delay)
logger.error("[Feishu] send failed after %d attempts: %s", _max_retries, last_exc)
raise last_exc # type: ignore[misc]
if last_exc is None:
raise RuntimeError("Feishu send failed without an exception from any attempt")
raise last_exc
async def send_file(self, msg: OutboundMessage, attachment: ResolvedAttachment) -> bool:
if not self._api_client:
@ -509,8 +518,9 @@ class FeishuChannel(Channel):
logger.info("[Feishu] empty text, ignoring message")
return
# Check if it's a command
if text.startswith("/"):
# Only treat known slash commands as commands; absolute paths and
# other slash-prefixed text should be handled as normal chat.
if _is_feishu_command(text):
msg_type = InboundMessageType.COMMAND
else:
msg_type = InboundMessageType.CHAT

View File

@ -7,11 +7,13 @@ import logging
import mimetypes
import re
import time
from collections.abc import Mapping
from collections.abc import Awaitable, Callable, Mapping
from typing import Any
import httpx
from langgraph_sdk.errors import ConflictError
from app.channels.commands import KNOWN_CHANNEL_COMMANDS
from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
from app.channels.store import ChannelStore
@ -35,8 +37,49 @@ CHANNEL_CAPABILITIES = {
"feishu": {"supports_streaming": True},
"slack": {"supports_streaming": False},
"telegram": {"supports_streaming": False},
"wecom": {"supports_streaming": True},
}
InboundFileReader = Callable[[dict[str, Any], httpx.AsyncClient], Awaitable[bytes | None]]
INBOUND_FILE_READERS: dict[str, InboundFileReader] = {}
def register_inbound_file_reader(channel_name: str, reader: InboundFileReader) -> None:
INBOUND_FILE_READERS[channel_name] = reader
async def _read_http_inbound_file(file_info: dict[str, Any], client: httpx.AsyncClient) -> bytes | None:
url = file_info.get("url")
if not isinstance(url, str) or not url:
return None
resp = await client.get(url)
resp.raise_for_status()
return resp.content
async def _read_wecom_inbound_file(file_info: dict[str, Any], client: httpx.AsyncClient) -> bytes | None:
data = await _read_http_inbound_file(file_info, client)
if data is None:
return None
aeskey = file_info.get("aeskey") if isinstance(file_info.get("aeskey"), str) else None
if not aeskey:
return data
try:
from aibot.crypto_utils import decrypt_file
except Exception:
logger.exception("[Manager] failed to import WeCom decrypt_file")
return None
return decrypt_file(data, aeskey)
register_inbound_file_reader("wecom", _read_wecom_inbound_file)
class InvalidChannelSessionConfigError(ValueError):
"""Raised when IM channel session overrides contain invalid agent config."""
@ -341,6 +384,105 @@ def _prepare_artifact_delivery(
return response_text, attachments
async def _ingest_inbound_files(thread_id: str, msg: InboundMessage) -> list[dict[str, Any]]:
if not msg.files:
return []
from deerflow.uploads.manager import claim_unique_filename, ensure_uploads_dir, normalize_filename
uploads_dir = ensure_uploads_dir(thread_id)
seen_names = {entry.name for entry in uploads_dir.iterdir() if entry.is_file()}
created: list[dict[str, Any]] = []
file_reader = INBOUND_FILE_READERS.get(msg.channel_name, _read_http_inbound_file)
async with httpx.AsyncClient(timeout=httpx.Timeout(20.0)) as client:
for idx, f in enumerate(msg.files):
if not isinstance(f, dict):
continue
ftype = f.get("type") if isinstance(f.get("type"), str) else "file"
filename = f.get("filename") if isinstance(f.get("filename"), str) else ""
try:
data = await file_reader(f, client)
except Exception:
logger.exception(
"[Manager] failed to read inbound file: channel=%s, file=%s",
msg.channel_name,
f.get("url") or filename or idx,
)
continue
if data is None:
logger.warning(
"[Manager] inbound file reader returned no data: channel=%s, file=%s",
msg.channel_name,
f.get("url") or filename or idx,
)
continue
if not filename:
ext = ".bin"
if ftype == "image":
ext = ".png"
filename = f"{msg.thread_ts or 'msg'}_{idx}{ext}"
try:
safe_name = claim_unique_filename(normalize_filename(filename), seen_names)
except ValueError:
logger.warning(
"[Manager] skipping inbound file with unsafe filename: channel=%s, file=%r",
msg.channel_name,
filename,
)
continue
dest = uploads_dir / safe_name
try:
dest.write_bytes(data)
except Exception:
logger.exception("[Manager] failed to write inbound file: %s", dest)
continue
created.append(
{
"filename": safe_name,
"size": len(data),
"path": f"/mnt/user-data/uploads/{safe_name}",
"is_image": ftype == "image",
}
)
return created
def _format_uploaded_files_block(files: list[dict[str, Any]]) -> str:
lines = [
"<uploaded_files>",
"The following files were uploaded in this message:",
"",
]
if not files:
lines.append("(empty)")
else:
for f in files:
filename = f.get("filename", "")
size = int(f.get("size") or 0)
size_kb = size / 1024 if size else 0
size_str = f"{size_kb:.1f} KB" if size_kb < 1024 else f"{size_kb / 1024:.1f} MB"
path = f.get("path", "")
is_image = bool(f.get("is_image"))
file_kind = "image" if is_image else "file"
lines.append(f"- {filename} ({size_str})")
lines.append(f" Type: {file_kind}")
lines.append(f" Path: {path}")
lines.append("")
lines.append("Use `read_file` for text-based files and documents.")
lines.append("Use `view_image` for image files (jpg, jpeg, png, webp) so the model can inspect the image content.")
lines.append("</uploaded_files>")
return "\n".join(lines)
class ChannelManager:
"""Core dispatcher that bridges IM channels to the DeerFlow agent.
@ -535,6 +677,11 @@ class ChannelManager:
assistant_id, run_config, run_context = self._resolve_run_params(msg, thread_id)
if extra_context:
run_context.update(extra_context)
uploaded = await _ingest_inbound_files(thread_id, msg)
if uploaded:
msg.text = f"{_format_uploaded_files_block(uploaded)}\n\n{msg.text}".strip()
if self._channel_supports_streaming(msg.channel_name):
await self._handle_streaming_chat(
client,
@ -735,7 +882,8 @@ class ChannelManager:
"/help — Show this help"
)
else:
reply = f"Unknown command: /{command}. Type /help for available commands."
available = " | ".join(sorted(KNOWN_CHANNEL_COMMANDS))
reply = f"Unknown command: /{command}. Available commands: {available}"
outbound = OutboundMessage(
channel_name=msg.channel_name,

View File

@ -17,6 +17,7 @@ _CHANNEL_REGISTRY: dict[str, str] = {
"feishu": "app.channels.feishu:FeishuChannel",
"slack": "app.channels.slack:SlackChannel",
"telegram": "app.channels.telegram:TelegramChannel",
"wecom": "app.channels.wecom:WeComChannel",
}
_CHANNELS_LANGGRAPH_URL_ENV = "DEER_FLOW_CHANNELS_LANGGRAPH_URL"

View File

@ -126,7 +126,9 @@ class SlackChannel(Channel):
)
except Exception:
pass
raise last_exc # type: ignore[misc]
if last_exc is None:
raise RuntimeError("Slack send failed without an exception from any attempt")
raise last_exc
async def send_file(self, msg: OutboundMessage, attachment: ResolvedAttachment) -> bool:
if not self._web_client:

View File

@ -125,7 +125,9 @@ class TelegramChannel(Channel):
await asyncio.sleep(delay)
logger.error("[Telegram] send failed after %d attempts: %s", _max_retries, last_exc)
raise last_exc # type: ignore[misc]
if last_exc is None:
raise RuntimeError("Telegram send failed without an exception from any attempt")
raise last_exc
async def send_file(self, msg: OutboundMessage, attachment: ResolvedAttachment) -> bool:
if not self._application:

View File

@ -0,0 +1,394 @@
from __future__ import annotations
import asyncio
import base64
import hashlib
import logging
from collections.abc import Awaitable, Callable
from typing import Any, cast
from app.channels.base import Channel
from app.channels.message_bus import (
InboundMessageType,
MessageBus,
OutboundMessage,
ResolvedAttachment,
)
logger = logging.getLogger(__name__)
class WeComChannel(Channel):
def __init__(self, bus: MessageBus, config: dict[str, Any]) -> None:
super().__init__(name="wecom", bus=bus, config=config)
self._bot_id: str | None = None
self._bot_secret: str | None = None
self._ws_client = None
self._ws_task: asyncio.Task | None = None
self._ws_frames: dict[str, dict[str, Any]] = {}
self._ws_stream_ids: dict[str, str] = {}
self._working_message = "Working on it..."
def _clear_ws_context(self, thread_ts: str | None) -> None:
if not thread_ts:
return
self._ws_frames.pop(thread_ts, None)
self._ws_stream_ids.pop(thread_ts, None)
async def _send_ws_upload_command(self, req_id: str, body: dict[str, Any], cmd: str) -> dict[str, Any]:
if not self._ws_client:
raise RuntimeError("WeCom WebSocket client is not available")
ws_manager = getattr(self._ws_client, "_ws_manager", None)
send_reply = getattr(ws_manager, "send_reply", None)
if not callable(send_reply):
raise RuntimeError("Installed wecom-aibot-python-sdk does not expose the WebSocket media upload API expected by DeerFlow. Use wecom-aibot-python-sdk==0.1.6 or update the adapter.")
send_reply_async = cast(Callable[[str, dict[str, Any], str], Awaitable[dict[str, Any]]], send_reply)
return await send_reply_async(req_id, body, cmd)
async def start(self) -> None:
if self._running:
return
bot_id = self.config.get("bot_id")
bot_secret = self.config.get("bot_secret")
working_message = self.config.get("working_message")
self._bot_id = bot_id if isinstance(bot_id, str) and bot_id else None
self._bot_secret = bot_secret if isinstance(bot_secret, str) and bot_secret else None
self._working_message = working_message if isinstance(working_message, str) and working_message else "Working on it..."
if not self._bot_id or not self._bot_secret:
logger.error("WeCom channel requires bot_id and bot_secret")
return
try:
from aibot import WSClient, WSClientOptions
except ImportError:
logger.error("wecom-aibot-python-sdk is not installed. Install it with: uv add wecom-aibot-python-sdk")
return
else:
self._ws_client = WSClient(WSClientOptions(bot_id=self._bot_id, secret=self._bot_secret, logger=logger))
self._ws_client.on("message.text", self._on_ws_text)
self._ws_client.on("message.mixed", self._on_ws_mixed)
self._ws_client.on("message.image", self._on_ws_image)
self._ws_client.on("message.file", self._on_ws_file)
self._ws_task = asyncio.create_task(self._ws_client.connect())
self._running = True
self.bus.subscribe_outbound(self._on_outbound)
logger.info("WeCom channel started")
async def stop(self) -> None:
self._running = False
self.bus.unsubscribe_outbound(self._on_outbound)
if self._ws_task:
try:
self._ws_task.cancel()
except Exception:
pass
self._ws_task = None
if self._ws_client:
try:
self._ws_client.disconnect()
except Exception:
pass
self._ws_client = None
self._ws_frames.clear()
self._ws_stream_ids.clear()
logger.info("WeCom channel stopped")
async def send(self, msg: OutboundMessage, *, _max_retries: int = 3) -> None:
if self._ws_client:
await self._send_ws(msg, _max_retries=_max_retries)
return
logger.warning("[WeCom] send called but WebSocket client is not available")
async def _on_outbound(self, msg: OutboundMessage) -> None:
if msg.channel_name != self.name:
return
try:
await self.send(msg)
except Exception:
logger.exception("Failed to send outbound message on channel %s", self.name)
if msg.is_final:
self._clear_ws_context(msg.thread_ts)
return
for attachment in msg.attachments:
try:
success = await self.send_file(msg, attachment)
if not success:
logger.warning("[%s] file upload skipped for %s", self.name, attachment.filename)
except Exception:
logger.exception("[%s] failed to upload file %s", self.name, attachment.filename)
if msg.is_final:
self._clear_ws_context(msg.thread_ts)
async def send_file(self, msg: OutboundMessage, attachment: ResolvedAttachment) -> bool:
if not msg.is_final:
return True
if not self._ws_client:
return False
if not msg.thread_ts:
return False
frame = self._ws_frames.get(msg.thread_ts)
if not frame:
return False
media_type = "image" if attachment.is_image else "file"
size_limit = 2 * 1024 * 1024 if attachment.is_image else 20 * 1024 * 1024
if attachment.size > size_limit:
logger.warning(
"[WeCom] %s too large (%d bytes), skipping: %s",
media_type,
attachment.size,
attachment.filename,
)
return False
try:
media_id = await self._upload_media_ws(
media_type=media_type,
filename=attachment.filename,
path=str(attachment.actual_path),
size=attachment.size,
)
if not media_id:
return False
body = {media_type: {"media_id": media_id}, "msgtype": media_type}
await self._ws_client.reply(frame, body)
logger.debug("[WeCom] %s sent via ws: %s", media_type, attachment.filename)
return True
except Exception:
logger.exception("[WeCom] failed to upload/send file via ws: %s", attachment.filename)
return False
async def _on_ws_text(self, frame: dict[str, Any]) -> None:
body = frame.get("body", {}) or {}
text = ((body.get("text") or {}).get("content") or "").strip()
quote = body.get("quote", {}).get("text", {}).get("content", "").strip()
if not text and not quote:
return
await self._publish_ws_inbound(frame, text + (f"\nQuote message: {quote}" if quote else ""))
async def _on_ws_mixed(self, frame: dict[str, Any]) -> None:
body = frame.get("body", {}) or {}
mixed = body.get("mixed") or {}
items = mixed.get("msg_item") or []
parts: list[str] = []
files: list[dict[str, Any]] = []
for item in items:
item_type = (item or {}).get("msgtype")
if item_type == "text":
content = (((item or {}).get("text") or {}).get("content") or "").strip()
if content:
parts.append(content)
elif item_type in ("image", "file"):
payload = (item or {}).get(item_type) or {}
url = payload.get("url")
aeskey = payload.get("aeskey")
if isinstance(url, str) and url:
files.append(
{
"type": item_type,
"url": url,
"aeskey": (aeskey if isinstance(aeskey, str) and aeskey else None),
}
)
text = "\n\n".join(parts).strip()
if not text and not files:
return
if not text:
text = "receive image/file"
await self._publish_ws_inbound(frame, text, files=files)
async def _on_ws_image(self, frame: dict[str, Any]) -> None:
body = frame.get("body", {}) or {}
image = body.get("image") or {}
url = image.get("url")
aeskey = image.get("aeskey")
if not isinstance(url, str) or not url:
return
await self._publish_ws_inbound(
frame,
"receive image ",
files=[
{
"type": "image",
"url": url,
"aeskey": aeskey if isinstance(aeskey, str) and aeskey else None,
}
],
)
async def _on_ws_file(self, frame: dict[str, Any]) -> None:
body = frame.get("body", {}) or {}
file_obj = body.get("file") or {}
url = file_obj.get("url")
aeskey = file_obj.get("aeskey")
if not isinstance(url, str) or not url:
return
await self._publish_ws_inbound(
frame,
"receive file",
files=[
{
"type": "file",
"url": url,
"aeskey": aeskey if isinstance(aeskey, str) and aeskey else None,
}
],
)
async def _publish_ws_inbound(
self,
frame: dict[str, Any],
text: str,
*,
files: list[dict[str, Any]] | None = None,
) -> None:
if not self._ws_client:
return
try:
from aibot import generate_req_id
except Exception:
return
body = frame.get("body", {}) or {}
msg_id = body.get("msgid")
if not msg_id:
return
user_id = (body.get("from") or {}).get("userid")
inbound_type = InboundMessageType.COMMAND if text.startswith("/") else InboundMessageType.CHAT
inbound = self._make_inbound(
chat_id=user_id, # keep user's conversation in memory
user_id=user_id,
text=text,
msg_type=inbound_type,
thread_ts=msg_id,
files=files or [],
metadata={"aibotid": body.get("aibotid"), "chattype": body.get("chattype")},
)
inbound.topic_id = user_id # keep the same thread
stream_id = generate_req_id("stream")
self._ws_frames[msg_id] = frame
self._ws_stream_ids[msg_id] = stream_id
try:
await self._ws_client.reply_stream(frame, stream_id, self._working_message, False)
except Exception:
pass
await self.bus.publish_inbound(inbound)
async def _send_ws(self, msg: OutboundMessage, *, _max_retries: int = 3) -> None:
if not self._ws_client:
return
try:
from aibot import generate_req_id
except Exception:
generate_req_id = None
if msg.thread_ts and msg.thread_ts in self._ws_frames:
frame = self._ws_frames[msg.thread_ts]
stream_id = self._ws_stream_ids.get(msg.thread_ts)
if not stream_id and generate_req_id:
stream_id = generate_req_id("stream")
self._ws_stream_ids[msg.thread_ts] = stream_id
if not stream_id:
return
last_exc: Exception | None = None
for attempt in range(_max_retries):
try:
await self._ws_client.reply_stream(frame, stream_id, msg.text, bool(msg.is_final))
return
except Exception as exc:
last_exc = exc
if attempt < _max_retries - 1:
await asyncio.sleep(2**attempt)
if last_exc:
raise last_exc
body = {"msgtype": "markdown", "markdown": {"content": msg.text}}
last_exc = None
for attempt in range(_max_retries):
try:
await self._ws_client.send_message(msg.chat_id, body)
return
except Exception as exc:
last_exc = exc
if attempt < _max_retries - 1:
await asyncio.sleep(2**attempt)
if last_exc:
raise last_exc
async def _upload_media_ws(
self,
*,
media_type: str,
filename: str,
path: str,
size: int,
) -> str | None:
if not self._ws_client:
return None
try:
from aibot import generate_req_id
except Exception:
return None
chunk_size = 512 * 1024
total_chunks = (size + chunk_size - 1) // chunk_size
if total_chunks < 1 or total_chunks > 100:
logger.warning("[WeCom] invalid total_chunks=%d for %s", total_chunks, filename)
return None
md5_hasher = hashlib.md5()
with open(path, "rb") as f:
for chunk in iter(lambda: f.read(1024 * 1024), b""):
md5_hasher.update(chunk)
md5 = md5_hasher.hexdigest()
init_req_id = generate_req_id("aibot_upload_media_init")
init_body = {
"type": media_type,
"filename": filename,
"total_size": int(size),
"total_chunks": int(total_chunks),
"md5": md5,
}
init_ack = await self._send_ws_upload_command(init_req_id, init_body, "aibot_upload_media_init")
upload_id = (init_ack.get("body") or {}).get("upload_id")
if not upload_id:
logger.warning("[WeCom] upload init returned no upload_id: %s", init_ack)
return None
with open(path, "rb") as f:
for idx in range(total_chunks):
data = f.read(chunk_size)
if not data:
break
chunk_req_id = generate_req_id("aibot_upload_media_chunk")
chunk_body = {
"upload_id": upload_id,
"chunk_index": int(idx),
"base64_data": base64.b64encode(data).decode("utf-8"),
}
await self._send_ws_upload_command(chunk_req_id, chunk_body, "aibot_upload_media_chunk")
finish_req_id = generate_req_id("aibot_upload_media_finish")
finish_ack = await self._send_ws_upload_command(finish_req_id, {"upload_id": upload_id}, "aibot_upload_media_finish")
media_id = (finish_ack.get("body") or {}).get("media_id")
if not media_id:
logger.warning("[WeCom] upload finish returned no media_id: %s", finish_ack)
return None
return media_id

View File

@ -49,6 +49,7 @@ class Fact(BaseModel):
confidence: float = Field(default=0.5, description="Confidence score (0-1)")
createdAt: str = Field(default="", description="Creation timestamp")
source: str = Field(default="unknown", description="Source thread ID")
sourceError: str | None = Field(default=None, description="Optional description of the prior mistake or wrong approach")
class MemoryResponse(BaseModel):
@ -108,6 +109,7 @@ class MemoryStatusResponse(BaseModel):
@router.get(
"/memory",
response_model=MemoryResponse,
response_model_exclude_none=True,
summary="Get Memory Data",
description="Retrieve the current global memory data including user context, history, and facts.",
)
@ -152,6 +154,7 @@ async def get_memory() -> MemoryResponse:
@router.post(
"/memory/reload",
response_model=MemoryResponse,
response_model_exclude_none=True,
summary="Reload Memory Data",
description="Reload memory data from the storage file, refreshing the in-memory cache.",
)
@ -171,6 +174,7 @@ async def reload_memory() -> MemoryResponse:
@router.delete(
"/memory",
response_model=MemoryResponse,
response_model_exclude_none=True,
summary="Clear All Memory Data",
description="Delete all saved memory data and reset the memory structure to an empty state.",
)
@ -187,6 +191,7 @@ async def clear_memory() -> MemoryResponse:
@router.post(
"/memory/facts",
response_model=MemoryResponse,
response_model_exclude_none=True,
summary="Create Memory Fact",
description="Create a single saved memory fact manually.",
)
@ -209,6 +214,7 @@ async def create_memory_fact_endpoint(request: FactCreateRequest) -> MemoryRespo
@router.delete(
"/memory/facts/{fact_id}",
response_model=MemoryResponse,
response_model_exclude_none=True,
summary="Delete Memory Fact",
description="Delete a single saved memory fact by its fact id.",
)
@ -227,6 +233,7 @@ async def delete_memory_fact_endpoint(fact_id: str) -> MemoryResponse:
@router.patch(
"/memory/facts/{fact_id}",
response_model=MemoryResponse,
response_model_exclude_none=True,
summary="Patch Memory Fact",
description="Partially update a single saved memory fact by its fact id while preserving omitted fields.",
)
@ -252,6 +259,7 @@ async def update_memory_fact_endpoint(fact_id: str, request: FactPatchRequest) -
@router.get(
"/memory/export",
response_model=MemoryResponse,
response_model_exclude_none=True,
summary="Export Memory Data",
description="Export the current global memory data as JSON for backup or transfer.",
)
@ -264,6 +272,7 @@ async def export_memory() -> MemoryResponse:
@router.post(
"/memory/import",
response_model=MemoryResponse,
response_model_exclude_none=True,
summary="Import Memory Data",
description="Import and overwrite the current global memory data from a JSON payload.",
)
@ -317,6 +326,7 @@ async def get_memory_config_endpoint() -> MemoryConfigResponse:
@router.get(
"/memory/status",
response_model=MemoryStatusResponse,
response_model_exclude_none=True,
summary="Get Memory Status",
description="Retrieve both memory configuration and current data in a single request.",
)

View File

@ -2,6 +2,7 @@ import json
import logging
from fastapi import APIRouter
from langchain_core.messages import HumanMessage, SystemMessage
from pydantic import BaseModel, Field
from deerflow.models import create_chat_model
@ -106,22 +107,21 @@ async def generate_suggestions(thread_id: str, request: SuggestionsRequest) -> S
if not conversation:
return SuggestionsResponse(suggestions=[])
prompt = (
system_instruction = (
"You are generating follow-up questions to help the user continue the conversation.\n"
f"Based on the conversation below, produce EXACTLY {n} short questions the user might ask next.\n"
"Requirements:\n"
"- Questions must be relevant to the conversation.\n"
"- Questions must be relevant to the preceding conversation.\n"
"- Questions must be written in the same language as the user.\n"
"- Keep each question concise (ideally <= 20 words / <= 40 Chinese characters).\n"
"- Do NOT include numbering, markdown, or any extra text.\n"
"- Output MUST be a JSON array of strings only.\n\n"
"Conversation:\n"
f"{conversation}\n"
"- Output MUST be a JSON array of strings only.\n"
)
user_content = f"Conversation Context:\n{conversation}\n\nGenerate {n} follow-up questions"
try:
model = create_chat_model(name=request.model_name, thinking_enabled=False)
response = model.invoke(prompt)
response = await model.ainvoke([SystemMessage(content=system_instruction), HumanMessage(content=user_content)])
raw = _extract_response_text(response.content)
suggestions = _parse_json_string_list(raw) or []
cleaned = [s.replace("\n", " ").strip() for s in suggestions if s.strip()]

View File

@ -38,6 +38,7 @@ class RunCreateRequest(BaseModel):
command: dict[str, Any] | None = Field(default=None, description="LangGraph Command")
metadata: dict[str, Any] | None = Field(default=None, description="Run metadata")
config: dict[str, Any] | None = Field(default=None, description="RunnableConfig overrides")
context: dict[str, Any] | None = Field(default=None, description="DeerFlow context overrides (model_name, thinking_enabled, etc.)")
webhook: str | None = Field(default=None, description="Completion callback URL")
checkpoint_id: str | None = Field(default=None, description="Resume from checkpoint")
checkpoint: dict[str, Any] | None = Field(default=None, description="Full checkpoint object")

View File

@ -413,16 +413,19 @@ async def get_thread(thread_id: str, request: Request) -> ThreadResponse:
"metadata": {k: v for k, v in ckpt_meta.items() if k not in ("created_at", "updated_at", "step", "source", "writes", "parents")},
}
status = _derive_thread_status(checkpoint_tuple) if checkpoint_tuple is not None else record.get("status", "idle") # type: ignore[union-attr]
if record is None:
raise HTTPException(status_code=404, detail=f"Thread {thread_id} not found")
status = _derive_thread_status(checkpoint_tuple) if checkpoint_tuple is not None else record.get("status", "idle")
checkpoint = getattr(checkpoint_tuple, "checkpoint", {}) or {} if checkpoint_tuple is not None else {}
channel_values = checkpoint.get("channel_values", {})
return ThreadResponse(
thread_id=thread_id,
status=status,
created_at=str(record.get("created_at", "")), # type: ignore[union-attr]
updated_at=str(record.get("updated_at", "")), # type: ignore[union-attr]
metadata=record.get("metadata", {}), # type: ignore[union-attr]
created_at=str(record.get("created_at", "")),
updated_at=str(record.get("updated_at", "")),
metadata=record.get("metadata", {}),
values=serialize_channel_values(channel_values),
)

View File

@ -129,26 +129,38 @@ def build_run_config(
the LangGraph Platform-compatible HTTP API and the IM channel path behave
identically.
"""
configurable: dict[str, Any] = {"thread_id": thread_id}
config: dict[str, Any] = {"recursion_limit": 100}
if request_config:
configurable.update(request_config.get("configurable", {}))
# LangGraph >= 0.6.0 introduced ``context`` as the preferred way to
# pass thread-level data and rejects requests that include both
# ``configurable`` and ``context``. If the caller already sends
# ``context``, honour it and skip our own ``configurable`` dict.
if "context" in request_config:
if "configurable" in request_config:
logger.warning(
"build_run_config: client sent both 'context' and 'configurable'; preferring 'context' (LangGraph >= 0.6.0). thread_id=%s, caller_configurable keys=%s",
thread_id,
list(request_config.get("configurable", {}).keys()),
)
config["context"] = request_config["context"]
else:
configurable = {"thread_id": thread_id}
configurable.update(request_config.get("configurable", {}))
config["configurable"] = configurable
for k, v in request_config.items():
if k not in ("configurable", "context"):
config[k] = v
else:
config["configurable"] = {"thread_id": thread_id}
# Inject custom agent name when the caller specified a non-default assistant.
# Honour an explicit configurable["agent_name"] in the request if already set.
if assistant_id and assistant_id != _DEFAULT_ASSISTANT_ID and "agent_name" not in configurable:
# Normalize the same way ChannelManager does: strip, lowercase,
# replace underscores with hyphens, then validate to prevent path
# traversal and invalid agent directory lookups.
normalized = assistant_id.strip().lower().replace("_", "-")
if not normalized or not re.fullmatch(r"[a-z0-9-]+", normalized):
raise ValueError(f"Invalid assistant_id {assistant_id!r}: must contain only letters, digits, and hyphens after normalization.")
configurable["agent_name"] = normalized
config: dict[str, Any] = {"configurable": configurable, "recursion_limit": 100}
if request_config:
for k, v in request_config.items():
if k != "configurable":
config[k] = v
if assistant_id and assistant_id != _DEFAULT_ASSISTANT_ID and "configurable" in config:
if "agent_name" not in config["configurable"]:
normalized = assistant_id.strip().lower().replace("_", "-")
if not normalized or not re.fullmatch(r"[a-z0-9-]+", normalized):
raise ValueError(f"Invalid assistant_id {assistant_id!r}: must contain only letters, digits, and hyphens after normalization.")
config["configurable"]["agent_name"] = normalized
if metadata:
config.setdefault("metadata", {}).update(metadata)
return config
@ -304,6 +316,27 @@ async def start_run(
agent_factory = resolve_agent_factory(body.assistant_id)
graph_input = normalize_input(body.input)
config = build_run_config(thread_id, body.config, body.metadata, assistant_id=body.assistant_id)
# Merge DeerFlow-specific context overrides into configurable.
# The ``context`` field is a custom extension for the langgraph-compat layer
# that carries agent configuration (model_name, thinking_enabled, etc.).
# Only agent-relevant keys are forwarded; unknown keys (e.g. thread_id) are ignored.
context = getattr(body, "context", None)
if context:
_CONTEXT_CONFIGURABLE_KEYS = {
"model_name",
"mode",
"thinking_enabled",
"reasoning_effort",
"is_plan_mode",
"subagent_enabled",
"max_concurrent_subagents",
}
configurable = config.setdefault("configurable", {})
for key in _CONTEXT_CONFIGURABLE_KEYS:
if key in context:
configurable.setdefault(key, context[key])
stream_modes = normalize_stream_modes(body.stream_mode)
task = asyncio.create_task(

View File

@ -278,6 +278,12 @@ skills:
- Skills are automatically discovered and loaded
- Available in both local and Docker sandbox via path mapping
**Per-Agent Skill Filtering**:
Custom agents can restrict which skills they load by defining a `skills` field in their `config.yaml` (located at `workspace/agents/<agent_name>/config.yaml`):
- **Omitted or `null`**: Loads all globally enabled skills (default fallback).
- **`[]` (empty list)**: Disables all skills for this specific agent.
- **`["skill-name"]`**: Loads only the explicitly specified skills.
### Title Generation
Automatic conversation title generation:

View File

@ -0,0 +1,446 @@
# [RFC] 在 DeerFlow 中增加 `grep``glob` 文件搜索工具
## Summary
我认为这个方向是对的,而且值得做。
如果 DeerFlow 想更接近 Claude Code 这类 coding agent 的实际工作流,仅有 `ls` / `read_file` / `write_file` / `str_replace` 还不够。模型在进入修改前,通常还需要两类能力:
- `glob`: 快速按路径模式找文件
- `grep`: 快速按内容模式找候选位置
这两类工具的价值,不是“功能上 bash 也能做”,而是它们能以更低 token 成本、更强约束、更稳定的输出格式,替代模型频繁走 `bash find` / `bash grep` / `rg` 的习惯。
但前提是实现方式要对:**它们应该是只读、结构化、受限、可审计的原生工具,而不是对 shell 命令的简单包装。**
## Problem
当前 DeerFlow 的文件工具层主要覆盖:
- `ls`: 浏览目录结构
- `read_file`: 读取文件内容
- `write_file`: 写文件
- `str_replace`: 做局部字符串替换
- `bash`: 兜底执行命令
这套能力能完成任务,但在代码库探索阶段效率不高。
典型问题:
1. 模型想找 “所有 `*.tsx` 的 page 文件” 时,只能反复 `ls` 多层目录,或者退回 `bash find`
2. 模型想找 “某个 symbol / 文案 / 配置键在哪里出现” 时,只能逐文件 `read_file`,或者退回 `bash grep` / `rg`
3. 一旦退回 `bash`,工具调用就失去结构化输出,结果也更难做裁剪、分页、审计和跨 sandbox 一致化
4. 对没有开启 host bash 的本地模式,`bash` 甚至可能不可用,此时缺少足够强的只读检索能力
结论DeerFlow 现在缺的不是“再多一个 shell 命令”,而是**文件系统检索层**。
## Goals
- 为 agent 提供稳定的路径搜索和内容搜索能力
- 减少对 `bash` 的依赖,特别是在仓库探索阶段
- 保持与现有 sandbox 安全模型一致
- 输出格式结构化,便于模型后续串联 `read_file` / `str_replace`
- 让本地 sandbox、容器 sandbox、未来 MCP 文件系统工具都能遵守同一语义
## Non-Goals
- 不做通用 shell 兼容层
- 不暴露完整 grep/find/rg CLI 语法
- 不在第一版支持二进制检索、复杂 PCRE 特性、上下文窗口高亮渲染等重功能
- 不把它做成“任意磁盘搜索”,仍然只允许在 DeerFlow 已授权的路径内执行
## Why This Is Worth Doing
参考 Claude Code 这一类 agent 的设计思路,`glob``grep` 的核心价值不是新能力本身,而是把“探索代码库”的常见动作从开放式 shell 降到受控工具层。
这样有几个直接收益:
1. **更低的模型负担**
模型不需要自己拼 `find`, `grep`, `rg`, `xargs`, quoting 等命令细节。
2. **更稳定的跨环境行为**
本地、Docker、AIO sandbox 不必依赖容器里是否装了 `rg`,也不会因为 shell 差异导致行为漂移。
3. **更强的安全与审计**
调用参数就是“搜索什么、在哪搜、最多返回多少”,天然比任意命令更容易审计和限流。
4. **更好的 token 效率**
`grep` 返回的是命中摘要而不是整段文件,模型只对少数候选路径再调用 `read_file`
5. **对 `tool_search` 友好**
当 DeerFlow 持续扩展工具集时,`grep` / `glob` 会成为非常高频的基础工具,值得保留为 built-in而不是让模型总是退回通用 bash。
## Proposal
增加两个 built-in sandbox tools
- `glob`
- `grep`
推荐继续放在:
- `backend/packages/harness/deerflow/sandbox/tools.py`
并在 `config.example.yaml` 中默认加入 `file:read` 组。
### 1. `glob` 工具
用途:按路径模式查找文件或目录。
建议 schema
```python
@tool("glob", parse_docstring=True)
def glob_tool(
runtime: ToolRuntime[ContextT, ThreadState],
description: str,
pattern: str,
path: str,
include_dirs: bool = False,
max_results: int = 200,
) -> str:
...
```
参数语义:
- `description`: 与现有工具保持一致
- `pattern`: glob 模式,例如 `**/*.py``src/**/test_*.ts`
- `path`: 搜索根目录,必须是绝对路径
- `include_dirs`: 是否返回目录
- `max_results`: 最大返回条数,防止一次性打爆上下文
建议返回格式:
```text
Found 3 paths under /mnt/user-data/workspace
1. /mnt/user-data/workspace/backend/app.py
2. /mnt/user-data/workspace/backend/tests/test_app.py
3. /mnt/user-data/workspace/scripts/build.py
```
如果后续想更适合前端消费,也可以改成 JSON 字符串;但第一版为了兼容现有工具风格,返回可读文本即可。
### 2. `grep` 工具
用途:按内容模式搜索文件,返回命中位置摘要。
建议 schema
```python
@tool("grep", parse_docstring=True)
def grep_tool(
runtime: ToolRuntime[ContextT, ThreadState],
description: str,
pattern: str,
path: str,
glob: str | None = None,
literal: bool = False,
case_sensitive: bool = False,
max_results: int = 100,
) -> str:
...
```
参数语义:
- `pattern`: 搜索词或正则
- `path`: 搜索根目录,必须是绝对路径
- `glob`: 可选路径过滤,例如 `**/*.py`
- `literal`: 为 `True` 时按普通字符串匹配,不解释为正则
- `case_sensitive`: 是否大小写敏感
- `max_results`: 最大返回命中数,不是文件数
建议返回格式:
```text
Found 4 matches under /mnt/user-data/workspace
/mnt/user-data/workspace/backend/config.py:12: TOOL_GROUPS = [...]
/mnt/user-data/workspace/backend/config.py:48: def load_tool_config(...):
/mnt/user-data/workspace/backend/tools.py:91: "tool_groups"
/mnt/user-data/workspace/backend/tests/test_config.py:22: assert "tool_groups" in data
```
第一版建议只返回:
- 文件路径
- 行号
- 命中行摘要
不返回上下文块,避免结果过大。模型如果需要上下文,再调用 `read_file(path, start_line, end_line)`
## Design Principles
### A. 不做 shell wrapper
不建议把 `grep` 实现为:
```python
subprocess.run("grep ...")
```
也不建议在容器里直接拼 `find` / `rg` 命令。
原因:
- 会引入 shell quoting 和注入面
- 会依赖不同 sandbox 内镜像是否安装同一套命令
- Windows / macOS / Linux 行为不一致
- 很难稳定控制输出条数与格式
正确方向是:
- `glob` 使用 Python 标准库路径遍历
- `grep` 使用 Python 逐文件扫描
- 输出由 DeerFlow 自己格式化
如果未来为了性能考虑要优先调用 `rg`,也应该封装在 provider 内部,并保证外部语义不变,而不是把 CLI 暴露给模型。
### B. 继续沿用 DeerFlow 的路径权限模型
这两个工具必须复用当前 `ls` / `read_file` 的路径校验逻辑:
- 本地模式走 `validate_local_tool_path(..., read_only=True)`
- 支持 `/mnt/skills/...`
- 支持 `/mnt/acp-workspace/...`
- 支持 thread workspace / uploads / outputs 的虚拟路径解析
- 明确拒绝越权路径与 path traversal
也就是说,它们属于 **file:read**,不是 `bash` 的替代越权入口。
### C. 结果必须硬限制
没有硬限制的 `glob` / `grep` 很容易炸上下文。
建议第一版至少限制:
- `glob.max_results` 默认 200最大 1000
- `grep.max_results` 默认 100最大 500
- 单行摘要最大长度,例如 200 字符
- 二进制文件跳过
- 超大文件跳过,例如单文件大于 1 MB 或按配置控制
此外,命中数超过阈值时应返回:
- 已展示的条数
- 被截断的事实
- 建议用户缩小搜索范围
例如:
```text
Found more than 100 matches, showing first 100. Narrow the path or add a glob filter.
```
### D. 工具语义要彼此互补
推荐模型工作流应该是:
1. `glob` 找候选文件
2. `grep` 找候选位置
3. `read_file` 读局部上下文
4. `str_replace` / `write_file` 执行修改
这样工具边界清晰,也更利于 prompt 中教模型形成稳定习惯。
## Implementation Approach
## Option A: 直接在 `sandbox/tools.py` 实现第一版
这是我推荐的起步方案。
做法:
- 在 `sandbox/tools.py` 新增 `glob_tool``grep_tool`
- 在 local sandbox 场景直接使用 Python 文件系统 API
- 在非 local sandbox 场景,优先也通过 DeerFlow 自己控制的路径访问层实现
优点:
- 改动小
- 能尽快验证 agent 效果
- 不需要先改 `Sandbox` 抽象
缺点:
- `tools.py` 会继续变胖
- 如果未来想在 provider 侧做性能优化,需要再抽象一次
## Option B: 先扩展 `Sandbox` 抽象
例如新增:
```python
class Sandbox(ABC):
def glob(self, path: str, pattern: str, include_dirs: bool = False, max_results: int = 200) -> list[str]:
...
def grep(
self,
path: str,
pattern: str,
*,
glob: str | None = None,
literal: bool = False,
case_sensitive: bool = False,
max_results: int = 100,
) -> list[GrepMatch]:
...
```
优点:
- 抽象更干净
- 容器 / 远程 sandbox 可以各自优化
缺点:
- 首次引入成本更高
- 需要同步改所有 sandbox provider
结论:
**第一版建议走 Option A等工具价值验证后再下沉到 `Sandbox` 抽象层。**
## Detailed Behavior
### `glob` 行为
- 输入根目录不存在:返回清晰错误
- 根路径不是目录:返回清晰错误
- 模式非法:返回清晰错误
- 结果为空:返回 `No files matched`
- 默认忽略项应尽量与当前 `list_dir` 对齐,例如:
- `.git`
- `node_modules`
- `__pycache__`
- `.venv`
- 构建产物目录
这里建议抽一个共享 ignore 集,避免 `ls``glob` 结果风格不一致。
### `grep` 行为
- 默认只扫描文本文件
- 检测到二进制文件直接跳过
- 对超大文件直接跳过或只扫前 N KB
- regex 编译失败时返回参数错误
- 输出中的路径继续使用虚拟路径,而不是暴露宿主真实路径
- 建议默认按文件路径、行号排序,保持稳定输出
## Prompting Guidance
如果引入这两个工具,建议同步更新系统提示中的文件操作建议:
- 查找文件名模式时优先用 `glob`
- 查找代码符号、配置项、文案时优先用 `grep`
- 只有在工具不足以完成目标时才退回 `bash`
否则模型仍会习惯性先调用 `bash`
## Risks
### 1. 与 `bash` 能力重叠
这是事实,但不是问题。
`ls``read_file` 也都能被 `bash` 替代,但我们仍然保留它们,因为结构化工具更适合 agent。
### 2. 性能问题
在大仓库上,纯 Python `grep` 可能比 `rg` 慢。
缓解方式:
- 第一版先加结果上限和文件大小上限
- 路径上强制要求 root path
- 提供 `glob` 过滤缩小扫描范围
- 后续如有必要,在 provider 内部做 `rg` 优化,但保持同一 schema
### 3. 忽略规则不一致
如果 `ls` 能看到的路径,`glob` 却看不到,模型会困惑。
缓解方式:
- 统一 ignore 规则
- 在文档里明确“默认跳过常见依赖和构建目录”
### 4. 正则搜索过于复杂
如果第一版就支持大量 grep 方言,边界会很乱。
缓解方式:
- 第一版只支持 Python `re`
- 并提供 `literal=True` 的简单模式
## Alternatives Considered
### A. 不增加工具,完全依赖 `bash`
不推荐。
这会让 DeerFlow 在代码探索体验上持续落后,也削弱无 bash 或受限 bash 场景下的能力。
### B. 只加 `glob`,不加 `grep`
不推荐。
只解决“找文件”,没有解决“找位置”。模型最终还是会退回 `bash grep`
### C. 只加 `grep`,不加 `glob`
也不推荐。
`grep` 缺少路径模式过滤时,扫描范围经常太大;`glob` 是它的天然前置工具。
### D. 直接接入 MCP filesystem server 的搜索能力
短期不推荐作为主路径。
MCP 可以是补充,但 `glob` / `grep` 作为 DeerFlow 的基础 coding tool最好仍然是 built-in这样才能在默认安装中稳定可用。
## Acceptance Criteria
- `config.example.yaml` 中可默认启用 `glob``grep`
- 两个工具归属 `file:read`
- 本地 sandbox 下严格遵守现有路径权限
- 输出不泄露宿主机真实路径
- 大结果集会被截断并明确提示
- 模型可以通过 `glob -> grep -> read_file -> str_replace` 完成典型改码流
- 在禁用 host bash 的本地模式下,仓库探索能力明显提升
## Rollout Plan
1. 在 `sandbox/tools.py` 中实现 `glob_tool``grep_tool`
2. 抽取与 `list_dir` 一致的 ignore 规则,避免行为漂移
3. 在 `config.example.yaml` 默认加入工具配置
4. 为本地路径校验、虚拟路径映射、结果截断、二进制跳过补测试
5. 更新 README / backend docs / prompt guidance
6. 收集实际 agent 调用数据,再决定是否下沉到 `Sandbox` 抽象
## Suggested Config
```yaml
tools:
- name: glob
group: file:read
use: deerflow.sandbox.tools:glob_tool
- name: grep
group: file:read
use: deerflow.sandbox.tools:grep_tool
```
## Final Recommendation
结论是:**可以加,而且应该加。**
但我会明确卡三个边界:
1. `grep` / `glob` 必须是 built-in 的只读结构化工具
2. 第一版不要做 shell wrapper不要把 CLI 方言直接暴露给模型
3. 先在 `sandbox/tools.py` 验证价值,再考虑是否下沉到 `Sandbox` provider 抽象
如果按这个方向做,它会明显提升 DeerFlow 在 coding / repo exploration 场景下的可用性,而且风险可控。

View File

@ -345,6 +345,8 @@ def make_lead_agent(config: RunnableConfig):
model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, reasoning_effort=reasoning_effort),
tools=get_available_tools(model_name=model_name, groups=agent_config.tool_groups if agent_config else None, subagent_enabled=subagent_enabled),
middleware=_build_middlewares(config, model_name=model_name, agent_name=agent_name),
system_prompt=apply_prompt_template(subagent_enabled=subagent_enabled, max_concurrent_subagents=max_concurrent_subagents, agent_name=agent_name),
system_prompt=apply_prompt_template(
subagent_enabled=subagent_enabled, max_concurrent_subagents=max_concurrent_subagents, agent_name=agent_name, available_skills=set(agent_config.skills) if agent_config and agent_config.skills is not None else None
),
state_schema=ThreadState,
)

View File

@ -8,6 +8,14 @@ from deerflow.subagents import get_available_subagent_names
logger = logging.getLogger(__name__)
def _get_enabled_skills():
try:
return list(load_skills(enabled_only=True))
except Exception:
logger.exception("Failed to load enabled skills for prompt injection")
return []
def _build_subagent_section(max_concurrent: int) -> str:
"""Build the subagent system prompt section with dynamic concurrency limit.
@ -386,7 +394,7 @@ def get_skills_prompt_section(available_skills: set[str] | None = None) -> str:
Returns the <skill_system>...</skill_system> block listing all enabled skills,
suitable for injection into any agent's system prompt.
"""
skills = load_skills(enabled_only=True)
skills = _get_enabled_skills()
try:
from deerflow.config import get_app_config
@ -402,6 +410,10 @@ def get_skills_prompt_section(available_skills: set[str] | None = None) -> str:
if available_skills is not None:
skills = [skill for skill in skills if skill.name in available_skills]
# Check again after filtering
if not skills:
return ""
skill_items = "\n".join(
f" <skill>\n <name>{skill.name}</name>\n <description>{skill.description}</description>\n <location>{skill.get_container_file_path(container_base_path)}</location>\n </skill>" for skill in skills
)
@ -446,7 +458,7 @@ def get_deferred_tools_prompt_section() -> str:
if not get_app_config().tool_search.enabled:
return ""
except FileNotFoundError:
except Exception:
return ""
registry = get_deferred_registry()

View File

@ -29,6 +29,17 @@ Instructions:
2. Extract relevant facts, preferences, and context with specific details (numbers, names, technologies)
3. Update the memory sections as needed following the detailed length guidelines below
Before extracting facts, perform a structured reflection on the conversation:
1. Error/Retry Detection: Did the agent encounter errors, require retries, or produce incorrect results?
If yes, record the root cause and correct approach as a high-confidence fact with category "correction".
2. User Correction Detection: Did the user correct the agent's direction, understanding, or output?
If yes, record the correct interpretation or approach as a high-confidence fact with category "correction".
Include what went wrong in "sourceError" only when category is "correction" and the mistake is explicit in the conversation.
3. Project Constraint Discovery: Were any project-specific constraints discovered during the conversation?
If yes, record them as facts with the most appropriate category and confidence.
{correction_hint}
Memory Section Guidelines:
**User Context** (Current state - concise summaries):
@ -62,6 +73,7 @@ Memory Section Guidelines:
* context: Background facts (job title, projects, locations, languages)
* behavior: Working patterns, communication habits, problem-solving approaches
* goal: Stated objectives, learning targets, project ambitions
* correction: Explicit agent mistakes or user corrections, including the correct approach
- Confidence levels:
* 0.9-1.0: Explicitly stated facts ("I work on X", "My role is Y")
* 0.7-0.8: Strongly implied from actions/discussions
@ -94,7 +106,7 @@ Output Format (JSON):
"longTermBackground": {{ "summary": "...", "shouldUpdate": true/false }}
}},
"newFacts": [
{{ "content": "...", "category": "preference|knowledge|context|behavior|goal", "confidence": 0.0-1.0 }}
{{ "content": "...", "category": "preference|knowledge|context|behavior|goal|correction", "confidence": 0.0-1.0 }}
],
"factsToRemove": ["fact_id_1", "fact_id_2"]
}}
@ -104,6 +116,8 @@ Important Rules:
- Follow length guidelines: workContext/personalContext are concise (1-3 sentences), topOfMind and history sections are detailed (paragraphs)
- Include specific metrics, version numbers, and proper nouns in facts
- Only add facts that are clearly stated (0.9+) or strongly implied (0.7+)
- Use category "correction" for explicit agent mistakes or user corrections; assign confidence >= 0.95 when the correction is explicit
- Include "sourceError" only for explicit correction facts when the prior mistake or wrong approach is clearly stated; omit it otherwise
- Remove facts that are contradicted by new information
- When updating topOfMind, integrate new focus areas while removing completed/abandoned ones
Keep 3-5 concurrent focus themes that are still active and relevant
@ -126,7 +140,7 @@ Message:
Extract facts in this JSON format:
{{
"facts": [
{{ "content": "...", "category": "preference|knowledge|context|behavior|goal", "confidence": 0.0-1.0 }}
{{ "content": "...", "category": "preference|knowledge|context|behavior|goal|correction", "confidence": 0.0-1.0 }}
]
}}
@ -136,6 +150,7 @@ Categories:
- context: Background context (location, job, projects)
- behavior: Behavioral patterns
- goal: User's goals or objectives
- correction: Explicit corrections or mistakes to avoid repeating
Rules:
- Only extract clear, specific facts
@ -231,6 +246,10 @@ def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2
if earlier.get("summary"):
history_sections.append(f"Earlier: {earlier['summary']}")
background = history_data.get("longTermBackground", {})
if background.get("summary"):
history_sections.append(f"Background: {background['summary']}")
if history_sections:
sections.append("History:\n" + "\n".join(f"- {s}" for s in history_sections))
@ -262,7 +281,11 @@ def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2
continue
category = str(fact.get("category", "context")).strip() or "context"
confidence = _coerce_confidence(fact.get("confidence"), default=0.0)
line = f"- [{category} | {confidence:.2f}] {content}"
source_error = fact.get("sourceError")
if category == "correction" and isinstance(source_error, str) and source_error.strip():
line = f"- [{category} | {confidence:.2f}] {content} (avoid: {source_error.strip()})"
else:
line = f"- [{category} | {confidence:.2f}] {content}"
# Each additional line is preceded by a newline (except the first).
line_text = ("\n" + line) if fact_lines else line

View File

@ -20,6 +20,7 @@ class ConversationContext:
messages: list[Any]
timestamp: datetime = field(default_factory=datetime.utcnow)
agent_name: str | None = None
correction_detected: bool = False
class MemoryUpdateQueue:
@ -37,25 +38,38 @@ class MemoryUpdateQueue:
self._timer: threading.Timer | None = None
self._processing = False
def add(self, thread_id: str, messages: list[Any], agent_name: str | None = None) -> None:
def add(
self,
thread_id: str,
messages: list[Any],
agent_name: str | None = None,
correction_detected: bool = False,
) -> None:
"""Add a conversation to the update queue.
Args:
thread_id: The thread ID.
messages: The conversation messages.
agent_name: If provided, memory is stored per-agent. If None, uses global memory.
correction_detected: Whether recent turns include an explicit correction signal.
"""
config = get_memory_config()
if not config.enabled:
return
context = ConversationContext(
thread_id=thread_id,
messages=messages,
agent_name=agent_name,
)
with self._lock:
existing_context = next(
(context for context in self._queue if context.thread_id == thread_id),
None,
)
merged_correction_detected = correction_detected or (existing_context.correction_detected if existing_context is not None else False)
context = ConversationContext(
thread_id=thread_id,
messages=messages,
agent_name=agent_name,
correction_detected=merged_correction_detected,
)
# Check if this thread already has a pending update
# If so, replace it with the newer one
self._queue = [c for c in self._queue if c.thread_id != thread_id]
@ -115,6 +129,7 @@ class MemoryUpdateQueue:
messages=context.messages,
thread_id=context.thread_id,
agent_name=context.agent_name,
correction_detected=context.correction_detected,
)
if success:
logger.info("Memory updated successfully for thread %s", context.thread_id)

View File

@ -266,13 +266,20 @@ class MemoryUpdater:
model_name = self._model_name or config.model_name
return create_chat_model(name=model_name, thinking_enabled=False)
def update_memory(self, messages: list[Any], thread_id: str | None = None, agent_name: str | None = None) -> bool:
def update_memory(
self,
messages: list[Any],
thread_id: str | None = None,
agent_name: str | None = None,
correction_detected: bool = False,
) -> bool:
"""Update memory based on conversation messages.
Args:
messages: List of conversation messages.
thread_id: Optional thread ID for tracking source.
agent_name: If provided, updates per-agent memory. If None, updates global memory.
correction_detected: Whether recent turns include an explicit correction signal.
Returns:
True if update was successful, False otherwise.
@ -295,9 +302,19 @@ class MemoryUpdater:
return False
# Build prompt
correction_hint = ""
if correction_detected:
correction_hint = (
"IMPORTANT: Explicit correction signals were detected in this conversation. "
"Pay special attention to what the agent got wrong, what the user corrected, "
"and record the correct approach as a fact with category "
'"correction" and confidence >= 0.95 when appropriate.'
)
prompt = MEMORY_UPDATE_PROMPT.format(
current_memory=json.dumps(current_memory, indent=2),
conversation=conversation_text,
correction_hint=correction_hint,
)
# Call LLM
@ -383,6 +400,8 @@ class MemoryUpdater:
confidence = fact.get("confidence", 0.5)
if confidence >= config.fact_confidence_threshold:
raw_content = fact.get("content", "")
if not isinstance(raw_content, str):
continue
normalized_content = raw_content.strip()
fact_key = _fact_content_key(normalized_content)
if fact_key is not None and fact_key in existing_fact_keys:
@ -396,6 +415,11 @@ class MemoryUpdater:
"createdAt": now,
"source": thread_id or "unknown",
}
source_error = fact.get("sourceError")
if isinstance(source_error, str):
normalized_source_error = source_error.strip()
if normalized_source_error:
fact_entry["sourceError"] = normalized_source_error
current_memory["facts"].append(fact_entry)
if fact_key is not None:
existing_fact_keys.add(fact_key)
@ -412,16 +436,22 @@ class MemoryUpdater:
return current_memory
def update_memory_from_conversation(messages: list[Any], thread_id: str | None = None, agent_name: str | None = None) -> bool:
def update_memory_from_conversation(
messages: list[Any],
thread_id: str | None = None,
agent_name: str | None = None,
correction_detected: bool = False,
) -> bool:
"""Convenience function to update memory from a conversation.
Args:
messages: List of conversation messages.
thread_id: Optional thread ID.
agent_name: If provided, updates per-agent memory. If None, updates global memory.
correction_detected: Whether recent turns include an explicit correction signal.
Returns:
True if successful, False otherwise.
"""
updater = MemoryUpdater()
return updater.update_memory(messages, thread_id, agent_name)
return updater.update_memory(messages, thread_id, agent_name, correction_detected)

View File

@ -0,0 +1,275 @@
"""LLM error handling middleware with retry/backoff and user-facing fallbacks."""
from __future__ import annotations
import asyncio
import logging
import time
from collections.abc import Awaitable, Callable
from email.utils import parsedate_to_datetime
from typing import Any, override
from langchain.agents import AgentState
from langchain.agents.middleware import AgentMiddleware
from langchain.agents.middleware.types import (
ModelCallResult,
ModelRequest,
ModelResponse,
)
from langchain_core.messages import AIMessage
from langgraph.errors import GraphBubbleUp
logger = logging.getLogger(__name__)
_RETRIABLE_STATUS_CODES = {408, 409, 425, 429, 500, 502, 503, 504}
_BUSY_PATTERNS = (
"server busy",
"temporarily unavailable",
"try again later",
"please retry",
"please try again",
"overloaded",
"high demand",
"rate limit",
"负载较高",
"服务繁忙",
"稍后重试",
"请稍后重试",
)
_QUOTA_PATTERNS = (
"insufficient_quota",
"quota",
"billing",
"credit",
"payment",
"余额不足",
"超出限额",
"额度不足",
"欠费",
)
_AUTH_PATTERNS = (
"authentication",
"unauthorized",
"invalid api key",
"invalid_api_key",
"permission",
"forbidden",
"access denied",
"无权",
"未授权",
)
class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
"""Retry transient LLM errors and surface graceful assistant messages."""
retry_max_attempts: int = 3
retry_base_delay_ms: int = 1000
retry_cap_delay_ms: int = 8000
def _classify_error(self, exc: BaseException) -> tuple[bool, str]:
detail = _extract_error_detail(exc)
lowered = detail.lower()
error_code = _extract_error_code(exc)
status_code = _extract_status_code(exc)
if _matches_any(lowered, _QUOTA_PATTERNS) or _matches_any(str(error_code).lower(), _QUOTA_PATTERNS):
return False, "quota"
if _matches_any(lowered, _AUTH_PATTERNS):
return False, "auth"
exc_name = exc.__class__.__name__
if exc_name in {
"APITimeoutError",
"APIConnectionError",
"InternalServerError",
}:
return True, "transient"
if status_code in _RETRIABLE_STATUS_CODES:
return True, "transient"
if _matches_any(lowered, _BUSY_PATTERNS):
return True, "busy"
return False, "generic"
def _build_retry_delay_ms(self, attempt: int, exc: BaseException) -> int:
retry_after = _extract_retry_after_ms(exc)
if retry_after is not None:
return retry_after
backoff = self.retry_base_delay_ms * (2 ** max(0, attempt - 1))
return min(backoff, self.retry_cap_delay_ms)
def _build_retry_message(self, attempt: int, wait_ms: int, reason: str) -> str:
seconds = max(1, round(wait_ms / 1000))
reason_text = "provider is busy" if reason == "busy" else "provider request failed temporarily"
return f"LLM request retry {attempt}/{self.retry_max_attempts}: {reason_text}. Retrying in {seconds}s."
def _build_user_message(self, exc: BaseException, reason: str) -> str:
detail = _extract_error_detail(exc)
if reason == "quota":
return "The configured LLM provider rejected the request because the account is out of quota, billing is unavailable, or usage is restricted. Please fix the provider account and try again."
if reason == "auth":
return "The configured LLM provider rejected the request because authentication or access is invalid. Please check the provider credentials and try again."
if reason in {"busy", "transient"}:
return "The configured LLM provider is temporarily unavailable after multiple retries. Please wait a moment and continue the conversation."
return f"LLM request failed: {detail}"
def _emit_retry_event(self, attempt: int, wait_ms: int, reason: str) -> None:
try:
from langgraph.config import get_stream_writer
writer = get_stream_writer()
writer(
{
"type": "llm_retry",
"attempt": attempt,
"max_attempts": self.retry_max_attempts,
"wait_ms": wait_ms,
"reason": reason,
"message": self._build_retry_message(attempt, wait_ms, reason),
}
)
except Exception:
logger.debug("Failed to emit llm_retry event", exc_info=True)
@override
def wrap_model_call(
self,
request: ModelRequest,
handler: Callable[[ModelRequest], ModelResponse],
) -> ModelCallResult:
attempt = 1
while True:
try:
return handler(request)
except GraphBubbleUp:
# Preserve LangGraph control-flow signals (interrupt/pause/resume).
raise
except Exception as exc:
retriable, reason = self._classify_error(exc)
if retriable and attempt < self.retry_max_attempts:
wait_ms = self._build_retry_delay_ms(attempt, exc)
logger.warning(
"Transient LLM error on attempt %d/%d; retrying in %dms: %s",
attempt,
self.retry_max_attempts,
wait_ms,
_extract_error_detail(exc),
)
self._emit_retry_event(attempt, wait_ms, reason)
time.sleep(wait_ms / 1000)
attempt += 1
continue
logger.warning(
"LLM call failed after %d attempt(s): %s",
attempt,
_extract_error_detail(exc),
exc_info=exc,
)
return AIMessage(content=self._build_user_message(exc, reason))
@override
async def awrap_model_call(
self,
request: ModelRequest,
handler: Callable[[ModelRequest], Awaitable[ModelResponse]],
) -> ModelCallResult:
attempt = 1
while True:
try:
return await handler(request)
except GraphBubbleUp:
# Preserve LangGraph control-flow signals (interrupt/pause/resume).
raise
except Exception as exc:
retriable, reason = self._classify_error(exc)
if retriable and attempt < self.retry_max_attempts:
wait_ms = self._build_retry_delay_ms(attempt, exc)
logger.warning(
"Transient LLM error on attempt %d/%d; retrying in %dms: %s",
attempt,
self.retry_max_attempts,
wait_ms,
_extract_error_detail(exc),
)
self._emit_retry_event(attempt, wait_ms, reason)
await asyncio.sleep(wait_ms / 1000)
attempt += 1
continue
logger.warning(
"LLM call failed after %d attempt(s): %s",
attempt,
_extract_error_detail(exc),
exc_info=exc,
)
return AIMessage(content=self._build_user_message(exc, reason))
def _matches_any(detail: str, patterns: tuple[str, ...]) -> bool:
return any(pattern in detail for pattern in patterns)
def _extract_error_code(exc: BaseException) -> Any:
for attr in ("code", "error_code"):
value = getattr(exc, attr, None)
if value not in (None, ""):
return value
body = getattr(exc, "body", None)
if isinstance(body, dict):
error = body.get("error")
if isinstance(error, dict):
for key in ("code", "type"):
value = error.get(key)
if value not in (None, ""):
return value
return None
def _extract_status_code(exc: BaseException) -> int | None:
for attr in ("status_code", "status"):
value = getattr(exc, attr, None)
if isinstance(value, int):
return value
response = getattr(exc, "response", None)
status = getattr(response, "status_code", None)
return status if isinstance(status, int) else None
def _extract_retry_after_ms(exc: BaseException) -> int | None:
response = getattr(exc, "response", None)
headers = getattr(response, "headers", None)
if headers is None:
return None
raw = None
header_name = ""
for key in ("retry-after-ms", "Retry-After-Ms", "retry-after", "Retry-After"):
header_name = key
if hasattr(headers, "get"):
raw = headers.get(key)
if raw:
break
if not raw:
return None
try:
multiplier = 1 if "ms" in header_name.lower() else 1000
return max(0, int(float(raw) * multiplier))
except (TypeError, ValueError):
try:
target = parsedate_to_datetime(str(raw))
delta = target.timestamp() - time.time()
return max(0, int(delta * 1000))
except (TypeError, ValueError, OverflowError):
return None
def _extract_error_detail(exc: BaseException) -> str:
detail = str(exc).strip()
if detail:
return detail
message = getattr(exc, "message", None)
if isinstance(message, str) and message.strip():
return message.strip()
return exc.__class__.__name__

View File

@ -182,6 +182,23 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
return None, False
@staticmethod
def _append_text(content: str | list | None, text: str) -> str | list:
"""Append *text* to AIMessage content, handling str, list, and None.
When content is a list of content blocks (e.g. Anthropic thinking mode),
we append a new ``{"type": "text", ...}`` block instead of concatenating
a string to a list, which would raise ``TypeError``.
"""
if content is None:
return text
if isinstance(content, list):
return [*content, {"type": "text", "text": f"\n\n{text}"}]
if isinstance(content, str):
return content + f"\n\n{text}"
# Fallback: coerce unexpected types to str to avoid TypeError
return str(content) + f"\n\n{text}"
def _apply(self, state: AgentState, runtime: Runtime) -> dict | None:
warning, hard_stop = self._track_and_check(state, runtime)
@ -192,7 +209,7 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
stripped_msg = last_msg.model_copy(
update={
"tool_calls": [],
"content": (last_msg.content or "") + f"\n\n{_HARD_STOP_MSG}",
"content": self._append_text(last_msg.content, _HARD_STOP_MSG),
}
)
return {"messages": [stripped_msg]}

View File

@ -14,6 +14,21 @@ from deerflow.config.memory_config import get_memory_config
logger = logging.getLogger(__name__)
_UPLOAD_BLOCK_RE = re.compile(r"<uploaded_files>[\s\S]*?</uploaded_files>\n*", re.IGNORECASE)
_CORRECTION_PATTERNS = (
re.compile(r"\bthat(?:'s| is) (?:wrong|incorrect)\b", re.IGNORECASE),
re.compile(r"\byou misunderstood\b", re.IGNORECASE),
re.compile(r"\btry again\b", re.IGNORECASE),
re.compile(r"\bredo\b", re.IGNORECASE),
re.compile(r"不对"),
re.compile(r"你理解错了"),
re.compile(r"你理解有误"),
re.compile(r"重试"),
re.compile(r"重新来"),
re.compile(r"换一种"),
re.compile(r"改用"),
)
class MemoryMiddlewareState(AgentState):
"""Compatible with the `ThreadState` schema."""
@ -21,6 +36,22 @@ class MemoryMiddlewareState(AgentState):
pass
def _extract_message_text(message: Any) -> str:
"""Extract plain text from message content for filtering and signal detection."""
content = getattr(message, "content", "")
if isinstance(content, list):
text_parts: list[str] = []
for part in content:
if isinstance(part, str):
text_parts.append(part)
elif isinstance(part, dict):
text_val = part.get("text")
if isinstance(text_val, str):
text_parts.append(text_val)
return " ".join(text_parts)
return str(content)
def _filter_messages_for_memory(messages: list[Any]) -> list[Any]:
"""Filter messages to keep only user inputs and final assistant responses.
@ -44,18 +75,13 @@ def _filter_messages_for_memory(messages: list[Any]) -> list[Any]:
Returns:
Filtered list containing only user inputs and final assistant responses.
"""
_UPLOAD_BLOCK_RE = re.compile(r"<uploaded_files>[\s\S]*?</uploaded_files>\n*", re.IGNORECASE)
filtered = []
skip_next_ai = False
for msg in messages:
msg_type = getattr(msg, "type", None)
if msg_type == "human":
content = getattr(msg, "content", "")
if isinstance(content, list):
content = " ".join(p.get("text", "") for p in content if isinstance(p, dict))
content_str = str(content)
content_str = _extract_message_text(msg)
if "<uploaded_files>" in content_str:
# Strip the ephemeral upload block; keep the user's real question.
stripped = _UPLOAD_BLOCK_RE.sub("", content_str).strip()
@ -87,6 +113,25 @@ def _filter_messages_for_memory(messages: list[Any]) -> list[Any]:
return filtered
def detect_correction(messages: list[Any]) -> bool:
"""Detect explicit user corrections in recent conversation turns.
The queue keeps only one pending context per thread, so callers pass the
latest filtered message list. Checking only recent user turns keeps signal
detection conservative while avoiding stale corrections from long histories.
"""
recent_user_msgs = [msg for msg in messages[-6:] if getattr(msg, "type", None) == "human"]
for msg in recent_user_msgs:
content = _extract_message_text(msg).strip()
if not content:
continue
if any(pattern.search(content) for pattern in _CORRECTION_PATTERNS):
return True
return False
class MemoryMiddleware(AgentMiddleware[MemoryMiddlewareState]):
"""Middleware that queues conversation for memory update after agent execution.
@ -150,7 +195,13 @@ class MemoryMiddleware(AgentMiddleware[MemoryMiddlewareState]):
return None
# Queue the filtered conversation for memory update
correction_detected = detect_correction(filtered_messages)
queue = get_memory_queue()
queue.add(thread_id=thread_id, messages=filtered_messages, agent_name=self._agent_name)
queue.add(
thread_id=thread_id,
messages=filtered_messages,
agent_name=self._agent_name,
correction_detected=correction_detected,
)
return None

View File

@ -116,44 +116,33 @@ class TitleMiddleware(AgentMiddleware[TitleMiddlewareState]):
return config
def _generate_title_result(self, state: TitleMiddlewareState) -> dict | None:
"""Synchronously generate a title. Returns state update or None."""
"""Generate a local fallback title without blocking on an LLM call."""
if not self._should_generate_title(state):
return None
prompt, user_msg = self._build_title_prompt(state)
config = get_title_config()
model = create_chat_model(name=config.model_name, thinking_enabled=False)
try:
response = model.invoke(prompt, config=self._get_runnable_config())
title = self._parse_title(response.content)
if not title:
title = self._fallback_title(user_msg)
except Exception:
logger.exception("Failed to generate title (sync)")
title = self._fallback_title(user_msg)
return {"title": title}
_, user_msg = self._build_title_prompt(state)
return {"title": self._fallback_title(user_msg)}
async def _agenerate_title_result(self, state: TitleMiddlewareState) -> dict | None:
"""Asynchronously generate a title. Returns state update or None."""
"""Generate a title asynchronously and fall back locally on failure."""
if not self._should_generate_title(state):
return None
prompt, user_msg = self._build_title_prompt(state)
config = get_title_config()
model = create_chat_model(name=config.model_name, thinking_enabled=False)
prompt, user_msg = self._build_title_prompt(state)
try:
response = await model.ainvoke(prompt, config=self._get_runnable_config())
if config.model_name:
model = create_chat_model(name=config.model_name, thinking_enabled=False)
else:
model = create_chat_model(thinking_enabled=False)
response = await model.ainvoke(prompt)
title = self._parse_title(response.content)
if not title:
title = self._fallback_title(user_msg)
if title:
return {"title": title}
except Exception:
logger.exception("Failed to generate title (async)")
title = self._fallback_title(user_msg)
return {"title": title}
logger.debug("Failed to generate async title; falling back to local title", exc_info=True)
return {"title": self._fallback_title(user_msg)}
@override
def after_model(self, state: TitleMiddlewareState, runtime: Runtime) -> dict | None:

View File

@ -72,6 +72,7 @@ def _build_runtime_middlewares(
lazy_init: bool = True,
) -> list[AgentMiddleware]:
"""Build shared base middlewares for agent execution."""
from deerflow.agents.middlewares.llm_error_handling_middleware import LLMErrorHandlingMiddleware
from deerflow.agents.middlewares.thread_data_middleware import ThreadDataMiddleware
from deerflow.sandbox.middleware import SandboxMiddleware
@ -90,6 +91,8 @@ def _build_runtime_middlewares(
middlewares.append(DanglingToolCallMiddleware())
middlewares.append(LLMErrorHandlingMiddleware())
# Guardrail middleware (if configured)
from deerflow.config.guardrails_config import get_guardrails_config
@ -135,6 +138,6 @@ def build_subagent_runtime_middlewares(*, lazy_init: bool = True) -> list[AgentM
"""Middlewares shared by subagent runtime before subagent-only middlewares."""
return _build_runtime_middlewares(
include_uploads=False,
include_dangling_tool_call_patch=False,
include_dangling_tool_call_patch=True,
lazy_init=lazy_init,
)

View File

@ -10,10 +10,52 @@ from langchain_core.messages import HumanMessage
from langgraph.runtime import Runtime
from deerflow.config.paths import Paths, get_paths
from deerflow.utils.file_conversion import extract_outline
logger = logging.getLogger(__name__)
_OUTLINE_PREVIEW_LINES = 5
def _extract_outline_for_file(file_path: Path) -> tuple[list[dict], list[str]]:
"""Return the document outline and fallback preview for *file_path*.
Looks for a sibling ``<stem>.md`` file produced by the upload conversion
pipeline.
Returns:
(outline, preview) where:
- outline: list of ``{title, line}`` dicts (plus optional sentinel).
Empty when no headings are found or no .md exists.
- preview: first few non-empty lines of the .md, used as a content
anchor when outline is empty so the agent has some context.
Empty when outline is non-empty (no fallback needed).
"""
md_path = file_path.with_suffix(".md")
if not md_path.is_file():
return [], []
outline = extract_outline(md_path)
if outline:
logger.debug("Extracted %d outline entries from %s", len(outline), file_path.name)
return outline, []
# outline is empty — read the first few non-empty lines as a content preview
preview: list[str] = []
try:
with md_path.open(encoding="utf-8") as f:
for line in f:
stripped = line.strip()
if stripped:
preview.append(stripped)
if len(preview) >= _OUTLINE_PREVIEW_LINES:
break
except Exception:
logger.debug("Failed to read preview lines from %s", md_path, exc_info=True)
return [], preview
class UploadsMiddlewareState(AgentState):
"""State schema for uploads middleware."""
@ -39,12 +81,38 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):
super().__init__()
self._paths = Paths(base_dir) if base_dir else get_paths()
def _format_file_entry(self, file: dict, lines: list[str]) -> None:
"""Append a single file entry (name, size, path, optional outline) to lines."""
size_kb = file["size"] / 1024
size_str = f"{size_kb:.1f} KB" if size_kb < 1024 else f"{size_kb / 1024:.1f} MB"
lines.append(f"- {file['filename']} ({size_str})")
lines.append(f" Path: {file['path']}")
outline = file.get("outline") or []
if outline:
truncated = outline[-1].get("truncated", False)
visible = [e for e in outline if not e.get("truncated")]
lines.append(" Document outline (use `read_file` with line ranges to read sections):")
for entry in visible:
lines.append(f" L{entry['line']}: {entry['title']}")
if truncated:
lines.append(f" ... (showing first {len(visible)} headings; use `read_file` to explore further)")
else:
preview = file.get("outline_preview") or []
if preview:
lines.append(" No structural headings detected. Document begins with:")
for text in preview:
lines.append(f" > {text}")
lines.append(" Use `grep` to search for keywords (e.g. `grep(pattern='keyword', path='/mnt/user-data/uploads/')`).")
lines.append("")
def _create_files_message(self, new_files: list[dict], historical_files: list[dict]) -> str:
"""Create a formatted message listing uploaded files.
Args:
new_files: Files uploaded in the current message.
historical_files: Files uploaded in previous messages.
Each file dict may contain an optional ``outline`` key a list of
``{title, line}`` dicts extracted from the converted Markdown file.
Returns:
Formatted string inside <uploaded_files> tags.
@ -55,25 +123,24 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):
lines.append("")
if new_files:
for file in new_files:
size_kb = file["size"] / 1024
size_str = f"{size_kb:.1f} KB" if size_kb < 1024 else f"{size_kb / 1024:.1f} MB"
lines.append(f"- {file['filename']} ({size_str})")
lines.append(f" Path: {file['path']}")
lines.append("")
self._format_file_entry(file, lines)
else:
lines.append("(empty)")
lines.append("")
if historical_files:
lines.append("The following files were uploaded in previous messages and are still available:")
lines.append("")
for file in historical_files:
size_kb = file["size"] / 1024
size_str = f"{size_kb:.1f} KB" if size_kb < 1024 else f"{size_kb / 1024:.1f} MB"
lines.append(f"- {file['filename']} ({size_str})")
lines.append(f" Path: {file['path']}")
lines.append("")
self._format_file_entry(file, lines)
lines.append("You can read these files using the `read_file` tool with the paths shown above.")
lines.append("To work with these files:")
lines.append("- Read from the file first — use the outline line numbers and `read_file` to locate relevant sections.")
lines.append("- Use `grep` to search for keywords when you are not sure which section to look at")
lines.append(" (e.g. `grep(pattern='revenue', path='/mnt/user-data/uploads/')`).")
lines.append("- Use `glob` to find files by name pattern")
lines.append(" (e.g. `glob(pattern='**/*.md', path='/mnt/user-data/uploads/')`).")
lines.append("- Only fall back to web search if the file content is clearly insufficient to answer the question.")
lines.append("</uploaded_files>")
return "\n".join(lines)
@ -147,6 +214,13 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):
# Resolve uploads directory for existence checks
thread_id = (runtime.context or {}).get("thread_id")
if thread_id is None:
try:
from langgraph.config import get_config
thread_id = get_config().get("configurable", {}).get("thread_id")
except RuntimeError:
pass # get_config() raises outside a runnable context (e.g. unit tests)
uploads_dir = self._paths.sandbox_uploads_dir(thread_id) if thread_id else None
# Get newly uploaded files from the current message's additional_kwargs.files
@ -159,15 +233,26 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):
for file_path in sorted(uploads_dir.iterdir()):
if file_path.is_file() and file_path.name not in new_filenames:
stat = file_path.stat()
outline, preview = _extract_outline_for_file(file_path)
historical_files.append(
{
"filename": file_path.name,
"size": stat.st_size,
"path": f"/mnt/user-data/uploads/{file_path.name}",
"extension": file_path.suffix,
"outline": outline,
"outline_preview": preview,
}
)
# Attach outlines to new files as well
if uploads_dir:
for file in new_files:
phys_path = uploads_dir / file["filename"]
outline, preview = _extract_outline_for_file(phys_path)
file["outline"] = outline
file["outline_preview"] = preview
if not new_files and not historical_files:
return None

View File

@ -117,6 +117,7 @@ class DeerFlowClient:
subagent_enabled: bool = False,
plan_mode: bool = False,
agent_name: str | None = None,
available_skills: set[str] | None = None,
middlewares: Sequence[AgentMiddleware] | None = None,
):
"""Initialize the client.
@ -133,6 +134,7 @@ class DeerFlowClient:
subagent_enabled: Enable subagent delegation.
plan_mode: Enable TodoList middleware for plan mode.
agent_name: Name of the agent to use.
available_skills: Optional set of skill names to make available. If None (default), all scanned skills are available.
middlewares: Optional list of custom middlewares to inject into the agent.
"""
if config_path is not None:
@ -148,6 +150,7 @@ class DeerFlowClient:
self._subagent_enabled = subagent_enabled
self._plan_mode = plan_mode
self._agent_name = agent_name
self._available_skills = set(available_skills) if available_skills is not None else None
self._middlewares = list(middlewares) if middlewares else []
# Lazy agent — created on first call, recreated when config changes.
@ -208,6 +211,8 @@ class DeerFlowClient:
cfg.get("thinking_enabled"),
cfg.get("is_plan_mode"),
cfg.get("subagent_enabled"),
self._agent_name,
frozenset(self._available_skills) if self._available_skills is not None else None,
)
if self._agent is not None and self._agent_config_key == key:
@ -226,6 +231,7 @@ class DeerFlowClient:
subagent_enabled=subagent_enabled,
max_concurrent_subagents=max_concurrent_subagents,
agent_name=self._agent_name,
available_skills=self._available_skills,
),
"state_schema": ThreadState,
}

View File

@ -7,6 +7,7 @@ import uuid
from agent_sandbox import Sandbox as AioSandboxClient
from deerflow.sandbox.sandbox import Sandbox
from deerflow.sandbox.search import GrepMatch, path_matches, should_ignore_path, truncate_line
logger = logging.getLogger(__name__)
@ -124,16 +125,96 @@ class AioSandbox(Sandbox):
content: The text content to write to the file.
append: Whether to append the content to the file.
"""
try:
if append:
# Read existing content first and append
existing = self.read_file(path)
if not existing.startswith("Error:"):
content = existing + content
self._client.file.write_file(file=path, content=content)
except Exception as e:
logger.error(f"Failed to write file in sandbox: {e}")
raise
with self._lock:
try:
if append:
existing = self.read_file(path)
if not existing.startswith("Error:"):
content = existing + content
self._client.file.write_file(file=path, content=content)
except Exception as e:
logger.error(f"Failed to write file in sandbox: {e}")
raise
def glob(self, path: str, pattern: str, *, include_dirs: bool = False, max_results: int = 200) -> tuple[list[str], bool]:
if not include_dirs:
result = self._client.file.find_files(path=path, glob=pattern)
files = result.data.files if result.data and result.data.files else []
filtered = [file_path for file_path in files if not should_ignore_path(file_path)]
truncated = len(filtered) > max_results
return filtered[:max_results], truncated
result = self._client.file.list_path(path=path, recursive=True, show_hidden=False)
entries = result.data.files if result.data and result.data.files else []
matches: list[str] = []
root_path = path.rstrip("/") or "/"
root_prefix = root_path if root_path == "/" else f"{root_path}/"
for entry in entries:
if entry.path != root_path and not entry.path.startswith(root_prefix):
continue
if should_ignore_path(entry.path):
continue
rel_path = entry.path[len(root_path) :].lstrip("/")
if path_matches(pattern, rel_path):
matches.append(entry.path)
if len(matches) >= max_results:
return matches, True
return matches, False
def grep(
self,
path: str,
pattern: str,
*,
glob: str | None = None,
literal: bool = False,
case_sensitive: bool = False,
max_results: int = 100,
) -> tuple[list[GrepMatch], bool]:
import re as _re
regex_source = _re.escape(pattern) if literal else pattern
# Validate the pattern locally so an invalid regex raises re.error
# (caught by grep_tool's except re.error handler) rather than a
# generic remote API error.
_re.compile(regex_source, 0 if case_sensitive else _re.IGNORECASE)
regex = regex_source if case_sensitive else f"(?i){regex_source}"
if glob is not None:
find_result = self._client.file.find_files(path=path, glob=glob)
candidate_paths = find_result.data.files if find_result.data and find_result.data.files else []
else:
list_result = self._client.file.list_path(path=path, recursive=True, show_hidden=False)
entries = list_result.data.files if list_result.data and list_result.data.files else []
candidate_paths = [entry.path for entry in entries if not entry.is_directory]
matches: list[GrepMatch] = []
truncated = False
for file_path in candidate_paths:
if should_ignore_path(file_path):
continue
search_result = self._client.file.search_in_file(file=file_path, regex=regex)
data = search_result.data
if data is None:
continue
line_numbers = data.line_numbers or []
matched_lines = data.matches or []
for line_number, line in zip(line_numbers, matched_lines):
matches.append(
GrepMatch(
path=file_path,
line_number=line_number if isinstance(line_number, int) else 0,
line=truncate_line(line),
)
)
if len(matches) >= max_results:
truncated = True
return matches, truncated
return matches, truncated
def update_file(self, path: str, content: bytes) -> None:
"""Update a file with binary content in the sandbox.
@ -142,9 +223,10 @@ class AioSandbox(Sandbox):
path: The absolute path of the file to update.
content: The binary content to write to the file.
"""
try:
base64_content = base64.b64encode(content).decode("utf-8")
self._client.file.write_file(file=path, content=base64_content, encoding="base64")
except Exception as e:
logger.error(f"Failed to update file in sandbox: {e}")
raise
with self._lock:
try:
base64_content = base64.b64encode(content).decode("utf-8")
self._client.file.write_file(file=path, content=base64_content, encoding="base64")
except Exception as e:
logger.error(f"Failed to update file in sandbox: {e}")
raise

View File

@ -1,13 +1,16 @@
import logging
import os
import requests
import httpx
logger = logging.getLogger(__name__)
_api_key_warned = False
class JinaClient:
def crawl(self, url: str, return_format: str = "html", timeout: int = 10) -> str:
async def crawl(self, url: str, return_format: str = "html", timeout: int = 10) -> str:
global _api_key_warned
headers = {
"Content-Type": "application/json",
"X-Return-Format": return_format,
@ -15,11 +18,13 @@ class JinaClient:
}
if os.getenv("JINA_API_KEY"):
headers["Authorization"] = f"Bearer {os.getenv('JINA_API_KEY')}"
else:
elif not _api_key_warned:
_api_key_warned = True
logger.warning("Jina API key is not set. Provide your own key to access a higher rate limit. See https://jina.ai/reader for more information.")
data = {"url": url}
try:
response = requests.post("https://r.jina.ai/", headers=headers, json=data)
async with httpx.AsyncClient() as client:
response = await client.post("https://r.jina.ai/", headers=headers, json=data, timeout=timeout)
if response.status_code != 200:
error_message = f"Jina API returned status {response.status_code}: {response.text}"
@ -34,5 +39,5 @@ class JinaClient:
return response.text
except Exception as e:
error_message = f"Request to Jina API failed: {str(e)}"
logger.error(error_message)
logger.exception(error_message)
return f"Error: {error_message}"

View File

@ -8,7 +8,7 @@ readability_extractor = ReadabilityExtractor()
@tool("web_fetch", parse_docstring=True)
def web_fetch_tool(url: str) -> str:
async def web_fetch_tool(url: str) -> str:
"""Fetch the contents of a web page at a given URL.
Only fetch EXACT URLs that have been provided directly by the user or have been returned in results from the web_search and web_fetch tools.
This tool can NOT access content that requires authentication, such as private Google Docs or pages behind login walls.
@ -23,6 +23,8 @@ def web_fetch_tool(url: str) -> str:
config = get_app_config().get_tool_config("web_fetch")
if config is not None and "timeout" in config.model_extra:
timeout = config.model_extra.get("timeout")
html_content = jina_client.crawl(url, return_format="html", timeout=timeout)
html_content = await jina_client.crawl(url, return_format="html", timeout=timeout)
if isinstance(html_content, str) and html_content.startswith("Error:"):
return html_content
article = readability_extractor.extract_article(html_content)
return article.to_markdown()[:4096]

View File

@ -3,7 +3,13 @@ from .extensions_config import ExtensionsConfig, get_extensions_config
from .memory_config import MemoryConfig, get_memory_config
from .paths import Paths, get_paths
from .skills_config import SkillsConfig
from .tracing_config import get_tracing_config, is_tracing_enabled
from .tracing_config import (
get_enabled_tracing_providers,
get_explicitly_enabled_tracing_providers,
get_tracing_config,
is_tracing_enabled,
validate_enabled_tracing_providers,
)
__all__ = [
"get_app_config",
@ -15,5 +21,8 @@ __all__ = [
"MemoryConfig",
"get_memory_config",
"get_tracing_config",
"get_explicitly_enabled_tracing_providers",
"get_enabled_tracing_providers",
"is_tracing_enabled",
"validate_enabled_tracing_providers",
]

View File

@ -22,6 +22,11 @@ class AgentConfig(BaseModel):
description: str = ""
model: str | None = None
tool_groups: list[str] | None = None
# skills controls which skills are loaded into the agent's prompt:
# - None (or omitted): load all enabled skills (default fallback behavior)
# - [] (explicit empty list): disable all skills
# - ["skill1", "skill2"]: load only the specified skills
skills: list[str] | None = None
def load_agent_config(name: str | None) -> AgentConfig | None:

View File

@ -1,5 +1,6 @@
import logging
import os
from contextvars import ContextVar
from pathlib import Path
from typing import Any, Self
@ -11,16 +12,16 @@ from deerflow.config.acp_config import load_acp_config_from_dict
from deerflow.config.checkpointer_config import CheckpointerConfig, load_checkpointer_config_from_dict
from deerflow.config.database_config import DatabaseConfig
from deerflow.config.extensions_config import ExtensionsConfig
from deerflow.config.guardrails_config import load_guardrails_config_from_dict
from deerflow.config.memory_config import load_memory_config_from_dict
from deerflow.config.guardrails_config import GuardrailsConfig, load_guardrails_config_from_dict
from deerflow.config.memory_config import MemoryConfig, load_memory_config_from_dict
from deerflow.config.model_config import ModelConfig
from deerflow.config.run_events_config import RunEventsConfig
from deerflow.config.sandbox_config import SandboxConfig
from deerflow.config.skills_config import SkillsConfig
from deerflow.config.stream_bridge_config import StreamBridgeConfig, load_stream_bridge_config_from_dict
from deerflow.config.subagents_config import load_subagents_config_from_dict
from deerflow.config.summarization_config import load_summarization_config_from_dict
from deerflow.config.title_config import load_title_config_from_dict
from deerflow.config.subagents_config import SubagentsAppConfig, load_subagents_config_from_dict
from deerflow.config.summarization_config import SummarizationConfig, load_summarization_config_from_dict
from deerflow.config.title_config import TitleConfig, load_title_config_from_dict
from deerflow.config.token_usage_config import TokenUsageConfig
from deerflow.config.tool_config import ToolConfig, ToolGroupConfig
from deerflow.config.tool_search_config import ToolSearchConfig, load_tool_search_config_from_dict
@ -30,6 +31,13 @@ load_dotenv()
logger = logging.getLogger(__name__)
def _default_config_candidates() -> tuple[Path, ...]:
"""Return deterministic config.yaml locations without relying on cwd."""
backend_dir = Path(__file__).resolve().parents[4]
repo_root = backend_dir.parent
return (backend_dir / "config.yaml", repo_root / "config.yaml")
class AppConfig(BaseModel):
"""Config for the DeerFlow application"""
@ -42,6 +50,11 @@ class AppConfig(BaseModel):
skills: SkillsConfig = Field(default_factory=SkillsConfig, description="Skills configuration")
extensions: ExtensionsConfig = Field(default_factory=ExtensionsConfig, description="Extensions configuration (MCP servers and skills state)")
tool_search: ToolSearchConfig = Field(default_factory=ToolSearchConfig, description="Tool search / deferred loading configuration")
title: TitleConfig = Field(default_factory=TitleConfig, description="Automatic title generation configuration")
summarization: SummarizationConfig = Field(default_factory=SummarizationConfig, description="Conversation summarization configuration")
memory: MemoryConfig = Field(default_factory=MemoryConfig, description="Memory subsystem configuration")
subagents: SubagentsAppConfig = Field(default_factory=SubagentsAppConfig, description="Subagent runtime configuration")
guardrails: GuardrailsConfig = Field(default_factory=GuardrailsConfig, description="Guardrail middleware configuration")
model_config = ConfigDict(extra="allow", frozen=False)
database: DatabaseConfig = Field(default_factory=DatabaseConfig, description="Unified database backend configuration")
run_events: RunEventsConfig = Field(default_factory=RunEventsConfig, description="Run event storage configuration")
@ -55,7 +68,7 @@ class AppConfig(BaseModel):
Priority:
1. If provided `config_path` argument, use it.
2. If provided `DEER_FLOW_CONFIG_PATH` environment variable, use it.
3. Otherwise, first check the `config.yaml` in the current directory, then fallback to `config.yaml` in the parent directory.
3. Otherwise, search deterministic backend/repository-root defaults from `_default_config_candidates()`.
"""
if config_path:
path = Path(config_path)
@ -68,14 +81,10 @@ class AppConfig(BaseModel):
raise FileNotFoundError(f"Config file specified by environment variable `DEER_FLOW_CONFIG_PATH` not found at {path}")
return path
else:
# Check if the config.yaml is in the current directory
path = Path(os.getcwd()) / "config.yaml"
if not path.exists():
# Check if the config.yaml is in the parent directory of CWD
path = Path(os.getcwd()).parent / "config.yaml"
if not path.exists():
raise FileNotFoundError("`config.yaml` file not found at the current directory nor its parent directory")
return path
for path in _default_config_candidates():
if path.exists():
return path
raise FileNotFoundError("`config.yaml` file not found at the default backend or repository root locations")
@classmethod
def from_file(cls, config_path: str | None = None) -> Self:
@ -248,6 +257,8 @@ _app_config: AppConfig | None = None
_app_config_path: Path | None = None
_app_config_mtime: float | None = None
_app_config_is_custom = False
_current_app_config: ContextVar[AppConfig | None] = ContextVar("deerflow_current_app_config", default=None)
_current_app_config_stack: ContextVar[tuple[AppConfig | None, ...]] = ContextVar("deerflow_current_app_config_stack", default=())
def _get_config_mtime(config_path: Path) -> float | None:
@ -280,6 +291,10 @@ def get_app_config() -> AppConfig:
"""
global _app_config, _app_config_path, _app_config_mtime
runtime_override = _current_app_config.get()
if runtime_override is not None:
return runtime_override
if _app_config is not None and _app_config_is_custom:
return _app_config
@ -341,3 +356,26 @@ def set_app_config(config: AppConfig) -> None:
_app_config_path = None
_app_config_mtime = None
_app_config_is_custom = True
def peek_current_app_config() -> AppConfig | None:
"""Return the runtime-scoped AppConfig override, if one is active."""
return _current_app_config.get()
def push_current_app_config(config: AppConfig) -> None:
"""Push a runtime-scoped AppConfig override for the current execution context."""
stack = _current_app_config_stack.get()
_current_app_config_stack.set(stack + (_current_app_config.get(),))
_current_app_config.set(config)
def pop_current_app_config() -> None:
"""Pop the latest runtime-scoped AppConfig override for the current execution context."""
stack = _current_app_config_stack.get()
if not stack:
_current_app_config.set(None)
return
previous = stack[-1]
_current_app_config_stack.set(stack[:-1])
_current_app_config.set(previous)

View File

@ -80,6 +80,12 @@ class ExtensionsConfig(BaseModel):
Args:
config_path: Optional path to extensions config file.
Resolution order:
1. If provided `config_path` argument, use it.
2. If provided `DEER_FLOW_EXTENSIONS_CONFIG_PATH` environment variable, use it.
3. Otherwise, search backend/repository-root defaults for
`extensions_config.json`, then legacy `mcp_config.json`.
Returns:
Path to the extensions config file if found, otherwise None.
"""
@ -94,24 +100,16 @@ class ExtensionsConfig(BaseModel):
raise FileNotFoundError(f"Extensions config file specified by environment variable `DEER_FLOW_EXTENSIONS_CONFIG_PATH` not found at {path}")
return path
else:
# Check if the extensions_config.json is in the current directory
path = Path(os.getcwd()) / "extensions_config.json"
if path.exists():
return path
# Check if the extensions_config.json is in the parent directory of CWD
path = Path(os.getcwd()).parent / "extensions_config.json"
if path.exists():
return path
# Backward compatibility: check for mcp_config.json
path = Path(os.getcwd()) / "mcp_config.json"
if path.exists():
return path
path = Path(os.getcwd()).parent / "mcp_config.json"
if path.exists():
return path
backend_dir = Path(__file__).resolve().parents[4]
repo_root = backend_dir.parent
for path in (
backend_dir / "extensions_config.json",
repo_root / "extensions_config.json",
backend_dir / "mcp_config.json",
repo_root / "mcp_config.json",
):
if path.exists():
return path
# Extensions are optional, so return None if not found
return None

View File

@ -9,6 +9,12 @@ VIRTUAL_PATH_PREFIX = "/mnt/user-data"
_SAFE_THREAD_ID_RE = re.compile(r"^[A-Za-z0-9_\-]+$")
def _default_local_base_dir() -> Path:
"""Return the repo-local DeerFlow state directory without relying on cwd."""
backend_dir = Path(__file__).resolve().parents[4]
return backend_dir / ".deer-flow"
def _validate_thread_id(thread_id: str) -> str:
"""Validate a thread ID before using it in filesystem paths."""
if not _SAFE_THREAD_ID_RE.match(thread_id):
@ -67,8 +73,7 @@ class Paths:
BaseDir resolution (in priority order):
1. Constructor argument `base_dir`
2. DEER_FLOW_HOME environment variable
3. Local dev fallback: cwd/.deer-flow (when cwd is the backend/ dir)
4. Default: $HOME/.deer-flow
3. Repo-local fallback derived from this module path: `{backend_dir}/.deer-flow`
"""
def __init__(self, base_dir: str | Path | None = None) -> None:
@ -104,11 +109,7 @@ class Paths:
if env_home := os.getenv("DEER_FLOW_HOME"):
return Path(env_home).resolve()
cwd = Path.cwd()
if cwd.name == "backend" or (cwd / "pyproject.toml").exists():
return cwd / ".deer-flow"
return Path.home() / ".deer-flow"
return _default_local_base_dir()
@property
def memory_file(self) -> Path:

View File

@ -64,4 +64,15 @@ class SandboxConfig(BaseModel):
description="Environment variables to inject into the sandbox container. Values starting with $ will be resolved from host environment variables.",
)
bash_output_max_chars: int = Field(
default=20000,
ge=0,
description="Maximum characters to keep from bash tool output. Output exceeding this limit is middle-truncated (head + tail), preserving the first and last half. Set to 0 to disable truncation.",
)
read_file_output_max_chars: int = Field(
default=50000,
ge=0,
description="Maximum characters to keep from read_file tool output. Output exceeding this limit is head-truncated. Set to 0 to disable truncation.",
)
model_config = ConfigDict(extra="allow")

View File

@ -3,6 +3,11 @@ from pathlib import Path
from pydantic import BaseModel, Field
def _default_repo_root() -> Path:
"""Resolve the repo root without relying on the current working directory."""
return Path(__file__).resolve().parents[5]
class SkillsConfig(BaseModel):
"""Configuration for skills system"""
@ -26,8 +31,8 @@ class SkillsConfig(BaseModel):
# Use configured path (can be absolute or relative)
path = Path(self.path)
if not path.is_absolute():
# If relative, resolve from current working directory
path = Path.cwd() / path
# If relative, resolve from the repo root for deterministic behavior.
path = _default_repo_root() / path
return path.resolve()
else:
# Default: ../skills relative to backend directory

View File

@ -1,14 +1,12 @@
import logging
import os
import threading
from pydantic import BaseModel, Field
logger = logging.getLogger(__name__)
_config_lock = threading.Lock()
class TracingConfig(BaseModel):
class LangSmithTracingConfig(BaseModel):
"""Configuration for LangSmith tracing."""
enabled: bool = Field(...)
@ -18,9 +16,69 @@ class TracingConfig(BaseModel):
@property
def is_configured(self) -> bool:
"""Check if tracing is fully configured (enabled and has API key)."""
return self.enabled and bool(self.api_key)
def validate(self) -> None:
if self.enabled and not self.api_key:
raise ValueError("LangSmith tracing is enabled but LANGSMITH_API_KEY (or LANGCHAIN_API_KEY) is not set.")
class LangfuseTracingConfig(BaseModel):
"""Configuration for Langfuse tracing."""
enabled: bool = Field(...)
public_key: str | None = Field(...)
secret_key: str | None = Field(...)
host: str = Field(...)
@property
def is_configured(self) -> bool:
return self.enabled and bool(self.public_key) and bool(self.secret_key)
def validate(self) -> None:
if not self.enabled:
return
missing: list[str] = []
if not self.public_key:
missing.append("LANGFUSE_PUBLIC_KEY")
if not self.secret_key:
missing.append("LANGFUSE_SECRET_KEY")
if missing:
raise ValueError(f"Langfuse tracing is enabled but required settings are missing: {', '.join(missing)}")
class TracingConfig(BaseModel):
"""Tracing configuration for supported providers."""
langsmith: LangSmithTracingConfig = Field(...)
langfuse: LangfuseTracingConfig = Field(...)
@property
def is_configured(self) -> bool:
return bool(self.enabled_providers)
@property
def explicitly_enabled_providers(self) -> list[str]:
enabled: list[str] = []
if self.langsmith.enabled:
enabled.append("langsmith")
if self.langfuse.enabled:
enabled.append("langfuse")
return enabled
@property
def enabled_providers(self) -> list[str]:
enabled: list[str] = []
if self.langsmith.is_configured:
enabled.append("langsmith")
if self.langfuse.is_configured:
enabled.append("langfuse")
return enabled
def validate_enabled(self) -> None:
self.langsmith.validate()
self.langfuse.validate()
_tracing_config: TracingConfig | None = None
@ -29,12 +87,7 @@ _TRUTHY_VALUES = {"1", "true", "yes", "on"}
def _env_flag_preferred(*names: str) -> bool:
"""Return the boolean value of the first env var that is present and non-empty.
Accepted truthy values (case-insensitive): ``1``, ``true``, ``yes``, ``on``.
Any other non-empty value is treated as falsy. If none of the named
variables is set, returns ``False``.
"""
"""Return the boolean value of the first env var that is present and non-empty."""
for name in names:
value = os.environ.get(name)
if value is not None and value.strip():
@ -52,43 +105,45 @@ def _first_env_value(*names: str) -> str | None:
def get_tracing_config() -> TracingConfig:
"""Get the current tracing configuration from environment variables.
``LANGSMITH_*`` variables take precedence over their legacy ``LANGCHAIN_*``
counterparts. For boolean flags (``enabled``), the *first* variable that is
present and non-empty in the priority list is the sole authority its value
is parsed and returned without consulting the remaining candidates. Accepted
truthy values are ``1``, ``true``, ``yes``, and ``on`` (case-insensitive);
any other non-empty value is treated as falsy.
Priority order:
enabled : LANGSMITH_TRACING > LANGCHAIN_TRACING_V2 > LANGCHAIN_TRACING
api_key : LANGSMITH_API_KEY > LANGCHAIN_API_KEY
project : LANGSMITH_PROJECT > LANGCHAIN_PROJECT (default: "deer-flow")
endpoint : LANGSMITH_ENDPOINT > LANGCHAIN_ENDPOINT (default: https://api.smith.langchain.com)
Returns:
TracingConfig with current settings.
"""
"""Get the current tracing configuration from environment variables."""
global _tracing_config
if _tracing_config is not None:
return _tracing_config
with _config_lock:
if _tracing_config is not None: # Double-check after acquiring lock
if _tracing_config is not None:
return _tracing_config
_tracing_config = TracingConfig(
# Keep compatibility with both legacy LANGCHAIN_* and newer LANGSMITH_* variables.
enabled=_env_flag_preferred("LANGSMITH_TRACING", "LANGCHAIN_TRACING_V2", "LANGCHAIN_TRACING"),
api_key=_first_env_value("LANGSMITH_API_KEY", "LANGCHAIN_API_KEY"),
project=_first_env_value("LANGSMITH_PROJECT", "LANGCHAIN_PROJECT") or "deer-flow",
endpoint=_first_env_value("LANGSMITH_ENDPOINT", "LANGCHAIN_ENDPOINT") or "https://api.smith.langchain.com",
langsmith=LangSmithTracingConfig(
enabled=_env_flag_preferred("LANGSMITH_TRACING", "LANGCHAIN_TRACING_V2", "LANGCHAIN_TRACING"),
api_key=_first_env_value("LANGSMITH_API_KEY", "LANGCHAIN_API_KEY"),
project=_first_env_value("LANGSMITH_PROJECT", "LANGCHAIN_PROJECT") or "deer-flow",
endpoint=_first_env_value("LANGSMITH_ENDPOINT", "LANGCHAIN_ENDPOINT") or "https://api.smith.langchain.com",
),
langfuse=LangfuseTracingConfig(
enabled=_env_flag_preferred("LANGFUSE_TRACING"),
public_key=_first_env_value("LANGFUSE_PUBLIC_KEY"),
secret_key=_first_env_value("LANGFUSE_SECRET_KEY"),
host=_first_env_value("LANGFUSE_BASE_URL") or "https://cloud.langfuse.com",
),
)
return _tracing_config
def get_enabled_tracing_providers() -> list[str]:
"""Return the configured tracing providers that are enabled and complete."""
return get_tracing_config().enabled_providers
def get_explicitly_enabled_tracing_providers() -> list[str]:
"""Return tracing providers explicitly enabled by config, even if incomplete."""
return get_tracing_config().explicitly_enabled_providers
def validate_enabled_tracing_providers() -> None:
"""Validate that any explicitly enabled providers are fully configured."""
get_tracing_config().validate_enabled()
def is_tracing_enabled() -> bool:
"""Check if LangSmith tracing is enabled and configured.
Returns:
True if tracing is enabled and has an API key.
"""
"""Check if any tracing provider is enabled and fully configured."""
return get_tracing_config().is_configured

View File

@ -2,8 +2,9 @@ import logging
from langchain.chat_models import BaseChatModel
from deerflow.config import get_app_config, get_tracing_config, is_tracing_enabled
from deerflow.config import get_app_config
from deerflow.reflection import resolve_class
from deerflow.tracing import build_tracing_callbacks
logger = logging.getLogger(__name__)
@ -88,17 +89,9 @@ def create_chat_model(name: str | None = None, thinking_enabled: bool = False, *
model_instance = model_class(**kwargs, **model_settings_from_config)
if is_tracing_enabled():
try:
from langchain_core.tracers.langchain import LangChainTracer
tracing_config = get_tracing_config()
tracer = LangChainTracer(
project_name=tracing_config.project,
)
existing_callbacks = model_instance.callbacks or []
model_instance.callbacks = [*existing_callbacks, tracer]
logger.debug(f"LangSmith tracing attached to model '{name}' (project='{tracing_config.project}')")
except Exception as e:
logger.warning(f"Failed to attach LangSmith tracing to model '{name}': {e}")
callbacks = build_tracing_callbacks()
if callbacks:
existing_callbacks = model_instance.callbacks or []
model_instance.callbacks = [*existing_callbacks, *callbacks]
logger.debug(f"Tracing attached to model '{name}' with providers={len(callbacks)}")
return model_instance

View File

@ -123,6 +123,11 @@ async def run_agent(
# Inject runtime context so middlewares can access thread_id
# (langgraph-cli does this automatically; we must do it manually)
runtime = Runtime(context={"thread_id": thread_id}, store=store)
# If the caller already set a ``context`` key (LangGraph >= 0.6.0
# prefers it over ``configurable`` for thread-level data), make
# sure ``thread_id`` is available there too.
if "context" in config and isinstance(config["context"], dict):
config["context"].setdefault("thread_id", thread_id)
config.setdefault("configurable", {})["__pregel_runtime"] = runtime
# Inject RunJournal as a LangChain callback handler.

View File

@ -25,6 +25,7 @@ class MemoryStreamBridge(StreamBridge):
self._maxsize = queue_maxsize
self._queues: dict[str, asyncio.Queue[StreamEvent]] = {}
self._counters: dict[str, int] = {}
self._dropped_counts: dict[str, int] = {}
# -- helpers ---------------------------------------------------------------
@ -32,6 +33,7 @@ class MemoryStreamBridge(StreamBridge):
if run_id not in self._queues:
self._queues[run_id] = asyncio.Queue(maxsize=self._maxsize)
self._counters[run_id] = 0
self._dropped_counts[run_id] = 0
return self._queues[run_id]
def _next_id(self, run_id: str) -> str:
@ -48,14 +50,41 @@ class MemoryStreamBridge(StreamBridge):
try:
await asyncio.wait_for(queue.put(entry), timeout=_PUBLISH_TIMEOUT)
except TimeoutError:
logger.warning("Stream bridge queue full for run %s — dropping event %s", run_id, event)
self._dropped_counts[run_id] = self._dropped_counts.get(run_id, 0) + 1
logger.warning(
"Stream bridge queue full for run %s — dropping event %s (total dropped: %d)",
run_id,
event,
self._dropped_counts[run_id],
)
async def publish_end(self, run_id: str) -> None:
queue = self._get_or_create_queue(run_id)
try:
await asyncio.wait_for(queue.put(END_SENTINEL), timeout=_PUBLISH_TIMEOUT)
except TimeoutError:
logger.warning("Stream bridge queue full for run %s — dropping END sentinel", run_id)
# END sentinel is critical — it is the only signal that allows
# subscribers to terminate. If the queue is full we evict the
# oldest *regular* events to make room rather than dropping END,
# which would cause the SSE connection to hang forever and leak
# the queue/counter resources for this run_id.
if queue.full():
evicted = 0
while queue.full():
try:
queue.get_nowait()
evicted += 1
except asyncio.QueueEmpty:
break # pragma: no cover defensive
if evicted:
logger.warning(
"Stream bridge queue full for run %s — evicted %d event(s) to guarantee END sentinel delivery",
run_id,
evicted,
)
# After eviction the queue is guaranteed to have space, so a
# simple non-blocking put is safe. We still use put() (which
# blocks until space is available) as a defensive measure.
await queue.put(END_SENTINEL)
async def subscribe(
self,
@ -84,7 +113,18 @@ class MemoryStreamBridge(StreamBridge):
await asyncio.sleep(delay)
self._queues.pop(run_id, None)
self._counters.pop(run_id, None)
self._dropped_counts.pop(run_id, None)
async def close(self) -> None:
self._queues.clear()
self._counters.clear()
self._dropped_counts.clear()
def dropped_count(self, run_id: str) -> int:
"""Return the number of events dropped for *run_id*."""
return self._dropped_counts.get(run_id, 0)
@property
def dropped_total(self) -> int:
"""Return the total number of events dropped across all runs."""
return sum(self._dropped_counts.values())

View File

@ -0,0 +1,23 @@
import threading
from deerflow.sandbox.sandbox import Sandbox
_FILE_OPERATION_LOCKS: dict[tuple[str, str], threading.Lock] = {}
_FILE_OPERATION_LOCKS_GUARD = threading.Lock()
def get_file_operation_lock_key(sandbox: Sandbox, path: str) -> tuple[str, str]:
sandbox_id = getattr(sandbox, "id", None)
if not sandbox_id:
sandbox_id = f"instance:{id(sandbox)}"
return sandbox_id, path
def get_file_operation_lock(sandbox: Sandbox, path: str) -> threading.Lock:
lock_key = get_file_operation_lock_key(sandbox, path)
with _FILE_OPERATION_LOCKS_GUARD:
lock = _FILE_OPERATION_LOCKS.get(lock_key)
if lock is None:
lock = threading.Lock()
_FILE_OPERATION_LOCKS[lock_key] = lock
return lock

View File

@ -1,72 +1,6 @@
import fnmatch
from pathlib import Path
IGNORE_PATTERNS = [
# Version Control
".git",
".svn",
".hg",
".bzr",
# Dependencies
"node_modules",
"__pycache__",
".venv",
"venv",
".env",
"env",
".tox",
".nox",
".eggs",
"*.egg-info",
"site-packages",
# Build outputs
"dist",
"build",
".next",
".nuxt",
".output",
".turbo",
"target",
"out",
# IDE & Editor
".idea",
".vscode",
"*.swp",
"*.swo",
"*~",
".project",
".classpath",
".settings",
# OS generated
".DS_Store",
"Thumbs.db",
"desktop.ini",
"*.lnk",
# Logs & temp files
"*.log",
"*.tmp",
"*.temp",
"*.bak",
"*.cache",
".cache",
"logs",
# Coverage & test artifacts
".coverage",
"coverage",
".nyc_output",
"htmlcov",
".pytest_cache",
".mypy_cache",
".ruff_cache",
]
def _should_ignore(name: str) -> bool:
"""Check if a file/directory name matches any ignore pattern."""
for pattern in IGNORE_PATTERNS:
if fnmatch.fnmatch(name, pattern):
return True
return False
from deerflow.sandbox.search import should_ignore_name
def list_dir(path: str, max_depth: int = 2) -> list[str]:
@ -95,7 +29,7 @@ def list_dir(path: str, max_depth: int = 2) -> list[str]:
try:
for item in current_path.iterdir():
if _should_ignore(item.name):
if should_ignore_name(item.name):
continue
post_fix = "/" if item.is_dir() else ""

View File

@ -1,11 +1,23 @@
import errno
import ntpath
import os
import shutil
import subprocess
from dataclasses import dataclass
from pathlib import Path
from deerflow.sandbox.local.list_dir import list_dir
from deerflow.sandbox.sandbox import Sandbox
from deerflow.sandbox.search import GrepMatch, find_glob_matches, find_grep_matches
@dataclass(frozen=True)
class PathMapping:
"""A path mapping from a container path to a local path with optional read-only flag."""
container_path: str
local_path: str
read_only: bool = False
class LocalSandbox(Sandbox):
@ -39,17 +51,42 @@ class LocalSandbox(Sandbox):
return None
def __init__(self, id: str, path_mappings: dict[str, str] | None = None):
def __init__(self, id: str, path_mappings: list[PathMapping] | None = None):
"""
Initialize local sandbox with optional path mappings.
Args:
id: Sandbox identifier
path_mappings: Dictionary mapping container paths to local paths
Example: {"/mnt/skills": "/absolute/path/to/skills"}
path_mappings: List of path mappings with optional read-only flag.
Skills directory is read-only by default.
"""
super().__init__(id)
self.path_mappings = path_mappings or {}
self.path_mappings = path_mappings or []
def _is_read_only_path(self, resolved_path: str) -> bool:
"""Check if a resolved path is under a read-only mount.
When multiple mappings match (nested mounts), prefer the most specific
mapping (i.e. the one whose local_path is the longest prefix of the
resolved path), similar to how ``_resolve_path`` handles container paths.
"""
resolved = str(Path(resolved_path).resolve())
best_mapping: PathMapping | None = None
best_prefix_len = -1
for mapping in self.path_mappings:
local_resolved = str(Path(mapping.local_path).resolve())
if resolved == local_resolved or resolved.startswith(local_resolved + os.sep):
prefix_len = len(local_resolved)
if prefix_len > best_prefix_len:
best_prefix_len = prefix_len
best_mapping = mapping
if best_mapping is None:
return False
return best_mapping.read_only
def _resolve_path(self, path: str) -> str:
"""
@ -64,7 +101,9 @@ class LocalSandbox(Sandbox):
path_str = str(path)
# Try each mapping (longest prefix first for more specific matches)
for container_path, local_path in sorted(self.path_mappings.items(), key=lambda x: len(x[0]), reverse=True):
for mapping in sorted(self.path_mappings, key=lambda m: len(m.container_path), reverse=True):
container_path = mapping.container_path
local_path = mapping.local_path
if path_str == container_path or path_str.startswith(container_path + "/"):
# Replace the container path prefix with local path
relative = path_str[len(container_path) :].lstrip("/")
@ -84,15 +123,16 @@ class LocalSandbox(Sandbox):
Returns:
Container path if mapping exists, otherwise original path
"""
path_str = str(Path(path).resolve())
normalized_path = path.replace("\\", "/")
path_str = str(Path(normalized_path).resolve())
# Try each mapping (longest local path first for more specific matches)
for container_path, local_path in sorted(self.path_mappings.items(), key=lambda x: len(x[1]), reverse=True):
local_path_resolved = str(Path(local_path).resolve())
if path_str.startswith(local_path_resolved):
for mapping in sorted(self.path_mappings, key=lambda m: len(m.local_path), reverse=True):
local_path_resolved = str(Path(mapping.local_path).resolve())
if path_str == local_path_resolved or path_str.startswith(local_path_resolved + "/"):
# Replace the local path prefix with container path
relative = path_str[len(local_path_resolved) :].lstrip("/")
resolved = f"{container_path}/{relative}" if relative else container_path
resolved = f"{mapping.container_path}/{relative}" if relative else mapping.container_path
return resolved
# No mapping found, return original path
@ -111,7 +151,7 @@ class LocalSandbox(Sandbox):
import re
# Sort mappings by local path length (longest first) for correct prefix matching
sorted_mappings = sorted(self.path_mappings.items(), key=lambda x: len(x[1]), reverse=True)
sorted_mappings = sorted(self.path_mappings, key=lambda m: len(m.local_path), reverse=True)
if not sorted_mappings:
return output
@ -119,12 +159,11 @@ class LocalSandbox(Sandbox):
# Create pattern that matches absolute paths
# Match paths like /Users/... or other absolute paths
result = output
for container_path, local_path in sorted_mappings:
local_path_resolved = str(Path(local_path).resolve())
for mapping in sorted_mappings:
# Escape the local path for use in regex
escaped_local = re.escape(local_path_resolved)
# Match the local path followed by optional path components
pattern = re.compile(escaped_local + r"(?:/[^\s\"';&|<>()]*)?")
escaped_local = re.escape(str(Path(mapping.local_path).resolve()))
# Match the local path followed by optional path components with either separator
pattern = re.compile(escaped_local + r"(?:[/\\][^\s\"';&|<>()]*)?")
def replace_match(match: re.Match) -> str:
matched_path = match.group(0)
@ -147,7 +186,7 @@ class LocalSandbox(Sandbox):
import re
# Sort mappings by length (longest first) for correct prefix matching
sorted_mappings = sorted(self.path_mappings.items(), key=lambda x: len(x[0]), reverse=True)
sorted_mappings = sorted(self.path_mappings, key=lambda m: len(m.container_path), reverse=True)
# Build regex pattern to match all container paths
# Match container path followed by optional path components
@ -157,7 +196,7 @@ class LocalSandbox(Sandbox):
# Create pattern that matches any of the container paths.
# The lookahead (?=/|$|...) ensures we only match at a path-segment boundary,
# preventing /mnt/skills from matching inside /mnt/skills-extra.
patterns = [re.escape(container_path) + r"(?=/|$|[\s\"';&|<>()])(?:/[^\s\"';&|<>()]*)?" for container_path, _ in sorted_mappings]
patterns = [re.escape(m.container_path) + r"(?=/|$|[\s\"';&|<>()])(?:/[^\s\"';&|<>()]*)?" for m in sorted_mappings]
pattern = re.compile("|".join(f"({p})" for p in patterns))
def replace_match(match: re.Match) -> str:
@ -248,6 +287,8 @@ class LocalSandbox(Sandbox):
def write_file(self, path: str, content: str, append: bool = False) -> None:
resolved_path = self._resolve_path(path)
if self._is_read_only_path(resolved_path):
raise OSError(errno.EROFS, "Read-only file system", path)
try:
dir_path = os.path.dirname(resolved_path)
if dir_path:
@ -259,8 +300,43 @@ class LocalSandbox(Sandbox):
# Re-raise with the original path for clearer error messages, hiding internal resolved paths
raise type(e)(e.errno, e.strerror, path) from None
def glob(self, path: str, pattern: str, *, include_dirs: bool = False, max_results: int = 200) -> tuple[list[str], bool]:
resolved_path = Path(self._resolve_path(path))
matches, truncated = find_glob_matches(resolved_path, pattern, include_dirs=include_dirs, max_results=max_results)
return [self._reverse_resolve_path(match) for match in matches], truncated
def grep(
self,
path: str,
pattern: str,
*,
glob: str | None = None,
literal: bool = False,
case_sensitive: bool = False,
max_results: int = 100,
) -> tuple[list[GrepMatch], bool]:
resolved_path = Path(self._resolve_path(path))
matches, truncated = find_grep_matches(
resolved_path,
pattern,
glob_pattern=glob,
literal=literal,
case_sensitive=case_sensitive,
max_results=max_results,
)
return [
GrepMatch(
path=self._reverse_resolve_path(match.path),
line_number=match.line_number,
line=match.line,
)
for match in matches
], truncated
def update_file(self, path: str, content: bytes) -> None:
resolved_path = self._resolve_path(path)
if self._is_read_only_path(resolved_path):
raise OSError(errno.EROFS, "Read-only file system", path)
try:
dir_path = os.path.dirname(resolved_path)
if dir_path:

View File

@ -1,6 +1,7 @@
import logging
from pathlib import Path
from deerflow.sandbox.local.local_sandbox import LocalSandbox
from deerflow.sandbox.local.local_sandbox import LocalSandbox, PathMapping
from deerflow.sandbox.sandbox import Sandbox
from deerflow.sandbox.sandbox_provider import SandboxProvider
@ -14,16 +15,17 @@ class LocalSandboxProvider(SandboxProvider):
"""Initialize the local sandbox provider with path mappings."""
self._path_mappings = self._setup_path_mappings()
def _setup_path_mappings(self) -> dict[str, str]:
def _setup_path_mappings(self) -> list[PathMapping]:
"""
Setup path mappings for local sandbox.
Maps container paths to actual local paths, including skills directory.
Maps container paths to actual local paths, including skills directory
and any custom mounts configured in config.yaml.
Returns:
Dictionary of path mappings
List of path mappings
"""
mappings = {}
mappings: list[PathMapping] = []
# Map skills container path to local skills directory
try:
@ -35,10 +37,63 @@ class LocalSandboxProvider(SandboxProvider):
# Only add mapping if skills directory exists
if skills_path.exists():
mappings[container_path] = str(skills_path)
mappings.append(
PathMapping(
container_path=container_path,
local_path=str(skills_path),
read_only=True, # Skills directory is always read-only
)
)
# Map custom mounts from sandbox config
_RESERVED_CONTAINER_PREFIXES = [container_path, "/mnt/acp-workspace", "/mnt/user-data"]
sandbox_config = config.sandbox
if sandbox_config and sandbox_config.mounts:
for mount in sandbox_config.mounts:
host_path = Path(mount.host_path)
container_path = mount.container_path.rstrip("/") or "/"
if not host_path.is_absolute():
logger.warning(
"Mount host_path must be absolute, skipping: %s -> %s",
mount.host_path,
mount.container_path,
)
continue
if not container_path.startswith("/"):
logger.warning(
"Mount container_path must be absolute, skipping: %s -> %s",
mount.host_path,
mount.container_path,
)
continue
# Reject mounts that conflict with reserved container paths
if any(container_path == p or container_path.startswith(p + "/") for p in _RESERVED_CONTAINER_PREFIXES):
logger.warning(
"Mount container_path conflicts with reserved prefix, skipping: %s",
mount.container_path,
)
continue
# Ensure the host path exists before adding mapping
if host_path.exists():
mappings.append(
PathMapping(
container_path=container_path,
local_path=str(host_path.resolve()),
read_only=mount.read_only,
)
)
else:
logger.warning(
"Mount host_path does not exist, skipping: %s -> %s",
mount.host_path,
mount.container_path,
)
except Exception as e:
# Log but don't fail if config loading fails
logger.warning("Could not setup skills path mapping: %s", e, exc_info=True)
logger.warning("Could not setup path mappings: %s", e, exc_info=True)
return mappings

View File

@ -1,5 +1,7 @@
from abc import ABC, abstractmethod
from deerflow.sandbox.search import GrepMatch
class Sandbox(ABC):
"""Abstract base class for sandbox environments"""
@ -61,6 +63,25 @@ class Sandbox(ABC):
"""
pass
@abstractmethod
def glob(self, path: str, pattern: str, *, include_dirs: bool = False, max_results: int = 200) -> tuple[list[str], bool]:
"""Find paths that match a glob pattern under a root directory."""
pass
@abstractmethod
def grep(
self,
path: str,
pattern: str,
*,
glob: str | None = None,
literal: bool = False,
case_sensitive: bool = False,
max_results: int = 100,
) -> tuple[list[GrepMatch], bool]:
"""Search for matches inside text files under a directory."""
pass
@abstractmethod
def update_file(self, path: str, content: bytes) -> None:
"""Update a file with binary content.

View File

@ -0,0 +1,210 @@
import fnmatch
import os
import re
from dataclasses import dataclass
from pathlib import Path, PurePosixPath
IGNORE_PATTERNS = [
".git",
".svn",
".hg",
".bzr",
"node_modules",
"__pycache__",
".venv",
"venv",
".env",
"env",
".tox",
".nox",
".eggs",
"*.egg-info",
"site-packages",
"dist",
"build",
".next",
".nuxt",
".output",
".turbo",
"target",
"out",
".idea",
".vscode",
"*.swp",
"*.swo",
"*~",
".project",
".classpath",
".settings",
".DS_Store",
"Thumbs.db",
"desktop.ini",
"*.lnk",
"*.log",
"*.tmp",
"*.temp",
"*.bak",
"*.cache",
".cache",
"logs",
".coverage",
"coverage",
".nyc_output",
"htmlcov",
".pytest_cache",
".mypy_cache",
".ruff_cache",
]
DEFAULT_MAX_FILE_SIZE_BYTES = 1_000_000
DEFAULT_LINE_SUMMARY_LENGTH = 200
@dataclass(frozen=True)
class GrepMatch:
path: str
line_number: int
line: str
def should_ignore_name(name: str) -> bool:
for pattern in IGNORE_PATTERNS:
if fnmatch.fnmatch(name, pattern):
return True
return False
def should_ignore_path(path: str) -> bool:
return any(should_ignore_name(segment) for segment in path.replace("\\", "/").split("/") if segment)
def path_matches(pattern: str, rel_path: str) -> bool:
path = PurePosixPath(rel_path)
if path.match(pattern):
return True
if pattern.startswith("**/"):
return path.match(pattern[3:])
return False
def truncate_line(line: str, max_chars: int = DEFAULT_LINE_SUMMARY_LENGTH) -> str:
line = line.rstrip("\n\r")
if len(line) <= max_chars:
return line
return line[: max_chars - 3] + "..."
def is_binary_file(path: Path, sample_size: int = 8192) -> bool:
try:
with path.open("rb") as handle:
return b"\0" in handle.read(sample_size)
except OSError:
return True
def find_glob_matches(root: Path, pattern: str, *, include_dirs: bool = False, max_results: int = 200) -> tuple[list[str], bool]:
matches: list[str] = []
truncated = False
root = root.resolve()
if not root.exists():
raise FileNotFoundError(root)
if not root.is_dir():
raise NotADirectoryError(root)
for current_root, dirs, files in os.walk(root):
dirs[:] = [name for name in dirs if not should_ignore_name(name)]
# root is already resolved; os.walk builds current_root by joining under root,
# so relative_to() works without an extra stat()/resolve() per directory.
rel_dir = Path(current_root).relative_to(root)
if include_dirs:
for name in dirs:
rel_path = (rel_dir / name).as_posix()
if path_matches(pattern, rel_path):
matches.append(str(Path(current_root) / name))
if len(matches) >= max_results:
truncated = True
return matches, truncated
for name in files:
if should_ignore_name(name):
continue
rel_path = (rel_dir / name).as_posix()
if path_matches(pattern, rel_path):
matches.append(str(Path(current_root) / name))
if len(matches) >= max_results:
truncated = True
return matches, truncated
return matches, truncated
def find_grep_matches(
root: Path,
pattern: str,
*,
glob_pattern: str | None = None,
literal: bool = False,
case_sensitive: bool = False,
max_results: int = 100,
max_file_size: int = DEFAULT_MAX_FILE_SIZE_BYTES,
line_summary_length: int = DEFAULT_LINE_SUMMARY_LENGTH,
) -> tuple[list[GrepMatch], bool]:
matches: list[GrepMatch] = []
truncated = False
root = root.resolve()
if not root.exists():
raise FileNotFoundError(root)
if not root.is_dir():
raise NotADirectoryError(root)
regex_source = re.escape(pattern) if literal else pattern
flags = 0 if case_sensitive else re.IGNORECASE
regex = re.compile(regex_source, flags)
# Skip lines longer than this to prevent ReDoS on minified / no-newline files.
_max_line_chars = line_summary_length * 10
for current_root, dirs, files in os.walk(root):
dirs[:] = [name for name in dirs if not should_ignore_name(name)]
rel_dir = Path(current_root).relative_to(root)
for name in files:
if should_ignore_name(name):
continue
candidate_path = Path(current_root) / name
rel_path = (rel_dir / name).as_posix()
if glob_pattern is not None and not path_matches(glob_pattern, rel_path):
continue
try:
if candidate_path.is_symlink():
continue
file_path = candidate_path.resolve()
if not file_path.is_relative_to(root):
continue
if file_path.stat().st_size > max_file_size or is_binary_file(file_path):
continue
with file_path.open(encoding="utf-8", errors="replace") as handle:
for line_number, line in enumerate(handle, start=1):
if len(line) > _max_line_chars:
continue
if regex.search(line):
matches.append(
GrepMatch(
path=str(file_path),
line_number=line_number,
line=truncate_line(line, line_summary_length),
)
)
if len(matches) >= max_results:
truncated = True
return matches, truncated
except OSError:
continue
return matches, truncated

View File

@ -7,17 +7,21 @@ from langchain.tools import ToolRuntime, tool
from langgraph.typing import ContextT
from deerflow.agents.thread_state import ThreadDataState, ThreadState
from deerflow.config import get_app_config
from deerflow.config.paths import VIRTUAL_PATH_PREFIX
from deerflow.sandbox.exceptions import (
SandboxError,
SandboxNotFoundError,
SandboxRuntimeError,
)
from deerflow.sandbox.file_operation_lock import get_file_operation_lock
from deerflow.sandbox.sandbox import Sandbox
from deerflow.sandbox.sandbox_provider import get_sandbox_provider
from deerflow.sandbox.search import GrepMatch
from deerflow.sandbox.security import LOCAL_HOST_BASH_DISABLED_MESSAGE, is_host_bash_allowed
_ABSOLUTE_PATH_PATTERN = re.compile(r"(?<![:\w])/(?:[^\s\"'`;&|<>()]+)")
_ABSOLUTE_PATH_PATTERN = re.compile(r"(?<![:\w])(?<!:/)/(?:[^\s\"'`;&|<>()]+)")
_FILE_URL_PATTERN = re.compile(r"\bfile://\S+", re.IGNORECASE)
_LOCAL_BASH_SYSTEM_PATH_PREFIXES = (
"/bin/",
"/usr/bin/",
@ -29,6 +33,10 @@ _LOCAL_BASH_SYSTEM_PATH_PREFIXES = (
_DEFAULT_SKILLS_CONTAINER_PATH = "/mnt/skills"
_ACP_WORKSPACE_VIRTUAL_PATH = "/mnt/acp-workspace"
_DEFAULT_GLOB_MAX_RESULTS = 200
_MAX_GLOB_MAX_RESULTS = 1000
_DEFAULT_GREP_MAX_RESULTS = 100
_MAX_GREP_MAX_RESULTS = 500
def _get_skills_container_path() -> str:
@ -111,6 +119,54 @@ def _is_acp_workspace_path(path: str) -> bool:
return path == _ACP_WORKSPACE_VIRTUAL_PATH or path.startswith(f"{_ACP_WORKSPACE_VIRTUAL_PATH}/")
def _get_custom_mounts():
"""Get custom volume mounts from sandbox config.
Result is cached after the first successful config load. If config loading
fails an empty list is returned *without* caching so that a later call can
pick up the real value once the config is available.
"""
cached = getattr(_get_custom_mounts, "_cached", None)
if cached is not None:
return cached
try:
from pathlib import Path
from deerflow.config import get_app_config
config = get_app_config()
mounts = []
if config.sandbox and config.sandbox.mounts:
# Only include mounts whose host_path exists, consistent with
# LocalSandboxProvider._setup_path_mappings() which also filters
# by host_path.exists().
mounts = [m for m in config.sandbox.mounts if Path(m.host_path).exists()]
_get_custom_mounts._cached = mounts # type: ignore[attr-defined]
return mounts
except Exception:
# If config loading fails, return an empty list without caching so that
# a later call can retry once the config is available.
return []
def _is_custom_mount_path(path: str) -> bool:
"""Check if path is under a custom mount container_path."""
for mount in _get_custom_mounts():
if path == mount.container_path or path.startswith(f"{mount.container_path}/"):
return True
return False
def _get_custom_mount_for_path(path: str):
"""Get the mount config matching this path (longest prefix first)."""
best = None
for mount in _get_custom_mounts():
if path == mount.container_path or path.startswith(f"{mount.container_path}/"):
if best is None or len(mount.container_path) > len(best.container_path):
best = mount
return best
def _extract_thread_id_from_thread_data(thread_data: "ThreadDataState | None") -> str | None:
"""Extract thread_id from thread_data by inspecting workspace_path.
@ -243,6 +299,69 @@ def _get_mcp_allowed_paths() -> list[str]:
return allowed_paths
def _get_tool_config_int(name: str, key: str, default: int) -> int:
try:
tool_config = get_app_config().get_tool_config(name)
if tool_config is not None and key in tool_config.model_extra:
value = tool_config.model_extra.get(key)
if isinstance(value, int):
return value
except Exception:
pass
return default
def _clamp_max_results(value: int, *, default: int, upper_bound: int) -> int:
if value <= 0:
return default
return min(value, upper_bound)
def _resolve_max_results(name: str, requested: int, *, default: int, upper_bound: int) -> int:
requested_max_results = _clamp_max_results(requested, default=default, upper_bound=upper_bound)
configured_max_results = _clamp_max_results(
_get_tool_config_int(name, "max_results", default),
default=default,
upper_bound=upper_bound,
)
return min(requested_max_results, configured_max_results)
def _resolve_local_read_path(path: str, thread_data: ThreadDataState) -> str:
validate_local_tool_path(path, thread_data, read_only=True)
if _is_skills_path(path):
return _resolve_skills_path(path)
if _is_acp_workspace_path(path):
return _resolve_acp_workspace_path(path, _extract_thread_id_from_thread_data(thread_data))
return _resolve_and_validate_user_data_path(path, thread_data)
def _format_glob_results(root_path: str, matches: list[str], truncated: bool) -> str:
if not matches:
return f"No files matched under {root_path}"
lines = [f"Found {len(matches)} paths under {root_path}"]
if truncated:
lines[0] += f" (showing first {len(matches)})"
lines.extend(f"{index}. {path}" for index, path in enumerate(matches, start=1))
if truncated:
lines.append("Results truncated. Narrow the path or pattern to see fewer matches.")
return "\n".join(lines)
def _format_grep_results(root_path: str, matches: list[GrepMatch], truncated: bool) -> str:
if not matches:
return f"No matches found under {root_path}"
lines = [f"Found {len(matches)} matches under {root_path}"]
if truncated:
lines[0] += f" (showing first {len(matches)})"
lines.extend(f"{match.path}:{match.line_number}: {match.line}" for match in matches)
if truncated:
lines.append("Results truncated. Narrow the path or add a glob filter.")
return "\n".join(lines)
def _path_variants(path: str) -> set[str]:
return {path, path.replace("\\", "/"), path.replace("/", "\\")}
@ -377,6 +496,8 @@ def mask_local_paths_in_output(output: str, thread_data: ThreadDataState | None)
result = pattern.sub(replace_acp, result)
# Custom mount host paths are masked by LocalSandbox._reverse_resolve_paths_in_output()
# Mask user-data host paths
if thread_data is None:
return result
@ -425,6 +546,7 @@ def validate_local_tool_path(path: str, thread_data: ThreadDataState | None, *,
- ``/mnt/user-data/*`` always allowed (read + write)
- ``/mnt/skills/*`` allowed only when *read_only* is True
- ``/mnt/acp-workspace/*`` allowed only when *read_only* is True
- Custom mount paths (from config.yaml) respects per-mount ``read_only`` flag
Args:
path: The virtual path to validate.
@ -456,7 +578,14 @@ def validate_local_tool_path(path: str, thread_data: ThreadDataState | None, *,
if path.startswith(f"{VIRTUAL_PATH_PREFIX}/"):
return
raise PermissionError(f"Only paths under {VIRTUAL_PATH_PREFIX}/, {_get_skills_container_path()}/, or {_ACP_WORKSPACE_VIRTUAL_PATH}/ are allowed")
# Custom mount paths — respect read_only config
if _is_custom_mount_path(path):
mount = _get_custom_mount_for_path(path)
if mount and mount.read_only and not read_only:
raise PermissionError(f"Write access to read-only mount is not allowed: {path}")
return
raise PermissionError(f"Only paths under {VIRTUAL_PATH_PREFIX}/, {_get_skills_container_path()}/, {_ACP_WORKSPACE_VIRTUAL_PATH}/, or configured mount paths are allowed")
def _validate_resolved_user_data_path(resolved: Path, thread_data: ThreadDataState) -> None:
@ -506,15 +635,21 @@ def validate_local_bash_command_paths(command: str, thread_data: ThreadDataState
boundary and must not be treated as isolation from the host filesystem.
In local mode, commands must use virtual paths under /mnt/user-data for
user data access. Skills paths under /mnt/skills and ACP workspace paths
under /mnt/acp-workspace are allowed (path-traversal checks only; write
prevention for bash commands is not enforced here).
user data access. Skills paths under /mnt/skills, ACP workspace paths
under /mnt/acp-workspace, and custom mount container paths (configured in
config.yaml) are allowed (path-traversal checks only; write prevention
for bash commands is not enforced here).
A small allowlist of common system path prefixes is kept for executable
and device references (e.g. /bin/sh, /dev/null).
"""
if thread_data is None:
raise SandboxRuntimeError("Thread data not available for local sandbox")
# Block file:// URLs which bypass the absolute-path regex but allow local file exfiltration
file_url_match = _FILE_URL_PATTERN.search(command)
if file_url_match:
raise PermissionError(f"Unsafe file:// URL in command: {file_url_match.group()}. Use paths under {VIRTUAL_PATH_PREFIX}")
unsafe_paths: list[str] = []
allowed_paths = _get_mcp_allowed_paths()
@ -538,6 +673,11 @@ def validate_local_bash_command_paths(command: str, thread_data: ThreadDataState
_reject_path_traversal(absolute_path)
continue
# Allow custom mount container paths
if _is_custom_mount_path(absolute_path):
_reject_path_traversal(absolute_path)
continue
if any(absolute_path == prefix.rstrip("/") or absolute_path.startswith(prefix) for prefix in _LOCAL_BASH_SYSTEM_PATH_PREFIXES):
continue
@ -582,6 +722,8 @@ def replace_virtual_paths_in_command(command: str, thread_data: ThreadDataState
result = acp_pattern.sub(replace_acp_match, result)
# Custom mount paths are resolved by LocalSandbox._resolve_paths_in_command()
# Replace user-data paths
if VIRTUAL_PATH_PREFIX in result and thread_data is not None:
pattern = re.compile(rf"{re.escape(VIRTUAL_PATH_PREFIX)}(/[^\s\"';&|<>()]*)?")
@ -757,6 +899,59 @@ def ensure_thread_directories_exist(runtime: ToolRuntime[ContextT, ThreadState]
runtime.state["thread_directories_created"] = True
def _truncate_bash_output(output: str, max_chars: int) -> str:
"""Middle-truncate bash output, preserving head and tail (50/50 split).
bash output may have errors at either end (stderr/stdout ordering is
non-deterministic), so both ends are preserved equally.
The returned string (including the truncation marker) is guaranteed to be
no longer than max_chars characters. Pass max_chars=0 to disable truncation
and return the full output unchanged.
"""
if max_chars == 0:
return output
if len(output) <= max_chars:
return output
total_len = len(output)
# Compute the exact worst-case marker length: skipped chars is at most
# total_len, so this is a tight upper bound.
marker_max_len = len(f"\n... [middle truncated: {total_len} chars skipped] ...\n")
kept = max(0, max_chars - marker_max_len)
if kept == 0:
return output[:max_chars]
head_len = kept // 2
tail_len = kept - head_len
skipped = total_len - kept
marker = f"\n... [middle truncated: {skipped} chars skipped] ...\n"
return f"{output[:head_len]}{marker}{output[-tail_len:] if tail_len > 0 else ''}"
def _truncate_read_file_output(output: str, max_chars: int) -> str:
"""Head-truncate read_file output, preserving the beginning of the file.
Source code and documents are read top-to-bottom; the head contains the
most context (imports, class definitions, function signatures).
The returned string (including the truncation marker) is guaranteed to be
no longer than max_chars characters. Pass max_chars=0 to disable truncation
and return the full output unchanged.
"""
if max_chars == 0:
return output
if len(output) <= max_chars:
return output
total = len(output)
# Compute the exact worst-case marker length: both numeric fields are at
# their maximum (total chars), so this is a tight upper bound.
marker_max_len = len(f"\n... [truncated: showing first {total} of {total} chars. Use start_line/end_line to read a specific range] ...")
kept = max(0, max_chars - marker_max_len)
if kept == 0:
return output[:max_chars]
marker = f"\n... [truncated: showing first {kept} of {total} chars. Use start_line/end_line to read a specific range] ..."
return f"{output[:kept]}{marker}"
@tool("bash", parse_docstring=True)
def bash_tool(runtime: ToolRuntime[ContextT, ThreadState], description: str, command: str) -> str:
"""Execute a bash command in a Linux environment.
@ -781,9 +976,23 @@ def bash_tool(runtime: ToolRuntime[ContextT, ThreadState], description: str, com
command = replace_virtual_paths_in_command(command, thread_data)
command = _apply_cwd_prefix(command, thread_data)
output = sandbox.execute_command(command)
return mask_local_paths_in_output(output, thread_data)
try:
from deerflow.config.app_config import get_app_config
sandbox_cfg = get_app_config().sandbox
max_chars = sandbox_cfg.bash_output_max_chars if sandbox_cfg else 20000
except Exception:
max_chars = 20000
return _truncate_bash_output(mask_local_paths_in_output(output, thread_data), max_chars)
ensure_thread_directories_exist(runtime)
return sandbox.execute_command(command)
try:
from deerflow.config.app_config import get_app_config
sandbox_cfg = get_app_config().sandbox
max_chars = sandbox_cfg.bash_output_max_chars if sandbox_cfg else 20000
except Exception:
max_chars = 20000
return _truncate_bash_output(sandbox.execute_command(command), max_chars)
except SandboxError as e:
return f"Error: {e}"
except PermissionError as e:
@ -811,8 +1020,9 @@ def ls_tool(runtime: ToolRuntime[ContextT, ThreadState], description: str, path:
path = _resolve_skills_path(path)
elif _is_acp_workspace_path(path):
path = _resolve_acp_workspace_path(path, _extract_thread_id_from_thread_data(thread_data))
else:
elif not _is_custom_mount_path(path):
path = _resolve_and_validate_user_data_path(path, thread_data)
# Custom mount paths are resolved by LocalSandbox._resolve_path()
children = sandbox.list_dir(path)
if not children:
return "(empty)"
@ -827,6 +1037,126 @@ def ls_tool(runtime: ToolRuntime[ContextT, ThreadState], description: str, path:
return f"Error: Unexpected error listing directory: {_sanitize_error(e, runtime)}"
@tool("glob", parse_docstring=True)
def glob_tool(
runtime: ToolRuntime[ContextT, ThreadState],
description: str,
pattern: str,
path: str,
include_dirs: bool = False,
max_results: int = _DEFAULT_GLOB_MAX_RESULTS,
) -> str:
"""Find files or directories that match a glob pattern under a root directory.
Args:
description: Explain why you are searching for these paths in short words. ALWAYS PROVIDE THIS PARAMETER FIRST.
pattern: The glob pattern to match relative to the root path, for example `**/*.py`.
path: The **absolute** root directory to search under.
include_dirs: Whether matching directories should also be returned. Default is False.
max_results: Maximum number of paths to return. Default is 200.
"""
try:
sandbox = ensure_sandbox_initialized(runtime)
ensure_thread_directories_exist(runtime)
requested_path = path
effective_max_results = _resolve_max_results(
"glob",
max_results,
default=_DEFAULT_GLOB_MAX_RESULTS,
upper_bound=_MAX_GLOB_MAX_RESULTS,
)
thread_data = None
if is_local_sandbox(runtime):
thread_data = get_thread_data(runtime)
if thread_data is None:
raise SandboxRuntimeError("Thread data not available for local sandbox")
path = _resolve_local_read_path(path, thread_data)
matches, truncated = sandbox.glob(path, pattern, include_dirs=include_dirs, max_results=effective_max_results)
if thread_data is not None:
matches = [mask_local_paths_in_output(match, thread_data) for match in matches]
return _format_glob_results(requested_path, matches, truncated)
except SandboxError as e:
return f"Error: {e}"
except FileNotFoundError:
return f"Error: Directory not found: {requested_path}"
except NotADirectoryError:
return f"Error: Path is not a directory: {requested_path}"
except PermissionError:
return f"Error: Permission denied: {requested_path}"
except Exception as e:
return f"Error: Unexpected error searching paths: {_sanitize_error(e, runtime)}"
@tool("grep", parse_docstring=True)
def grep_tool(
runtime: ToolRuntime[ContextT, ThreadState],
description: str,
pattern: str,
path: str,
glob: str | None = None,
literal: bool = False,
case_sensitive: bool = False,
max_results: int = _DEFAULT_GREP_MAX_RESULTS,
) -> str:
"""Search for matching lines inside text files under a root directory.
Args:
description: Explain why you are searching file contents in short words. ALWAYS PROVIDE THIS PARAMETER FIRST.
pattern: The string or regex pattern to search for.
path: The **absolute** root directory to search under.
glob: Optional glob filter for candidate files, for example `**/*.py`.
literal: Whether to treat `pattern` as a plain string. Default is False.
case_sensitive: Whether matching is case-sensitive. Default is False.
max_results: Maximum number of matching lines to return. Default is 100.
"""
try:
sandbox = ensure_sandbox_initialized(runtime)
ensure_thread_directories_exist(runtime)
requested_path = path
effective_max_results = _resolve_max_results(
"grep",
max_results,
default=_DEFAULT_GREP_MAX_RESULTS,
upper_bound=_MAX_GREP_MAX_RESULTS,
)
thread_data = None
if is_local_sandbox(runtime):
thread_data = get_thread_data(runtime)
if thread_data is None:
raise SandboxRuntimeError("Thread data not available for local sandbox")
path = _resolve_local_read_path(path, thread_data)
matches, truncated = sandbox.grep(
path,
pattern,
glob=glob,
literal=literal,
case_sensitive=case_sensitive,
max_results=effective_max_results,
)
if thread_data is not None:
matches = [
GrepMatch(
path=mask_local_paths_in_output(match.path, thread_data),
line_number=match.line_number,
line=match.line,
)
for match in matches
]
return _format_grep_results(requested_path, matches, truncated)
except SandboxError as e:
return f"Error: {e}"
except FileNotFoundError:
return f"Error: Directory not found: {requested_path}"
except NotADirectoryError:
return f"Error: Path is not a directory: {requested_path}"
except re.error as e:
return f"Error: Invalid regex pattern: {e}"
except PermissionError:
return f"Error: Permission denied: {requested_path}"
except Exception as e:
return f"Error: Unexpected error searching file contents: {_sanitize_error(e, runtime)}"
@tool("read_file", parse_docstring=True)
def read_file_tool(
runtime: ToolRuntime[ContextT, ThreadState],
@ -854,14 +1184,22 @@ def read_file_tool(
path = _resolve_skills_path(path)
elif _is_acp_workspace_path(path):
path = _resolve_acp_workspace_path(path, _extract_thread_id_from_thread_data(thread_data))
else:
elif not _is_custom_mount_path(path):
path = _resolve_and_validate_user_data_path(path, thread_data)
# Custom mount paths are resolved by LocalSandbox._resolve_path()
content = sandbox.read_file(path)
if not content:
return "(empty)"
if start_line is not None and end_line is not None:
content = "\n".join(content.splitlines()[start_line - 1 : end_line])
return content
try:
from deerflow.config.app_config import get_app_config
sandbox_cfg = get_app_config().sandbox
max_chars = sandbox_cfg.read_file_output_max_chars if sandbox_cfg else 50000
except Exception:
max_chars = 50000
return _truncate_read_file_output(content, max_chars)
except SandboxError as e:
return f"Error: {e}"
except FileNotFoundError:
@ -896,8 +1234,11 @@ def write_file_tool(
if is_local_sandbox(runtime):
thread_data = get_thread_data(runtime)
validate_local_tool_path(path, thread_data)
path = _resolve_and_validate_user_data_path(path, thread_data)
sandbox.write_file(path, content, append)
if not _is_custom_mount_path(path):
path = _resolve_and_validate_user_data_path(path, thread_data)
# Custom mount paths are resolved by LocalSandbox._resolve_path()
with get_file_operation_lock(sandbox, path):
sandbox.write_file(path, content, append)
return "OK"
except SandboxError as e:
return f"Error: {e}"
@ -937,17 +1278,20 @@ def str_replace_tool(
if is_local_sandbox(runtime):
thread_data = get_thread_data(runtime)
validate_local_tool_path(path, thread_data)
path = _resolve_and_validate_user_data_path(path, thread_data)
content = sandbox.read_file(path)
if not content:
return "OK"
if old_str not in content:
return f"Error: String to replace not found in file: {requested_path}"
if replace_all:
content = content.replace(old_str, new_str)
else:
content = content.replace(old_str, new_str, 1)
sandbox.write_file(path, content)
if not _is_custom_mount_path(path):
path = _resolve_and_validate_user_data_path(path, thread_data)
# Custom mount paths are resolved by LocalSandbox._resolve_path()
with get_file_operation_lock(sandbox, path):
content = sandbox.read_file(path)
if not content:
return "OK"
if old_str not in content:
return f"Error: String to replace not found in file: {requested_path}"
if replace_all:
content = content.replace(old_str, new_str)
else:
content = content.replace(old_str, new_str, 1)
sandbox.write_file(path, content)
return "OK"
except SandboxError as e:
return f"Error: {e}"

View File

@ -33,15 +33,72 @@ def parse_skill_file(skill_file: Path, category: str, relative_path: Path | None
front_matter = front_matter_match.group(1)
# Parse YAML front matter (simple key-value parsing)
# Parse YAML front matter with basic multiline string support
metadata = {}
for line in front_matter.split("\n"):
line = line.strip()
if not line:
lines = front_matter.split("\n")
current_key = None
current_value = []
is_multiline = False
multiline_style = None
indent_level = None
for line in lines:
if is_multiline:
if not line.strip():
current_value.append("")
continue
current_indent = len(line) - len(line.lstrip())
if indent_level is None:
if current_indent > 0:
indent_level = current_indent
current_value.append(line[indent_level:])
continue
elif current_indent >= indent_level:
current_value.append(line[indent_level:])
continue
# If we reach here, it's either a new key or the end of multiline
if current_key and is_multiline:
if multiline_style == "|":
metadata[current_key] = "\n".join(current_value).rstrip()
else:
text = "\n".join(current_value).rstrip()
# Replace single newlines with spaces for folded blocks
metadata[current_key] = re.sub(r"(?<!\n)\n(?!\n)", " ", text)
current_key = None
current_value = []
is_multiline = False
multiline_style = None
indent_level = None
if not line.strip():
continue
if ":" in line:
# Handle nested dicts simply by ignoring indentation for now,
# or just extracting top-level keys
key, value = line.split(":", 1)
metadata[key.strip()] = value.strip()
key = key.strip()
value = value.strip()
if value in (">", "|"):
current_key = key
is_multiline = True
multiline_style = value
current_value = []
indent_level = None
else:
metadata[key] = value
if current_key and is_multiline:
if multiline_style == "|":
metadata[current_key] = "\n".join(current_value).rstrip()
else:
text = "\n".join(current_value).rstrip()
metadata[current_key] = re.sub(r"(?<!\n)\n(?!\n)", " ", text)
# Extract required fields
name = metadata.get("name")

View File

@ -57,6 +57,42 @@ def _build_mcp_servers() -> dict[str, dict[str, Any]]:
return build_servers_config(ExtensionsConfig.from_file())
def _build_acp_mcp_servers() -> list[dict[str, Any]]:
"""Build ACP ``mcpServers`` payload for ``new_session``.
The ACP client expects a list of server objects, while DeerFlow's MCP helper
returns a name -> config mapping for the LangChain MCP adapter. This helper
converts the enabled servers into the ACP wire format.
"""
from deerflow.config.extensions_config import ExtensionsConfig
extensions_config = ExtensionsConfig.from_file()
enabled_servers = extensions_config.get_enabled_mcp_servers()
mcp_servers: list[dict[str, Any]] = []
for name, server_config in enabled_servers.items():
transport_type = server_config.type or "stdio"
payload: dict[str, Any] = {"name": name, "type": transport_type}
if transport_type == "stdio":
if not server_config.command:
raise ValueError(f"MCP server '{name}' with stdio transport requires 'command' field")
payload["command"] = server_config.command
payload["args"] = server_config.args
payload["env"] = [{"name": key, "value": value} for key, value in server_config.env.items()]
elif transport_type in ("http", "sse"):
if not server_config.url:
raise ValueError(f"MCP server '{name}' with {transport_type} transport requires 'url' field")
payload["url"] = server_config.url
payload["headers"] = [{"name": key, "value": value} for key, value in server_config.headers.items()]
else:
raise ValueError(f"MCP server '{name}' has unsupported transport type: {transport_type}")
mcp_servers.append(payload)
return mcp_servers
def _build_permission_response(options: list[Any], *, auto_approve: bool) -> Any:
"""Build an ACP permission response.
@ -173,7 +209,15 @@ def build_invoke_acp_agent_tool(agents: dict) -> BaseTool:
cmd = agent_config.command
args = agent_config.args or []
physical_cwd = _get_work_dir(thread_id)
mcp_servers = _build_mcp_servers()
try:
mcp_servers = _build_acp_mcp_servers()
except ValueError as exc:
logger.warning(
"Invalid MCP server configuration for ACP agent '%s'; continuing without MCP servers: %s",
agent,
exc,
)
mcp_servers = []
agent_env: dict[str, str] | None = None
if agent_config.env:
agent_env = {k: (os.environ.get(v[1:], "") if v.startswith("$") else v) for k, v in agent_config.env.items()}

View File

@ -0,0 +1,3 @@
from .factory import build_tracing_callbacks
__all__ = ["build_tracing_callbacks"]

View File

@ -0,0 +1,54 @@
from __future__ import annotations
from typing import Any
from deerflow.config import (
get_enabled_tracing_providers,
get_tracing_config,
validate_enabled_tracing_providers,
)
def _create_langsmith_tracer(config) -> Any:
from langchain_core.tracers.langchain import LangChainTracer
return LangChainTracer(project_name=config.project)
def _create_langfuse_handler(config) -> Any:
from langfuse import Langfuse
from langfuse.langchain import CallbackHandler as LangfuseCallbackHandler
# langfuse>=4 initializes project-specific credentials through the client
# singleton; the LangChain callback then attaches to that configured client.
Langfuse(
secret_key=config.secret_key,
public_key=config.public_key,
host=config.host,
)
return LangfuseCallbackHandler(public_key=config.public_key)
def build_tracing_callbacks() -> list[Any]:
"""Build callbacks for all explicitly enabled tracing providers."""
validate_enabled_tracing_providers()
enabled_providers = get_enabled_tracing_providers()
if not enabled_providers:
return []
tracing_config = get_tracing_config()
callbacks: list[Any] = []
for provider in enabled_providers:
if provider == "langsmith":
try:
callbacks.append(_create_langsmith_tracer(tracing_config.langsmith))
except Exception as exc: # pragma: no cover - exercised via tests with monkeypatch
raise RuntimeError(f"LangSmith tracing initialization failed: {exc}") from exc
elif provider == "langfuse":
try:
callbacks.append(_create_langfuse_handler(tracing_config.langfuse))
except Exception as exc: # pragma: no cover - exercised via tests with monkeypatch
raise RuntimeError(f"Langfuse tracing initialization failed: {exc}") from exc
return callbacks

View File

@ -1,10 +1,22 @@
"""File conversion utilities.
Converts document files (PDF, PPT, Excel, Word) to Markdown using markitdown.
Converts document files (PDF, PPT, Excel, Word) to Markdown.
PDF conversion strategy (auto mode):
1. Try pymupdf4llm if installed better heading detection, faster on most files.
2. If output is suspiciously short (< _MIN_CHARS_PER_PAGE chars/page, or < 200 chars
total when page count is unavailable), treat as image-based and fall back to MarkItDown.
3. If pymupdf4llm is not installed, use MarkItDown directly (existing behaviour).
Large files (> ASYNC_THRESHOLD_BYTES) are converted in a thread pool via
asyncio.to_thread() to avoid blocking the event loop (fixes #1569).
No FastAPI or HTTP dependencies pure utility functions.
"""
import asyncio
import logging
import re
from pathlib import Path
logger = logging.getLogger(__name__)
@ -20,28 +32,278 @@ CONVERTIBLE_EXTENSIONS = {
".docx",
}
# Files larger than this threshold are converted in a background thread.
# Small files complete in < 1s synchronously; spawning a thread adds unnecessary
# scheduling overhead for them.
_ASYNC_THRESHOLD_BYTES = 1 * 1024 * 1024 # 1 MB
# If pymupdf4llm produces fewer characters *per page* than this threshold,
# the PDF is likely image-based or encrypted — fall back to MarkItDown.
# Rationale: normal text PDFs yield 200-2000 chars/page; image-based PDFs
# yield close to 0. 50 chars/page gives a wide safety margin.
# Falls back to absolute 200-char check when page count is unavailable.
_MIN_CHARS_PER_PAGE = 50
def _pymupdf_output_too_sparse(text: str, file_path: Path) -> bool:
"""Return True if pymupdf4llm output is suspiciously short (image-based PDF).
Uses chars-per-page rather than an absolute threshold so that both short
documents (few pages, few chars) and long documents (many pages, many chars)
are handled correctly.
"""
chars = len(text.strip())
doc = None
pages: int | None = None
try:
import pymupdf
doc = pymupdf.open(str(file_path))
pages = len(doc)
except Exception:
pass
finally:
if doc is not None:
try:
doc.close()
except Exception:
pass
if pages is not None and pages > 0:
return (chars / pages) < _MIN_CHARS_PER_PAGE
# Fallback: absolute threshold when page count is unavailable
return chars < 200
def _convert_pdf_with_pymupdf4llm(file_path: Path) -> str | None:
"""Attempt PDF conversion with pymupdf4llm.
Returns the markdown text, or None if pymupdf4llm is not installed or
if conversion fails (e.g. encrypted/corrupt PDF).
"""
try:
import pymupdf4llm
except ImportError:
return None
try:
return pymupdf4llm.to_markdown(str(file_path))
except Exception:
logger.exception("pymupdf4llm failed to convert %s; falling back to MarkItDown", file_path.name)
return None
def _convert_with_markitdown(file_path: Path) -> str:
"""Convert any supported file to markdown text using MarkItDown."""
from markitdown import MarkItDown
md = MarkItDown()
return md.convert(str(file_path)).text_content
def _do_convert(file_path: Path, pdf_converter: str) -> str:
"""Synchronous conversion — called directly or via asyncio.to_thread.
Args:
file_path: Path to the file.
pdf_converter: "auto" | "pymupdf4llm" | "markitdown"
"""
is_pdf = file_path.suffix.lower() == ".pdf"
if is_pdf and pdf_converter != "markitdown":
# Try pymupdf4llm first (auto or explicit)
pymupdf_text = _convert_pdf_with_pymupdf4llm(file_path)
if pymupdf_text is not None:
# pymupdf4llm is installed
if pdf_converter == "pymupdf4llm":
# Explicit — use as-is regardless of output length
return pymupdf_text
# auto mode: fall back if output looks like a failed parse.
# Use chars-per-page to distinguish image-based PDFs (near 0) from
# legitimately short documents.
if not _pymupdf_output_too_sparse(pymupdf_text, file_path):
return pymupdf_text
logger.warning(
"pymupdf4llm produced only %d chars for %s (likely image-based PDF); falling back to MarkItDown",
len(pymupdf_text.strip()),
file_path.name,
)
# pymupdf4llm not installed or fallback triggered → use MarkItDown
return _convert_with_markitdown(file_path)
async def convert_file_to_markdown(file_path: Path) -> Path | None:
"""Convert a file to markdown using markitdown.
"""Convert a supported document file to Markdown.
PDF files are handled with a two-converter strategy (see module docstring).
Large files (> 1 MB) are offloaded to a thread pool to avoid blocking the
event loop.
Args:
file_path: Path to the file to convert.
Returns:
Path to the markdown file if conversion was successful, None otherwise.
Path to the generated .md file, or None if conversion failed.
"""
try:
from markitdown import MarkItDown
pdf_converter = _get_pdf_converter()
file_size = file_path.stat().st_size
md = MarkItDown()
result = md.convert(str(file_path))
if file_size > _ASYNC_THRESHOLD_BYTES:
text = await asyncio.to_thread(_do_convert, file_path, pdf_converter)
else:
text = _do_convert(file_path, pdf_converter)
# Save as .md file with same name
md_path = file_path.with_suffix(".md")
md_path.write_text(result.text_content, encoding="utf-8")
md_path.write_text(text, encoding="utf-8")
logger.info(f"Converted {file_path.name} to markdown: {md_path.name}")
logger.info("Converted %s to markdown: %s (%d chars)", file_path.name, md_path.name, len(text))
return md_path
except Exception as e:
logger.error(f"Failed to convert {file_path.name} to markdown: {e}")
logger.error("Failed to convert %s to markdown: %s", file_path.name, e)
return None
# Regex for bold-only lines that look like section headings.
# Targets SEC filing structural headings that pymupdf4llm renders as **bold**
# rather than # Markdown headings (because they use same font size as body text,
# distinguished only by bold+caps formatting).
#
# Pattern requires ALL of:
# 1. Entire line is a single **...** block (no surrounding prose)
# 2. Starts with a recognised structural keyword:
# - ITEM / PART / SECTION (with optional number/letter after)
# - SCHEDULE, EXHIBIT, APPENDIX, ANNEX, CHAPTER
# All-caps addresses, boilerplate ("CURRENT REPORT", "SIGNATURES",
# "WASHINGTON, DC 20549") do NOT start with these keywords and are excluded.
#
# Chinese headings (第三节...) are already captured as standard # headings
# by pymupdf4llm, so they don't need this pattern.
_BOLD_HEADING_RE = re.compile(r"^\*\*((ITEM|PART|SECTION|SCHEDULE|EXHIBIT|APPENDIX|ANNEX|CHAPTER)\b[A-Z0-9 .,\-]*)\*\*\s*$")
# Regex for split-bold headings produced by pymupdf4llm when a heading spans
# multiple text spans in the PDF (e.g. section number and title are separate spans).
# Matches lines like: **1** **Introduction** or **3.2** **Multi-Head Attention**
# Requirements:
# 1. Entire line consists only of **...** blocks separated by whitespace (no prose)
# 2. First block is a section number (digits and dots, e.g. "1", "3.2", "A.1")
# 3. Second block must not be purely numeric/punctuation — excludes financial table
# headers like **2023** **2022** **2021** while allowing non-ASCII titles such as
# **1** **概述** or accented words (negative lookahead instead of [A-Za-z])
# 4. At most two additional blocks (four total) with [^*]+ (no * inside) to keep
# the regex linear and avoid ReDoS on attacker-controlled content
_SPLIT_BOLD_HEADING_RE = re.compile(r"^\*\*[\dA-Z][\d\.]*\*\*\s+\*\*(?!\d[\d\s.,\-–—/:()%]*\*\*)[^*]+\*\*(?:\s+\*\*[^*]+\*\*){0,2}\s*$")
# Maximum number of outline entries injected into the agent context.
# Keeps prompt size bounded even for very long documents.
MAX_OUTLINE_ENTRIES = 50
_ALLOWED_PDF_CONVERTERS = {"auto", "pymupdf4llm", "markitdown"}
def _clean_bold_title(raw: str) -> str:
"""Normalise a title string that may contain pymupdf4llm bold artefacts.
pymupdf4llm sometimes emits adjacent bold spans as ``**A** **B**`` instead
of a single ``**A B**`` block. This helper merges those fragments and then
strips the outermost ``**...**`` wrapper so the caller gets plain text.
Examples::
"**Overview**" "Overview"
"**UNITED STATES** **SECURITIES**" "UNITED STATES SECURITIES"
"plain text" "plain text" (unchanged)
"""
# Merge adjacent bold spans: "** **" → " "
merged = re.sub(r"\*\*\s*\*\*", " ", raw).strip()
# Strip outermost **...** if the whole string is wrapped
if m := re.fullmatch(r"\*\*(.+?)\*\*", merged, re.DOTALL):
return m.group(1).strip()
return merged
def extract_outline(md_path: Path) -> list[dict]:
"""Extract document outline (headings) from a Markdown file.
Recognises three heading styles produced by pymupdf4llm:
1. Standard Markdown headings: lines starting with one or more '#'.
Inline ``**...**`` wrappers and adjacent bold spans (``** **``) are
cleaned so the title is plain text.
2. Bold-only structural headings: ``**ITEM 1. BUSINESS**``, ``**PART II**``,
etc. SEC filings use bold+caps for section headings with the same font
size as body text, so pymupdf4llm cannot promote them to # headings.
3. Split-bold headings: ``**1** **Introduction**``, ``**3.2** **Attention**``.
pymupdf4llm emits these when the section number and title text are
separate spans in the underlying PDF (common in academic papers).
Args:
md_path: Path to the .md file.
Returns:
List of dicts with keys: title (str), line (int, 1-based).
When the outline is truncated at MAX_OUTLINE_ENTRIES, a sentinel entry
``{"truncated": True}`` is appended as the last element so callers can
render a "showing first N headings" hint without re-scanning the file.
Returns an empty list if the file cannot be read or has no headings.
"""
outline: list[dict] = []
try:
with md_path.open(encoding="utf-8") as f:
for lineno, line in enumerate(f, 1):
stripped = line.strip()
if not stripped:
continue
# Style 1: standard Markdown heading
if stripped.startswith("#"):
title = _clean_bold_title(stripped.lstrip("#").strip())
if title:
outline.append({"title": title, "line": lineno})
# Style 2: single bold block with SEC structural keyword
elif m := _BOLD_HEADING_RE.match(stripped):
title = m.group(1).strip()
if title:
outline.append({"title": title, "line": lineno})
# Style 3: split-bold heading — **<num>** **<title>**
# Regex already enforces max 4 blocks and non-numeric second block.
elif _SPLIT_BOLD_HEADING_RE.match(stripped):
title = " ".join(re.findall(r"\*\*([^*]+)\*\*", stripped))
if title:
outline.append({"title": title, "line": lineno})
if len(outline) >= MAX_OUTLINE_ENTRIES:
outline.append({"truncated": True})
break
except Exception:
return []
return outline
def _get_pdf_converter() -> str:
"""Read pdf_converter setting from app config, defaulting to 'auto'.
Normalizes the value to lowercase and validates it against the allowed set
so that values like 'AUTO' or 'MarkItDown' from config.yaml don't silently
fall through to unexpected behaviour.
"""
try:
from deerflow.config.app_config import get_app_config
cfg = get_app_config()
uploads_cfg = getattr(cfg, "uploads", None)
if uploads_cfg is not None:
raw = str(getattr(uploads_cfg, "pdf_converter", "auto")).strip().lower()
if raw not in _ALLOWED_PDF_CONVERTERS:
logger.warning("Invalid pdf_converter value %r; falling back to 'auto'", raw)
return "auto"
return raw
except Exception:
pass
return "auto"

View File

@ -14,6 +14,7 @@ dependencies = [
"langchain-deepseek>=1.0.1",
"langchain-mcp-adapters>=0.1.0",
"langchain-openai>=1.1.7",
"langfuse>=3.4.1",
"langgraph>=1.0.6,<1.0.10",
"langgraph-api>=0.7.0,<0.8.0",
"langgraph-cli>=0.4.14",
@ -44,6 +45,9 @@ postgres = [
"psycopg-pool>=3.3.0",
]
[project.optional-dependencies]
pymupdf = ["pymupdf4llm>=0.0.17"]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

View File

@ -16,6 +16,7 @@ dependencies = [
"python-telegram-bot>=21.0",
"langgraph-sdk>=0.1.51",
"markdown-to-mrkdwn>=0.3.1",
"wecom-aibot-python-sdk>=0.1.6",
]
[project.optional-dependencies]

View File

@ -131,3 +131,53 @@ class TestListDirSerialization:
result = sandbox.list_dir("/test")
assert result == ["/a", "/b"]
assert lock_was_held == [True], "list_dir must hold the lock during exec_command"
class TestConcurrentFileWrites:
"""Verify file write paths do not lose concurrent updates."""
def test_append_should_preserve_both_parallel_writes(self, sandbox):
storage = {"content": "seed\n"}
active_reads = 0
state_lock = threading.Lock()
overlap_detected = threading.Event()
def overlapping_read_file(path):
nonlocal active_reads
with state_lock:
active_reads += 1
snapshot = storage["content"]
if active_reads == 2:
overlap_detected.set()
overlap_detected.wait(0.05)
with state_lock:
active_reads -= 1
return snapshot
def write_back(*, file, content, **kwargs):
storage["content"] = content
return SimpleNamespace(data=SimpleNamespace())
sandbox.read_file = overlapping_read_file
sandbox._client.file.write_file = write_back
barrier = threading.Barrier(2)
def writer(payload: str):
barrier.wait()
sandbox.write_file("/tmp/shared.log", payload, append=True)
threads = [
threading.Thread(target=writer, args=("A\n",)),
threading.Thread(target=writer, args=("B\n",)),
]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
assert storage["content"] in {"seed\nA\nB\n", "seed\nB\nA\n"}

View File

@ -12,7 +12,7 @@ from unittest.mock import AsyncMock, MagicMock
import pytest
from app.channels.base import Channel
from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage
from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
from app.channels.store import ChannelStore
@ -1718,6 +1718,159 @@ class TestFeishuChannel:
_run(go())
class TestWeComChannel:
def test_publish_ws_inbound_starts_stream_and_publishes_message(self, monkeypatch):
from app.channels.wecom import WeComChannel
async def go():
bus = MessageBus()
bus.publish_inbound = AsyncMock()
channel = WeComChannel(bus, config={})
channel._ws_client = SimpleNamespace(reply_stream=AsyncMock())
monkeypatch.setitem(
__import__("sys").modules,
"aibot",
SimpleNamespace(generate_req_id=lambda prefix: "stream-1"),
)
frame = {
"body": {
"msgid": "msg-1",
"from": {"userid": "user-1"},
"aibotid": "bot-1",
"chattype": "single",
}
}
files = [{"type": "image", "url": "https://example.com/image.png"}]
await channel._publish_ws_inbound(frame, "hello", files=files)
channel._ws_client.reply_stream.assert_awaited_once_with(frame, "stream-1", "Working on it...", False)
bus.publish_inbound.assert_awaited_once()
inbound = bus.publish_inbound.await_args.args[0]
assert inbound.channel_name == "wecom"
assert inbound.chat_id == "user-1"
assert inbound.user_id == "user-1"
assert inbound.text == "hello"
assert inbound.thread_ts == "msg-1"
assert inbound.topic_id == "user-1"
assert inbound.files == files
assert inbound.metadata == {"aibotid": "bot-1", "chattype": "single"}
assert channel._ws_frames["msg-1"] is frame
assert channel._ws_stream_ids["msg-1"] == "stream-1"
_run(go())
def test_publish_ws_inbound_uses_configured_working_message(self, monkeypatch):
from app.channels.wecom import WeComChannel
async def go():
bus = MessageBus()
bus.publish_inbound = AsyncMock()
channel = WeComChannel(bus, config={"working_message": "Please wait..."})
channel._ws_client = SimpleNamespace(reply_stream=AsyncMock())
channel._working_message = "Please wait..."
monkeypatch.setitem(
__import__("sys").modules,
"aibot",
SimpleNamespace(generate_req_id=lambda prefix: "stream-1"),
)
frame = {
"body": {
"msgid": "msg-1",
"from": {"userid": "user-1"},
}
}
await channel._publish_ws_inbound(frame, "hello")
channel._ws_client.reply_stream.assert_awaited_once_with(frame, "stream-1", "Please wait...", False)
_run(go())
def test_on_outbound_sends_attachment_before_clearing_context(self, tmp_path):
from app.channels.wecom import WeComChannel
async def go():
bus = MessageBus()
channel = WeComChannel(bus, config={})
frame = {"body": {"msgid": "msg-1"}}
ws_client = SimpleNamespace(
reply_stream=AsyncMock(),
reply=AsyncMock(),
)
channel._ws_client = ws_client
channel._ws_frames["msg-1"] = frame
channel._ws_stream_ids["msg-1"] = "stream-1"
channel._upload_media_ws = AsyncMock(return_value="media-1")
attachment_path = tmp_path / "image.png"
attachment_path.write_bytes(b"png")
attachment = ResolvedAttachment(
virtual_path="/mnt/user-data/outputs/image.png",
actual_path=attachment_path,
filename="image.png",
mime_type="image/png",
size=attachment_path.stat().st_size,
is_image=True,
)
msg = OutboundMessage(
channel_name="wecom",
chat_id="user-1",
thread_id="thread-1",
text="done",
attachments=[attachment],
is_final=True,
thread_ts="msg-1",
)
await channel._on_outbound(msg)
ws_client.reply_stream.assert_awaited_once_with(frame, "stream-1", "done", True)
channel._upload_media_ws.assert_awaited_once_with(
media_type="image",
filename="image.png",
path=str(attachment_path),
size=attachment.size,
)
ws_client.reply.assert_awaited_once_with(frame, {"image": {"media_id": "media-1"}, "msgtype": "image"})
assert "msg-1" not in channel._ws_frames
assert "msg-1" not in channel._ws_stream_ids
_run(go())
def test_send_falls_back_to_send_message_without_thread_context(self):
from app.channels.wecom import WeComChannel
async def go():
bus = MessageBus()
channel = WeComChannel(bus, config={})
channel._ws_client = SimpleNamespace(send_message=AsyncMock())
msg = OutboundMessage(
channel_name="wecom",
chat_id="user-1",
thread_id="thread-1",
text="hello",
thread_ts=None,
)
await channel.send(msg)
channel._ws_client.send_message.assert_awaited_once_with(
"user-1",
{"msgtype": "markdown", "markdown": {"content": "hello"}},
)
_run(go())
class TestChannelService:
def test_get_status_no_channels(self):
from app.channels.service import ChannelService
@ -1854,6 +2007,20 @@ class TestSlackSendRetry:
_run(go())
def test_raises_runtime_error_when_no_attempts_configured(self):
from app.channels.slack import SlackChannel
async def go():
bus = MessageBus()
ch = SlackChannel(bus=bus, config={"bot_token": "xoxb-test", "app_token": "xapp-test"})
ch._web_client = MagicMock()
msg = OutboundMessage(channel_name="slack", chat_id="C123", thread_id="t1", text="hello")
with pytest.raises(RuntimeError, match="without an exception"):
await ch.send(msg, _max_retries=0)
_run(go())
# ---------------------------------------------------------------------------
# Telegram send retry tests
@ -1912,6 +2079,36 @@ class TestTelegramSendRetry:
_run(go())
def test_raises_runtime_error_when_no_attempts_configured(self):
from app.channels.telegram import TelegramChannel
async def go():
bus = MessageBus()
ch = TelegramChannel(bus=bus, config={"bot_token": "test-token"})
ch._application = MagicMock()
msg = OutboundMessage(channel_name="telegram", chat_id="12345", thread_id="t1", text="hello")
with pytest.raises(RuntimeError, match="without an exception"):
await ch.send(msg, _max_retries=0)
_run(go())
class TestFeishuSendRetry:
def test_raises_runtime_error_when_no_attempts_configured(self):
from app.channels.feishu import FeishuChannel
async def go():
bus = MessageBus()
ch = FeishuChannel(bus=bus, config={"app_id": "id", "app_secret": "secret"})
ch._api_client = MagicMock()
msg = OutboundMessage(channel_name="feishu", chat_id="chat", thread_id="t1", text="hello")
with pytest.raises(RuntimeError, match="without an exception"):
await ch.send(msg, _max_retries=0)
_run(go())
# ---------------------------------------------------------------------------
# Telegram private-chat thread context tests

View File

@ -59,18 +59,20 @@ class TestClientInit:
assert client._subagent_enabled is False
assert client._plan_mode is False
assert client._agent_name is None
assert client._available_skills is None
assert client._checkpointer is None
assert client._agent is None
def test_custom_params(self, mock_app_config):
mock_middleware = MagicMock()
with patch("deerflow.client.get_app_config", return_value=mock_app_config):
c = DeerFlowClient(model_name="gpt-4", thinking_enabled=False, subagent_enabled=True, plan_mode=True, agent_name="test-agent", middlewares=[mock_middleware])
c = DeerFlowClient(model_name="gpt-4", thinking_enabled=False, subagent_enabled=True, plan_mode=True, agent_name="test-agent", available_skills={"skill1", "skill2"}, middlewares=[mock_middleware])
assert c._model_name == "gpt-4"
assert c._thinking_enabled is False
assert c._subagent_enabled is True
assert c._plan_mode is True
assert c._agent_name == "test-agent"
assert c._available_skills == {"skill1", "skill2"}
assert c._middlewares == [mock_middleware]
def test_invalid_agent_name(self, mock_app_config):
@ -394,8 +396,10 @@ class TestEnsureAgent:
patch("deerflow.client._build_middlewares", return_value=[]) as mock_build_middlewares,
patch("deerflow.client.apply_prompt_template", return_value="prompt") as mock_apply_prompt,
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.agents.checkpointer.get_checkpointer", return_value=MagicMock()),
):
client._agent_name = "custom-agent"
client._available_skills = {"test_skill"}
client._ensure_agent(config)
assert client._agent is mock_agent
@ -404,6 +408,7 @@ class TestEnsureAgent:
assert mock_build_middlewares.call_args.kwargs.get("agent_name") == "custom-agent"
mock_apply_prompt.assert_called_once()
assert mock_apply_prompt.call_args.kwargs.get("agent_name") == "custom-agent"
assert mock_apply_prompt.call_args.kwargs.get("available_skills") == {"test_skill"}
def test_uses_default_checkpointer_when_available(self, client):
mock_agent = MagicMock()
@ -441,6 +446,7 @@ class TestEnsureAgent:
patch("deerflow.client._build_middlewares", side_effect=fake_build_middlewares),
patch("deerflow.client.apply_prompt_template", return_value="prompt"),
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.agents.checkpointer.get_checkpointer", return_value=MagicMock()),
):
client._ensure_agent(config)
@ -469,7 +475,7 @@ class TestEnsureAgent:
"""_ensure_agent does not recreate if config key unchanged."""
mock_agent = MagicMock()
client._agent = mock_agent
client._agent_config_key = (None, True, False, False)
client._agent_config_key = (None, True, False, False, None, None)
config = client._get_runnable_config("t1")
client._ensure_agent(config)
@ -1276,6 +1282,7 @@ class TestScenarioAgentRecreation:
patch("deerflow.client._build_middlewares", return_value=[]),
patch("deerflow.client.apply_prompt_template", return_value="prompt"),
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.agents.checkpointer.get_checkpointer", return_value=MagicMock()),
):
client._ensure_agent(config_a)
first_agent = client._agent
@ -1303,6 +1310,7 @@ class TestScenarioAgentRecreation:
patch("deerflow.client._build_middlewares", return_value=[]),
patch("deerflow.client.apply_prompt_template", return_value="prompt"),
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.agents.checkpointer.get_checkpointer", return_value=MagicMock()),
):
client._ensure_agent(config)
client._ensure_agent(config)
@ -1327,6 +1335,7 @@ class TestScenarioAgentRecreation:
patch("deerflow.client._build_middlewares", return_value=[]),
patch("deerflow.client.apply_prompt_template", return_value="prompt"),
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.agents.checkpointer.get_checkpointer", return_value=MagicMock()),
):
client._ensure_agent(config)
client.reset_agent()

View File

@ -164,6 +164,28 @@ class TestLoadAgentConfig:
assert cfg.tool_groups == ["file:read", "file:write"]
def test_load_config_with_skills_empty_list(self, tmp_path):
config_dict = {"name": "no-skills-agent", "skills": []}
_write_agent(tmp_path, "no-skills-agent", config_dict)
with patch("deerflow.config.agents_config.get_paths", return_value=_make_paths(tmp_path)):
from deerflow.config.agents_config import load_agent_config
cfg = load_agent_config("no-skills-agent")
assert cfg.skills == []
def test_load_config_with_skills_omitted(self, tmp_path):
config_dict = {"name": "default-skills-agent"}
_write_agent(tmp_path, "default-skills-agent", config_dict)
with patch("deerflow.config.agents_config.get_paths", return_value=_make_paths(tmp_path)):
from deerflow.config.agents_config import load_agent_config
cfg = load_agent_config("default-skills-agent")
assert cfg.skills is None
def test_legacy_prompt_file_field_ignored(self, tmp_path):
"""Unknown fields like the old prompt_file should be silently ignored."""
agent_dir = tmp_path / "agents" / "legacy-agent"

View File

@ -3,6 +3,7 @@ from unittest.mock import MagicMock
import pytest
from app.channels.commands import KNOWN_CHANNEL_COMMANDS
from app.channels.feishu import FeishuChannel
from app.channels.message_bus import MessageBus
@ -68,3 +69,57 @@ def test_feishu_on_message_rich_text():
assert "Paragraph 1, part 1. Paragraph 1, part 2." in parsed_text
assert "@bot Paragraph 2." in parsed_text
assert "\n\n" in parsed_text
@pytest.mark.parametrize("command", sorted(KNOWN_CHANNEL_COMMANDS))
def test_feishu_recognizes_all_known_slash_commands(command):
"""Every entry in KNOWN_CHANNEL_COMMANDS must be classified as a command."""
bus = MessageBus()
config = {"app_id": "test", "app_secret": "test"}
channel = FeishuChannel(bus, config)
event = MagicMock()
event.event.message.chat_id = "chat_1"
event.event.message.message_id = "msg_1"
event.event.message.root_id = None
event.event.sender.sender_id.open_id = "user_1"
event.event.message.content = json.dumps({"text": command})
with pytest.MonkeyPatch.context() as m:
mock_make_inbound = MagicMock()
m.setattr(channel, "_make_inbound", mock_make_inbound)
channel._on_message(event)
mock_make_inbound.assert_called_once()
assert mock_make_inbound.call_args[1]["msg_type"].value == "command", f"{command!r} should be classified as COMMAND"
@pytest.mark.parametrize(
"text",
[
"/unknown",
"/mnt/user-data/outputs/prd/technical-design.md",
"/etc/passwd",
"/not-a-command at all",
],
)
def test_feishu_treats_unknown_slash_text_as_chat(text):
"""Slash-prefixed text that is not a known command must be classified as CHAT."""
bus = MessageBus()
config = {"app_id": "test", "app_secret": "test"}
channel = FeishuChannel(bus, config)
event = MagicMock()
event.event.message.chat_id = "chat_1"
event.event.message.message_id = "msg_1"
event.event.message.root_id = None
event.event.sender.sender_id.open_id = "user_1"
event.event.message.content = json.dumps({"text": text})
with pytest.MonkeyPatch.context() as m:
mock_make_inbound = MagicMock()
m.setattr(channel, "_make_inbound", mock_make_inbound)
channel._on_message(event)
mock_make_inbound.assert_called_once()
assert mock_make_inbound.call_args[1]["msg_type"].value == "chat", f"{text!r} should be classified as CHAT"

View File

@ -0,0 +1,459 @@
"""Tests for file_conversion utilities (PR1: pymupdf4llm + asyncio.to_thread; PR2: extract_outline)."""
from __future__ import annotations
import asyncio
import sys
from types import ModuleType
from unittest.mock import MagicMock, patch
from deerflow.utils.file_conversion import (
_ASYNC_THRESHOLD_BYTES,
_MIN_CHARS_PER_PAGE,
MAX_OUTLINE_ENTRIES,
_do_convert,
_pymupdf_output_too_sparse,
convert_file_to_markdown,
extract_outline,
)
def _make_pymupdf_mock(page_count: int) -> ModuleType:
"""Return a fake *pymupdf* module whose ``open()`` reports *page_count* pages."""
mock_doc = MagicMock()
mock_doc.__len__ = MagicMock(return_value=page_count)
fake_pymupdf = ModuleType("pymupdf")
fake_pymupdf.open = MagicMock(return_value=mock_doc) # type: ignore[attr-defined]
return fake_pymupdf
def _run(coro):
loop = asyncio.new_event_loop()
try:
return loop.run_until_complete(coro)
finally:
loop.close()
# ---------------------------------------------------------------------------
# _pymupdf_output_too_sparse
# ---------------------------------------------------------------------------
class TestPymupdfOutputTooSparse:
"""Check the chars-per-page sparsity heuristic."""
def test_dense_text_pdf_not_sparse(self, tmp_path):
"""Normal text PDF: many chars per page → not sparse."""
pdf = tmp_path / "dense.pdf"
pdf.write_bytes(b"%PDF-1.4 fake")
# 10 pages × 10 000 chars → 1000/page ≫ threshold
with patch.dict(sys.modules, {"pymupdf": _make_pymupdf_mock(page_count=10)}):
result = _pymupdf_output_too_sparse("x" * 10_000, pdf)
assert result is False
def test_image_based_pdf_is_sparse(self, tmp_path):
"""Image-based PDF: near-zero chars per page → sparse."""
pdf = tmp_path / "image.pdf"
pdf.write_bytes(b"%PDF-1.4 fake")
# 612 chars / 31 pages ≈ 19.7/page < _MIN_CHARS_PER_PAGE (50)
with patch.dict(sys.modules, {"pymupdf": _make_pymupdf_mock(page_count=31)}):
result = _pymupdf_output_too_sparse("x" * 612, pdf)
assert result is True
def test_fallback_when_pymupdf_unavailable(self, tmp_path):
"""When pymupdf is not installed, fall back to absolute 200-char threshold."""
pdf = tmp_path / "broken.pdf"
pdf.write_bytes(b"%PDF-1.4 fake")
# Remove pymupdf from sys.modules so the `import pymupdf` inside the
# function raises ImportError, triggering the absolute-threshold fallback.
with patch.dict(sys.modules, {"pymupdf": None}):
sparse = _pymupdf_output_too_sparse("x" * 100, pdf)
not_sparse = _pymupdf_output_too_sparse("x" * 300, pdf)
assert sparse is True
assert not_sparse is False
def test_exactly_at_threshold_is_not_sparse(self, tmp_path):
"""Chars-per-page == threshold is treated as NOT sparse (boundary inclusive)."""
pdf = tmp_path / "boundary.pdf"
pdf.write_bytes(b"%PDF-1.4 fake")
# 2 pages × _MIN_CHARS_PER_PAGE chars = exactly at threshold
with patch.dict(sys.modules, {"pymupdf": _make_pymupdf_mock(page_count=2)}):
result = _pymupdf_output_too_sparse("x" * (_MIN_CHARS_PER_PAGE * 2), pdf)
assert result is False
# ---------------------------------------------------------------------------
# _do_convert — routing logic
# ---------------------------------------------------------------------------
class TestDoConvert:
"""Verify that _do_convert routes to the right sub-converter."""
def test_non_pdf_always_uses_markitdown(self, tmp_path):
"""DOCX / XLSX / PPTX always go through MarkItDown regardless of setting."""
docx = tmp_path / "report.docx"
docx.write_bytes(b"PK fake docx")
with patch(
"deerflow.utils.file_conversion._convert_with_markitdown",
return_value="# Markdown from MarkItDown",
) as mock_md:
result = _do_convert(docx, "auto")
mock_md.assert_called_once_with(docx)
assert result == "# Markdown from MarkItDown"
def test_pdf_auto_uses_pymupdf4llm_when_dense(self, tmp_path):
"""auto mode: use pymupdf4llm output when it's dense enough."""
pdf = tmp_path / "report.pdf"
pdf.write_bytes(b"%PDF-1.4 fake")
dense_text = "# Heading\n" + "word " * 2000 # clearly dense
with (
patch(
"deerflow.utils.file_conversion._convert_pdf_with_pymupdf4llm",
return_value=dense_text,
),
patch(
"deerflow.utils.file_conversion._pymupdf_output_too_sparse",
return_value=False,
),
patch("deerflow.utils.file_conversion._convert_with_markitdown") as mock_md,
):
result = _do_convert(pdf, "auto")
mock_md.assert_not_called()
assert result == dense_text
def test_pdf_auto_falls_back_when_sparse(self, tmp_path):
"""auto mode: fall back to MarkItDown when pymupdf4llm output is sparse."""
pdf = tmp_path / "scanned.pdf"
pdf.write_bytes(b"%PDF-1.4 fake")
with (
patch(
"deerflow.utils.file_conversion._convert_pdf_with_pymupdf4llm",
return_value="x" * 612, # 19.7 chars/page for 31-page doc
),
patch(
"deerflow.utils.file_conversion._pymupdf_output_too_sparse",
return_value=True,
),
patch(
"deerflow.utils.file_conversion._convert_with_markitdown",
return_value="OCR result via MarkItDown",
) as mock_md,
):
result = _do_convert(pdf, "auto")
mock_md.assert_called_once_with(pdf)
assert result == "OCR result via MarkItDown"
def test_pdf_explicit_pymupdf4llm_skips_sparsity_check(self, tmp_path):
"""'pymupdf4llm' mode: use output as-is even if sparse."""
pdf = tmp_path / "explicit.pdf"
pdf.write_bytes(b"%PDF-1.4 fake")
sparse_text = "x" * 10 # very short
with (
patch(
"deerflow.utils.file_conversion._convert_pdf_with_pymupdf4llm",
return_value=sparse_text,
),
patch("deerflow.utils.file_conversion._convert_with_markitdown") as mock_md,
):
result = _do_convert(pdf, "pymupdf4llm")
mock_md.assert_not_called()
assert result == sparse_text
def test_pdf_explicit_markitdown_skips_pymupdf4llm(self, tmp_path):
"""'markitdown' mode: never attempt pymupdf4llm."""
pdf = tmp_path / "force_md.pdf"
pdf.write_bytes(b"%PDF-1.4 fake")
with (
patch("deerflow.utils.file_conversion._convert_pdf_with_pymupdf4llm") as mock_pymu,
patch(
"deerflow.utils.file_conversion._convert_with_markitdown",
return_value="MarkItDown result",
),
):
result = _do_convert(pdf, "markitdown")
mock_pymu.assert_not_called()
assert result == "MarkItDown result"
def test_pdf_auto_falls_back_when_pymupdf4llm_not_installed(self, tmp_path):
"""auto mode: if pymupdf4llm is not installed, use MarkItDown directly."""
pdf = tmp_path / "no_pymupdf.pdf"
pdf.write_bytes(b"%PDF-1.4 fake")
with (
patch(
"deerflow.utils.file_conversion._convert_pdf_with_pymupdf4llm",
return_value=None, # None signals not installed
),
patch(
"deerflow.utils.file_conversion._convert_with_markitdown",
return_value="MarkItDown fallback",
) as mock_md,
):
result = _do_convert(pdf, "auto")
mock_md.assert_called_once_with(pdf)
assert result == "MarkItDown fallback"
# ---------------------------------------------------------------------------
# convert_file_to_markdown — async + file writing
# ---------------------------------------------------------------------------
class TestConvertFileToMarkdown:
def test_small_file_runs_synchronously(self, tmp_path):
"""Small files (< 1 MB) are converted in the event loop thread."""
pdf = tmp_path / "small.pdf"
pdf.write_bytes(b"%PDF-1.4 " + b"x" * 100) # well under 1 MB
with (
patch("deerflow.utils.file_conversion._get_pdf_converter", return_value="auto"),
patch(
"deerflow.utils.file_conversion._do_convert",
return_value="# Small PDF",
) as mock_convert,
patch("asyncio.to_thread") as mock_thread,
):
md_path = _run(convert_file_to_markdown(pdf))
# asyncio.to_thread must NOT have been called
mock_thread.assert_not_called()
mock_convert.assert_called_once()
assert md_path == pdf.with_suffix(".md")
assert md_path.read_text() == "# Small PDF"
def test_large_file_offloaded_to_thread(self, tmp_path):
"""Large files (> 1 MB) are offloaded via asyncio.to_thread."""
pdf = tmp_path / "large.pdf"
# Write slightly more than the threshold
pdf.write_bytes(b"%PDF-1.4 " + b"x" * (_ASYNC_THRESHOLD_BYTES + 1))
async def fake_to_thread(fn, *args, **kwargs):
return fn(*args, **kwargs)
with (
patch("deerflow.utils.file_conversion._get_pdf_converter", return_value="auto"),
patch(
"deerflow.utils.file_conversion._do_convert",
return_value="# Large PDF",
),
patch("asyncio.to_thread", side_effect=fake_to_thread) as mock_thread,
):
md_path = _run(convert_file_to_markdown(pdf))
mock_thread.assert_called_once()
assert md_path == pdf.with_suffix(".md")
assert md_path.read_text() == "# Large PDF"
def test_returns_none_on_conversion_error(self, tmp_path):
"""If conversion raises, return None without propagating the exception."""
pdf = tmp_path / "broken.pdf"
pdf.write_bytes(b"%PDF-1.4 fake")
with (
patch("deerflow.utils.file_conversion._get_pdf_converter", return_value="auto"),
patch(
"deerflow.utils.file_conversion._do_convert",
side_effect=RuntimeError("conversion failed"),
),
):
result = _run(convert_file_to_markdown(pdf))
assert result is None
def test_writes_utf8_markdown_file(self, tmp_path):
"""Generated .md file is written with UTF-8 encoding."""
pdf = tmp_path / "report.pdf"
pdf.write_bytes(b"%PDF-1.4 fake")
chinese_content = "# 中文报告\n\n这是测试内容。"
with (
patch("deerflow.utils.file_conversion._get_pdf_converter", return_value="auto"),
patch(
"deerflow.utils.file_conversion._do_convert",
return_value=chinese_content,
),
):
md_path = _run(convert_file_to_markdown(pdf))
assert md_path is not None
assert md_path.read_text(encoding="utf-8") == chinese_content
# ---------------------------------------------------------------------------
# extract_outline
# ---------------------------------------------------------------------------
class TestExtractOutline:
"""Tests for extract_outline()."""
def test_empty_file_returns_empty(self, tmp_path):
"""Empty markdown file yields no outline entries."""
md = tmp_path / "empty.md"
md.write_text("", encoding="utf-8")
assert extract_outline(md) == []
def test_missing_file_returns_empty(self, tmp_path):
"""Non-existent path returns [] without raising."""
assert extract_outline(tmp_path / "nonexistent.md") == []
def test_standard_markdown_headings(self, tmp_path):
"""# / ## / ### headings are all recognised."""
md = tmp_path / "doc.md"
md.write_text(
"# Chapter One\n\nSome text.\n\n## Section 1.1\n\nMore text.\n\n### Sub 1.1.1\n",
encoding="utf-8",
)
outline = extract_outline(md)
assert len(outline) == 3
assert outline[0] == {"title": "Chapter One", "line": 1}
assert outline[1] == {"title": "Section 1.1", "line": 5}
assert outline[2] == {"title": "Sub 1.1.1", "line": 9}
def test_bold_sec_item_heading(self, tmp_path):
"""**ITEM N. TITLE** lines in SEC filings are recognised."""
md = tmp_path / "10k.md"
md.write_text(
"Cover page text.\n\n**ITEM 1. BUSINESS**\n\nBody.\n\n**ITEM 1A. RISK FACTORS**\n",
encoding="utf-8",
)
outline = extract_outline(md)
assert len(outline) == 2
assert outline[0] == {"title": "ITEM 1. BUSINESS", "line": 3}
assert outline[1] == {"title": "ITEM 1A. RISK FACTORS", "line": 7}
def test_bold_part_heading(self, tmp_path):
"""**PART I** / **PART II** headings are recognised."""
md = tmp_path / "10k.md"
md.write_text("**PART I**\n\n**PART II**\n\n**PART III**\n", encoding="utf-8")
outline = extract_outline(md)
assert len(outline) == 3
titles = [e["title"] for e in outline]
assert "PART I" in titles
assert "PART II" in titles
assert "PART III" in titles
def test_sec_cover_page_boilerplate_excluded(self, tmp_path):
"""Address lines and short cover boilerplate must NOT appear in outline."""
md = tmp_path / "8k.md"
md.write_text(
"## **UNITED STATES SECURITIES AND EXCHANGE COMMISSION**\n\n**WASHINGTON, DC 20549**\n\n**CURRENT REPORT**\n\n**SIGNATURES**\n\n**TESLA, INC.**\n\n**ITEM 2.02. RESULTS OF OPERATIONS**\n",
encoding="utf-8",
)
outline = extract_outline(md)
titles = [e["title"] for e in outline]
# Cover-page boilerplate should be excluded
assert "WASHINGTON, DC 20549" not in titles
assert "CURRENT REPORT" not in titles
assert "SIGNATURES" not in titles
assert "TESLA, INC." not in titles
# Real SEC heading must be included
assert "ITEM 2.02. RESULTS OF OPERATIONS" in titles
def test_chinese_headings_via_standard_markdown(self, tmp_path):
"""Chinese annual report headings emitted as # by pymupdf4llm are captured."""
md = tmp_path / "annual.md"
md.write_text(
"# 第一节 公司简介\n\n内容。\n\n## 第三节 管理层讨论与分析\n\n分析内容。\n",
encoding="utf-8",
)
outline = extract_outline(md)
assert len(outline) == 2
assert outline[0]["title"] == "第一节 公司简介"
assert outline[1]["title"] == "第三节 管理层讨论与分析"
def test_outline_capped_at_max_entries(self, tmp_path):
"""When truncated, result has MAX_OUTLINE_ENTRIES real entries + 1 sentinel."""
lines = [f"# Heading {i}" for i in range(MAX_OUTLINE_ENTRIES + 10)]
md = tmp_path / "long.md"
md.write_text("\n".join(lines), encoding="utf-8")
outline = extract_outline(md)
# Last entry is the truncation sentinel
assert outline[-1] == {"truncated": True}
# Visible entries are exactly MAX_OUTLINE_ENTRIES
visible = [e for e in outline if not e.get("truncated")]
assert len(visible) == MAX_OUTLINE_ENTRIES
def test_no_truncation_sentinel_when_under_limit(self, tmp_path):
"""Short documents produce no sentinel entry."""
lines = [f"# Heading {i}" for i in range(5)]
md = tmp_path / "short.md"
md.write_text("\n".join(lines), encoding="utf-8")
outline = extract_outline(md)
assert len(outline) == 5
assert not any(e.get("truncated") for e in outline)
def test_blank_lines_and_whitespace_ignored(self, tmp_path):
"""Blank lines between headings do not produce empty entries."""
md = tmp_path / "spaced.md"
md.write_text("\n\n# Title One\n\n\n\n# Title Two\n\n", encoding="utf-8")
outline = extract_outline(md)
assert len(outline) == 2
assert all(e["title"] for e in outline)
def test_inline_bold_not_confused_with_heading(self, tmp_path):
"""Mid-sentence bold text must not be mistaken for a heading."""
md = tmp_path / "prose.md"
md.write_text(
"This sentence has **bold words** inside it.\n\nAnother with **MULTIPLE CAPS** inline.\n",
encoding="utf-8",
)
outline = extract_outline(md)
assert outline == []
def test_split_bold_heading_academic_paper(self, tmp_path):
"""**<num>** **<title>** lines from academic papers are recognised (Style 3)."""
md = tmp_path / "paper.md"
md.write_text(
"## **Attention Is All You Need**\n\n**1** **Introduction**\n\nBody text.\n\n**2** **Background**\n\nMore text.\n\n**3.1** **Encoder and Decoder Stacks**\n",
encoding="utf-8",
)
outline = extract_outline(md)
titles = [e["title"] for e in outline]
assert "1 Introduction" in titles
assert "2 Background" in titles
assert "3.1 Encoder and Decoder Stacks" in titles
def test_split_bold_year_columns_excluded(self, tmp_path):
"""Financial table headers like **2023** **2022** **2021** are NOT headings."""
md = tmp_path / "annual.md"
md.write_text(
"# Financial Summary\n\n**2023** **2022** **2021**\n\nRevenue 100 90 80\n",
encoding="utf-8",
)
outline = extract_outline(md)
titles = [e["title"] for e in outline]
# Only the # heading should appear, not the year-column row
assert titles == ["Financial Summary"]
def test_adjacent_bold_spans_merged_in_markdown_heading(self, tmp_path):
"""** ** artefacts inside a # heading are merged into clean plain text."""
md = tmp_path / "sec.md"
md.write_text(
"## **UNITED STATES** **SECURITIES AND EXCHANGE COMMISSION**\n\nBody text.\n",
encoding="utf-8",
)
outline = extract_outline(md)
assert len(outline) == 1
# Title must be clean — no ** ** artefacts
assert outline[0]["title"] == "UNITED STATES SECURITIES AND EXCHANGE COMMISSION"

View File

@ -109,17 +109,11 @@ def test_build_run_config_with_overrides():
def test_build_run_config_custom_agent_injects_agent_name():
"""Custom assistant_id must be forwarded as configurable['agent_name'].
Regression test for #1644: when the LangGraph Platform-compatible
/runs endpoint receives a custom assistant_id (e.g. 'finalis'), the
Gateway must inject configurable['agent_name'] so that make_lead_agent
loads the correct agents/finalis/SOUL.md.
"""
"""Custom assistant_id must be forwarded as configurable['agent_name']."""
from app.gateway.services import build_run_config
config = build_run_config("thread-1", None, None, assistant_id="finalis")
assert config["configurable"]["agent_name"] == "finalis", "Custom assistant_id must be forwarded as configurable['agent_name'] so that make_lead_agent loads the correct SOUL.md"
assert config["configurable"]["agent_name"] == "finalis"
def test_build_run_config_lead_agent_no_agent_name():
@ -148,7 +142,7 @@ def test_build_run_config_explicit_agent_name_not_overwritten():
None,
assistant_id="other-agent",
)
assert config["configurable"]["agent_name"] == "explicit-agent", "An explicit configurable['agent_name'] in the request body must not be overwritten by the assistant_id mapping"
assert config["configurable"]["agent_name"] == "explicit-agent"
def test_resolve_agent_factory_returns_make_lead_agent():
@ -160,3 +154,189 @@ def test_resolve_agent_factory_returns_make_lead_agent():
assert resolve_agent_factory("lead_agent") is make_lead_agent
assert resolve_agent_factory("finalis") is make_lead_agent
assert resolve_agent_factory("custom-agent-123") is make_lead_agent
# ---------------------------------------------------------------------------
# ---------------------------------------------------------------------------
# Regression tests for issue #1699:
# context field in langgraph-compat requests not merged into configurable
# ---------------------------------------------------------------------------
def test_run_create_request_accepts_context():
"""RunCreateRequest must accept the ``context`` field without dropping it."""
from app.gateway.routers.thread_runs import RunCreateRequest
body = RunCreateRequest(
input={"messages": [{"role": "user", "content": "hi"}]},
context={
"model_name": "deepseek-v3",
"thinking_enabled": True,
"is_plan_mode": True,
"subagent_enabled": True,
"thread_id": "some-thread-id",
},
)
assert body.context is not None
assert body.context["model_name"] == "deepseek-v3"
assert body.context["is_plan_mode"] is True
assert body.context["subagent_enabled"] is True
def test_run_create_request_context_defaults_to_none():
"""RunCreateRequest without context should default to None (backward compat)."""
from app.gateway.routers.thread_runs import RunCreateRequest
body = RunCreateRequest(input=None)
assert body.context is None
def test_context_merges_into_configurable():
"""Context values must be merged into config['configurable'] by start_run.
Since start_run is async and requires many dependencies, we test the
merging logic directly by simulating what start_run does.
"""
from app.gateway.services import build_run_config
# Simulate the context merging logic from start_run
config = build_run_config("thread-1", None, None)
context = {
"model_name": "deepseek-v3",
"mode": "ultra",
"reasoning_effort": "high",
"thinking_enabled": True,
"is_plan_mode": True,
"subagent_enabled": True,
"max_concurrent_subagents": 5,
"thread_id": "should-be-ignored",
}
_CONTEXT_CONFIGURABLE_KEYS = {
"model_name",
"mode",
"thinking_enabled",
"reasoning_effort",
"is_plan_mode",
"subagent_enabled",
"max_concurrent_subagents",
}
configurable = config.setdefault("configurable", {})
for key in _CONTEXT_CONFIGURABLE_KEYS:
if key in context:
configurable.setdefault(key, context[key])
assert config["configurable"]["model_name"] == "deepseek-v3"
assert config["configurable"]["thinking_enabled"] is True
assert config["configurable"]["is_plan_mode"] is True
assert config["configurable"]["subagent_enabled"] is True
assert config["configurable"]["max_concurrent_subagents"] == 5
assert config["configurable"]["reasoning_effort"] == "high"
assert config["configurable"]["mode"] == "ultra"
# thread_id from context should NOT override the one from build_run_config
assert config["configurable"]["thread_id"] == "thread-1"
# Non-allowlisted keys should not appear
assert "thread_id" not in {k for k in context if k in _CONTEXT_CONFIGURABLE_KEYS}
def test_context_does_not_override_existing_configurable():
"""Values already in config.configurable must NOT be overridden by context."""
from app.gateway.services import build_run_config
config = build_run_config(
"thread-1",
{"configurable": {"model_name": "gpt-4", "is_plan_mode": False}},
None,
)
context = {
"model_name": "deepseek-v3",
"is_plan_mode": True,
"subagent_enabled": True,
}
_CONTEXT_CONFIGURABLE_KEYS = {
"model_name",
"mode",
"thinking_enabled",
"reasoning_effort",
"is_plan_mode",
"subagent_enabled",
"max_concurrent_subagents",
}
configurable = config.setdefault("configurable", {})
for key in _CONTEXT_CONFIGURABLE_KEYS:
if key in context:
configurable.setdefault(key, context[key])
# Existing values must NOT be overridden
assert config["configurable"]["model_name"] == "gpt-4"
assert config["configurable"]["is_plan_mode"] is False
# New values should be added
assert config["configurable"]["subagent_enabled"] is True
# ---------------------------------------------------------------------------
# build_run_config — context / configurable precedence (LangGraph >= 0.6.0)
# ---------------------------------------------------------------------------
def test_build_run_config_with_context():
"""When caller sends 'context', prefer it over 'configurable'."""
from app.gateway.services import build_run_config
config = build_run_config(
"thread-1",
{"context": {"user_id": "u-42", "thread_id": "thread-1"}},
None,
)
assert "context" in config
assert config["context"]["user_id"] == "u-42"
assert "configurable" not in config
assert config["recursion_limit"] == 100
def test_build_run_config_context_plus_configurable_warns(caplog):
"""When caller sends both 'context' and 'configurable', prefer 'context' and log a warning."""
import logging
from app.gateway.services import build_run_config
with caplog.at_level(logging.WARNING, logger="app.gateway.services"):
config = build_run_config(
"thread-1",
{
"context": {"user_id": "u-42"},
"configurable": {"model_name": "gpt-4"},
},
None,
)
assert "context" in config
assert config["context"]["user_id"] == "u-42"
assert "configurable" not in config
assert any("both 'context' and 'configurable'" in r.message for r in caplog.records)
def test_build_run_config_context_passthrough_other_keys():
"""Non-conflicting keys from request_config are still passed through when context is used."""
from app.gateway.services import build_run_config
config = build_run_config(
"thread-1",
{"context": {"thread_id": "thread-1"}, "tags": ["prod"]},
None,
)
assert config["context"]["thread_id"] == "thread-1"
assert "configurable" not in config
assert config["tags"] == ["prod"]
def test_build_run_config_no_request_config():
"""When request_config is None, fall back to basic configurable with thread_id."""
from app.gateway.services import build_run_config
config = build_run_config("thread-abc", None, None)
assert config["configurable"] == {"thread_id": "thread-abc"}
assert "context" not in config

View File

@ -8,6 +8,7 @@ import pytest
from deerflow.config.acp_config import ACPAgentConfig
from deerflow.config.extensions_config import ExtensionsConfig, McpServerConfig, set_extensions_config
from deerflow.tools.builtins.invoke_acp_agent_tool import (
_build_acp_mcp_servers,
_build_mcp_servers,
_build_permission_response,
_get_work_dir,
@ -42,6 +43,43 @@ def test_build_mcp_servers_filters_disabled_and_maps_transports():
set_extensions_config(ExtensionsConfig(mcp_servers={}, skills={}))
def test_build_acp_mcp_servers_formats_list_payload():
set_extensions_config(ExtensionsConfig(mcp_servers={"stale": McpServerConfig(enabled=True, type="stdio", command="echo")}, skills={}))
fresh_config = ExtensionsConfig(
mcp_servers={
"stdio": McpServerConfig(enabled=True, type="stdio", command="npx", args=["srv"], env={"FOO": "bar"}),
"http": McpServerConfig(enabled=True, type="http", url="https://example.com/mcp", headers={"Authorization": "Bearer token"}),
"disabled": McpServerConfig(enabled=False, type="stdio", command="echo"),
},
skills={},
)
monkeypatch = pytest.MonkeyPatch()
monkeypatch.setattr(
"deerflow.config.extensions_config.ExtensionsConfig.from_file",
classmethod(lambda cls: fresh_config),
)
try:
assert _build_acp_mcp_servers() == [
{
"name": "stdio",
"type": "stdio",
"command": "npx",
"args": ["srv"],
"env": [{"name": "FOO", "value": "bar"}],
},
{
"name": "http",
"type": "http",
"url": "https://example.com/mcp",
"headers": [{"name": "Authorization", "value": "Bearer token"}],
},
]
finally:
monkeypatch.undo()
set_extensions_config(ExtensionsConfig(mcp_servers={}, skills={}))
def test_build_permission_response_prefers_allow_once():
response = _build_permission_response(
[
@ -251,9 +289,15 @@ async def test_invoke_acp_agent_uses_fixed_acp_workspace(monkeypatch, tmp_path):
assert captured["spawn"] == {"cmd": "codex-acp", "args": ["--json"], "cwd": expected_cwd}
assert captured["new_session"] == {
"cwd": expected_cwd,
"mcp_servers": {
"github": {"transport": "stdio", "command": "npx", "args": ["github-mcp"]},
},
"mcp_servers": [
{
"name": "github",
"type": "stdio",
"command": "npx",
"args": ["github-mcp"],
"env": [],
}
],
"model": "gpt-5-codex",
}
assert captured["prompt"] == {
@ -448,6 +492,94 @@ async def test_invoke_acp_agent_passes_env_to_spawn(monkeypatch, tmp_path):
assert captured["env"] == {"OPENAI_API_KEY": "sk-from-env", "FOO": "bar"}
@pytest.mark.anyio
async def test_invoke_acp_agent_skips_invalid_mcp_servers(monkeypatch, tmp_path, caplog):
"""Invalid MCP config should be logged and skipped instead of failing ACP invocation."""
from deerflow.config import paths as paths_module
monkeypatch.setattr(paths_module, "get_paths", lambda: paths_module.Paths(base_dir=tmp_path))
monkeypatch.setattr(
"deerflow.tools.builtins.invoke_acp_agent_tool._build_acp_mcp_servers",
lambda: (_ for _ in ()).throw(ValueError("missing command")),
)
captured: dict[str, object] = {}
class DummyClient:
def __init__(self) -> None:
self._chunks: list[str] = []
@property
def collected_text(self) -> str:
return ""
async def session_update(self, session_id, update, **kwargs):
pass
async def request_permission(self, options, session_id, tool_call, **kwargs):
raise AssertionError("should not be called")
class DummyConn:
async def initialize(self, **kwargs):
pass
async def new_session(self, **kwargs):
captured["new_session"] = kwargs
return SimpleNamespace(session_id="s1")
async def prompt(self, **kwargs):
pass
class DummyProcessContext:
def __init__(self, client, cmd, *args, env=None, cwd=None):
captured["spawn"] = {"cmd": cmd, "args": list(args), "env": env, "cwd": cwd}
async def __aenter__(self):
return DummyConn(), object()
async def __aexit__(self, exc_type, exc, tb):
return False
class DummyRequestError(Exception):
@staticmethod
def method_not_found(method):
return DummyRequestError(method)
monkeypatch.setitem(
sys.modules,
"acp",
SimpleNamespace(
PROTOCOL_VERSION="2026-03-24",
Client=DummyClient,
RequestError=DummyRequestError,
spawn_agent_process=lambda client, cmd, *args, env=None, cwd: DummyProcessContext(client, cmd, *args, env=env, cwd=cwd),
text_block=lambda text: {"type": "text", "text": text},
),
)
monkeypatch.setitem(
sys.modules,
"acp.schema",
SimpleNamespace(
ClientCapabilities=lambda: {},
Implementation=lambda **kwargs: kwargs,
TextContentBlock=type("TextContentBlock", (), {"__init__": lambda self, text: setattr(self, "text", text)}),
),
)
tool = build_invoke_acp_agent_tool({"codex": ACPAgentConfig(command="codex-acp", description="Codex CLI")})
caplog.set_level("WARNING")
try:
await tool.coroutine(agent="codex", prompt="Do something")
finally:
sys.modules.pop("acp", None)
sys.modules.pop("acp.schema", None)
assert captured["new_session"]["mcp_servers"] == []
assert "continuing without MCP servers" in caplog.text
assert "missing command" in caplog.text
@pytest.mark.anyio
async def test_invoke_acp_agent_passes_none_env_when_not_configured(monkeypatch, tmp_path):
"""When env is empty, None is passed to spawn_agent_process (subprocess inherits parent env)."""

View File

@ -0,0 +1,177 @@
"""Tests for JinaClient async crawl method."""
import logging
from unittest.mock import MagicMock
import httpx
import pytest
import deerflow.community.jina_ai.jina_client as jina_client_module
from deerflow.community.jina_ai.jina_client import JinaClient
from deerflow.community.jina_ai.tools import web_fetch_tool
@pytest.fixture
def jina_client():
return JinaClient()
@pytest.mark.anyio
async def test_crawl_success(jina_client, monkeypatch):
"""Test successful crawl returns response text."""
async def mock_post(self, url, **kwargs):
return httpx.Response(200, text="<html><body>Hello</body></html>", request=httpx.Request("POST", url))
monkeypatch.setattr(httpx.AsyncClient, "post", mock_post)
result = await jina_client.crawl("https://example.com")
assert result == "<html><body>Hello</body></html>"
@pytest.mark.anyio
async def test_crawl_non_200_status(jina_client, monkeypatch):
"""Test that non-200 status returns error message."""
async def mock_post(self, url, **kwargs):
return httpx.Response(429, text="Rate limited", request=httpx.Request("POST", url))
monkeypatch.setattr(httpx.AsyncClient, "post", mock_post)
result = await jina_client.crawl("https://example.com")
assert result.startswith("Error:")
assert "429" in result
@pytest.mark.anyio
async def test_crawl_empty_response(jina_client, monkeypatch):
"""Test that empty response returns error message."""
async def mock_post(self, url, **kwargs):
return httpx.Response(200, text="", request=httpx.Request("POST", url))
monkeypatch.setattr(httpx.AsyncClient, "post", mock_post)
result = await jina_client.crawl("https://example.com")
assert result.startswith("Error:")
assert "empty" in result.lower()
@pytest.mark.anyio
async def test_crawl_whitespace_only_response(jina_client, monkeypatch):
"""Test that whitespace-only response returns error message."""
async def mock_post(self, url, **kwargs):
return httpx.Response(200, text=" \n ", request=httpx.Request("POST", url))
monkeypatch.setattr(httpx.AsyncClient, "post", mock_post)
result = await jina_client.crawl("https://example.com")
assert result.startswith("Error:")
assert "empty" in result.lower()
@pytest.mark.anyio
async def test_crawl_network_error(jina_client, monkeypatch):
"""Test that network errors are handled gracefully."""
async def mock_post(self, url, **kwargs):
raise httpx.ConnectError("Connection refused")
monkeypatch.setattr(httpx.AsyncClient, "post", mock_post)
result = await jina_client.crawl("https://example.com")
assert result.startswith("Error:")
assert "failed" in result.lower()
@pytest.mark.anyio
async def test_crawl_passes_headers(jina_client, monkeypatch):
"""Test that correct headers are sent."""
captured_headers = {}
async def mock_post(self, url, **kwargs):
captured_headers.update(kwargs.get("headers", {}))
return httpx.Response(200, text="ok", request=httpx.Request("POST", url))
monkeypatch.setattr(httpx.AsyncClient, "post", mock_post)
await jina_client.crawl("https://example.com", return_format="markdown", timeout=30)
assert captured_headers["X-Return-Format"] == "markdown"
assert captured_headers["X-Timeout"] == "30"
@pytest.mark.anyio
async def test_crawl_includes_api_key_when_set(jina_client, monkeypatch):
"""Test that Authorization header is set when JINA_API_KEY is available."""
captured_headers = {}
async def mock_post(self, url, **kwargs):
captured_headers.update(kwargs.get("headers", {}))
return httpx.Response(200, text="ok", request=httpx.Request("POST", url))
monkeypatch.setattr(httpx.AsyncClient, "post", mock_post)
monkeypatch.setenv("JINA_API_KEY", "test-key-123")
await jina_client.crawl("https://example.com")
assert captured_headers["Authorization"] == "Bearer test-key-123"
@pytest.mark.anyio
async def test_crawl_warns_once_when_api_key_missing(jina_client, monkeypatch, caplog):
"""Test that the missing API key warning is logged only once."""
jina_client_module._api_key_warned = False
async def mock_post(self, url, **kwargs):
return httpx.Response(200, text="ok", request=httpx.Request("POST", url))
monkeypatch.setattr(httpx.AsyncClient, "post", mock_post)
monkeypatch.delenv("JINA_API_KEY", raising=False)
with caplog.at_level(logging.WARNING, logger="deerflow.community.jina_ai.jina_client"):
await jina_client.crawl("https://example.com")
await jina_client.crawl("https://example.com")
warning_count = sum(1 for record in caplog.records if "Jina API key is not set" in record.message)
assert warning_count == 1
@pytest.mark.anyio
async def test_crawl_no_auth_header_without_api_key(jina_client, monkeypatch):
"""Test that no Authorization header is set when JINA_API_KEY is not available."""
jina_client_module._api_key_warned = False
captured_headers = {}
async def mock_post(self, url, **kwargs):
captured_headers.update(kwargs.get("headers", {}))
return httpx.Response(200, text="ok", request=httpx.Request("POST", url))
monkeypatch.setattr(httpx.AsyncClient, "post", mock_post)
monkeypatch.delenv("JINA_API_KEY", raising=False)
await jina_client.crawl("https://example.com")
assert "Authorization" not in captured_headers
@pytest.mark.anyio
async def test_web_fetch_tool_returns_error_on_crawl_failure(monkeypatch):
"""Test that web_fetch_tool short-circuits and returns the error string when crawl fails."""
async def mock_crawl(self, url, **kwargs):
return "Error: Jina API returned status 429: Rate limited"
mock_config = MagicMock()
mock_config.get_tool_config.return_value = None
monkeypatch.setattr("deerflow.community.jina_ai.tools.get_app_config", lambda: mock_config)
monkeypatch.setattr(JinaClient, "crawl", mock_crawl)
result = await web_fetch_tool.ainvoke("https://example.com")
assert result.startswith("Error:")
assert "429" in result
@pytest.mark.anyio
async def test_web_fetch_tool_returns_markdown_on_success(monkeypatch):
"""Test that web_fetch_tool returns extracted markdown on successful crawl."""
async def mock_crawl(self, url, **kwargs):
return "<html><body><p>Hello world</p></body></html>"
mock_config = MagicMock()
mock_config.get_tool_config.return_value = None
monkeypatch.setattr("deerflow.community.jina_ai.tools.get_app_config", lambda: mock_config)
monkeypatch.setattr(JinaClient, "crawl", mock_crawl)
result = await web_fetch_tool.ainvoke("https://example.com")
assert "Hello world" in result
assert not result.startswith("Error:")

View File

@ -0,0 +1,96 @@
from pathlib import Path
from deerflow.agents.lead_agent.prompt import get_skills_prompt_section
from deerflow.config.agents_config import AgentConfig
from deerflow.skills.types import Skill
def _make_skill(name: str) -> Skill:
return Skill(
name=name,
description=f"Description for {name}",
license="MIT",
skill_dir=Path(f"/tmp/{name}"),
skill_file=Path(f"/tmp/{name}/SKILL.md"),
relative_path=Path(name),
category="public",
enabled=True,
)
def test_get_skills_prompt_section_returns_empty_when_no_skills_match(monkeypatch):
skills = [_make_skill("skill1"), _make_skill("skill2")]
monkeypatch.setattr("deerflow.agents.lead_agent.prompt.load_skills", lambda enabled_only: skills)
result = get_skills_prompt_section(available_skills={"non_existent_skill"})
assert result == ""
def test_get_skills_prompt_section_returns_empty_when_available_skills_empty(monkeypatch):
skills = [_make_skill("skill1"), _make_skill("skill2")]
monkeypatch.setattr("deerflow.agents.lead_agent.prompt.load_skills", lambda enabled_only: skills)
result = get_skills_prompt_section(available_skills=set())
assert result == ""
def test_get_skills_prompt_section_returns_skills(monkeypatch):
skills = [_make_skill("skill1"), _make_skill("skill2")]
monkeypatch.setattr("deerflow.agents.lead_agent.prompt.load_skills", lambda enabled_only: skills)
result = get_skills_prompt_section(available_skills={"skill1"})
assert "skill1" in result
assert "skill2" not in result
def test_get_skills_prompt_section_returns_all_when_available_skills_is_none(monkeypatch):
skills = [_make_skill("skill1"), _make_skill("skill2")]
monkeypatch.setattr("deerflow.agents.lead_agent.prompt.load_skills", lambda enabled_only: skills)
result = get_skills_prompt_section(available_skills=None)
assert "skill1" in result
assert "skill2" in result
def test_make_lead_agent_empty_skills_passed_correctly(monkeypatch):
from unittest.mock import MagicMock
from deerflow.agents.lead_agent import agent as lead_agent_module
# Mock dependencies
monkeypatch.setattr(lead_agent_module, "get_app_config", lambda: MagicMock())
monkeypatch.setattr(lead_agent_module, "_resolve_model_name", lambda x=None: "default-model")
monkeypatch.setattr(lead_agent_module, "create_chat_model", lambda **kwargs: "model")
monkeypatch.setattr("deerflow.tools.get_available_tools", lambda **kwargs: [])
monkeypatch.setattr(lead_agent_module, "_build_middlewares", lambda *args, **kwargs: [])
monkeypatch.setattr(lead_agent_module, "create_agent", lambda **kwargs: kwargs)
class MockModelConfig:
supports_thinking = False
mock_app_config = MagicMock()
mock_app_config.get_model_config.return_value = MockModelConfig()
monkeypatch.setattr(lead_agent_module, "get_app_config", lambda: mock_app_config)
captured_skills = []
def mock_apply_prompt_template(**kwargs):
captured_skills.append(kwargs.get("available_skills"))
return "mock_prompt"
monkeypatch.setattr(lead_agent_module, "apply_prompt_template", mock_apply_prompt_template)
# Case 1: Empty skills list
monkeypatch.setattr(lead_agent_module, "load_agent_config", lambda x: AgentConfig(name="test", skills=[]))
lead_agent_module.make_lead_agent({"configurable": {"agent_name": "test"}})
assert captured_skills[-1] == set()
# Case 2: None skills list
monkeypatch.setattr(lead_agent_module, "load_agent_config", lambda x: AgentConfig(name="test", skills=None))
lead_agent_module.make_lead_agent({"configurable": {"agent_name": "test"}})
assert captured_skills[-1] is None
# Case 3: Some skills list
monkeypatch.setattr(lead_agent_module, "load_agent_config", lambda x: AgentConfig(name="test", skills=["skill1"]))
lead_agent_module.make_lead_agent({"configurable": {"agent_name": "test"}})
assert captured_skills[-1] == {"skill1"}

View File

@ -0,0 +1,136 @@
from __future__ import annotations
import asyncio
from types import SimpleNamespace
import pytest
from langchain_core.messages import AIMessage
from langgraph.errors import GraphBubbleUp
from deerflow.agents.middlewares.llm_error_handling_middleware import (
LLMErrorHandlingMiddleware,
)
class FakeError(Exception):
def __init__(
self,
message: str,
*,
status_code: int | None = None,
code: str | None = None,
headers: dict[str, str] | None = None,
body: dict | None = None,
) -> None:
super().__init__(message)
self.status_code = status_code
self.code = code
self.body = body
self.response = SimpleNamespace(status_code=status_code, headers=headers or {}) if status_code is not None or headers else None
def _build_middleware(**attrs: int) -> LLMErrorHandlingMiddleware:
middleware = LLMErrorHandlingMiddleware()
for key, value in attrs.items():
setattr(middleware, key, value)
return middleware
def test_async_model_call_retries_busy_provider_then_succeeds(
monkeypatch: pytest.MonkeyPatch,
) -> None:
middleware = _build_middleware(retry_max_attempts=3, retry_base_delay_ms=25, retry_cap_delay_ms=25)
attempts = 0
waits: list[float] = []
events: list[dict] = []
async def fake_sleep(delay: float) -> None:
waits.append(delay)
def fake_writer():
return events.append
async def handler(_request) -> AIMessage:
nonlocal attempts
attempts += 1
if attempts < 3:
raise FakeError("当前服务集群负载较高,请稍后重试,感谢您的耐心等待。 (2064)")
return AIMessage(content="ok")
monkeypatch.setattr("asyncio.sleep", fake_sleep)
monkeypatch.setattr(
"langgraph.config.get_stream_writer",
fake_writer,
)
result = asyncio.run(middleware.awrap_model_call(SimpleNamespace(), handler))
assert isinstance(result, AIMessage)
assert result.content == "ok"
assert attempts == 3
assert waits == [0.025, 0.025]
assert [event["type"] for event in events] == ["llm_retry", "llm_retry"]
def test_async_model_call_returns_user_message_for_quota_errors() -> None:
middleware = _build_middleware(retry_max_attempts=3)
async def handler(_request) -> AIMessage:
raise FakeError(
"insufficient_quota: account balance is empty",
status_code=429,
code="insufficient_quota",
)
result = asyncio.run(middleware.awrap_model_call(SimpleNamespace(), handler))
assert isinstance(result, AIMessage)
assert "out of quota" in str(result.content)
def test_sync_model_call_uses_retry_after_header(monkeypatch: pytest.MonkeyPatch) -> None:
middleware = _build_middleware(retry_max_attempts=2, retry_base_delay_ms=10, retry_cap_delay_ms=10)
waits: list[float] = []
attempts = 0
def fake_sleep(delay: float) -> None:
waits.append(delay)
def handler(_request) -> AIMessage:
nonlocal attempts
attempts += 1
if attempts == 1:
raise FakeError(
"server busy",
status_code=503,
headers={"Retry-After": "2"},
)
return AIMessage(content="ok")
monkeypatch.setattr("time.sleep", fake_sleep)
result = middleware.wrap_model_call(SimpleNamespace(), handler)
assert isinstance(result, AIMessage)
assert result.content == "ok"
assert waits == [2.0]
def test_sync_model_call_propagates_graph_bubble_up() -> None:
middleware = _build_middleware()
def handler(_request) -> AIMessage:
raise GraphBubbleUp()
with pytest.raises(GraphBubbleUp):
middleware.wrap_model_call(SimpleNamespace(), handler)
def test_async_model_call_propagates_graph_bubble_up() -> None:
middleware = _build_middleware()
async def handler(_request) -> AIMessage:
raise GraphBubbleUp()
with pytest.raises(GraphBubbleUp):
asyncio.run(middleware.awrap_model_call(SimpleNamespace(), handler))

View File

@ -0,0 +1,388 @@
import errno
from types import SimpleNamespace
from unittest.mock import patch
import pytest
from deerflow.sandbox.local.local_sandbox import LocalSandbox, PathMapping
from deerflow.sandbox.local.local_sandbox_provider import LocalSandboxProvider
class TestPathMapping:
def test_path_mapping_dataclass(self):
mapping = PathMapping(container_path="/mnt/skills", local_path="/home/user/skills", read_only=True)
assert mapping.container_path == "/mnt/skills"
assert mapping.local_path == "/home/user/skills"
assert mapping.read_only is True
def test_path_mapping_defaults_to_false(self):
mapping = PathMapping(container_path="/mnt/data", local_path="/home/user/data")
assert mapping.read_only is False
class TestLocalSandboxPathResolution:
def test_resolve_path_exact_match(self):
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path="/home/user/skills"),
],
)
resolved = sandbox._resolve_path("/mnt/skills")
assert resolved == "/home/user/skills"
def test_resolve_path_nested_path(self):
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path="/home/user/skills"),
],
)
resolved = sandbox._resolve_path("/mnt/skills/agent/prompt.py")
assert resolved == "/home/user/skills/agent/prompt.py"
def test_resolve_path_no_mapping(self):
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path="/home/user/skills"),
],
)
resolved = sandbox._resolve_path("/mnt/other/file.txt")
assert resolved == "/mnt/other/file.txt"
def test_resolve_path_longest_prefix_first(self):
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path="/home/user/skills"),
PathMapping(container_path="/mnt", local_path="/var/mnt"),
],
)
resolved = sandbox._resolve_path("/mnt/skills/file.py")
# Should match /mnt/skills first (longer prefix)
assert resolved == "/home/user/skills/file.py"
def test_reverse_resolve_path_exact_match(self, tmp_path):
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path=str(skills_dir)),
],
)
resolved = sandbox._reverse_resolve_path(str(skills_dir))
assert resolved == "/mnt/skills"
def test_reverse_resolve_path_nested(self, tmp_path):
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
file_path = skills_dir / "agent" / "prompt.py"
file_path.parent.mkdir()
file_path.write_text("test")
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path=str(skills_dir)),
],
)
resolved = sandbox._reverse_resolve_path(str(file_path))
assert resolved == "/mnt/skills/agent/prompt.py"
class TestReadOnlyPath:
def test_is_read_only_true(self):
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path="/home/user/skills", read_only=True),
],
)
assert sandbox._is_read_only_path("/home/user/skills/file.py") is True
def test_is_read_only_false_for_writable(self):
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/data", local_path="/home/user/data", read_only=False),
],
)
assert sandbox._is_read_only_path("/home/user/data/file.txt") is False
def test_is_read_only_false_for_unmapped_path(self):
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path="/home/user/skills", read_only=True),
],
)
# Path not under any mapping
assert sandbox._is_read_only_path("/tmp/other/file.txt") is False
def test_is_read_only_true_for_exact_match(self):
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path="/home/user/skills", read_only=True),
],
)
assert sandbox._is_read_only_path("/home/user/skills") is True
def test_write_file_blocked_on_read_only(self, tmp_path):
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path=str(skills_dir), read_only=True),
],
)
# Skills dir is read-only, write should be blocked
with pytest.raises(OSError) as exc_info:
sandbox.write_file("/mnt/skills/new_file.py", "content")
assert exc_info.value.errno == errno.EROFS
def test_write_file_allowed_on_writable_mount(self, tmp_path):
data_dir = tmp_path / "data"
data_dir.mkdir()
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/data", local_path=str(data_dir), read_only=False),
],
)
sandbox.write_file("/mnt/data/file.txt", "content")
assert (data_dir / "file.txt").read_text() == "content"
def test_update_file_blocked_on_read_only(self, tmp_path):
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
existing_file = skills_dir / "existing.py"
existing_file.write_bytes(b"original")
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path=str(skills_dir), read_only=True),
],
)
with pytest.raises(OSError) as exc_info:
sandbox.update_file("/mnt/skills/existing.py", b"updated")
assert exc_info.value.errno == errno.EROFS
class TestMultipleMounts:
def test_multiple_read_write_mounts(self, tmp_path):
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
data_dir = tmp_path / "data"
data_dir.mkdir()
external_dir = tmp_path / "external"
external_dir.mkdir()
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/skills", local_path=str(skills_dir), read_only=True),
PathMapping(container_path="/mnt/data", local_path=str(data_dir), read_only=False),
PathMapping(container_path="/mnt/external", local_path=str(external_dir), read_only=True),
],
)
# Skills is read-only
with pytest.raises(OSError):
sandbox.write_file("/mnt/skills/file.py", "content")
# Data is writable
sandbox.write_file("/mnt/data/file.txt", "data content")
assert (data_dir / "file.txt").read_text() == "data content"
# External is read-only
with pytest.raises(OSError):
sandbox.write_file("/mnt/external/file.txt", "content")
def test_nested_mounts_writable_under_readonly(self, tmp_path):
"""A writable mount nested under a read-only mount should allow writes."""
ro_dir = tmp_path / "ro"
ro_dir.mkdir()
rw_dir = ro_dir / "writable"
rw_dir.mkdir()
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/repo", local_path=str(ro_dir), read_only=True),
PathMapping(container_path="/mnt/repo/writable", local_path=str(rw_dir), read_only=False),
],
)
# Parent mount is read-only
with pytest.raises(OSError):
sandbox.write_file("/mnt/repo/file.txt", "content")
# Nested writable mount should allow writes
sandbox.write_file("/mnt/repo/writable/file.txt", "content")
assert (rw_dir / "file.txt").read_text() == "content"
def test_execute_command_path_replacement(self, tmp_path, monkeypatch):
data_dir = tmp_path / "data"
data_dir.mkdir()
test_file = data_dir / "test.txt"
test_file.write_text("hello")
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/data", local_path=str(data_dir)),
],
)
# Mock subprocess to capture the resolved command
captured = {}
original_run = __import__("subprocess").run
def mock_run(*args, **kwargs):
if len(args) > 0:
captured["command"] = args[0]
return original_run(*args, **kwargs)
monkeypatch.setattr("deerflow.sandbox.local.local_sandbox.subprocess.run", mock_run)
monkeypatch.setattr("deerflow.sandbox.local.local_sandbox.LocalSandbox._get_shell", lambda self: "/bin/sh")
sandbox.execute_command("cat /mnt/data/test.txt")
# Verify the command received the resolved local path
assert str(data_dir) in captured.get("command", "")
def test_reverse_resolve_path_does_not_match_partial_prefix(self, tmp_path):
foo_dir = tmp_path / "foo"
foo_dir.mkdir()
foobar_dir = tmp_path / "foobar"
foobar_dir.mkdir()
target = foobar_dir / "file.txt"
target.write_text("test")
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/foo", local_path=str(foo_dir)),
],
)
resolved = sandbox._reverse_resolve_path(str(target))
assert resolved == str(target.resolve())
def test_reverse_resolve_paths_in_output_supports_backslash_separator(self, tmp_path):
mount_dir = tmp_path / "mount"
mount_dir.mkdir()
sandbox = LocalSandbox(
"test",
[
PathMapping(container_path="/mnt/data", local_path=str(mount_dir)),
],
)
output = f"Copied: {mount_dir}\\file.txt"
masked = sandbox._reverse_resolve_paths_in_output(output)
assert "/mnt/data/file.txt" in masked
assert str(mount_dir) not in masked
class TestLocalSandboxProviderMounts:
def test_setup_path_mappings_uses_configured_skills_container_path_as_reserved_prefix(self, tmp_path):
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
custom_dir = tmp_path / "custom"
custom_dir.mkdir()
from deerflow.config.sandbox_config import SandboxConfig, VolumeMountConfig
sandbox_config = SandboxConfig(
use="deerflow.sandbox.local:LocalSandboxProvider",
mounts=[
VolumeMountConfig(host_path=str(custom_dir), container_path="/custom-skills/nested", read_only=False),
],
)
config = SimpleNamespace(
skills=SimpleNamespace(container_path="/custom-skills", get_skills_path=lambda: skills_dir),
sandbox=sandbox_config,
)
with patch("deerflow.config.get_app_config", return_value=config):
provider = LocalSandboxProvider()
assert [m.container_path for m in provider._path_mappings] == ["/custom-skills"]
def test_setup_path_mappings_skips_relative_host_path(self, tmp_path):
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
from deerflow.config.sandbox_config import SandboxConfig, VolumeMountConfig
sandbox_config = SandboxConfig(
use="deerflow.sandbox.local:LocalSandboxProvider",
mounts=[
VolumeMountConfig(host_path="relative/path", container_path="/mnt/data", read_only=False),
],
)
config = SimpleNamespace(
skills=SimpleNamespace(container_path="/mnt/skills", get_skills_path=lambda: skills_dir),
sandbox=sandbox_config,
)
with patch("deerflow.config.get_app_config", return_value=config):
provider = LocalSandboxProvider()
assert [m.container_path for m in provider._path_mappings] == ["/mnt/skills"]
def test_setup_path_mappings_skips_non_absolute_container_path(self, tmp_path):
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
custom_dir = tmp_path / "custom"
custom_dir.mkdir()
from deerflow.config.sandbox_config import SandboxConfig, VolumeMountConfig
sandbox_config = SandboxConfig(
use="deerflow.sandbox.local:LocalSandboxProvider",
mounts=[
VolumeMountConfig(host_path=str(custom_dir), container_path="mnt/data", read_only=False),
],
)
config = SimpleNamespace(
skills=SimpleNamespace(container_path="/mnt/skills", get_skills_path=lambda: skills_dir),
sandbox=sandbox_config,
)
with patch("deerflow.config.get_app_config", return_value=config):
provider = LocalSandboxProvider()
assert [m.container_path for m in provider._path_mappings] == ["/mnt/skills"]
def test_setup_path_mappings_normalizes_container_path_trailing_slash(self, tmp_path):
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
custom_dir = tmp_path / "custom"
custom_dir.mkdir()
from deerflow.config.sandbox_config import SandboxConfig, VolumeMountConfig
sandbox_config = SandboxConfig(
use="deerflow.sandbox.local:LocalSandboxProvider",
mounts=[
VolumeMountConfig(host_path=str(custom_dir), container_path="/mnt/data/", read_only=False),
],
)
config = SimpleNamespace(
skills=SimpleNamespace(container_path="/mnt/skills", get_skills_path=lambda: skills_dir),
sandbox=sandbox_config,
)
with patch("deerflow.config.get_app_config", return_value=config):
provider = LocalSandboxProvider()
assert [m.container_path for m in provider._path_mappings] == ["/mnt/skills", "/mnt/data"]

View File

@ -1,5 +1,6 @@
"""Tests for LoopDetectionMiddleware."""
import copy
from unittest.mock import MagicMock
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
@ -19,8 +20,13 @@ def _make_runtime(thread_id="test-thread"):
def _make_state(tool_calls=None, content=""):
"""Build a minimal AgentState dict with an AIMessage."""
msg = AIMessage(content=content, tool_calls=tool_calls or [])
"""Build a minimal AgentState dict with an AIMessage.
Deep-copies *content* when it is mutable (e.g. list) so that
successive calls never share the same object reference.
"""
safe_content = copy.deepcopy(content) if isinstance(content, list) else content
msg = AIMessage(content=safe_content, tool_calls=tool_calls or [])
return {"messages": [msg]}
@ -229,3 +235,114 @@ class TestLoopDetection:
mw._apply(_make_state(tool_calls=call), runtime)
assert "default" in mw._history
class TestAppendText:
"""Unit tests for LoopDetectionMiddleware._append_text."""
def test_none_content_returns_text(self):
result = LoopDetectionMiddleware._append_text(None, "hello")
assert result == "hello"
def test_str_content_concatenates(self):
result = LoopDetectionMiddleware._append_text("existing", "appended")
assert result == "existing\n\nappended"
def test_empty_str_content_concatenates(self):
result = LoopDetectionMiddleware._append_text("", "appended")
assert result == "\n\nappended"
def test_list_content_appends_text_block(self):
"""List content (e.g. Anthropic thinking mode) should get a new text block."""
content = [
{"type": "thinking", "text": "Let me think..."},
{"type": "text", "text": "Here is my answer"},
]
result = LoopDetectionMiddleware._append_text(content, "stop msg")
assert isinstance(result, list)
assert len(result) == 3
assert result[0] == content[0]
assert result[1] == content[1]
assert result[2] == {"type": "text", "text": "\n\nstop msg"}
def test_empty_list_content_appends_text_block(self):
result = LoopDetectionMiddleware._append_text([], "stop msg")
assert isinstance(result, list)
assert len(result) == 1
assert result[0] == {"type": "text", "text": "\n\nstop msg"}
def test_unexpected_type_coerced_to_str(self):
"""Unexpected content types should be coerced to str as a fallback."""
result = LoopDetectionMiddleware._append_text(42, "stop msg")
assert isinstance(result, str)
assert result == "42\n\nstop msg"
def test_list_content_not_mutated_in_place(self):
"""_append_text must not modify the original list."""
original = [{"type": "text", "text": "hello"}]
result = LoopDetectionMiddleware._append_text(original, "appended")
assert len(original) == 1 # original unchanged
assert len(result) == 2 # new list has the appended block
class TestHardStopWithListContent:
"""Regression tests: hard stop must not crash when AIMessage.content is a list."""
def test_hard_stop_with_list_content(self):
"""Hard stop on list content should not raise TypeError (regression)."""
mw = LoopDetectionMiddleware(warn_threshold=2, hard_limit=4)
runtime = _make_runtime()
call = [_bash_call("ls")]
# Build state with list content (e.g. Anthropic thinking mode)
list_content = [
{"type": "thinking", "text": "Let me think..."},
{"type": "text", "text": "I'll run ls"},
]
for _ in range(3):
mw._apply(_make_state(tool_calls=call, content=list_content), runtime)
# Fourth call triggers hard stop — must not raise TypeError
result = mw._apply(_make_state(tool_calls=call, content=list_content), runtime)
assert result is not None
msg = result["messages"][0]
assert isinstance(msg, AIMessage)
assert msg.tool_calls == []
# Content should remain a list with the stop message appended
assert isinstance(msg.content, list)
assert len(msg.content) == 3
assert msg.content[2]["type"] == "text"
assert _HARD_STOP_MSG in msg.content[2]["text"]
def test_hard_stop_with_none_content(self):
"""Hard stop on None content should produce a plain string."""
mw = LoopDetectionMiddleware(warn_threshold=2, hard_limit=4)
runtime = _make_runtime()
call = [_bash_call("ls")]
for _ in range(3):
mw._apply(_make_state(tool_calls=call), runtime)
# Fourth call with default empty-string content
result = mw._apply(_make_state(tool_calls=call), runtime)
assert result is not None
msg = result["messages"][0]
assert isinstance(msg.content, str)
assert _HARD_STOP_MSG in msg.content
def test_hard_stop_with_str_content(self):
"""Hard stop on str content should concatenate the stop message."""
mw = LoopDetectionMiddleware(warn_threshold=2, hard_limit=4)
runtime = _make_runtime()
call = [_bash_call("ls")]
for _ in range(3):
mw._apply(_make_state(tool_calls=call, content="thinking..."), runtime)
result = mw._apply(_make_state(tool_calls=call, content="thinking..."), runtime)
assert result is not None
msg = result["messages"][0]
assert isinstance(msg.content, str)
assert msg.content.startswith("thinking...")
assert _HARD_STOP_MSG in msg.content

View File

@ -119,3 +119,57 @@ def test_format_memory_skips_non_string_content_facts() -> None:
# The formatted line for a list content would be "- [knowledge | 0.85] ['list']".
assert "| 0.85]" not in result
assert "Valid fact" in result
def test_format_memory_renders_correction_source_error() -> None:
memory_data = {
"facts": [
{
"content": "Use make dev for local development.",
"category": "correction",
"confidence": 0.95,
"sourceError": "The agent previously suggested npm start.",
}
]
}
result = format_memory_for_injection(memory_data, max_tokens=2000)
assert "Use make dev for local development." in result
assert "avoid: The agent previously suggested npm start." in result
def test_format_memory_renders_correction_without_source_error_normally() -> None:
memory_data = {
"facts": [
{
"content": "Use make dev for local development.",
"category": "correction",
"confidence": 0.95,
}
]
}
result = format_memory_for_injection(memory_data, max_tokens=2000)
assert "Use make dev for local development." in result
assert "avoid:" not in result
def test_format_memory_includes_long_term_background() -> None:
"""longTermBackground in history must be injected into the prompt."""
memory_data = {
"user": {},
"history": {
"recentMonths": {"summary": "Recent activity summary"},
"earlierContext": {"summary": "Earlier context summary"},
"longTermBackground": {"summary": "Core expertise in distributed systems"},
},
"facts": [],
}
result = format_memory_for_injection(memory_data, max_tokens=2000)
assert "Background: Core expertise in distributed systems" in result
assert "Recent: Recent activity summary" in result
assert "Earlier: Earlier context summary" in result

View File

@ -0,0 +1,50 @@
from unittest.mock import MagicMock, patch
from deerflow.agents.memory.queue import ConversationContext, MemoryUpdateQueue
from deerflow.config.memory_config import MemoryConfig
def _memory_config(**overrides: object) -> MemoryConfig:
config = MemoryConfig()
for key, value in overrides.items():
setattr(config, key, value)
return config
def test_queue_add_preserves_existing_correction_flag_for_same_thread() -> None:
queue = MemoryUpdateQueue()
with (
patch("deerflow.agents.memory.queue.get_memory_config", return_value=_memory_config(enabled=True)),
patch.object(queue, "_reset_timer"),
):
queue.add(thread_id="thread-1", messages=["first"], correction_detected=True)
queue.add(thread_id="thread-1", messages=["second"], correction_detected=False)
assert len(queue._queue) == 1
assert queue._queue[0].messages == ["second"]
assert queue._queue[0].correction_detected is True
def test_process_queue_forwards_correction_flag_to_updater() -> None:
queue = MemoryUpdateQueue()
queue._queue = [
ConversationContext(
thread_id="thread-1",
messages=["conversation"],
agent_name="lead_agent",
correction_detected=True,
)
]
mock_updater = MagicMock()
mock_updater.update_memory.return_value = True
with patch("deerflow.agents.memory.updater.MemoryUpdater", return_value=mock_updater):
queue._process_queue()
mock_updater.update_memory.assert_called_once_with(
messages=["conversation"],
thread_id="thread-1",
agent_name="lead_agent",
correction_detected=True,
)

View File

@ -72,6 +72,56 @@ def test_import_memory_route_returns_imported_memory() -> None:
assert response.json()["facts"] == imported_memory["facts"]
def test_export_memory_route_preserves_source_error() -> None:
app = FastAPI()
app.include_router(memory.router)
exported_memory = _sample_memory(
facts=[
{
"id": "fact_correction",
"content": "Use make dev for local development.",
"category": "correction",
"confidence": 0.95,
"createdAt": "2026-03-20T00:00:00Z",
"source": "thread-1",
"sourceError": "The agent previously suggested npm start.",
}
]
)
with patch("app.gateway.routers.memory.get_memory_data", return_value=exported_memory):
with TestClient(app) as client:
response = client.get("/api/memory/export")
assert response.status_code == 200
assert response.json()["facts"][0]["sourceError"] == "The agent previously suggested npm start."
def test_import_memory_route_preserves_source_error() -> None:
app = FastAPI()
app.include_router(memory.router)
imported_memory = _sample_memory(
facts=[
{
"id": "fact_correction",
"content": "Use make dev for local development.",
"category": "correction",
"confidence": 0.95,
"createdAt": "2026-03-20T00:00:00Z",
"source": "thread-1",
"sourceError": "The agent previously suggested npm start.",
}
]
)
with patch("app.gateway.routers.memory.import_memory_data", return_value=imported_memory):
with TestClient(app) as client:
response = client.post("/api/memory/import", json=imported_memory)
assert response.status_code == 200
assert response.json()["facts"][0]["sourceError"] == "The agent previously suggested npm start."
def test_clear_memory_route_returns_cleared_memory() -> None:
app = FastAPI()
app.include_router(memory.router)

View File

@ -146,6 +146,53 @@ def test_apply_updates_preserves_threshold_and_max_facts_trimming() -> None:
assert result["facts"][1]["source"] == "thread-9"
def test_apply_updates_preserves_source_error() -> None:
updater = MemoryUpdater()
current_memory = _make_memory()
update_data = {
"newFacts": [
{
"content": "Use make dev for local development.",
"category": "correction",
"confidence": 0.95,
"sourceError": "The agent previously suggested npm start.",
}
]
}
with patch(
"deerflow.agents.memory.updater.get_memory_config",
return_value=_memory_config(max_facts=100, fact_confidence_threshold=0.7),
):
result = updater._apply_updates(current_memory, update_data, thread_id="thread-correction")
assert result["facts"][0]["sourceError"] == "The agent previously suggested npm start."
assert result["facts"][0]["category"] == "correction"
def test_apply_updates_ignores_empty_source_error() -> None:
updater = MemoryUpdater()
current_memory = _make_memory()
update_data = {
"newFacts": [
{
"content": "Use make dev for local development.",
"category": "correction",
"confidence": 0.95,
"sourceError": " ",
}
]
}
with patch(
"deerflow.agents.memory.updater.get_memory_config",
return_value=_memory_config(max_facts=100, fact_confidence_threshold=0.7),
):
result = updater._apply_updates(current_memory, update_data, thread_id="thread-correction")
assert "sourceError" not in result["facts"][0]
def test_clear_memory_data_resets_all_sections() -> None:
with patch("deerflow.agents.memory.updater._save_memory_to_file", return_value=True):
result = clear_memory_data()
@ -522,3 +569,53 @@ class TestUpdateMemoryStructuredResponse:
result = updater.update_memory([msg, ai_msg])
assert result is True
def test_correction_hint_injected_when_detected(self):
updater = MemoryUpdater()
valid_json = '{"user": {}, "history": {}, "newFacts": [], "factsToRemove": []}'
model = self._make_mock_model(valid_json)
with (
patch.object(updater, "_get_model", return_value=model),
patch("deerflow.agents.memory.updater.get_memory_config", return_value=_memory_config(enabled=True)),
patch("deerflow.agents.memory.updater.get_memory_data", return_value=_make_memory()),
patch("deerflow.agents.memory.updater.get_memory_storage", return_value=MagicMock(save=MagicMock(return_value=True))),
):
msg = MagicMock()
msg.type = "human"
msg.content = "No, that's wrong."
ai_msg = MagicMock()
ai_msg.type = "ai"
ai_msg.content = "Understood"
ai_msg.tool_calls = []
result = updater.update_memory([msg, ai_msg], correction_detected=True)
assert result is True
prompt = model.invoke.call_args[0][0]
assert "Explicit correction signals were detected" in prompt
def test_correction_hint_empty_when_not_detected(self):
updater = MemoryUpdater()
valid_json = '{"user": {}, "history": {}, "newFacts": [], "factsToRemove": []}'
model = self._make_mock_model(valid_json)
with (
patch.object(updater, "_get_model", return_value=model),
patch("deerflow.agents.memory.updater.get_memory_config", return_value=_memory_config(enabled=True)),
patch("deerflow.agents.memory.updater.get_memory_data", return_value=_make_memory()),
patch("deerflow.agents.memory.updater.get_memory_storage", return_value=MagicMock(save=MagicMock(return_value=True))),
):
msg = MagicMock()
msg.type = "human"
msg.content = "Let's talk about memory."
ai_msg = MagicMock()
ai_msg.type = "ai"
ai_msg.content = "Sure"
ai_msg.tool_calls = []
result = updater.update_memory([msg, ai_msg], correction_detected=False)
assert result is True
prompt = model.invoke.call_args[0][0]
assert "Explicit correction signals were detected" not in prompt

View File

@ -10,7 +10,7 @@ persisting in long-term memory:
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
from deerflow.agents.memory.updater import _strip_upload_mentions_from_memory
from deerflow.agents.middlewares.memory_middleware import _filter_messages_for_memory
from deerflow.agents.middlewares.memory_middleware import _filter_messages_for_memory, detect_correction
# ---------------------------------------------------------------------------
# Helpers
@ -134,6 +134,64 @@ class TestFilterMessagesForMemory:
assert "<uploaded_files>" not in all_content
# ===========================================================================
# detect_correction
# ===========================================================================
class TestDetectCorrection:
def test_detects_english_correction_signal(self):
msgs = [
_human("Please help me run the project."),
_ai("Use npm start."),
_human("That's wrong, use make dev instead."),
_ai("Understood."),
]
assert detect_correction(msgs) is True
def test_detects_chinese_correction_signal(self):
msgs = [
_human("帮我启动项目"),
_ai("用 npm start"),
_human("不对,改用 make dev"),
_ai("明白了"),
]
assert detect_correction(msgs) is True
def test_returns_false_without_signal(self):
msgs = [
_human("Please explain the build setup."),
_ai("Here is the build setup."),
_human("Thanks, that makes sense."),
]
assert detect_correction(msgs) is False
def test_only_checks_recent_messages(self):
msgs = [
_human("That is wrong, use make dev instead."),
_ai("Noted."),
_human("Let's discuss tests."),
_ai("Sure."),
_human("What about linting?"),
_ai("Use ruff."),
_human("And formatting?"),
_ai("Use make format."),
]
assert detect_correction(msgs) is False
def test_handles_list_content(self):
msgs = [
HumanMessage(content=["That is wrong,", {"type": "text", "text": "use make dev instead."}]),
_ai("Updated."),
]
assert detect_correction(msgs) is True
# ===========================================================================
# _strip_upload_mentions_from_memory
# ===========================================================================

View File

@ -73,7 +73,7 @@ def _patch_factory(monkeypatch, app_config: AppConfig, model_class=FakeChatModel
"""Patch get_app_config, resolve_class, and tracing for isolated unit tests."""
monkeypatch.setattr(factory_module, "get_app_config", lambda: app_config)
monkeypatch.setattr(factory_module, "resolve_class", lambda path, base: model_class)
monkeypatch.setattr(factory_module, "is_tracing_enabled", lambda: False)
monkeypatch.setattr(factory_module, "build_tracing_callbacks", lambda: [])
# ---------------------------------------------------------------------------
@ -95,12 +95,23 @@ def test_uses_first_model_when_name_is_none(monkeypatch):
def test_raises_when_model_not_found(monkeypatch):
cfg = _make_app_config([_make_model("only-model")])
monkeypatch.setattr(factory_module, "get_app_config", lambda: cfg)
monkeypatch.setattr(factory_module, "is_tracing_enabled", lambda: False)
monkeypatch.setattr(factory_module, "build_tracing_callbacks", lambda: [])
with pytest.raises(ValueError, match="ghost-model"):
factory_module.create_chat_model(name="ghost-model")
def test_appends_all_tracing_callbacks(monkeypatch):
cfg = _make_app_config([_make_model("alpha")])
_patch_factory(monkeypatch, cfg)
monkeypatch.setattr(factory_module, "build_tracing_callbacks", lambda: ["smith-callback", "langfuse-callback"])
FakeChatModel.captured_kwargs = {}
model = factory_module.create_chat_model(name="alpha")
assert model.callbacks == ["smith-callback", "langfuse-callback"]
# ---------------------------------------------------------------------------
# thinking_enabled=True
# ---------------------------------------------------------------------------

View File

@ -0,0 +1,393 @@
from types import SimpleNamespace
from unittest.mock import patch
from deerflow.community.aio_sandbox.aio_sandbox import AioSandbox
from deerflow.sandbox.local.local_sandbox import LocalSandbox
from deerflow.sandbox.search import GrepMatch, find_glob_matches, find_grep_matches
from deerflow.sandbox.tools import glob_tool, grep_tool
def _make_runtime(tmp_path):
workspace = tmp_path / "workspace"
uploads = tmp_path / "uploads"
outputs = tmp_path / "outputs"
workspace.mkdir()
uploads.mkdir()
outputs.mkdir()
return SimpleNamespace(
state={
"sandbox": {"sandbox_id": "local"},
"thread_data": {
"workspace_path": str(workspace),
"uploads_path": str(uploads),
"outputs_path": str(outputs),
},
},
context={"thread_id": "thread-1"},
)
def test_glob_tool_returns_virtual_paths_and_ignores_common_dirs(tmp_path, monkeypatch) -> None:
runtime = _make_runtime(tmp_path)
workspace = tmp_path / "workspace"
(workspace / "app.py").write_text("print('hi')\n", encoding="utf-8")
(workspace / "pkg").mkdir()
(workspace / "pkg" / "util.py").write_text("print('util')\n", encoding="utf-8")
(workspace / "node_modules").mkdir()
(workspace / "node_modules" / "skip.py").write_text("ignored\n", encoding="utf-8")
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
result = glob_tool.func(
runtime=runtime,
description="find python files",
pattern="**/*.py",
path="/mnt/user-data/workspace",
)
assert "/mnt/user-data/workspace/app.py" in result
assert "/mnt/user-data/workspace/pkg/util.py" in result
assert "node_modules" not in result
assert str(workspace) not in result
def test_glob_tool_supports_skills_virtual_paths(tmp_path, monkeypatch) -> None:
runtime = _make_runtime(tmp_path)
skills_dir = tmp_path / "skills"
(skills_dir / "public" / "demo").mkdir(parents=True)
(skills_dir / "public" / "demo" / "SKILL.md").write_text("# Demo\n", encoding="utf-8")
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
with (
patch("deerflow.sandbox.tools._get_skills_container_path", return_value="/mnt/skills"),
patch("deerflow.sandbox.tools._get_skills_host_path", return_value=str(skills_dir)),
):
result = glob_tool.func(
runtime=runtime,
description="find skills",
pattern="**/SKILL.md",
path="/mnt/skills",
)
assert "/mnt/skills/public/demo/SKILL.md" in result
assert str(skills_dir) not in result
def test_grep_tool_filters_by_glob_and_skips_binary_files(tmp_path, monkeypatch) -> None:
runtime = _make_runtime(tmp_path)
workspace = tmp_path / "workspace"
(workspace / "main.py").write_text("TODO = 'ship it'\nprint(TODO)\n", encoding="utf-8")
(workspace / "notes.txt").write_text("TODO in txt should be filtered\n", encoding="utf-8")
(workspace / "image.bin").write_bytes(b"\0binary TODO")
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
result = grep_tool.func(
runtime=runtime,
description="find todo references",
pattern="TODO",
path="/mnt/user-data/workspace",
glob="**/*.py",
)
assert "/mnt/user-data/workspace/main.py:1: TODO = 'ship it'" in result
assert "notes.txt" not in result
assert "image.bin" not in result
assert str(workspace) not in result
def test_grep_tool_truncates_results(tmp_path, monkeypatch) -> None:
runtime = _make_runtime(tmp_path)
workspace = tmp_path / "workspace"
(workspace / "main.py").write_text("TODO one\nTODO two\nTODO three\n", encoding="utf-8")
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
# Prevent config.yaml tool config from overriding the caller-supplied max_results=2.
monkeypatch.setattr("deerflow.sandbox.tools.get_app_config", lambda: SimpleNamespace(get_tool_config=lambda name: None))
result = grep_tool.func(
runtime=runtime,
description="limit matches",
pattern="TODO",
path="/mnt/user-data/workspace",
max_results=2,
)
assert "Found 2 matches under /mnt/user-data/workspace (showing first 2)" in result
assert "TODO one" in result
assert "TODO two" in result
assert "TODO three" not in result
assert "Results truncated." in result
def test_glob_tool_include_dirs_filters_nested_ignored_paths(tmp_path, monkeypatch) -> None:
runtime = _make_runtime(tmp_path)
workspace = tmp_path / "workspace"
(workspace / "src").mkdir()
(workspace / "src" / "main.py").write_text("x\n", encoding="utf-8")
(workspace / "node_modules").mkdir()
(workspace / "node_modules" / "lib").mkdir()
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
result = glob_tool.func(
runtime=runtime,
description="find dirs",
pattern="**",
path="/mnt/user-data/workspace",
include_dirs=True,
)
assert "src" in result
assert "node_modules" not in result
def test_grep_tool_literal_mode(tmp_path, monkeypatch) -> None:
runtime = _make_runtime(tmp_path)
workspace = tmp_path / "workspace"
(workspace / "file.py").write_text("price = (a+b)\nresult = a+b\n", encoding="utf-8")
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
# literal=True should treat (a+b) as a plain string, not a regex group
result = grep_tool.func(
runtime=runtime,
description="literal search",
pattern="(a+b)",
path="/mnt/user-data/workspace",
literal=True,
)
assert "price = (a+b)" in result
assert "result = a+b" not in result
def test_grep_tool_case_sensitive(tmp_path, monkeypatch) -> None:
runtime = _make_runtime(tmp_path)
workspace = tmp_path / "workspace"
(workspace / "file.py").write_text("TODO: fix\ntodo: also fix\n", encoding="utf-8")
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
result = grep_tool.func(
runtime=runtime,
description="case sensitive search",
pattern="TODO",
path="/mnt/user-data/workspace",
case_sensitive=True,
)
assert "TODO: fix" in result
assert "todo: also fix" not in result
def test_grep_tool_invalid_regex_returns_error(tmp_path, monkeypatch) -> None:
runtime = _make_runtime(tmp_path)
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
result = grep_tool.func(
runtime=runtime,
description="bad pattern",
pattern="[invalid",
path="/mnt/user-data/workspace",
)
assert "Invalid regex pattern" in result
def test_aio_sandbox_glob_include_dirs_filters_nested_ignored(monkeypatch) -> None:
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
monkeypatch.setattr(
sandbox._client.file,
"list_path",
lambda **kwargs: SimpleNamespace(
data=SimpleNamespace(
files=[
SimpleNamespace(name="src", path="/mnt/workspace/src"),
SimpleNamespace(name="node_modules", path="/mnt/workspace/node_modules"),
# child of node_modules — should be filtered via should_ignore_path
SimpleNamespace(name="lib", path="/mnt/workspace/node_modules/lib"),
]
)
),
)
matches, truncated = sandbox.glob("/mnt/workspace", "**", include_dirs=True)
assert "/mnt/workspace/src" in matches
assert "/mnt/workspace/node_modules" not in matches
assert "/mnt/workspace/node_modules/lib" not in matches
assert truncated is False
def test_aio_sandbox_grep_invalid_regex_raises() -> None:
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
import re
try:
sandbox.grep("/mnt/workspace", "[invalid")
assert False, "Expected re.error"
except re.error:
pass
def test_aio_sandbox_glob_parses_json(monkeypatch) -> None:
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
monkeypatch.setattr(
sandbox._client.file,
"find_files",
lambda **kwargs: SimpleNamespace(data=SimpleNamespace(files=["/mnt/user-data/workspace/app.py", "/mnt/user-data/workspace/node_modules/skip.py"])),
)
matches, truncated = sandbox.glob("/mnt/user-data/workspace", "**/*.py")
assert matches == ["/mnt/user-data/workspace/app.py"]
assert truncated is False
def test_aio_sandbox_grep_parses_json(monkeypatch) -> None:
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
monkeypatch.setattr(
sandbox._client.file,
"list_path",
lambda **kwargs: SimpleNamespace(
data=SimpleNamespace(
files=[
SimpleNamespace(
name="app.py",
path="/mnt/user-data/workspace/app.py",
is_directory=False,
)
]
)
),
)
monkeypatch.setattr(
sandbox._client.file,
"search_in_file",
lambda **kwargs: SimpleNamespace(data=SimpleNamespace(line_numbers=[7], matches=["TODO = True"])),
)
matches, truncated = sandbox.grep("/mnt/user-data/workspace", "TODO")
assert matches == [GrepMatch(path="/mnt/user-data/workspace/app.py", line_number=7, line="TODO = True")]
assert truncated is False
def test_find_glob_matches_raises_not_a_directory(tmp_path) -> None:
file_path = tmp_path / "file.txt"
file_path.write_text("x\n", encoding="utf-8")
try:
find_glob_matches(file_path, "**/*.py")
assert False, "Expected NotADirectoryError"
except NotADirectoryError:
pass
def test_find_grep_matches_raises_not_a_directory(tmp_path) -> None:
file_path = tmp_path / "file.txt"
file_path.write_text("TODO\n", encoding="utf-8")
try:
find_grep_matches(file_path, "TODO")
assert False, "Expected NotADirectoryError"
except NotADirectoryError:
pass
def test_find_grep_matches_skips_symlink_outside_root(tmp_path) -> None:
workspace = tmp_path / "workspace"
workspace.mkdir()
outside = tmp_path / "outside.txt"
outside.write_text("TODO outside\n", encoding="utf-8")
(workspace / "outside-link.txt").symlink_to(outside)
matches, truncated = find_grep_matches(workspace, "TODO")
assert matches == []
assert truncated is False
def test_glob_tool_honors_smaller_requested_max_results(tmp_path, monkeypatch) -> None:
runtime = _make_runtime(tmp_path)
workspace = tmp_path / "workspace"
(workspace / "a.py").write_text("print('a')\n", encoding="utf-8")
(workspace / "b.py").write_text("print('b')\n", encoding="utf-8")
(workspace / "c.py").write_text("print('c')\n", encoding="utf-8")
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
monkeypatch.setattr(
"deerflow.sandbox.tools.get_app_config",
lambda: SimpleNamespace(get_tool_config=lambda name: SimpleNamespace(model_extra={"max_results": 50})),
)
result = glob_tool.func(
runtime=runtime,
description="limit glob matches",
pattern="**/*.py",
path="/mnt/user-data/workspace",
max_results=2,
)
assert "Found 2 paths under /mnt/user-data/workspace (showing first 2)" in result
assert "Results truncated." in result
def test_aio_sandbox_glob_include_dirs_enforces_root_boundary(monkeypatch) -> None:
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
monkeypatch.setattr(
sandbox._client.file,
"list_path",
lambda **kwargs: SimpleNamespace(
data=SimpleNamespace(
files=[
SimpleNamespace(name="src", path="/mnt/workspace/src"),
SimpleNamespace(name="src2", path="/mnt/workspace2/src2"),
]
)
),
)
matches, truncated = sandbox.glob("/mnt/workspace", "**", include_dirs=True)
assert matches == ["/mnt/workspace/src"]
assert truncated is False
def test_aio_sandbox_grep_skips_mismatched_line_number_payloads(monkeypatch) -> None:
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
monkeypatch.setattr(
sandbox._client.file,
"list_path",
lambda **kwargs: SimpleNamespace(
data=SimpleNamespace(
files=[
SimpleNamespace(
name="app.py",
path="/mnt/user-data/workspace/app.py",
is_directory=False,
)
]
)
),
)
monkeypatch.setattr(
sandbox._client.file,
"search_in_file",
lambda **kwargs: SimpleNamespace(data=SimpleNamespace(line_numbers=[7], matches=["TODO = True", "extra"])),
)
matches, truncated = sandbox.grep("/mnt/user-data/workspace", "TODO")
assert matches == [GrepMatch(path="/mnt/user-data/workspace/app.py", line_number=7, line="TODO = True")]
assert truncated is False

View File

@ -1,3 +1,4 @@
import threading
from pathlib import Path
from types import SimpleNamespace
from unittest.mock import patch
@ -7,7 +8,10 @@ import pytest
from deerflow.sandbox.tools import (
VIRTUAL_PATH_PREFIX,
_apply_cwd_prefix,
_get_custom_mount_for_path,
_get_custom_mounts,
_is_acp_workspace_path,
_is_custom_mount_path,
_is_skills_path,
_reject_path_traversal,
_resolve_acp_workspace_path,
@ -17,8 +21,10 @@ from deerflow.sandbox.tools import (
mask_local_paths_in_output,
replace_virtual_path,
replace_virtual_paths_in_command,
str_replace_tool,
validate_local_bash_command_paths,
validate_local_tool_path,
write_file_tool,
)
_THREAD_DATA = {
@ -93,6 +99,25 @@ def test_validate_local_tool_path_rejects_non_virtual_path() -> None:
validate_local_tool_path("/Users/someone/config.yaml", _THREAD_DATA)
def test_validate_local_tool_path_rejects_non_virtual_path_mentions_configured_mounts() -> None:
with pytest.raises(PermissionError, match="configured mount paths"):
validate_local_tool_path("/Users/someone/config.yaml", _THREAD_DATA)
def test_validate_local_tool_path_prioritizes_user_data_before_custom_mounts() -> None:
from deerflow.config.sandbox_config import VolumeMountConfig
mounts = [
VolumeMountConfig(host_path="/tmp/host-user-data", container_path=VIRTUAL_PATH_PREFIX, read_only=False),
]
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=mounts):
validate_local_tool_path(f"{VIRTUAL_PATH_PREFIX}/workspace/file.txt", _THREAD_DATA, read_only=True)
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=mounts):
with pytest.raises(PermissionError, match="path traversal"):
validate_local_tool_path(f"{VIRTUAL_PATH_PREFIX}/workspace/../../etc/passwd", _THREAD_DATA, read_only=True)
def test_validate_local_tool_path_rejects_bare_virtual_root() -> None:
"""The bare /mnt/user-data root without trailing slash is not a valid sub-path."""
with pytest.raises(PermissionError, match="Only paths under"):
@ -321,6 +346,56 @@ def test_validate_local_bash_command_paths_allows_skills_path() -> None:
)
def test_validate_local_bash_command_paths_allows_urls() -> None:
"""URLs in bash commands should not be mistaken for absolute paths (issue #1385)."""
# HTTPS URLs
validate_local_bash_command_paths(
"curl -X POST https://example.com/api/v1/risk/check",
_THREAD_DATA,
)
# HTTP URLs
validate_local_bash_command_paths(
"curl http://localhost:8080/health",
_THREAD_DATA,
)
# URLs with query strings
validate_local_bash_command_paths(
"curl https://api.example.com/v2/search?q=test",
_THREAD_DATA,
)
# FTP URLs
validate_local_bash_command_paths(
"curl ftp://ftp.example.com/pub/file.tar.gz",
_THREAD_DATA,
)
# URL mixed with valid virtual path
validate_local_bash_command_paths(
"curl https://example.com/data -o /mnt/user-data/workspace/data.json",
_THREAD_DATA,
)
def test_validate_local_bash_command_paths_blocks_file_urls() -> None:
"""file:// URLs should be treated as unsafe and blocked."""
with pytest.raises(PermissionError):
validate_local_bash_command_paths("curl file:///etc/passwd", _THREAD_DATA)
def test_validate_local_bash_command_paths_blocks_file_urls_case_insensitive() -> None:
"""file:// URL detection should be case-insensitive."""
with pytest.raises(PermissionError):
validate_local_bash_command_paths("curl FILE:///etc/shadow", _THREAD_DATA)
def test_validate_local_bash_command_paths_blocks_file_urls_mixed_with_valid() -> None:
"""file:// URLs should be blocked even when mixed with valid paths."""
with pytest.raises(PermissionError):
validate_local_bash_command_paths(
"curl file:///etc/passwd -o /mnt/user-data/workspace/out.txt",
_THREAD_DATA,
)
def test_validate_local_bash_command_paths_still_blocks_other_paths() -> None:
"""Paths outside virtual and system prefixes must still be blocked."""
with patch("deerflow.sandbox.tools._get_skills_container_path", return_value="/mnt/skills"):
@ -512,3 +587,371 @@ def test_validate_local_bash_command_paths_allows_mcp_filesystem_paths() -> None
with patch("deerflow.config.extensions_config.get_extensions_config", return_value=disabled_config):
with pytest.raises(PermissionError, match="Unsafe absolute paths"):
validate_local_bash_command_paths("ls /mnt/d/workspace", _THREAD_DATA)
# ---------- Custom mount path tests ----------
def _mock_custom_mounts():
"""Create mock VolumeMountConfig objects for testing."""
from deerflow.config.sandbox_config import VolumeMountConfig
return [
VolumeMountConfig(host_path="/home/user/code-read", container_path="/mnt/code-read", read_only=True),
VolumeMountConfig(host_path="/home/user/data", container_path="/mnt/data", read_only=False),
]
def test_is_custom_mount_path_recognises_configured_mounts() -> None:
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=_mock_custom_mounts()):
assert _is_custom_mount_path("/mnt/code-read") is True
assert _is_custom_mount_path("/mnt/code-read/src/main.py") is True
assert _is_custom_mount_path("/mnt/data") is True
assert _is_custom_mount_path("/mnt/data/file.txt") is True
assert _is_custom_mount_path("/mnt/code-read-extra/foo") is False
assert _is_custom_mount_path("/mnt/other") is False
def test_get_custom_mount_for_path_returns_longest_prefix() -> None:
from deerflow.config.sandbox_config import VolumeMountConfig
mounts = [
VolumeMountConfig(host_path="/var/mnt", container_path="/mnt", read_only=False),
VolumeMountConfig(host_path="/home/user/code", container_path="/mnt/code", read_only=True),
]
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=mounts):
mount = _get_custom_mount_for_path("/mnt/code/file.py")
assert mount is not None
assert mount.container_path == "/mnt/code"
def test_validate_local_tool_path_allows_custom_mount_read() -> None:
"""read_file / ls should be able to access custom mount paths."""
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=_mock_custom_mounts()):
validate_local_tool_path("/mnt/code-read/src/main.py", _THREAD_DATA, read_only=True)
validate_local_tool_path("/mnt/data/file.txt", _THREAD_DATA, read_only=True)
def test_validate_local_tool_path_blocks_read_only_mount_write() -> None:
"""write_file / str_replace must NOT write to read-only custom mounts."""
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=_mock_custom_mounts()):
with pytest.raises(PermissionError, match="Write access to read-only mount is not allowed"):
validate_local_tool_path("/mnt/code-read/src/main.py", _THREAD_DATA, read_only=False)
def test_validate_local_tool_path_allows_writable_mount_write() -> None:
"""write_file / str_replace should succeed on writable custom mounts."""
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=_mock_custom_mounts()):
validate_local_tool_path("/mnt/data/file.txt", _THREAD_DATA, read_only=False)
def test_validate_local_tool_path_blocks_traversal_in_custom_mount() -> None:
"""Path traversal via .. in custom mount paths must be rejected."""
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=_mock_custom_mounts()):
with pytest.raises(PermissionError, match="path traversal"):
validate_local_tool_path("/mnt/code-read/../../etc/passwd", _THREAD_DATA, read_only=True)
def test_validate_local_bash_command_paths_allows_custom_mount() -> None:
"""bash commands referencing custom mount paths should be allowed."""
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=_mock_custom_mounts()):
validate_local_bash_command_paths("cat /mnt/code-read/src/main.py", _THREAD_DATA)
validate_local_bash_command_paths("ls /mnt/data", _THREAD_DATA)
def test_validate_local_bash_command_paths_blocks_traversal_in_custom_mount() -> None:
"""Bash commands with traversal in custom mount paths should be blocked."""
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=_mock_custom_mounts()):
with pytest.raises(PermissionError, match="path traversal"):
validate_local_bash_command_paths("cat /mnt/code-read/../../etc/passwd", _THREAD_DATA)
def test_validate_local_bash_command_paths_still_blocks_non_mount_paths() -> None:
"""Paths not matching any custom mount should still be blocked."""
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=_mock_custom_mounts()):
with pytest.raises(PermissionError, match="Unsafe absolute paths"):
validate_local_bash_command_paths("cat /etc/shadow", _THREAD_DATA)
def test_get_custom_mounts_caching(monkeypatch, tmp_path) -> None:
"""_get_custom_mounts should cache after first successful load."""
# Clear any existing cache
if hasattr(_get_custom_mounts, "_cached"):
monkeypatch.delattr(_get_custom_mounts, "_cached")
# Use real directories so host_path.exists() filtering passes
dir_a = tmp_path / "code-read"
dir_a.mkdir()
dir_b = tmp_path / "data"
dir_b.mkdir()
from deerflow.config.sandbox_config import SandboxConfig, VolumeMountConfig
mounts = [
VolumeMountConfig(host_path=str(dir_a), container_path="/mnt/code-read", read_only=True),
VolumeMountConfig(host_path=str(dir_b), container_path="/mnt/data", read_only=False),
]
mock_sandbox = SandboxConfig(use="deerflow.sandbox.local:LocalSandboxProvider", mounts=mounts)
mock_config = SimpleNamespace(sandbox=mock_sandbox)
with patch("deerflow.config.get_app_config", return_value=mock_config):
result = _get_custom_mounts()
assert len(result) == 2
# After caching, should return cached value even without mock
assert hasattr(_get_custom_mounts, "_cached")
assert len(_get_custom_mounts()) == 2
# Cleanup
monkeypatch.delattr(_get_custom_mounts, "_cached")
def test_get_custom_mounts_filters_nonexistent_host_path(monkeypatch, tmp_path) -> None:
"""_get_custom_mounts should only return mounts whose host_path exists."""
if hasattr(_get_custom_mounts, "_cached"):
monkeypatch.delattr(_get_custom_mounts, "_cached")
from deerflow.config.sandbox_config import SandboxConfig, VolumeMountConfig
existing_dir = tmp_path / "existing"
existing_dir.mkdir()
mounts = [
VolumeMountConfig(host_path=str(existing_dir), container_path="/mnt/existing", read_only=True),
VolumeMountConfig(host_path="/nonexistent/path/12345", container_path="/mnt/ghost", read_only=False),
]
mock_sandbox = SandboxConfig(use="deerflow.sandbox.local:LocalSandboxProvider", mounts=mounts)
mock_config = SimpleNamespace(sandbox=mock_sandbox)
with patch("deerflow.config.get_app_config", return_value=mock_config):
result = _get_custom_mounts()
assert len(result) == 1
assert result[0].container_path == "/mnt/existing"
# Cleanup
monkeypatch.delattr(_get_custom_mounts, "_cached")
def test_get_custom_mount_for_path_boundary_no_false_prefix_match() -> None:
"""_get_custom_mount_for_path must not match /mnt/code-read-extra for /mnt/code-read."""
with patch("deerflow.sandbox.tools._get_custom_mounts", return_value=_mock_custom_mounts()):
mount = _get_custom_mount_for_path("/mnt/code-read-extra/foo")
assert mount is None
def test_str_replace_parallel_updates_should_preserve_both_edits(monkeypatch) -> None:
class SharedSandbox:
def __init__(self) -> None:
self.content = "alpha\nbeta\n"
self._active_reads = 0
self._state_lock = threading.Lock()
self._overlap_detected = threading.Event()
def read_file(self, path: str) -> str:
with self._state_lock:
self._active_reads += 1
snapshot = self.content
if self._active_reads == 2:
self._overlap_detected.set()
self._overlap_detected.wait(0.05)
with self._state_lock:
self._active_reads -= 1
return snapshot
def write_file(self, path: str, content: str, append: bool = False) -> None:
self.content = content
sandbox = SharedSandbox()
runtimes = [
SimpleNamespace(state={}, context={"thread_id": "thread-1"}, config={}),
SimpleNamespace(state={}, context={"thread_id": "thread-1"}, config={}),
]
failures: list[BaseException] = []
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: sandbox)
monkeypatch.setattr("deerflow.sandbox.tools.ensure_thread_directories_exist", lambda runtime: None)
monkeypatch.setattr("deerflow.sandbox.tools.is_local_sandbox", lambda runtime: False)
def worker(runtime: SimpleNamespace, old_str: str, new_str: str) -> None:
try:
result = str_replace_tool.func(
runtime=runtime,
description="并发替换同一文件",
path="/mnt/user-data/workspace/shared.txt",
old_str=old_str,
new_str=new_str,
)
assert result == "OK"
except BaseException as exc: # pragma: no cover - failure is asserted below
failures.append(exc)
threads = [
threading.Thread(target=worker, args=(runtimes[0], "alpha", "ALPHA")),
threading.Thread(target=worker, args=(runtimes[1], "beta", "BETA")),
]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
assert failures == []
assert "ALPHA" in sandbox.content
assert "BETA" in sandbox.content
def test_str_replace_parallel_updates_in_isolated_sandboxes_should_not_share_path_lock(monkeypatch) -> None:
class IsolatedSandbox:
def __init__(self, sandbox_id: str, shared_state: dict[str, object]) -> None:
self.id = sandbox_id
self.content = "alpha\nbeta\n"
self._shared_state = shared_state
def read_file(self, path: str) -> str:
state_lock = self._shared_state["state_lock"]
with state_lock:
active_reads = self._shared_state["active_reads"]
self._shared_state["active_reads"] = active_reads + 1
snapshot = self.content
if self._shared_state["active_reads"] == 2:
overlap_detected = self._shared_state["overlap_detected"]
overlap_detected.set()
overlap_detected = self._shared_state["overlap_detected"]
overlap_detected.wait(0.05)
with state_lock:
active_reads = self._shared_state["active_reads"]
self._shared_state["active_reads"] = active_reads - 1
return snapshot
def write_file(self, path: str, content: str, append: bool = False) -> None:
self.content = content
shared_state: dict[str, object] = {
"active_reads": 0,
"state_lock": threading.Lock(),
"overlap_detected": threading.Event(),
}
sandboxes = {
"sandbox-a": IsolatedSandbox("sandbox-a", shared_state),
"sandbox-b": IsolatedSandbox("sandbox-b", shared_state),
}
runtimes = [
SimpleNamespace(state={}, context={"thread_id": "thread-1", "sandbox_key": "sandbox-a"}, config={}),
SimpleNamespace(state={}, context={"thread_id": "thread-2", "sandbox_key": "sandbox-b"}, config={}),
]
failures: list[BaseException] = []
monkeypatch.setattr(
"deerflow.sandbox.tools.ensure_sandbox_initialized",
lambda runtime: sandboxes[runtime.context["sandbox_key"]],
)
monkeypatch.setattr("deerflow.sandbox.tools.ensure_thread_directories_exist", lambda runtime: None)
monkeypatch.setattr("deerflow.sandbox.tools.is_local_sandbox", lambda runtime: False)
def worker(runtime: SimpleNamespace, old_str: str, new_str: str) -> None:
try:
result = str_replace_tool.func(
runtime=runtime,
description="隔离 sandbox 并发替换同一路径",
path="/mnt/user-data/workspace/shared.txt",
old_str=old_str,
new_str=new_str,
)
assert result == "OK"
except BaseException as exc: # pragma: no cover - failure is asserted below
failures.append(exc)
threads = [
threading.Thread(target=worker, args=(runtimes[0], "alpha", "ALPHA")),
threading.Thread(target=worker, args=(runtimes[1], "beta", "BETA")),
]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
assert failures == []
assert sandboxes["sandbox-a"].content == "ALPHA\nbeta\n"
assert sandboxes["sandbox-b"].content == "alpha\nBETA\n"
assert shared_state["overlap_detected"].is_set()
def test_str_replace_and_append_on_same_path_should_preserve_both_updates(monkeypatch) -> None:
class SharedSandbox:
def __init__(self) -> None:
self.id = "sandbox-1"
self.content = "alpha\n"
self.state_lock = threading.Lock()
self.str_replace_has_snapshot = threading.Event()
self.append_finished = threading.Event()
def read_file(self, path: str) -> str:
with self.state_lock:
snapshot = self.content
self.str_replace_has_snapshot.set()
self.append_finished.wait(0.05)
return snapshot
def write_file(self, path: str, content: str, append: bool = False) -> None:
with self.state_lock:
if append:
self.content += content
self.append_finished.set()
else:
self.content = content
sandbox = SharedSandbox()
runtimes = [
SimpleNamespace(state={}, context={"thread_id": "thread-1"}, config={}),
SimpleNamespace(state={}, context={"thread_id": "thread-1"}, config={}),
]
failures: list[BaseException] = []
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: sandbox)
monkeypatch.setattr("deerflow.sandbox.tools.ensure_thread_directories_exist", lambda runtime: None)
monkeypatch.setattr("deerflow.sandbox.tools.is_local_sandbox", lambda runtime: False)
def replace_worker() -> None:
try:
result = str_replace_tool.func(
runtime=runtimes[0],
description="替换旧内容",
path="/mnt/user-data/workspace/shared.txt",
old_str="alpha",
new_str="ALPHA",
)
assert result == "OK"
except BaseException as exc: # pragma: no cover - failure is asserted below
failures.append(exc)
def append_worker() -> None:
try:
sandbox.str_replace_has_snapshot.wait(0.05)
result = write_file_tool.func(
runtime=runtimes[1],
description="追加新内容",
path="/mnt/user-data/workspace/shared.txt",
content="tail\n",
append=True,
)
assert result == "OK"
except BaseException as exc: # pragma: no cover - failure is asserted below
failures.append(exc)
replace_thread = threading.Thread(target=replace_worker)
append_thread = threading.Thread(target=append_worker)
replace_thread.start()
append_thread.start()
replace_thread.join()
append_thread.join()
assert failures == []
assert sandbox.content == "ALPHA\ntail\n"

View File

@ -93,6 +93,27 @@ class TestParseSkillFile:
assert result is not None
assert result.description == "A skill: does things"
def test_multiline_yaml_folded_description(self, tmp_path):
skill_file = _write_skill(
tmp_path,
"---\nname: multiline-skill\ndescription: >\n This is a multiline\n description for a skill.\n\n It spans multiple lines.\nlicense: MIT\n---\n\nBody\n",
)
result = parse_skill_file(skill_file, "public")
assert result is not None
assert result.name == "multiline-skill"
assert result.description == "This is a multiline description for a skill.\n\nIt spans multiple lines."
assert result.license == "MIT"
def test_multiline_yaml_literal_description(self, tmp_path):
skill_file = _write_skill(
tmp_path,
"---\nname: pipe-skill\ndescription: |\n First line.\n Second line.\n---\n\nBody\n",
)
result = parse_skill_file(skill_file, "public")
assert result is not None
assert result.name == "pipe-skill"
assert result.description == "First line.\nSecond line."
def test_empty_front_matter_returns_none(self, tmp_path):
skill_file = _write_skill(tmp_path, "---\n\n---\n\nBody\n")
assert parse_skill_file(skill_file, "public") is None

View File

@ -140,6 +140,193 @@ async def test_event_id_format(bridge: MemoryStreamBridge):
assert re.match(r"^\d+-\d+$", event.id), f"Expected timestamp-seq format, got {event.id}"
# ---------------------------------------------------------------------------
# END sentinel guarantee tests
# ---------------------------------------------------------------------------
@pytest.mark.anyio
async def test_end_sentinel_delivered_when_queue_full():
"""END sentinel must always be delivered, even when the queue is completely full.
This is the critical regression test for the bug where publish_end()
would silently drop the END sentinel when the queue was full, causing
subscribe() to hang forever and leaking resources.
"""
bridge = MemoryStreamBridge(queue_maxsize=2)
run_id = "run-end-full"
# Fill the queue to capacity
await bridge.publish(run_id, "event-1", {"n": 1})
await bridge.publish(run_id, "event-2", {"n": 2})
assert bridge._queues[run_id].full()
# publish_end should succeed by evicting old events
await bridge.publish_end(run_id)
# Subscriber must receive END_SENTINEL
events = []
async for entry in bridge.subscribe(run_id, heartbeat_interval=0.1):
events.append(entry)
if entry is END_SENTINEL:
break
assert any(e is END_SENTINEL for e in events), "END sentinel was not delivered"
@pytest.mark.anyio
async def test_end_sentinel_evicts_oldest_events():
"""When queue is full, publish_end evicts the oldest events to make room."""
bridge = MemoryStreamBridge(queue_maxsize=1)
run_id = "run-evict"
# Fill queue with one event
await bridge.publish(run_id, "will-be-evicted", {})
assert bridge._queues[run_id].full()
# publish_end must succeed
await bridge.publish_end(run_id)
# The only event we should get is END_SENTINEL (the regular event was evicted)
events = []
async for entry in bridge.subscribe(run_id, heartbeat_interval=0.1):
events.append(entry)
if entry is END_SENTINEL:
break
assert len(events) == 1
assert events[0] is END_SENTINEL
@pytest.mark.anyio
async def test_end_sentinel_no_eviction_when_space_available():
"""When queue has space, publish_end should not evict anything."""
bridge = MemoryStreamBridge(queue_maxsize=10)
run_id = "run-no-evict"
await bridge.publish(run_id, "event-1", {"n": 1})
await bridge.publish(run_id, "event-2", {"n": 2})
await bridge.publish_end(run_id)
events = []
async for entry in bridge.subscribe(run_id, heartbeat_interval=0.1):
events.append(entry)
if entry is END_SENTINEL:
break
# All events plus END should be present
assert len(events) == 3
assert events[0].event == "event-1"
assert events[1].event == "event-2"
assert events[2] is END_SENTINEL
@pytest.mark.anyio
async def test_concurrent_tasks_end_sentinel():
"""Multiple concurrent producer/consumer pairs should all terminate properly.
Simulates the production scenario where multiple runs share a single
bridge instance each must receive its own END sentinel.
"""
bridge = MemoryStreamBridge(queue_maxsize=4)
num_runs = 4
async def producer(run_id: str):
for i in range(10): # More events than queue capacity
await bridge.publish(run_id, f"event-{i}", {"i": i})
await bridge.publish_end(run_id)
async def consumer(run_id: str) -> list:
events = []
async for entry in bridge.subscribe(run_id, heartbeat_interval=0.1):
events.append(entry)
if entry is END_SENTINEL:
return events
return events # pragma: no cover
# Run producers and consumers concurrently
run_ids = [f"concurrent-{i}" for i in range(num_runs)]
producers = [producer(rid) for rid in run_ids]
consumers = [consumer(rid) for rid in run_ids]
# Start consumers first, then producers
consumer_tasks = [asyncio.create_task(c) for c in consumers]
await asyncio.gather(*producers)
results = await asyncio.wait_for(
asyncio.gather(*consumer_tasks),
timeout=10.0,
)
for i, events in enumerate(results):
assert events[-1] is END_SENTINEL, f"Run {run_ids[i]} did not receive END sentinel"
# ---------------------------------------------------------------------------
# Drop counter tests
# ---------------------------------------------------------------------------
@pytest.mark.anyio
async def test_dropped_count_tracking():
"""Dropped events should be tracked per run_id."""
bridge = MemoryStreamBridge(queue_maxsize=1)
run_id = "run-drop-count"
# Fill the queue
await bridge.publish(run_id, "first", {})
# This publish will time out and be dropped (we patch timeout to be instant)
# Instead, we verify the counter after publish_end eviction
await bridge.publish_end(run_id)
# dropped_count tracks publish() drops, not publish_end evictions
assert bridge.dropped_count(run_id) == 0
# cleanup should also clear the counter
await bridge.cleanup(run_id)
assert bridge.dropped_count(run_id) == 0
@pytest.mark.anyio
async def test_dropped_total():
"""dropped_total should sum across all runs."""
bridge = MemoryStreamBridge(queue_maxsize=256)
# No drops yet
assert bridge.dropped_total == 0
# Manually set some counts to verify the property
bridge._dropped_counts["run-a"] = 3
bridge._dropped_counts["run-b"] = 7
assert bridge.dropped_total == 10
@pytest.mark.anyio
async def test_cleanup_clears_dropped_counts():
"""cleanup() should clear the dropped counter for the run."""
bridge = MemoryStreamBridge(queue_maxsize=256)
run_id = "run-cleanup-drops"
bridge._get_or_create_queue(run_id)
bridge._dropped_counts[run_id] = 5
await bridge.cleanup(run_id)
assert run_id not in bridge._dropped_counts
@pytest.mark.anyio
async def test_close_clears_dropped_counts():
"""close() should clear all dropped counters."""
bridge = MemoryStreamBridge(queue_maxsize=256)
bridge._dropped_counts["run-x"] = 10
bridge._dropped_counts["run-y"] = 20
await bridge.close()
assert bridge.dropped_total == 0
assert len(bridge._dropped_counts) == 0
# ---------------------------------------------------------------------------
# Factory tests
# ---------------------------------------------------------------------------

View File

@ -1,5 +1,5 @@
import asyncio
from unittest.mock import MagicMock
from unittest.mock import AsyncMock, MagicMock
from app.gateway.routers import suggestions
@ -43,7 +43,7 @@ def test_generate_suggestions_parses_and_limits(monkeypatch):
model_name=None,
)
fake_model = MagicMock()
fake_model.invoke.return_value = MagicMock(content='```json\n["Q1", "Q2", "Q3", "Q4"]\n```')
fake_model.ainvoke = AsyncMock(return_value=MagicMock(content='```json\n["Q1", "Q2", "Q3", "Q4"]\n```'))
monkeypatch.setattr(suggestions, "create_chat_model", lambda **kwargs: fake_model)
result = asyncio.run(suggestions.generate_suggestions("t1", req))
@ -61,7 +61,7 @@ def test_generate_suggestions_parses_list_block_content(monkeypatch):
model_name=None,
)
fake_model = MagicMock()
fake_model.invoke.return_value = MagicMock(content=[{"type": "text", "text": '```json\n["Q1", "Q2"]\n```'}])
fake_model.ainvoke = AsyncMock(return_value=MagicMock(content=[{"type": "text", "text": '```json\n["Q1", "Q2"]\n```'}]))
monkeypatch.setattr(suggestions, "create_chat_model", lambda **kwargs: fake_model)
result = asyncio.run(suggestions.generate_suggestions("t1", req))
@ -79,7 +79,7 @@ def test_generate_suggestions_parses_output_text_block_content(monkeypatch):
model_name=None,
)
fake_model = MagicMock()
fake_model.invoke.return_value = MagicMock(content=[{"type": "output_text", "text": '```json\n["Q1", "Q2"]\n```'}])
fake_model.ainvoke = AsyncMock(return_value=MagicMock(content=[{"type": "output_text", "text": '```json\n["Q1", "Q2"]\n```'}]))
monkeypatch.setattr(suggestions, "create_chat_model", lambda **kwargs: fake_model)
result = asyncio.run(suggestions.generate_suggestions("t1", req))
@ -94,7 +94,7 @@ def test_generate_suggestions_returns_empty_on_model_error(monkeypatch):
model_name=None,
)
fake_model = MagicMock()
fake_model.invoke.side_effect = RuntimeError("boom")
fake_model.ainvoke = AsyncMock(side_effect=RuntimeError("boom"))
monkeypatch.setattr(suggestions, "create_chat_model", lambda **kwargs: fake_model)
result = asyncio.run(suggestions.generate_suggestions("t1", req))

View File

@ -5,6 +5,7 @@ from unittest.mock import AsyncMock, MagicMock
from langchain_core.messages import AIMessage, HumanMessage
from deerflow.agents.middlewares import title_middleware as title_middleware_module
from deerflow.agents.middlewares.title_middleware import TitleMiddleware
from deerflow.config.title_config import TitleConfig, get_title_config, set_title_config
@ -73,37 +74,32 @@ class TestTitleMiddlewareCoreLogic:
assert middleware._should_generate_title(state) is False
def test_generate_title_trims_quotes_and_respects_max_chars(self, monkeypatch):
def test_generate_title_uses_async_model_and_respects_max_chars(self, monkeypatch):
_set_test_title_config(max_chars=12)
middleware = TitleMiddleware()
fake_model = MagicMock()
fake_model.ainvoke = AsyncMock(return_value=MagicMock(content='"A very long generated title"'))
monkeypatch.setattr("deerflow.agents.middlewares.title_middleware.create_chat_model", lambda **kwargs: fake_model)
model = MagicMock()
model.ainvoke = AsyncMock(return_value=AIMessage(content="短标题"))
monkeypatch.setattr(title_middleware_module, "create_chat_model", MagicMock(return_value=model))
state = {
"messages": [
HumanMessage(content="请帮我写一个脚本"),
HumanMessage(content="请帮我写一个很长很长的脚本标题"),
AIMessage(content="好的,先确认需求"),
]
}
result = asyncio.run(middleware._agenerate_title_result(state))
title = result["title"]
assert '"' not in title
assert "'" not in title
assert len(title) == 12
assert title == "短标题"
title_middleware_module.create_chat_model.assert_called_once_with(thinking_enabled=False)
model.ainvoke.assert_awaited_once()
def test_generate_title_normalizes_structured_message_and_response_content(self, monkeypatch):
def test_generate_title_normalizes_structured_message_content(self, monkeypatch):
_set_test_title_config(max_chars=20)
middleware = TitleMiddleware()
fake_model = MagicMock()
fake_model.ainvoke = AsyncMock(
return_value=MagicMock(content=[{"type": "text", "text": '"结构总结"'}]),
)
monkeypatch.setattr(
"deerflow.agents.middlewares.title_middleware.create_chat_model",
lambda **kwargs: fake_model,
)
model = MagicMock()
model.ainvoke = AsyncMock(return_value=AIMessage(content="请帮我总结这段代码"))
monkeypatch.setattr(title_middleware_module, "create_chat_model", MagicMock(return_value=model))
state = {
"messages": [
@ -115,21 +111,14 @@ class TestTitleMiddlewareCoreLogic:
result = asyncio.run(middleware._agenerate_title_result(state))
title = result["title"]
prompt = fake_model.ainvoke.await_args.args[0]
assert "请帮我总结这段代码" in prompt
assert "好的,先看结构" in prompt
# Ensure structured message dict/JSON reprs are not leaking into the prompt.
assert "{'type':" not in prompt
assert "'type':" not in prompt
assert '"type":' not in prompt
assert title == "结构总结"
assert title == "请帮我总结这段代码"
def test_generate_title_fallback_when_model_fails(self, monkeypatch):
def test_generate_title_fallback_for_long_message(self, monkeypatch):
_set_test_title_config(max_chars=20)
middleware = TitleMiddleware()
fake_model = MagicMock()
fake_model.ainvoke = AsyncMock(side_effect=RuntimeError("LLM unavailable"))
monkeypatch.setattr("deerflow.agents.middlewares.title_middleware.create_chat_model", lambda **kwargs: fake_model)
model = MagicMock()
model.ainvoke = AsyncMock(side_effect=RuntimeError("model unavailable"))
monkeypatch.setattr(title_middleware_module, "create_chat_model", MagicMock(return_value=model))
state = {
"messages": [
@ -164,13 +153,10 @@ class TestTitleMiddlewareCoreLogic:
monkeypatch.setattr(middleware, "_generate_title_result", MagicMock(return_value=None))
assert middleware.after_model({"messages": []}, runtime=MagicMock()) is None
def test_sync_generate_title_with_model(self, monkeypatch):
"""Sync path calls model.invoke and produces a title."""
def test_sync_generate_title_uses_fallback_without_model(self):
"""Sync path avoids LLM calls and derives a local fallback title."""
_set_test_title_config(max_chars=20)
middleware = TitleMiddleware()
fake_model = MagicMock()
fake_model.invoke = MagicMock(return_value=MagicMock(content='"同步生成的标题"'))
monkeypatch.setattr("deerflow.agents.middlewares.title_middleware.create_chat_model", lambda **kwargs: fake_model)
state = {
"messages": [
@ -179,22 +165,19 @@ class TestTitleMiddlewareCoreLogic:
]
}
result = middleware._generate_title_result(state)
assert result == {"title": "同步生成的标题"}
fake_model.invoke.assert_called_once()
assert result == {"title": "请帮我写测试"}
def test_empty_title_falls_back(self, monkeypatch):
"""Empty model response triggers fallback title."""
def test_sync_generate_title_respects_fallback_truncation(self):
"""Sync fallback path still respects max_chars truncation rules."""
_set_test_title_config(max_chars=50)
middleware = TitleMiddleware()
fake_model = MagicMock()
fake_model.invoke = MagicMock(return_value=MagicMock(content=" "))
monkeypatch.setattr("deerflow.agents.middlewares.title_middleware.create_chat_model", lambda **kwargs: fake_model)
state = {
"messages": [
HumanMessage(content="空标题测试"),
HumanMessage(content="这是一个非常长的问题描述需要被截断以形成fallback标题而且这里继续补充更多上下文确保超过本地fallback截断阈值"),
AIMessage(content="回复"),
]
}
result = middleware._generate_title_result(state)
assert result["title"] == "空标题测试"
assert result["title"].endswith("...")
assert result["title"].startswith("这是一个非常长的问题描述")

View File

@ -0,0 +1,161 @@
"""Unit tests for tool output truncation functions.
These functions truncate long tool outputs to prevent context window overflow.
- _truncate_bash_output: middle-truncation (head + tail), for bash tool
- _truncate_read_file_output: head-truncation, for read_file tool
"""
from deerflow.sandbox.tools import _truncate_bash_output, _truncate_read_file_output
# ---------------------------------------------------------------------------
# _truncate_bash_output
# ---------------------------------------------------------------------------
class TestTruncateBashOutput:
def test_short_output_returned_unchanged(self):
output = "hello world"
assert _truncate_bash_output(output, 20000) == output
def test_output_equal_to_limit_returned_unchanged(self):
output = "A" * 20000
assert _truncate_bash_output(output, 20000) == output
def test_long_output_is_truncated(self):
output = "A" * 30000
result = _truncate_bash_output(output, 20000)
assert len(result) < len(output)
def test_result_never_exceeds_max_chars(self):
output = "A" * 30000
max_chars = 20000
result = _truncate_bash_output(output, max_chars)
assert len(result) <= max_chars
def test_head_is_preserved(self):
head = "HEAD_CONTENT"
output = head + "M" * 30000
result = _truncate_bash_output(output, 20000)
assert result.startswith(head)
def test_tail_is_preserved(self):
tail = "TAIL_CONTENT"
output = "M" * 30000 + tail
result = _truncate_bash_output(output, 20000)
assert result.endswith(tail)
def test_middle_truncation_marker_present(self):
output = "A" * 30000
result = _truncate_bash_output(output, 20000)
assert "[middle truncated:" in result
assert "chars skipped" in result
def test_skipped_chars_count_is_correct(self):
output = "A" * 25000
result = _truncate_bash_output(output, 20000)
# Extract the reported skipped count and verify it equals len(output) - kept.
# (kept = max_chars - marker_max_len, where marker_max_len is computed from
# the worst-case marker string — so the exact value is implementation-defined,
# but it must equal len(output) minus the chars actually preserved.)
import re
m = re.search(r"(\d+) chars skipped", result)
assert m is not None
reported_skipped = int(m.group(1))
# Verify the number is self-consistent: head + skipped + tail == total
assert reported_skipped > 0
# The marker reports exactly the chars between head and tail
head_and_tail = len(output) - reported_skipped
assert result.startswith(output[: head_and_tail // 2])
def test_max_chars_zero_disables_truncation(self):
output = "A" * 100000
assert _truncate_bash_output(output, 0) == output
def test_50_50_split(self):
# head and tail should each be roughly max_chars // 2
output = "H" * 20000 + "M" * 10000 + "T" * 20000
result = _truncate_bash_output(output, 20000)
assert result[:100] == "H" * 100
assert result[-100:] == "T" * 100
def test_small_max_chars_does_not_crash(self):
output = "A" * 1000
result = _truncate_bash_output(output, 10)
assert len(result) <= 10
def test_result_never_exceeds_max_chars_various_sizes(self):
output = "X" * 50000
for max_chars in [100, 1000, 5000, 20000, 49999]:
result = _truncate_bash_output(output, max_chars)
assert len(result) <= max_chars, f"failed for max_chars={max_chars}"
# ---------------------------------------------------------------------------
# _truncate_read_file_output
# ---------------------------------------------------------------------------
class TestTruncateReadFileOutput:
def test_short_output_returned_unchanged(self):
output = "def foo():\n pass\n"
assert _truncate_read_file_output(output, 50000) == output
def test_output_equal_to_limit_returned_unchanged(self):
output = "X" * 50000
assert _truncate_read_file_output(output, 50000) == output
def test_long_output_is_truncated(self):
output = "X" * 60000
result = _truncate_read_file_output(output, 50000)
assert len(result) < len(output)
def test_result_never_exceeds_max_chars(self):
output = "X" * 60000
max_chars = 50000
result = _truncate_read_file_output(output, max_chars)
assert len(result) <= max_chars
def test_head_is_preserved(self):
head = "import os\nimport sys\n"
output = head + "X" * 60000
result = _truncate_read_file_output(output, 50000)
assert result.startswith(head)
def test_truncation_marker_present(self):
output = "X" * 60000
result = _truncate_read_file_output(output, 50000)
assert "[truncated:" in result
assert "showing first" in result
def test_total_chars_reported_correctly(self):
output = "X" * 60000
result = _truncate_read_file_output(output, 50000)
assert "of 60000 chars" in result
def test_start_line_hint_present(self):
output = "X" * 60000
result = _truncate_read_file_output(output, 50000)
assert "start_line" in result
assert "end_line" in result
def test_max_chars_zero_disables_truncation(self):
output = "X" * 100000
assert _truncate_read_file_output(output, 0) == output
def test_tail_is_not_preserved(self):
# head-truncation: tail should be cut off
output = "H" * 50000 + "TAIL_SHOULD_NOT_APPEAR"
result = _truncate_read_file_output(output, 50000)
assert "TAIL_SHOULD_NOT_APPEAR" not in result
def test_small_max_chars_does_not_crash(self):
output = "X" * 1000
result = _truncate_read_file_output(output, 10)
assert len(result) <= 10
def test_result_never_exceeds_max_chars_various_sizes(self):
output = "X" * 50000
for max_chars in [100, 1000, 5000, 20000, 49999]:
result = _truncate_read_file_output(output, max_chars)
assert len(result) <= max_chars, f"failed for max_chars={max_chars}"

View File

@ -2,6 +2,8 @@
from __future__ import annotations
import pytest
from deerflow.config import tracing_config as tracing_module
@ -9,6 +11,29 @@ def _reset_tracing_cache() -> None:
tracing_module._tracing_config = None
@pytest.fixture(autouse=True)
def clear_tracing_env(monkeypatch):
for name in (
"LANGSMITH_TRACING",
"LANGCHAIN_TRACING_V2",
"LANGCHAIN_TRACING",
"LANGSMITH_API_KEY",
"LANGCHAIN_API_KEY",
"LANGSMITH_PROJECT",
"LANGCHAIN_PROJECT",
"LANGSMITH_ENDPOINT",
"LANGCHAIN_ENDPOINT",
"LANGFUSE_TRACING",
"LANGFUSE_PUBLIC_KEY",
"LANGFUSE_SECRET_KEY",
"LANGFUSE_BASE_URL",
):
monkeypatch.delenv(name, raising=False)
_reset_tracing_cache()
yield
_reset_tracing_cache()
def test_prefers_langsmith_env_names(monkeypatch):
monkeypatch.setenv("LANGSMITH_TRACING", "true")
monkeypatch.setenv("LANGSMITH_API_KEY", "lsv2_key")
@ -18,11 +43,12 @@ def test_prefers_langsmith_env_names(monkeypatch):
_reset_tracing_cache()
cfg = tracing_module.get_tracing_config()
assert cfg.enabled is True
assert cfg.api_key == "lsv2_key"
assert cfg.project == "smith-project"
assert cfg.endpoint == "https://smith.example.com"
assert cfg.langsmith.enabled is True
assert cfg.langsmith.api_key == "lsv2_key"
assert cfg.langsmith.project == "smith-project"
assert cfg.langsmith.endpoint == "https://smith.example.com"
assert tracing_module.is_tracing_enabled() is True
assert tracing_module.get_enabled_tracing_providers() == ["langsmith"]
def test_falls_back_to_langchain_env_names(monkeypatch):
@ -39,11 +65,12 @@ def test_falls_back_to_langchain_env_names(monkeypatch):
_reset_tracing_cache()
cfg = tracing_module.get_tracing_config()
assert cfg.enabled is True
assert cfg.api_key == "legacy-key"
assert cfg.project == "legacy-project"
assert cfg.endpoint == "https://legacy.example.com"
assert cfg.langsmith.enabled is True
assert cfg.langsmith.api_key == "legacy-key"
assert cfg.langsmith.project == "legacy-project"
assert cfg.langsmith.endpoint == "https://legacy.example.com"
assert tracing_module.is_tracing_enabled() is True
assert tracing_module.get_enabled_tracing_providers() == ["langsmith"]
def test_langsmith_tracing_false_overrides_langchain_tracing_v2_true(monkeypatch):
@ -55,8 +82,9 @@ def test_langsmith_tracing_false_overrides_langchain_tracing_v2_true(monkeypatch
_reset_tracing_cache()
cfg = tracing_module.get_tracing_config()
assert cfg.enabled is False
assert cfg.langsmith.enabled is False
assert tracing_module.is_tracing_enabled() is False
assert tracing_module.get_enabled_tracing_providers() == []
def test_defaults_when_project_not_set(monkeypatch):
@ -68,4 +96,51 @@ def test_defaults_when_project_not_set(monkeypatch):
_reset_tracing_cache()
cfg = tracing_module.get_tracing_config()
assert cfg.project == "deer-flow"
assert cfg.langsmith.project == "deer-flow"
def test_langfuse_config_is_loaded(monkeypatch):
monkeypatch.setenv("LANGFUSE_TRACING", "true")
monkeypatch.setenv("LANGFUSE_PUBLIC_KEY", "pk-lf-test")
monkeypatch.setenv("LANGFUSE_SECRET_KEY", "sk-lf-test")
monkeypatch.setenv("LANGFUSE_BASE_URL", "https://langfuse.example.com")
_reset_tracing_cache()
cfg = tracing_module.get_tracing_config()
assert cfg.langfuse.enabled is True
assert cfg.langfuse.public_key == "pk-lf-test"
assert cfg.langfuse.secret_key == "sk-lf-test"
assert cfg.langfuse.host == "https://langfuse.example.com"
assert tracing_module.get_enabled_tracing_providers() == ["langfuse"]
def test_dual_provider_config_is_loaded(monkeypatch):
monkeypatch.setenv("LANGSMITH_TRACING", "true")
monkeypatch.setenv("LANGSMITH_API_KEY", "lsv2_key")
monkeypatch.setenv("LANGFUSE_TRACING", "true")
monkeypatch.setenv("LANGFUSE_PUBLIC_KEY", "pk-lf-test")
monkeypatch.setenv("LANGFUSE_SECRET_KEY", "sk-lf-test")
_reset_tracing_cache()
cfg = tracing_module.get_tracing_config()
assert cfg.langsmith.is_configured is True
assert cfg.langfuse.is_configured is True
assert tracing_module.is_tracing_enabled() is True
assert tracing_module.get_enabled_tracing_providers() == ["langsmith", "langfuse"]
def test_langfuse_enabled_requires_public_and_secret_keys(monkeypatch):
monkeypatch.setenv("LANGFUSE_TRACING", "true")
monkeypatch.delenv("LANGFUSE_PUBLIC_KEY", raising=False)
monkeypatch.setenv("LANGFUSE_SECRET_KEY", "sk-lf-test")
_reset_tracing_cache()
assert tracing_module.get_tracing_config().is_configured is False
assert tracing_module.get_enabled_tracing_providers() == []
assert tracing_module.get_tracing_config().explicitly_enabled_providers == ["langfuse"]
with pytest.raises(ValueError, match="LANGFUSE_PUBLIC_KEY"):
tracing_module.validate_enabled_tracing_providers()

View File

@ -0,0 +1,173 @@
"""Tests for deerflow.tracing.factory."""
from __future__ import annotations
import sys
import types
import pytest
from deerflow.tracing import factory as tracing_factory
@pytest.fixture(autouse=True)
def clear_tracing_env(monkeypatch):
from deerflow.config import tracing_config as tracing_module
for name in (
"LANGSMITH_TRACING",
"LANGCHAIN_TRACING_V2",
"LANGCHAIN_TRACING",
"LANGSMITH_API_KEY",
"LANGCHAIN_API_KEY",
"LANGSMITH_PROJECT",
"LANGCHAIN_PROJECT",
"LANGSMITH_ENDPOINT",
"LANGCHAIN_ENDPOINT",
"LANGFUSE_TRACING",
"LANGFUSE_PUBLIC_KEY",
"LANGFUSE_SECRET_KEY",
"LANGFUSE_BASE_URL",
):
monkeypatch.delenv(name, raising=False)
tracing_module._tracing_config = None
yield
tracing_module._tracing_config = None
def test_build_tracing_callbacks_returns_empty_list_when_disabled(monkeypatch):
monkeypatch.setattr(tracing_factory, "validate_enabled_tracing_providers", lambda: None)
monkeypatch.setattr(tracing_factory, "get_enabled_tracing_providers", lambda: [])
callbacks = tracing_factory.build_tracing_callbacks()
assert callbacks == []
def test_build_tracing_callbacks_creates_langsmith_and_langfuse(monkeypatch):
class FakeLangSmithTracer:
def __init__(self, *, project_name: str):
self.project_name = project_name
class FakeLangfuseHandler:
def __init__(self, *, public_key: str):
self.public_key = public_key
monkeypatch.setattr(tracing_factory, "get_enabled_tracing_providers", lambda: ["langsmith", "langfuse"])
monkeypatch.setattr(tracing_factory, "validate_enabled_tracing_providers", lambda: None)
monkeypatch.setattr(
tracing_factory,
"get_tracing_config",
lambda: type(
"Cfg",
(),
{
"langsmith": type("LangSmithCfg", (), {"project": "smith-project"})(),
"langfuse": type(
"LangfuseCfg",
(),
{
"secret_key": "sk-lf-test",
"public_key": "pk-lf-test",
"host": "https://langfuse.example.com",
},
)(),
},
)(),
)
monkeypatch.setattr(tracing_factory, "_create_langsmith_tracer", lambda cfg: FakeLangSmithTracer(project_name=cfg.project))
monkeypatch.setattr(
tracing_factory,
"_create_langfuse_handler",
lambda cfg: FakeLangfuseHandler(public_key=cfg.public_key),
)
callbacks = tracing_factory.build_tracing_callbacks()
assert len(callbacks) == 2
assert callbacks[0].project_name == "smith-project"
assert callbacks[1].public_key == "pk-lf-test"
def test_build_tracing_callbacks_raises_when_enabled_provider_fails(monkeypatch):
monkeypatch.setattr(tracing_factory, "get_enabled_tracing_providers", lambda: ["langfuse"])
monkeypatch.setattr(tracing_factory, "validate_enabled_tracing_providers", lambda: None)
monkeypatch.setattr(
tracing_factory,
"get_tracing_config",
lambda: type(
"Cfg",
(),
{
"langfuse": type(
"LangfuseCfg",
(),
{"secret_key": "sk-lf-test", "public_key": "pk-lf-test", "host": "https://langfuse.example.com"},
)(),
},
)(),
)
monkeypatch.setattr(tracing_factory, "_create_langfuse_handler", lambda cfg: (_ for _ in ()).throw(RuntimeError("boom")))
with pytest.raises(RuntimeError, match="Langfuse tracing initialization failed"):
tracing_factory.build_tracing_callbacks()
def test_build_tracing_callbacks_raises_for_explicitly_enabled_misconfigured_provider(monkeypatch):
from deerflow.config import tracing_config as tracing_module
monkeypatch.setenv("LANGFUSE_TRACING", "true")
monkeypatch.delenv("LANGFUSE_PUBLIC_KEY", raising=False)
monkeypatch.setenv("LANGFUSE_SECRET_KEY", "sk-lf-test")
tracing_module._tracing_config = None
with pytest.raises(ValueError, match="LANGFUSE_PUBLIC_KEY"):
tracing_factory.build_tracing_callbacks()
def test_create_langfuse_handler_initializes_client_before_handler(monkeypatch):
calls: list[tuple[str, dict]] = []
class FakeLangfuse:
def __init__(self, **kwargs):
calls.append(("client", kwargs))
class FakeCallbackHandler:
def __init__(self, **kwargs):
calls.append(("handler", kwargs))
fake_langfuse_module = types.ModuleType("langfuse")
fake_langfuse_module.Langfuse = FakeLangfuse
fake_langfuse_langchain_module = types.ModuleType("langfuse.langchain")
fake_langfuse_langchain_module.CallbackHandler = FakeCallbackHandler
monkeypatch.setitem(sys.modules, "langfuse", fake_langfuse_module)
monkeypatch.setitem(sys.modules, "langfuse.langchain", fake_langfuse_langchain_module)
cfg = type(
"LangfuseCfg",
(),
{
"secret_key": "sk-lf-test",
"public_key": "pk-lf-test",
"host": "https://langfuse.example.com",
},
)()
tracing_factory._create_langfuse_handler(cfg)
assert calls == [
(
"client",
{
"secret_key": "sk-lf-test",
"public_key": "pk-lf-test",
"host": "https://langfuse.example.com",
},
),
(
"handler",
{
"public_key": "pk-lf-test",
},
),
]

View File

@ -289,6 +289,8 @@ class TestBeforeAgent:
"size": 5,
"path": "/mnt/user-data/uploads/notes.txt",
"extension": ".txt",
"outline": [],
"outline_preview": [],
}
]
@ -339,3 +341,130 @@ class TestBeforeAgent:
result = mw.before_agent(self._state(msg), _runtime())
assert result["messages"][-1].id == "original-id-42"
def test_outline_injected_when_md_file_exists(self, tmp_path):
"""When a converted .md file exists alongside the upload, its outline is injected."""
mw = _middleware(tmp_path)
uploads_dir = _uploads_dir(tmp_path)
(uploads_dir / "report.pdf").write_bytes(b"%PDF fake")
# Simulate the .md produced by the conversion pipeline
(uploads_dir / "report.md").write_text(
"# PART I\n\n## ITEM 1. BUSINESS\n\nBody text.\n\n## ITEM 2. RISK\n",
encoding="utf-8",
)
msg = _human("summarise", files=[{"filename": "report.pdf", "size": 9, "path": "/mnt/user-data/uploads/report.pdf"}])
result = mw.before_agent(self._state(msg), _runtime())
assert result is not None
content = result["messages"][-1].content
assert "Document outline" in content
assert "PART I" in content
assert "ITEM 1. BUSINESS" in content
assert "ITEM 2. RISK" in content
assert "read_file" in content
def test_no_outline_when_no_md_file(self, tmp_path):
"""Files without a sibling .md have no outline section."""
mw = _middleware(tmp_path)
uploads_dir = _uploads_dir(tmp_path)
(uploads_dir / "data.xlsx").write_bytes(b"fake-xlsx")
msg = _human("analyse", files=[{"filename": "data.xlsx", "size": 9, "path": "/mnt/user-data/uploads/data.xlsx"}])
result = mw.before_agent(self._state(msg), _runtime())
assert result is not None
content = result["messages"][-1].content
assert "Document outline" not in content
def test_outline_truncation_hint_shown(self, tmp_path):
"""When outline is truncated, a hint line is appended after the last visible entry."""
from deerflow.utils.file_conversion import MAX_OUTLINE_ENTRIES
mw = _middleware(tmp_path)
uploads_dir = _uploads_dir(tmp_path)
(uploads_dir / "big.pdf").write_bytes(b"%PDF fake")
# Write MAX_OUTLINE_ENTRIES + 5 headings so truncation is triggered
headings = "\n".join(f"# Heading {i}" for i in range(MAX_OUTLINE_ENTRIES + 5))
(uploads_dir / "big.md").write_text(headings, encoding="utf-8")
msg = _human("read", files=[{"filename": "big.pdf", "size": 9, "path": "/mnt/user-data/uploads/big.pdf"}])
result = mw.before_agent(self._state(msg), _runtime())
assert result is not None
content = result["messages"][-1].content
assert f"showing first {MAX_OUTLINE_ENTRIES} headings" in content
assert "use `read_file` to explore further" in content
def test_no_truncation_hint_for_short_outline(self, tmp_path):
"""Short outlines (under the cap) must not show a truncation hint."""
mw = _middleware(tmp_path)
uploads_dir = _uploads_dir(tmp_path)
(uploads_dir / "short.pdf").write_bytes(b"%PDF fake")
(uploads_dir / "short.md").write_text("# Intro\n\n# Conclusion\n", encoding="utf-8")
msg = _human("read", files=[{"filename": "short.pdf", "size": 9, "path": "/mnt/user-data/uploads/short.pdf"}])
result = mw.before_agent(self._state(msg), _runtime())
assert result is not None
content = result["messages"][-1].content
assert "showing first" not in content
def test_historical_file_outline_injected(self, tmp_path):
"""Outline is also shown for historical (previously uploaded) files."""
mw = _middleware(tmp_path)
uploads_dir = _uploads_dir(tmp_path)
# Historical file with .md
(uploads_dir / "old_report.pdf").write_bytes(b"%PDF old")
(uploads_dir / "old_report.md").write_text(
"# Chapter 1\n\n# Chapter 2\n",
encoding="utf-8",
)
# New file without .md
(uploads_dir / "new.txt").write_bytes(b"new")
msg = _human("go", files=[{"filename": "new.txt", "size": 3, "path": "/mnt/user-data/uploads/new.txt"}])
result = mw.before_agent(self._state(msg), _runtime())
assert result is not None
content = result["messages"][-1].content
assert "Chapter 1" in content
assert "Chapter 2" in content
def test_fallback_preview_shown_when_outline_empty(self, tmp_path):
"""When .md exists but has no headings, first lines are shown as a preview."""
mw = _middleware(tmp_path)
uploads_dir = _uploads_dir(tmp_path)
(uploads_dir / "report.pdf").write_bytes(b"%PDF fake")
# .md with no # headings — plain prose only
(uploads_dir / "report.md").write_text(
"Annual Financial Report 2024\n\nThis document summarises key findings.\n\nRevenue grew by 12%.\n",
encoding="utf-8",
)
msg = _human("analyse", files=[{"filename": "report.pdf", "size": 9, "path": "/mnt/user-data/uploads/report.pdf"}])
result = mw.before_agent(self._state(msg), _runtime())
assert result is not None
content = result["messages"][-1].content
# Outline section must NOT appear
assert "Document outline" not in content
# Preview lines must appear
assert "Annual Financial Report 2024" in content
assert "No structural headings detected" in content
# grep hint must appear
assert "grep" in content
def test_fallback_grep_hint_shown_when_no_md_file(self, tmp_path):
"""Files with no sibling .md still get the grep hint (outline is empty)."""
mw = _middleware(tmp_path)
uploads_dir = _uploads_dir(tmp_path)
(uploads_dir / "data.csv").write_bytes(b"a,b,c\n1,2,3\n")
msg = _human("analyse", files=[{"filename": "data.csv", "size": 12, "path": "/mnt/user-data/uploads/data.csv"}])
result = mw.before_agent(self._state(msg), _runtime())
assert result is not None
content = result["messages"][-1].content
assert "Document outline" not in content
assert "grep" in content

249
backend/uv.lock generated
View File

@ -53,7 +53,7 @@ wheels = [
[[package]]
name = "aiohttp"
version = "3.13.3"
version = "3.13.4"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "aiohappyeyeballs" },
@ -64,76 +64,76 @@ dependencies = [
{ name = "propcache" },
{ name = "yarl" },
]
sdist = { url = "https://files.pythonhosted.org/packages/50/42/32cf8e7704ceb4481406eb87161349abb46a57fee3f008ba9cb610968646/aiohttp-3.13.3.tar.gz", hash = "sha256:a949eee43d3782f2daae4f4a2819b2cb9b0c5d3b7f7a927067cc84dafdbb9f88", size = 7844556, upload-time = "2026-01-03T17:33:05.204Z" }
sdist = { url = "https://files.pythonhosted.org/packages/45/4a/064321452809dae953c1ed6e017504e72551a26b6f5708a5a80e4bf556ff/aiohttp-3.13.4.tar.gz", hash = "sha256:d97a6d09c66087890c2ab5d49069e1e570583f7ac0314ecf98294c1b6aaebd38", size = 7859748, upload-time = "2026-03-28T17:19:40.6Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/a0/be/4fc11f202955a69e0db803a12a062b8379c970c7c84f4882b6da17337cc1/aiohttp-3.13.3-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:b903a4dfee7d347e2d87697d0713be59e0b87925be030c9178c5faa58ea58d5c", size = 739732, upload-time = "2026-01-03T17:30:14.23Z" },
{ url = "https://files.pythonhosted.org/packages/97/2c/621d5b851f94fa0bb7430d6089b3aa970a9d9b75196bc93bb624b0db237a/aiohttp-3.13.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:a45530014d7a1e09f4a55f4f43097ba0fd155089372e105e4bff4ca76cb1b168", size = 494293, upload-time = "2026-01-03T17:30:15.96Z" },
{ url = "https://files.pythonhosted.org/packages/5d/43/4be01406b78e1be8320bb8316dc9c42dbab553d281c40364e0f862d5661c/aiohttp-3.13.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:27234ef6d85c914f9efeb77ff616dbf4ad2380be0cda40b4db086ffc7ddd1b7d", size = 493533, upload-time = "2026-01-03T17:30:17.431Z" },
{ url = "https://files.pythonhosted.org/packages/8d/a8/5a35dc56a06a2c90d4742cbf35294396907027f80eea696637945a106f25/aiohttp-3.13.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d32764c6c9aafb7fb55366a224756387cd50bfa720f32b88e0e6fa45b27dcf29", size = 1737839, upload-time = "2026-01-03T17:30:19.422Z" },
{ url = "https://files.pythonhosted.org/packages/bf/62/4b9eeb331da56530bf2e198a297e5303e1c1ebdceeb00fe9b568a65c5a0c/aiohttp-3.13.3-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:b1a6102b4d3ebc07dad44fbf07b45bb600300f15b552ddf1851b5390202ea2e3", size = 1703932, upload-time = "2026-01-03T17:30:21.756Z" },
{ url = "https://files.pythonhosted.org/packages/7c/f6/af16887b5d419e6a367095994c0b1332d154f647e7dc2bd50e61876e8e3d/aiohttp-3.13.3-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:c014c7ea7fb775dd015b2d3137378b7be0249a448a1612268b5a90c2d81de04d", size = 1771906, upload-time = "2026-01-03T17:30:23.932Z" },
{ url = "https://files.pythonhosted.org/packages/ce/83/397c634b1bcc24292fa1e0c7822800f9f6569e32934bdeef09dae7992dfb/aiohttp-3.13.3-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:2b8d8ddba8f95ba17582226f80e2de99c7a7948e66490ef8d947e272a93e9463", size = 1871020, upload-time = "2026-01-03T17:30:26Z" },
{ url = "https://files.pythonhosted.org/packages/86/f6/a62cbbf13f0ac80a70f71b1672feba90fdb21fd7abd8dbf25c0105fb6fa3/aiohttp-3.13.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9ae8dd55c8e6c4257eae3a20fd2c8f41edaea5992ed67156642493b8daf3cecc", size = 1755181, upload-time = "2026-01-03T17:30:27.554Z" },
{ url = "https://files.pythonhosted.org/packages/0a/87/20a35ad487efdd3fba93d5843efdfaa62d2f1479eaafa7453398a44faf13/aiohttp-3.13.3-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:01ad2529d4b5035578f5081606a465f3b814c542882804e2e8cda61adf5c71bf", size = 1561794, upload-time = "2026-01-03T17:30:29.254Z" },
{ url = "https://files.pythonhosted.org/packages/de/95/8fd69a66682012f6716e1bc09ef8a1a2a91922c5725cb904689f112309c4/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:bb4f7475e359992b580559e008c598091c45b5088f28614e855e42d39c2f1033", size = 1697900, upload-time = "2026-01-03T17:30:31.033Z" },
{ url = "https://files.pythonhosted.org/packages/e5/66/7b94b3b5ba70e955ff597672dad1691333080e37f50280178967aff68657/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:c19b90316ad3b24c69cd78d5c9b4f3aa4497643685901185b65166293d36a00f", size = 1728239, upload-time = "2026-01-03T17:30:32.703Z" },
{ url = "https://files.pythonhosted.org/packages/47/71/6f72f77f9f7d74719692ab65a2a0252584bf8d5f301e2ecb4c0da734530a/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:96d604498a7c782cb15a51c406acaea70d8c027ee6b90c569baa6e7b93073679", size = 1740527, upload-time = "2026-01-03T17:30:34.695Z" },
{ url = "https://files.pythonhosted.org/packages/fa/b4/75ec16cbbd5c01bdaf4a05b19e103e78d7ce1ef7c80867eb0ace42ff4488/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:084911a532763e9d3dd95adf78a78f4096cd5f58cdc18e6fdbc1b58417a45423", size = 1554489, upload-time = "2026-01-03T17:30:36.864Z" },
{ url = "https://files.pythonhosted.org/packages/52/8f/bc518c0eea29f8406dcf7ed1f96c9b48e3bc3995a96159b3fc11f9e08321/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:7a4a94eb787e606d0a09404b9c38c113d3b099d508021faa615d70a0131907ce", size = 1767852, upload-time = "2026-01-03T17:30:39.433Z" },
{ url = "https://files.pythonhosted.org/packages/9d/f2/a07a75173124f31f11ea6f863dc44e6f09afe2bca45dd4e64979490deab1/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:87797e645d9d8e222e04160ee32aa06bc5c163e8499f24db719e7852ec23093a", size = 1722379, upload-time = "2026-01-03T17:30:41.081Z" },
{ url = "https://files.pythonhosted.org/packages/3c/4a/1a3fee7c21350cac78e5c5cef711bac1b94feca07399f3d406972e2d8fcd/aiohttp-3.13.3-cp312-cp312-win32.whl", hash = "sha256:b04be762396457bef43f3597c991e192ee7da460a4953d7e647ee4b1c28e7046", size = 428253, upload-time = "2026-01-03T17:30:42.644Z" },
{ url = "https://files.pythonhosted.org/packages/d9/b7/76175c7cb4eb73d91ad63c34e29fc4f77c9386bba4a65b53ba8e05ee3c39/aiohttp-3.13.3-cp312-cp312-win_amd64.whl", hash = "sha256:e3531d63d3bdfa7e3ac5e9b27b2dd7ec9df3206a98e0b3445fa906f233264c57", size = 455407, upload-time = "2026-01-03T17:30:44.195Z" },
{ url = "https://files.pythonhosted.org/packages/97/8a/12ca489246ca1faaf5432844adbfce7ff2cc4997733e0af120869345643a/aiohttp-3.13.3-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:5dff64413671b0d3e7d5918ea490bdccb97a4ad29b3f311ed423200b2203e01c", size = 734190, upload-time = "2026-01-03T17:30:45.832Z" },
{ url = "https://files.pythonhosted.org/packages/32/08/de43984c74ed1fca5c014808963cc83cb00d7bb06af228f132d33862ca76/aiohttp-3.13.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:87b9aab6d6ed88235aa2970294f496ff1a1f9adcd724d800e9b952395a80ffd9", size = 491783, upload-time = "2026-01-03T17:30:47.466Z" },
{ url = "https://files.pythonhosted.org/packages/17/f8/8dd2cf6112a5a76f81f81a5130c57ca829d101ad583ce57f889179accdda/aiohttp-3.13.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:425c126c0dc43861e22cb1c14ba4c8e45d09516d0a3ae0a3f7494b79f5f233a3", size = 490704, upload-time = "2026-01-03T17:30:49.373Z" },
{ url = "https://files.pythonhosted.org/packages/6d/40/a46b03ca03936f832bc7eaa47cfbb1ad012ba1be4790122ee4f4f8cba074/aiohttp-3.13.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7f9120f7093c2a32d9647abcaf21e6ad275b4fbec5b55969f978b1a97c7c86bf", size = 1720652, upload-time = "2026-01-03T17:30:50.974Z" },
{ url = "https://files.pythonhosted.org/packages/f7/7e/917fe18e3607af92657e4285498f500dca797ff8c918bd7d90b05abf6c2a/aiohttp-3.13.3-cp313-cp313-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:697753042d57f4bf7122cab985bf15d0cef23c770864580f5af4f52023a56bd6", size = 1692014, upload-time = "2026-01-03T17:30:52.729Z" },
{ url = "https://files.pythonhosted.org/packages/71/b6/cefa4cbc00d315d68973b671cf105b21a609c12b82d52e5d0c9ae61d2a09/aiohttp-3.13.3-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:6de499a1a44e7de70735d0b39f67c8f25eb3d91eb3103be99ca0fa882cdd987d", size = 1759777, upload-time = "2026-01-03T17:30:54.537Z" },
{ url = "https://files.pythonhosted.org/packages/fb/e3/e06ee07b45e59e6d81498b591fc589629be1553abb2a82ce33efe2a7b068/aiohttp-3.13.3-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:37239e9f9a7ea9ac5bf6b92b0260b01f8a22281996da609206a84df860bc1261", size = 1861276, upload-time = "2026-01-03T17:30:56.512Z" },
{ url = "https://files.pythonhosted.org/packages/7c/24/75d274228acf35ceeb2850b8ce04de9dd7355ff7a0b49d607ee60c29c518/aiohttp-3.13.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f76c1e3fe7d7c8afad7ed193f89a292e1999608170dcc9751a7462a87dfd5bc0", size = 1743131, upload-time = "2026-01-03T17:30:58.256Z" },
{ url = "https://files.pythonhosted.org/packages/04/98/3d21dde21889b17ca2eea54fdcff21b27b93f45b7bb94ca029c31ab59dc3/aiohttp-3.13.3-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:fc290605db2a917f6e81b0e1e0796469871f5af381ce15c604a3c5c7e51cb730", size = 1556863, upload-time = "2026-01-03T17:31:00.445Z" },
{ url = "https://files.pythonhosted.org/packages/9e/84/da0c3ab1192eaf64782b03971ab4055b475d0db07b17eff925e8c93b3aa5/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:4021b51936308aeea0367b8f006dc999ca02bc118a0cc78c303f50a2ff6afb91", size = 1682793, upload-time = "2026-01-03T17:31:03.024Z" },
{ url = "https://files.pythonhosted.org/packages/ff/0f/5802ada182f575afa02cbd0ec5180d7e13a402afb7c2c03a9aa5e5d49060/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:49a03727c1bba9a97d3e93c9f93ca03a57300f484b6e935463099841261195d3", size = 1716676, upload-time = "2026-01-03T17:31:04.842Z" },
{ url = "https://files.pythonhosted.org/packages/3f/8c/714d53bd8b5a4560667f7bbbb06b20c2382f9c7847d198370ec6526af39c/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:3d9908a48eb7416dc1f4524e69f1d32e5d90e3981e4e37eb0aa1cd18f9cfa2a4", size = 1733217, upload-time = "2026-01-03T17:31:06.868Z" },
{ url = "https://files.pythonhosted.org/packages/7d/79/e2176f46d2e963facea939f5be2d26368ce543622be6f00a12844d3c991f/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:2712039939ec963c237286113c68dbad80a82a4281543f3abf766d9d73228998", size = 1552303, upload-time = "2026-01-03T17:31:08.958Z" },
{ url = "https://files.pythonhosted.org/packages/ab/6a/28ed4dea1759916090587d1fe57087b03e6c784a642b85ef48217b0277ae/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:7bfdc049127717581866fa4708791220970ce291c23e28ccf3922c700740fdc0", size = 1763673, upload-time = "2026-01-03T17:31:10.676Z" },
{ url = "https://files.pythonhosted.org/packages/e8/35/4a3daeb8b9fab49240d21c04d50732313295e4bd813a465d840236dd0ce1/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8057c98e0c8472d8846b9c79f56766bcc57e3e8ac7bfd510482332366c56c591", size = 1721120, upload-time = "2026-01-03T17:31:12.575Z" },
{ url = "https://files.pythonhosted.org/packages/bc/9f/d643bb3c5fb99547323e635e251c609fbbc660d983144cfebec529e09264/aiohttp-3.13.3-cp313-cp313-win32.whl", hash = "sha256:1449ceddcdbcf2e0446957863af03ebaaa03f94c090f945411b61269e2cb5daf", size = 427383, upload-time = "2026-01-03T17:31:14.382Z" },
{ url = "https://files.pythonhosted.org/packages/4e/f1/ab0395f8a79933577cdd996dd2f9aa6014af9535f65dddcf88204682fe62/aiohttp-3.13.3-cp313-cp313-win_amd64.whl", hash = "sha256:693781c45a4033d31d4187d2436f5ac701e7bbfe5df40d917736108c1cc7436e", size = 453899, upload-time = "2026-01-03T17:31:15.958Z" },
{ url = "https://files.pythonhosted.org/packages/99/36/5b6514a9f5d66f4e2597e40dea2e3db271e023eb7a5d22defe96ba560996/aiohttp-3.13.3-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:ea37047c6b367fd4bd632bff8077449b8fa034b69e812a18e0132a00fae6e808", size = 737238, upload-time = "2026-01-03T17:31:17.909Z" },
{ url = "https://files.pythonhosted.org/packages/f7/49/459327f0d5bcd8c6c9ca69e60fdeebc3622861e696490d8674a6d0cb90a6/aiohttp-3.13.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:6fc0e2337d1a4c3e6acafda6a78a39d4c14caea625124817420abceed36e2415", size = 492292, upload-time = "2026-01-03T17:31:19.919Z" },
{ url = "https://files.pythonhosted.org/packages/e8/0b/b97660c5fd05d3495b4eb27f2d0ef18dc1dc4eff7511a9bf371397ff0264/aiohttp-3.13.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c685f2d80bb67ca8c3837823ad76196b3694b0159d232206d1e461d3d434666f", size = 493021, upload-time = "2026-01-03T17:31:21.636Z" },
{ url = "https://files.pythonhosted.org/packages/54/d4/438efabdf74e30aeceb890c3290bbaa449780583b1270b00661126b8aae4/aiohttp-3.13.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:48e377758516d262bde50c2584fc6c578af272559c409eecbdd2bae1601184d6", size = 1717263, upload-time = "2026-01-03T17:31:23.296Z" },
{ url = "https://files.pythonhosted.org/packages/71/f2/7bddc7fd612367d1459c5bcf598a9e8f7092d6580d98de0e057eb42697ad/aiohttp-3.13.3-cp314-cp314-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:34749271508078b261c4abb1767d42b8d0c0cc9449c73a4df494777dc55f0687", size = 1669107, upload-time = "2026-01-03T17:31:25.334Z" },
{ url = "https://files.pythonhosted.org/packages/00/5a/1aeaecca40e22560f97610a329e0e5efef5e0b5afdf9f857f0d93839ab2e/aiohttp-3.13.3-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:82611aeec80eb144416956ec85b6ca45a64d76429c1ed46ae1b5f86c6e0c9a26", size = 1760196, upload-time = "2026-01-03T17:31:27.394Z" },
{ url = "https://files.pythonhosted.org/packages/f8/f8/0ff6992bea7bd560fc510ea1c815f87eedd745fe035589c71ce05612a19a/aiohttp-3.13.3-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:2fff83cfc93f18f215896e3a190e8e5cb413ce01553901aca925176e7568963a", size = 1843591, upload-time = "2026-01-03T17:31:29.238Z" },
{ url = "https://files.pythonhosted.org/packages/e3/d1/e30e537a15f53485b61f5be525f2157da719819e8377298502aebac45536/aiohttp-3.13.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bbe7d4cecacb439e2e2a8a1a7b935c25b812af7a5fd26503a66dadf428e79ec1", size = 1720277, upload-time = "2026-01-03T17:31:31.053Z" },
{ url = "https://files.pythonhosted.org/packages/84/45/23f4c451d8192f553d38d838831ebbc156907ea6e05557f39563101b7717/aiohttp-3.13.3-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:b928f30fe49574253644b1ca44b1b8adbd903aa0da4b9054a6c20fc7f4092a25", size = 1548575, upload-time = "2026-01-03T17:31:32.87Z" },
{ url = "https://files.pythonhosted.org/packages/6a/ed/0a42b127a43712eda7807e7892c083eadfaf8429ca8fb619662a530a3aab/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:7b5e8fe4de30df199155baaf64f2fcd604f4c678ed20910db8e2c66dc4b11603", size = 1679455, upload-time = "2026-01-03T17:31:34.76Z" },
{ url = "https://files.pythonhosted.org/packages/2e/b5/c05f0c2b4b4fe2c9d55e73b6d3ed4fd6c9dc2684b1d81cbdf77e7fad9adb/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:8542f41a62bcc58fc7f11cf7c90e0ec324ce44950003feb70640fc2a9092c32a", size = 1687417, upload-time = "2026-01-03T17:31:36.699Z" },
{ url = "https://files.pythonhosted.org/packages/c9/6b/915bc5dad66aef602b9e459b5a973529304d4e89ca86999d9d75d80cbd0b/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:5e1d8c8b8f1d91cd08d8f4a3c2b067bfca6ec043d3ff36de0f3a715feeedf926", size = 1729968, upload-time = "2026-01-03T17:31:38.622Z" },
{ url = "https://files.pythonhosted.org/packages/11/3b/e84581290a9520024a08640b63d07673057aec5ca548177a82026187ba73/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:90455115e5da1c3c51ab619ac57f877da8fd6d73c05aacd125c5ae9819582aba", size = 1545690, upload-time = "2026-01-03T17:31:40.57Z" },
{ url = "https://files.pythonhosted.org/packages/f5/04/0c3655a566c43fd647c81b895dfe361b9f9ad6d58c19309d45cff52d6c3b/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:042e9e0bcb5fba81886c8b4fbb9a09d6b8a00245fd8d88e4d989c1f96c74164c", size = 1746390, upload-time = "2026-01-03T17:31:42.857Z" },
{ url = "https://files.pythonhosted.org/packages/1f/53/71165b26978f719c3419381514c9690bd5980e764a09440a10bb816ea4ab/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:2eb752b102b12a76ca02dff751a801f028b4ffbbc478840b473597fc91a9ed43", size = 1702188, upload-time = "2026-01-03T17:31:44.984Z" },
{ url = "https://files.pythonhosted.org/packages/29/a7/cbe6c9e8e136314fa1980da388a59d2f35f35395948a08b6747baebb6aa6/aiohttp-3.13.3-cp314-cp314-win32.whl", hash = "sha256:b556c85915d8efaed322bf1bdae9486aa0f3f764195a0fb6ee962e5c71ef5ce1", size = 433126, upload-time = "2026-01-03T17:31:47.463Z" },
{ url = "https://files.pythonhosted.org/packages/de/56/982704adea7d3b16614fc5936014e9af85c0e34b58f9046655817f04306e/aiohttp-3.13.3-cp314-cp314-win_amd64.whl", hash = "sha256:9bf9f7a65e7aa20dd764151fb3d616c81088f91f8df39c3893a536e279b4b984", size = 459128, upload-time = "2026-01-03T17:31:49.2Z" },
{ url = "https://files.pythonhosted.org/packages/6c/2a/3c79b638a9c3d4658d345339d22070241ea341ed4e07b5ac60fb0f418003/aiohttp-3.13.3-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:05861afbbec40650d8a07ea324367cb93e9e8cc7762e04dd4405df99fa65159c", size = 769512, upload-time = "2026-01-03T17:31:51.134Z" },
{ url = "https://files.pythonhosted.org/packages/29/b9/3e5014d46c0ab0db8707e0ac2711ed28c4da0218c358a4e7c17bae0d8722/aiohttp-3.13.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:2fc82186fadc4a8316768d61f3722c230e2c1dcab4200d52d2ebdf2482e47592", size = 506444, upload-time = "2026-01-03T17:31:52.85Z" },
{ url = "https://files.pythonhosted.org/packages/90/03/c1d4ef9a054e151cd7839cdc497f2638f00b93cbe8043983986630d7a80c/aiohttp-3.13.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:0add0900ff220d1d5c5ebbf99ed88b0c1bbf87aa7e4262300ed1376a6b13414f", size = 510798, upload-time = "2026-01-03T17:31:54.91Z" },
{ url = "https://files.pythonhosted.org/packages/ea/76/8c1e5abbfe8e127c893fe7ead569148a4d5a799f7cf958d8c09f3eedf097/aiohttp-3.13.3-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:568f416a4072fbfae453dcf9a99194bbb8bdeab718e08ee13dfa2ba0e4bebf29", size = 1868835, upload-time = "2026-01-03T17:31:56.733Z" },
{ url = "https://files.pythonhosted.org/packages/8e/ac/984c5a6f74c363b01ff97adc96a3976d9c98940b8969a1881575b279ac5d/aiohttp-3.13.3-cp314-cp314t-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:add1da70de90a2569c5e15249ff76a631ccacfe198375eead4aadf3b8dc849dc", size = 1720486, upload-time = "2026-01-03T17:31:58.65Z" },
{ url = "https://files.pythonhosted.org/packages/b2/9a/b7039c5f099c4eb632138728828b33428585031a1e658d693d41d07d89d1/aiohttp-3.13.3-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:10b47b7ba335d2e9b1239fa571131a87e2d8ec96b333e68b2a305e7a98b0bae2", size = 1847951, upload-time = "2026-01-03T17:32:00.989Z" },
{ url = "https://files.pythonhosted.org/packages/3c/02/3bec2b9a1ba3c19ff89a43a19324202b8eb187ca1e928d8bdac9bbdddebd/aiohttp-3.13.3-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:3dd4dce1c718e38081c8f35f323209d4c1df7d4db4bab1b5c88a6b4d12b74587", size = 1941001, upload-time = "2026-01-03T17:32:03.122Z" },
{ url = "https://files.pythonhosted.org/packages/37/df/d879401cedeef27ac4717f6426c8c36c3091c6e9f08a9178cc87549c537f/aiohttp-3.13.3-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:34bac00a67a812570d4a460447e1e9e06fae622946955f939051e7cc895cfab8", size = 1797246, upload-time = "2026-01-03T17:32:05.255Z" },
{ url = "https://files.pythonhosted.org/packages/8d/15/be122de1f67e6953add23335c8ece6d314ab67c8bebb3f181063010795a7/aiohttp-3.13.3-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:a19884d2ee70b06d9204b2727a7b9f983d0c684c650254679e716b0b77920632", size = 1627131, upload-time = "2026-01-03T17:32:07.607Z" },
{ url = "https://files.pythonhosted.org/packages/12/12/70eedcac9134cfa3219ab7af31ea56bc877395b1ac30d65b1bc4b27d0438/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:5f8ca7f2bb6ba8348a3614c7918cc4bb73268c5ac2a207576b7afea19d3d9f64", size = 1795196, upload-time = "2026-01-03T17:32:09.59Z" },
{ url = "https://files.pythonhosted.org/packages/32/11/b30e1b1cd1f3054af86ebe60df96989c6a414dd87e27ad16950eee420bea/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:b0d95340658b9d2f11d9697f59b3814a9d3bb4b7a7c20b131df4bcef464037c0", size = 1782841, upload-time = "2026-01-03T17:32:11.445Z" },
{ url = "https://files.pythonhosted.org/packages/88/0d/d98a9367b38912384a17e287850f5695c528cff0f14f791ce8ee2e4f7796/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:a1e53262fd202e4b40b70c3aff944a8155059beedc8a89bba9dc1f9ef06a1b56", size = 1795193, upload-time = "2026-01-03T17:32:13.705Z" },
{ url = "https://files.pythonhosted.org/packages/43/a5/a2dfd1f5ff5581632c7f6a30e1744deda03808974f94f6534241ef60c751/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:d60ac9663f44168038586cab2157e122e46bdef09e9368b37f2d82d354c23f72", size = 1621979, upload-time = "2026-01-03T17:32:15.965Z" },
{ url = "https://files.pythonhosted.org/packages/fa/f0/12973c382ae7c1cccbc4417e129c5bf54c374dfb85af70893646e1f0e749/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:90751b8eed69435bac9ff4e3d2f6b3af1f57e37ecb0fbeee59c0174c9e2d41df", size = 1822193, upload-time = "2026-01-03T17:32:18.219Z" },
{ url = "https://files.pythonhosted.org/packages/3c/5f/24155e30ba7f8c96918af1350eb0663e2430aad9e001c0489d89cd708ab1/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:fc353029f176fd2b3ec6cfc71be166aba1936fe5d73dd1992ce289ca6647a9aa", size = 1769801, upload-time = "2026-01-03T17:32:20.25Z" },
{ url = "https://files.pythonhosted.org/packages/eb/f8/7314031ff5c10e6ece114da79b338ec17eeff3a079e53151f7e9f43c4723/aiohttp-3.13.3-cp314-cp314t-win32.whl", hash = "sha256:2e41b18a58da1e474a057b3d35248d8320029f61d70a37629535b16a0c8f3767", size = 466523, upload-time = "2026-01-03T17:32:22.215Z" },
{ url = "https://files.pythonhosted.org/packages/b4/63/278a98c715ae467624eafe375542d8ba9b4383a016df8fdefe0ae28382a7/aiohttp-3.13.3-cp314-cp314t-win_amd64.whl", hash = "sha256:44531a36aa2264a1860089ffd4dce7baf875ee5a6079d5fb42e261c704ef7344", size = 499694, upload-time = "2026-01-03T17:32:24.546Z" },
{ url = "https://files.pythonhosted.org/packages/1e/bd/ede278648914cabbabfdf95e436679b5d4156e417896a9b9f4587169e376/aiohttp-3.13.4-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:ee62d4471ce86b108b19c3364db4b91180d13fe3510144872d6bad5401957360", size = 752158, upload-time = "2026-03-28T17:16:06.901Z" },
{ url = "https://files.pythonhosted.org/packages/90/de/581c053253c07b480b03785196ca5335e3c606a37dc73e95f6527f1591fe/aiohttp-3.13.4-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:c0fd8f41b54b58636402eb493afd512c23580456f022c1ba2db0f810c959ed0d", size = 501037, upload-time = "2026-03-28T17:16:08.82Z" },
{ url = "https://files.pythonhosted.org/packages/fa/f9/a5ede193c08f13cc42c0a5b50d1e246ecee9115e4cf6e900d8dbd8fd6acb/aiohttp-3.13.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4baa48ce49efd82d6b1a0be12d6a36b35e5594d1dd42f8bfba96ea9f8678b88c", size = 501556, upload-time = "2026-03-28T17:16:10.63Z" },
{ url = "https://files.pythonhosted.org/packages/d6/10/88ff67cd48a6ec36335b63a640abe86135791544863e0cfe1f065d6cef7a/aiohttp-3.13.4-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d738ebab9f71ee652d9dbd0211057690022201b11197f9a7324fd4dba128aa97", size = 1757314, upload-time = "2026-03-28T17:16:12.498Z" },
{ url = "https://files.pythonhosted.org/packages/8b/15/fdb90a5cf5a1f52845c276e76298c75fbbcc0ac2b4a86551906d54529965/aiohttp-3.13.4-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:0ce692c3468fa831af7dceed52edf51ac348cebfc8d3feb935927b63bd3e8576", size = 1731819, upload-time = "2026-03-28T17:16:14.558Z" },
{ url = "https://files.pythonhosted.org/packages/ec/df/28146785a007f7820416be05d4f28cc207493efd1e8c6c1068e9bdc29198/aiohttp-3.13.4-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:8e08abcfe752a454d2cb89ff0c08f2d1ecd057ae3e8cc6d84638de853530ebab", size = 1793279, upload-time = "2026-03-28T17:16:16.594Z" },
{ url = "https://files.pythonhosted.org/packages/10/47/689c743abf62ea7a77774d5722f220e2c912a77d65d368b884d9779ef41b/aiohttp-3.13.4-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5977f701b3fff36367a11087f30ea73c212e686d41cd363c50c022d48b011d8d", size = 1891082, upload-time = "2026-03-28T17:16:18.71Z" },
{ url = "https://files.pythonhosted.org/packages/b0/b6/f7f4f318c7e58c23b761c9b13b9a3c9b394e0f9d5d76fbc6622fa98509f6/aiohttp-3.13.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:54203e10405c06f8b6020bd1e076ae0fe6c194adcee12a5a78af3ffa3c57025e", size = 1773938, upload-time = "2026-03-28T17:16:21.125Z" },
{ url = "https://files.pythonhosted.org/packages/aa/06/f207cb3121852c989586a6fc16ff854c4fcc8651b86c5d3bd1fc83057650/aiohttp-3.13.4-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:358a6af0145bc4dda037f13167bef3cce54b132087acc4c295c739d05d16b1c3", size = 1579548, upload-time = "2026-03-28T17:16:23.588Z" },
{ url = "https://files.pythonhosted.org/packages/6c/58/e1289661a32161e24c1fe479711d783067210d266842523752869cc1d9c2/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:898ea1850656d7d61832ef06aa9846ab3ddb1621b74f46de78fbc5e1a586ba83", size = 1714669, upload-time = "2026-03-28T17:16:25.713Z" },
{ url = "https://files.pythonhosted.org/packages/96/0a/3e86d039438a74a86e6a948a9119b22540bae037d6ba317a042ae3c22711/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:7bc30cceb710cf6a44e9617e43eebb6e3e43ad855a34da7b4b6a73537d8a6763", size = 1754175, upload-time = "2026-03-28T17:16:28.18Z" },
{ url = "https://files.pythonhosted.org/packages/f4/30/e717fc5df83133ba467a560b6d8ef20197037b4bb5d7075b90037de1018e/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:4a31c0c587a8a038f19a4c7e60654a6c899c9de9174593a13e7cc6e15ff271f9", size = 1762049, upload-time = "2026-03-28T17:16:30.941Z" },
{ url = "https://files.pythonhosted.org/packages/e4/28/8f7a2d4492e336e40005151bdd94baf344880a4707573378579f833a64c1/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:2062f675f3fe6e06d6113eb74a157fb9df58953ffed0cdb4182554b116545758", size = 1570861, upload-time = "2026-03-28T17:16:32.953Z" },
{ url = "https://files.pythonhosted.org/packages/78/45/12e1a3d0645968b1c38de4b23fdf270b8637735ea057d4f84482ff918ad9/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:3d1ba8afb847ff80626d5e408c1fdc99f942acc877d0702fe137015903a220a9", size = 1790003, upload-time = "2026-03-28T17:16:35.468Z" },
{ url = "https://files.pythonhosted.org/packages/eb/0f/60374e18d590de16dcb39d6ff62f39c096c1b958e6f37727b5870026ea30/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:b08149419994cdd4d5eecf7fd4bc5986b5a9380285bcd01ab4c0d6bfca47b79d", size = 1737289, upload-time = "2026-03-28T17:16:38.187Z" },
{ url = "https://files.pythonhosted.org/packages/02/bf/535e58d886cfbc40a8b0013c974afad24ef7632d645bca0b678b70033a60/aiohttp-3.13.4-cp312-cp312-win32.whl", hash = "sha256:fc432f6a2c4f720180959bc19aa37259651c1a4ed8af8afc84dd41c60f15f791", size = 434185, upload-time = "2026-03-28T17:16:40.735Z" },
{ url = "https://files.pythonhosted.org/packages/1e/1a/d92e3325134ebfff6f4069f270d3aac770d63320bd1fcd0eca023e74d9a8/aiohttp-3.13.4-cp312-cp312-win_amd64.whl", hash = "sha256:6148c9ae97a3e8bff9a1fc9c757fa164116f86c100468339730e717590a3fb77", size = 461285, upload-time = "2026-03-28T17:16:42.713Z" },
{ url = "https://files.pythonhosted.org/packages/e3/ac/892f4162df9b115b4758d615f32ec63d00f3084c705ff5526630887b9b42/aiohttp-3.13.4-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:63dd5e5b1e43b8fb1e91b79b7ceba1feba588b317d1edff385084fcc7a0a4538", size = 745744, upload-time = "2026-03-28T17:16:44.67Z" },
{ url = "https://files.pythonhosted.org/packages/97/a9/c5b87e4443a2f0ea88cb3000c93a8fdad1ee63bffc9ded8d8c8e0d66efc6/aiohttp-3.13.4-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:746ac3cc00b5baea424dacddea3ec2c2702f9590de27d837aa67004db1eebc6e", size = 498178, upload-time = "2026-03-28T17:16:46.766Z" },
{ url = "https://files.pythonhosted.org/packages/94/42/07e1b543a61250783650df13da8ddcdc0d0a5538b2bd15cef6e042aefc61/aiohttp-3.13.4-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:bda8f16ea99d6a6705e5946732e48487a448be874e54a4f73d514660ff7c05d3", size = 498331, upload-time = "2026-03-28T17:16:48.9Z" },
{ url = "https://files.pythonhosted.org/packages/20/d6/492f46bf0328534124772d0cf58570acae5b286ea25006900650f69dae0e/aiohttp-3.13.4-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4b061e7b5f840391e3f64d0ddf672973e45c4cfff7a0feea425ea24e51530fc2", size = 1744414, upload-time = "2026-03-28T17:16:50.968Z" },
{ url = "https://files.pythonhosted.org/packages/e2/4d/e02627b2683f68051246215d2d62b2d2f249ff7a285e7a858dc47d6b6a14/aiohttp-3.13.4-cp313-cp313-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:b252e8d5cd66184b570d0d010de742736e8a4fab22c58299772b0c5a466d4b21", size = 1719226, upload-time = "2026-03-28T17:16:53.173Z" },
{ url = "https://files.pythonhosted.org/packages/7b/6c/5d0a3394dd2b9f9aeba6e1b6065d0439e4b75d41f1fb09a3ec010b43552b/aiohttp-3.13.4-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:20af8aad61d1803ff11152a26146d8d81c266aa8c5aa9b4504432abb965c36a0", size = 1782110, upload-time = "2026-03-28T17:16:55.362Z" },
{ url = "https://files.pythonhosted.org/packages/0d/2d/c20791e3437700a7441a7edfb59731150322424f5aadf635602d1d326101/aiohttp-3.13.4-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:13a5cc924b59859ad2adb1478e31f410a7ed46e92a2a619d6d1dd1a63c1a855e", size = 1884809, upload-time = "2026-03-28T17:16:57.734Z" },
{ url = "https://files.pythonhosted.org/packages/c8/94/d99dbfbd1924a87ef643833932eb2a3d9e5eee87656efea7d78058539eff/aiohttp-3.13.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:534913dfb0a644d537aebb4123e7d466d94e3be5549205e6a31f72368980a81a", size = 1764938, upload-time = "2026-03-28T17:17:00.221Z" },
{ url = "https://files.pythonhosted.org/packages/49/61/3ce326a1538781deb89f6cf5e094e2029cd308ed1e21b2ba2278b08426f6/aiohttp-3.13.4-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:320e40192a2dcc1cf4b5576936e9652981ab596bf81eb309535db7e2f5b5672f", size = 1570697, upload-time = "2026-03-28T17:17:02.985Z" },
{ url = "https://files.pythonhosted.org/packages/b6/77/4ab5a546857bb3028fbaf34d6eea180267bdab022ee8b1168b1fcde4bfdd/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:9e587fcfce2bcf06526a43cb705bdee21ac089096f2e271d75de9c339db3100c", size = 1702258, upload-time = "2026-03-28T17:17:05.28Z" },
{ url = "https://files.pythonhosted.org/packages/79/63/d8f29021e39bc5af8e5d5e9da1b07976fb9846487a784e11e4f4eeda4666/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:9eb9c2eea7278206b5c6c1441fdd9dc420c278ead3f3b2cc87f9b693698cc500", size = 1740287, upload-time = "2026-03-28T17:17:07.712Z" },
{ url = "https://files.pythonhosted.org/packages/55/3a/cbc6b3b124859a11bc8055d3682c26999b393531ef926754a3445b99dfef/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:29be00c51972b04bf9d5c8f2d7f7314f48f96070ca40a873a53056e652e805f7", size = 1753011, upload-time = "2026-03-28T17:17:10.053Z" },
{ url = "https://files.pythonhosted.org/packages/e0/30/836278675205d58c1368b21520eab9572457cf19afd23759216c04483048/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:90c06228a6c3a7c9f776fe4fc0b7ff647fffd3bed93779a6913c804ae00c1073", size = 1566359, upload-time = "2026-03-28T17:17:12.433Z" },
{ url = "https://files.pythonhosted.org/packages/50/b4/8032cc9b82d17e4277704ba30509eaccb39329dc18d6a35f05e424439e32/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:a533ec132f05fd9a1d959e7f34184cd7d5e8511584848dab85faefbaac573069", size = 1785537, upload-time = "2026-03-28T17:17:14.721Z" },
{ url = "https://files.pythonhosted.org/packages/17/7d/5873e98230bde59f493bf1f7c3e327486a4b5653fa401144704df5d00211/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:1c946f10f413836f82ea4cfb90200d2a59578c549f00857e03111cf45ad01ca5", size = 1740752, upload-time = "2026-03-28T17:17:17.387Z" },
{ url = "https://files.pythonhosted.org/packages/7b/f2/13e46e0df051494d7d3c68b7f72d071f48c384c12716fc294f75d5b1a064/aiohttp-3.13.4-cp313-cp313-win32.whl", hash = "sha256:48708e2706106da6967eff5908c78ca3943f005ed6bcb75da2a7e4da94ef8c70", size = 433187, upload-time = "2026-03-28T17:17:19.523Z" },
{ url = "https://files.pythonhosted.org/packages/ea/c0/649856ee655a843c8f8664592cfccb73ac80ede6a8c8db33a25d810c12db/aiohttp-3.13.4-cp313-cp313-win_amd64.whl", hash = "sha256:74a2eb058da44fa3a877a49e2095b591d4913308bb424c418b77beb160c55ce3", size = 459778, upload-time = "2026-03-28T17:17:21.964Z" },
{ url = "https://files.pythonhosted.org/packages/6d/29/6657cc37ae04cacc2dbf53fb730a06b6091cc4cbe745028e047c53e6d840/aiohttp-3.13.4-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:e0a2c961fc92abeff61d6444f2ce6ad35bb982db9fc8ff8a47455beacf454a57", size = 749363, upload-time = "2026-03-28T17:17:24.044Z" },
{ url = "https://files.pythonhosted.org/packages/90/7f/30ccdf67ca3d24b610067dc63d64dcb91e5d88e27667811640644aa4a85d/aiohttp-3.13.4-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:153274535985a0ff2bff1fb6c104ed547cec898a09213d21b0f791a44b14d933", size = 499317, upload-time = "2026-03-28T17:17:26.199Z" },
{ url = "https://files.pythonhosted.org/packages/93/13/e372dd4e68ad04ee25dafb050c7f98b0d91ea643f7352757e87231102555/aiohttp-3.13.4-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:351f3171e2458da3d731ce83f9e6b9619e325c45cbd534c7759750cabf453ad7", size = 500477, upload-time = "2026-03-28T17:17:28.279Z" },
{ url = "https://files.pythonhosted.org/packages/e5/fe/ee6298e8e586096fb6f5eddd31393d8544f33ae0792c71ecbb4c2bef98ac/aiohttp-3.13.4-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f989ac8bc5595ff761a5ccd32bdb0768a117f36dd1504b1c2c074ed5d3f4df9c", size = 1737227, upload-time = "2026-03-28T17:17:30.587Z" },
{ url = "https://files.pythonhosted.org/packages/b0/b9/a7a0463a09e1a3fe35100f74324f23644bfc3383ac5fd5effe0722a5f0b7/aiohttp-3.13.4-cp314-cp314-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:d36fc1709110ec1e87a229b201dd3ddc32aa01e98e7868083a794609b081c349", size = 1694036, upload-time = "2026-03-28T17:17:33.29Z" },
{ url = "https://files.pythonhosted.org/packages/57/7c/8972ae3fb7be00a91aee6b644b2a6a909aedb2c425269a3bfd90115e6f8f/aiohttp-3.13.4-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:42adaeea83cbdf069ab94f5103ce0787c21fb1a0153270da76b59d5578302329", size = 1786814, upload-time = "2026-03-28T17:17:36.035Z" },
{ url = "https://files.pythonhosted.org/packages/93/01/c81e97e85c774decbaf0d577de7d848934e8166a3a14ad9f8aa5be329d28/aiohttp-3.13.4-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:92deb95469928cc41fd4b42a95d8012fa6df93f6b1c0a83af0ffbc4a5e218cde", size = 1866676, upload-time = "2026-03-28T17:17:38.441Z" },
{ url = "https://files.pythonhosted.org/packages/5a/5f/5b46fe8694a639ddea2cd035bf5729e4677ea882cb251396637e2ef1590d/aiohttp-3.13.4-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0c0c7c07c4257ef3a1df355f840bc62d133bcdef5c1c5ba75add3c08553e2eed", size = 1740842, upload-time = "2026-03-28T17:17:40.783Z" },
{ url = "https://files.pythonhosted.org/packages/20/a2/0d4b03d011cca6b6b0acba8433193c1e484efa8d705ea58295590fe24203/aiohttp-3.13.4-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:f062c45de8a1098cb137a1898819796a2491aec4e637a06b03f149315dff4d8f", size = 1566508, upload-time = "2026-03-28T17:17:43.235Z" },
{ url = "https://files.pythonhosted.org/packages/98/17/e689fd500da52488ec5f889effd6404dece6a59de301e380f3c64f167beb/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:76093107c531517001114f0ebdb4f46858ce818590363e3e99a4a2280334454a", size = 1700569, upload-time = "2026-03-28T17:17:46.165Z" },
{ url = "https://files.pythonhosted.org/packages/d8/0d/66402894dbcf470ef7db99449e436105ea862c24f7ea4c95c683e635af35/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:6f6ec32162d293b82f8b63a16edc80769662fbd5ae6fbd4936d3206a2c2cc63b", size = 1707407, upload-time = "2026-03-28T17:17:48.825Z" },
{ url = "https://files.pythonhosted.org/packages/2f/eb/af0ab1a3650092cbd8e14ef29e4ab0209e1460e1c299996c3f8288b3f1ff/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:5903e2db3d202a00ad9f0ec35a122c005e85d90c9836ab4cda628f01edf425e2", size = 1752214, upload-time = "2026-03-28T17:17:51.206Z" },
{ url = "https://files.pythonhosted.org/packages/5a/bf/72326f8a98e4c666f292f03c385545963cc65e358835d2a7375037a97b57/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:2d5bea57be7aca98dbbac8da046d99b5557c5cf4e28538c4c786313078aca09e", size = 1562162, upload-time = "2026-03-28T17:17:53.634Z" },
{ url = "https://files.pythonhosted.org/packages/67/9f/13b72435f99151dd9a5469c96b3b5f86aa29b7e785ca7f35cf5e538f74c0/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:bcf0c9902085976edc0232b75006ef38f89686901249ce14226b6877f88464fb", size = 1768904, upload-time = "2026-03-28T17:17:55.991Z" },
{ url = "https://files.pythonhosted.org/packages/18/bc/28d4970e7d5452ac7776cdb5431a1164a0d9cf8bd2fffd67b4fb463aa56d/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:c3295f98bfeed2e867cab588f2a146a9db37a85e3ae9062abf46ba062bd29165", size = 1723378, upload-time = "2026-03-28T17:17:58.348Z" },
{ url = "https://files.pythonhosted.org/packages/53/74/b32458ca1a7f34d65bdee7aef2036adbe0438123d3d53e2b083c453c24dd/aiohttp-3.13.4-cp314-cp314-win32.whl", hash = "sha256:a598a5c5767e1369d8f5b08695cab1d8160040f796c4416af76fd773d229b3c9", size = 438711, upload-time = "2026-03-28T17:18:00.728Z" },
{ url = "https://files.pythonhosted.org/packages/40/b2/54b487316c2df3e03a8f3435e9636f8a81a42a69d942164830d193beb56a/aiohttp-3.13.4-cp314-cp314-win_amd64.whl", hash = "sha256:c555db4bc7a264bead5a7d63d92d41a1122fcd39cc62a4db815f45ad46f9c2c8", size = 464977, upload-time = "2026-03-28T17:18:03.367Z" },
{ url = "https://files.pythonhosted.org/packages/47/fb/e41b63c6ce71b07a59243bb8f3b457ee0c3402a619acb9d2c0d21ef0e647/aiohttp-3.13.4-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:45abbbf09a129825d13c18c7d3182fecd46d9da3cfc383756145394013604ac1", size = 781549, upload-time = "2026-03-28T17:18:05.779Z" },
{ url = "https://files.pythonhosted.org/packages/97/53/532b8d28df1e17e44c4d9a9368b78dcb6bf0b51037522136eced13afa9e8/aiohttp-3.13.4-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:74c80b2bc2c2adb7b3d1941b2b60701ee2af8296fc8aad8b8bc48bc25767266c", size = 514383, upload-time = "2026-03-28T17:18:08.096Z" },
{ url = "https://files.pythonhosted.org/packages/1b/1f/62e5d400603e8468cd635812d99cb81cfdc08127a3dc474c647615f31339/aiohttp-3.13.4-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:c97989ae40a9746650fa196894f317dafc12227c808c774929dda0ff873a5954", size = 518304, upload-time = "2026-03-28T17:18:10.642Z" },
{ url = "https://files.pythonhosted.org/packages/90/57/2326b37b10896447e3c6e0cbef4fe2486d30913639a5cfd1332b5d870f82/aiohttp-3.13.4-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:dae86be9811493f9990ef44fff1685f5c1a3192e9061a71a109d527944eed551", size = 1893433, upload-time = "2026-03-28T17:18:13.121Z" },
{ url = "https://files.pythonhosted.org/packages/d2/b4/a24d82112c304afdb650167ef2fe190957d81cbddac7460bedd245f765aa/aiohttp-3.13.4-cp314-cp314t-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:1db491abe852ca2fa6cc48a3341985b0174b3741838e1341b82ac82c8bd9e871", size = 1755901, upload-time = "2026-03-28T17:18:16.21Z" },
{ url = "https://files.pythonhosted.org/packages/9e/2d/0883ef9d878d7846287f036c162a951968f22aabeef3ac97b0bea6f76d5d/aiohttp-3.13.4-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:0e5d701c0aad02a7dce72eef6b93226cf3734330f1a31d69ebbf69f33b86666e", size = 1876093, upload-time = "2026-03-28T17:18:18.703Z" },
{ url = "https://files.pythonhosted.org/packages/ad/52/9204bb59c014869b71971addad6778f005daa72a96eed652c496789d7468/aiohttp-3.13.4-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:8ac32a189081ae0a10ba18993f10f338ec94341f0d5df8fff348043962f3c6f8", size = 1970815, upload-time = "2026-03-28T17:18:21.858Z" },
{ url = "https://files.pythonhosted.org/packages/d6/b5/e4eb20275a866dde0f570f411b36c6b48f7b53edfe4f4071aa1b0728098a/aiohttp-3.13.4-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:98e968cdaba43e45c73c3f306fca418c8009a957733bac85937c9f9cf3f4de27", size = 1816223, upload-time = "2026-03-28T17:18:24.729Z" },
{ url = "https://files.pythonhosted.org/packages/d8/23/e98075c5bb146aa61a1239ee1ac7714c85e814838d6cebbe37d3fe19214a/aiohttp-3.13.4-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:ca114790c9144c335d538852612d3e43ea0f075288f4849cf4b05d6cd2238ce7", size = 1649145, upload-time = "2026-03-28T17:18:27.269Z" },
{ url = "https://files.pythonhosted.org/packages/d6/c1/7bad8be33bb06c2bb224b6468874346026092762cbec388c3bdb65a368ee/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:ea2e071661ba9cfe11eabbc81ac5376eaeb3061f6e72ec4cc86d7cdd1ffbdbbb", size = 1816562, upload-time = "2026-03-28T17:18:29.847Z" },
{ url = "https://files.pythonhosted.org/packages/5c/10/c00323348695e9a5e316825969c88463dcc24c7e9d443244b8a2c9cf2eae/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:34e89912b6c20e0fd80e07fa401fd218a410aa1ce9f1c2f1dad6db1bd0ce0927", size = 1800333, upload-time = "2026-03-28T17:18:32.269Z" },
{ url = "https://files.pythonhosted.org/packages/84/43/9b2147a1df3559f49bd723e22905b46a46c068a53adb54abdca32c4de180/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:0e217cf9f6a42908c52b46e42c568bd57adc39c9286ced31aaace614b6087965", size = 1820617, upload-time = "2026-03-28T17:18:35.238Z" },
{ url = "https://files.pythonhosted.org/packages/a9/7f/b3481a81e7a586d02e99387b18c6dafff41285f6efd3daa2124c01f87eae/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:0c296f1221e21ba979f5ac1964c3b78cfde15c5c5f855ffd2caab337e9cd9182", size = 1643417, upload-time = "2026-03-28T17:18:37.949Z" },
{ url = "https://files.pythonhosted.org/packages/8f/72/07181226bc99ce1124e0f89280f5221a82d3ae6a6d9d1973ce429d48e52b/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:d99a9d168ebaffb74f36d011750e490085ac418f4db926cce3989c8fe6cb6b1b", size = 1849286, upload-time = "2026-03-28T17:18:40.534Z" },
{ url = "https://files.pythonhosted.org/packages/1a/e6/1b3566e103eca6da5be4ae6713e112a053725c584e96574caf117568ffef/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:cb19177205d93b881f3f89e6081593676043a6828f59c78c17a0fd6c1fbed2ba", size = 1782635, upload-time = "2026-03-28T17:18:43.073Z" },
{ url = "https://files.pythonhosted.org/packages/37/58/1b11c71904b8d079eb0c39fe664180dd1e14bebe5608e235d8bfbadc8929/aiohttp-3.13.4-cp314-cp314t-win32.whl", hash = "sha256:c606aa5656dab6552e52ca368e43869c916338346bfaf6304e15c58fb113ea30", size = 472537, upload-time = "2026-03-28T17:18:46.286Z" },
{ url = "https://files.pythonhosted.org/packages/bc/8f/87c56a1a1977d7dddea5b31e12189665a140fdb48a71e9038ff90bb564ec/aiohttp-3.13.4-cp314-cp314t-win_amd64.whl", hash = "sha256:014dcc10ec8ab8db681f0d68e939d1e9286a5aa2b993cbbdb0db130853e02144", size = 506381, upload-time = "2026-03-28T17:18:48.74Z" },
]
[[package]]
@ -370,6 +370,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/83/7b/5652771e24fff12da9dde4c20ecf4682e606b104f26419d139758cc935a6/azure_identity-1.25.1-py3-none-any.whl", hash = "sha256:e9edd720af03dff020223cd269fa3a61e8f345ea75443858273bcb44844ab651", size = 191317, upload-time = "2025-10-06T20:30:04.251Z" },
]
[[package]]
name = "backoff"
version = "2.2.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/47/d7/5bbeb12c44d7c4f2fb5b56abce497eb5ed9f34d85701de869acedd602619/backoff-2.2.1.tar.gz", hash = "sha256:03f829f5bb1923180821643f8753b0502c3b682293992485b0eef2807afa5cba", size = 17001, upload-time = "2022-10-05T19:19:32.061Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/df/73/b6e24bd22e6720ca8ee9a85a0c4a2971af8497d8f3193fa05390cbd46e09/backoff-2.2.1-py3-none-any.whl", hash = "sha256:63579f9a0628e06278f7e47b7d7d5b6ce20dc65c5e96a6f3ca99a6adca0396e8", size = 15148, upload-time = "2022-10-05T19:19:30.546Z" },
]
[[package]]
name = "beautifulsoup4"
version = "4.14.3"
@ -726,6 +735,7 @@ dependencies = [
{ name = "slack-sdk" },
{ name = "sse-starlette" },
{ name = "uvicorn", extra = ["standard"] },
{ name = "wecom-aibot-python-sdk" },
]
[package.optional-dependencies]
@ -753,6 +763,7 @@ requires-dist = [
{ name = "slack-sdk", specifier = ">=3.33.0" },
{ name = "sse-starlette", specifier = ">=2.1.0" },
{ name = "uvicorn", extras = ["standard"], specifier = ">=0.34.0" },
{ name = "wecom-aibot-python-sdk", specifier = ">=0.1.6" },
]
provides-extras = ["postgres"]
@ -783,6 +794,7 @@ dependencies = [
{ name = "langchain-google-genai" },
{ name = "langchain-mcp-adapters" },
{ name = "langchain-openai" },
{ name = "langfuse" },
{ name = "langgraph" },
{ name = "langgraph-api" },
{ name = "langgraph-checkpoint-sqlite" },
@ -826,6 +838,7 @@ requires-dist = [
{ name = "langchain-google-genai", specifier = ">=4.2.1" },
{ name = "langchain-mcp-adapters", specifier = ">=0.1.0" },
{ name = "langchain-openai", specifier = ">=1.1.7" },
{ name = "langfuse", specifier = ">=3.4.1" },
{ name = "langgraph", specifier = ">=1.0.6,<1.0.10" },
{ name = "langgraph-api", specifier = ">=0.7.0,<0.8.0" },
{ name = "langgraph-checkpoint-postgres", marker = "extra == 'postgres'", specifier = ">=3.0.5" },
@ -1720,6 +1733,25 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/64/a1/50e7596aca775d8c3883eceeaf47489fac26c57c1abe243c00174f715a8a/langchain_openai-1.1.7-py3-none-any.whl", hash = "sha256:34e9cd686aac1a120d6472804422792bf8080a2103b5d21ee450c9e42d053815", size = 84753, upload-time = "2026-01-07T19:44:58.629Z" },
]
[[package]]
name = "langfuse"
version = "4.0.5"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "backoff" },
{ name = "httpx" },
{ name = "opentelemetry-api" },
{ name = "opentelemetry-exporter-otlp-proto-http" },
{ name = "opentelemetry-sdk" },
{ name = "packaging" },
{ name = "pydantic" },
{ name = "wrapt" },
]
sdist = { url = "https://files.pythonhosted.org/packages/f9/de/b319a127e231e6ac10fad7a75e040b0c961669d9aa1f372f131d48ee4835/langfuse-4.0.5.tar.gz", hash = "sha256:f07fc88526d0699b3696df6ff606bc3c509c86419b5f551dea3d95ed31b4b7f8", size = 273892, upload-time = "2026-04-01T11:05:48.135Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e3/92/b4699c9ce5f2e1ab04e7fc1c656cc14a522f10f2c7170d6e427013ce0d37/langfuse-4.0.5-py3-none-any.whl", hash = "sha256:48ef89fec839b40f0f0e68b26c160e7bc0178cf10c8e53932895f4aed428b4df", size = 472730, upload-time = "2026-04-01T11:05:46.948Z" },
]
[[package]]
name = "langgraph"
version = "1.0.9"
@ -3159,6 +3191,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/a6/53/d78dc063216e62fc55f6b2eebb447f6a4b0a59f55c8406376f76bf959b08/pydub-0.25.1-py2.py3-none-any.whl", hash = "sha256:65617e33033874b59d87db603aa1ed450633288aefead953b30bded59cb599a6", size = 32327, upload-time = "2021-03-10T02:09:53.503Z" },
]
[[package]]
name = "pyee"
version = "13.0.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/8b/04/e7c1fe4dc78a6fdbfd6c337b1c3732ff543b8a397683ab38378447baa331/pyee-13.0.1.tar.gz", hash = "sha256:0b931f7c14535667ed4c7e0d531716368715e860b988770fc7eb8578d1f67fc8", size = 31655, upload-time = "2026-02-14T21:12:28.044Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/a0/c4/b4d4827c93ef43c01f599ef31453ccc1c132b353284fc6c87d535c233129/pyee-13.0.1-py3-none-any.whl", hash = "sha256:af2f8fede4171ef667dfded53f96e2ed0d6e6bd7ee3bb46437f77e3b57689228", size = 15659, upload-time = "2026-02-14T21:12:26.263Z" },
]
[[package]]
name = "pygments"
version = "2.19.2"
@ -4188,6 +4232,71 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/6f/28/258ebab549c2bf3e64d2b0217b973467394a9cea8c42f70418ca2c5d0d2e/websockets-16.0-py3-none-any.whl", hash = "sha256:1637db62fad1dc833276dded54215f2c7fa46912301a24bd94d45d46a011ceec", size = 171598, upload-time = "2026-01-10T09:23:45.395Z" },
]
[[package]]
name = "wecom-aibot-python-sdk"
version = "1.0.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "aiohttp" },
{ name = "certifi" },
{ name = "cryptography" },
{ name = "pyee" },
{ name = "websockets" },
]
sdist = { url = "https://files.pythonhosted.org/packages/23/b4/df93b46006e5c1900703aefa59004e6d524a4e73ba56ae73bcce24ff4184/wecom_aibot_python_sdk-1.0.2.tar.gz", hash = "sha256:f8cd9920c0b6cb88bf8a50742fca1e834e5c49e06c3ae861d0f128672c17697b", size = 31706, upload-time = "2026-03-23T07:44:53.949Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/ee/39/f2fab475f15d5bf596c4fa998ddd321b1400bcc6ae2e73d3e935db939379/wecom_aibot_python_sdk-1.0.2-py3-none-any.whl", hash = "sha256:03df207c72021157506647cd9f4ee51b865a7f37d3b5df7f7af1b1c7e677db84", size = 23228, upload-time = "2026-03-23T07:44:52.555Z" },
]
[[package]]
name = "wrapt"
version = "1.17.3"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/95/8f/aeb76c5b46e273670962298c23e7ddde79916cb74db802131d49a85e4b7d/wrapt-1.17.3.tar.gz", hash = "sha256:f66eb08feaa410fe4eebd17f2a2c8e2e46d3476e9f8c783daa8e09e0faa666d0", size = 55547, upload-time = "2025-08-12T05:53:21.714Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/9f/41/cad1aba93e752f1f9268c77270da3c469883d56e2798e7df6240dcb2287b/wrapt-1.17.3-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:ab232e7fdb44cdfbf55fc3afa31bcdb0d8980b9b95c38b6405df2acb672af0e0", size = 53998, upload-time = "2025-08-12T05:51:47.138Z" },
{ url = "https://files.pythonhosted.org/packages/60/f8/096a7cc13097a1869fe44efe68dace40d2a16ecb853141394047f0780b96/wrapt-1.17.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:9baa544e6acc91130e926e8c802a17f3b16fbea0fd441b5a60f5cf2cc5c3deba", size = 39020, upload-time = "2025-08-12T05:51:35.906Z" },
{ url = "https://files.pythonhosted.org/packages/33/df/bdf864b8997aab4febb96a9ae5c124f700a5abd9b5e13d2a3214ec4be705/wrapt-1.17.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:6b538e31eca1a7ea4605e44f81a48aa24c4632a277431a6ed3f328835901f4fd", size = 39098, upload-time = "2025-08-12T05:51:57.474Z" },
{ url = "https://files.pythonhosted.org/packages/9f/81/5d931d78d0eb732b95dc3ddaeeb71c8bb572fb01356e9133916cd729ecdd/wrapt-1.17.3-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:042ec3bb8f319c147b1301f2393bc19dba6e176b7da446853406d041c36c7828", size = 88036, upload-time = "2025-08-12T05:52:34.784Z" },
{ url = "https://files.pythonhosted.org/packages/ca/38/2e1785df03b3d72d34fc6252d91d9d12dc27a5c89caef3335a1bbb8908ca/wrapt-1.17.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3af60380ba0b7b5aeb329bc4e402acd25bd877e98b3727b0135cb5c2efdaefe9", size = 88156, upload-time = "2025-08-12T05:52:13.599Z" },
{ url = "https://files.pythonhosted.org/packages/b3/8b/48cdb60fe0603e34e05cffda0b2a4adab81fd43718e11111a4b0100fd7c1/wrapt-1.17.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:0b02e424deef65c9f7326d8c19220a2c9040c51dc165cddb732f16198c168396", size = 87102, upload-time = "2025-08-12T05:52:14.56Z" },
{ url = "https://files.pythonhosted.org/packages/3c/51/d81abca783b58f40a154f1b2c56db1d2d9e0d04fa2d4224e357529f57a57/wrapt-1.17.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:74afa28374a3c3a11b3b5e5fca0ae03bef8450d6aa3ab3a1e2c30e3a75d023dc", size = 87732, upload-time = "2025-08-12T05:52:36.165Z" },
{ url = "https://files.pythonhosted.org/packages/9e/b1/43b286ca1392a006d5336412d41663eeef1ad57485f3e52c767376ba7e5a/wrapt-1.17.3-cp312-cp312-win32.whl", hash = "sha256:4da9f45279fff3543c371d5ababc57a0384f70be244de7759c85a7f989cb4ebe", size = 36705, upload-time = "2025-08-12T05:53:07.123Z" },
{ url = "https://files.pythonhosted.org/packages/28/de/49493f962bd3c586ab4b88066e967aa2e0703d6ef2c43aa28cb83bf7b507/wrapt-1.17.3-cp312-cp312-win_amd64.whl", hash = "sha256:e71d5c6ebac14875668a1e90baf2ea0ef5b7ac7918355850c0908ae82bcb297c", size = 38877, upload-time = "2025-08-12T05:53:05.436Z" },
{ url = "https://files.pythonhosted.org/packages/f1/48/0f7102fe9cb1e8a5a77f80d4f0956d62d97034bbe88d33e94699f99d181d/wrapt-1.17.3-cp312-cp312-win_arm64.whl", hash = "sha256:604d076c55e2fdd4c1c03d06dc1a31b95130010517b5019db15365ec4a405fc6", size = 36885, upload-time = "2025-08-12T05:52:54.367Z" },
{ url = "https://files.pythonhosted.org/packages/fc/f6/759ece88472157acb55fc195e5b116e06730f1b651b5b314c66291729193/wrapt-1.17.3-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:a47681378a0439215912ef542c45a783484d4dd82bac412b71e59cf9c0e1cea0", size = 54003, upload-time = "2025-08-12T05:51:48.627Z" },
{ url = "https://files.pythonhosted.org/packages/4f/a9/49940b9dc6d47027dc850c116d79b4155f15c08547d04db0f07121499347/wrapt-1.17.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:54a30837587c6ee3cd1a4d1c2ec5d24e77984d44e2f34547e2323ddb4e22eb77", size = 39025, upload-time = "2025-08-12T05:51:37.156Z" },
{ url = "https://files.pythonhosted.org/packages/45/35/6a08de0f2c96dcdd7fe464d7420ddb9a7655a6561150e5fc4da9356aeaab/wrapt-1.17.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:16ecf15d6af39246fe33e507105d67e4b81d8f8d2c6598ff7e3ca1b8a37213f7", size = 39108, upload-time = "2025-08-12T05:51:58.425Z" },
{ url = "https://files.pythonhosted.org/packages/0c/37/6faf15cfa41bf1f3dba80cd3f5ccc6622dfccb660ab26ed79f0178c7497f/wrapt-1.17.3-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:6fd1ad24dc235e4ab88cda009e19bf347aabb975e44fd5c2fb22a3f6e4141277", size = 88072, upload-time = "2025-08-12T05:52:37.53Z" },
{ url = "https://files.pythonhosted.org/packages/78/f2/efe19ada4a38e4e15b6dff39c3e3f3f73f5decf901f66e6f72fe79623a06/wrapt-1.17.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0ed61b7c2d49cee3c027372df5809a59d60cf1b6c2f81ee980a091f3afed6a2d", size = 88214, upload-time = "2025-08-12T05:52:15.886Z" },
{ url = "https://files.pythonhosted.org/packages/40/90/ca86701e9de1622b16e09689fc24b76f69b06bb0150990f6f4e8b0eeb576/wrapt-1.17.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:423ed5420ad5f5529db9ce89eac09c8a2f97da18eb1c870237e84c5a5c2d60aa", size = 87105, upload-time = "2025-08-12T05:52:17.914Z" },
{ url = "https://files.pythonhosted.org/packages/fd/e0/d10bd257c9a3e15cbf5523025252cc14d77468e8ed644aafb2d6f54cb95d/wrapt-1.17.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:e01375f275f010fcbf7f643b4279896d04e571889b8a5b3f848423d91bf07050", size = 87766, upload-time = "2025-08-12T05:52:39.243Z" },
{ url = "https://files.pythonhosted.org/packages/e8/cf/7d848740203c7b4b27eb55dbfede11aca974a51c3d894f6cc4b865f42f58/wrapt-1.17.3-cp313-cp313-win32.whl", hash = "sha256:53e5e39ff71b3fc484df8a522c933ea2b7cdd0d5d15ae82e5b23fde87d44cbd8", size = 36711, upload-time = "2025-08-12T05:53:10.074Z" },
{ url = "https://files.pythonhosted.org/packages/57/54/35a84d0a4d23ea675994104e667ceff49227ce473ba6a59ba2c84f250b74/wrapt-1.17.3-cp313-cp313-win_amd64.whl", hash = "sha256:1f0b2f40cf341ee8cc1a97d51ff50dddb9fcc73241b9143ec74b30fc4f44f6cb", size = 38885, upload-time = "2025-08-12T05:53:08.695Z" },
{ url = "https://files.pythonhosted.org/packages/01/77/66e54407c59d7b02a3c4e0af3783168fff8e5d61def52cda8728439d86bc/wrapt-1.17.3-cp313-cp313-win_arm64.whl", hash = "sha256:7425ac3c54430f5fc5e7b6f41d41e704db073309acfc09305816bc6a0b26bb16", size = 36896, upload-time = "2025-08-12T05:52:55.34Z" },
{ url = "https://files.pythonhosted.org/packages/02/a2/cd864b2a14f20d14f4c496fab97802001560f9f41554eef6df201cd7f76c/wrapt-1.17.3-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:cf30f6e3c077c8e6a9a7809c94551203c8843e74ba0c960f4a98cd80d4665d39", size = 54132, upload-time = "2025-08-12T05:51:49.864Z" },
{ url = "https://files.pythonhosted.org/packages/d5/46/d011725b0c89e853dc44cceb738a307cde5d240d023d6d40a82d1b4e1182/wrapt-1.17.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:e228514a06843cae89621384cfe3a80418f3c04aadf8a3b14e46a7be704e4235", size = 39091, upload-time = "2025-08-12T05:51:38.935Z" },
{ url = "https://files.pythonhosted.org/packages/2e/9e/3ad852d77c35aae7ddebdbc3b6d35ec8013af7d7dddad0ad911f3d891dae/wrapt-1.17.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:5ea5eb3c0c071862997d6f3e02af1d055f381b1d25b286b9d6644b79db77657c", size = 39172, upload-time = "2025-08-12T05:51:59.365Z" },
{ url = "https://files.pythonhosted.org/packages/c3/f7/c983d2762bcce2326c317c26a6a1e7016f7eb039c27cdf5c4e30f4160f31/wrapt-1.17.3-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:281262213373b6d5e4bb4353bc36d1ba4084e6d6b5d242863721ef2bf2c2930b", size = 87163, upload-time = "2025-08-12T05:52:40.965Z" },
{ url = "https://files.pythonhosted.org/packages/e4/0f/f673f75d489c7f22d17fe0193e84b41540d962f75fce579cf6873167c29b/wrapt-1.17.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:dc4a8d2b25efb6681ecacad42fca8859f88092d8732b170de6a5dddd80a1c8fa", size = 87963, upload-time = "2025-08-12T05:52:20.326Z" },
{ url = "https://files.pythonhosted.org/packages/df/61/515ad6caca68995da2fac7a6af97faab8f78ebe3bf4f761e1b77efbc47b5/wrapt-1.17.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:373342dd05b1d07d752cecbec0c41817231f29f3a89aa8b8843f7b95992ed0c7", size = 86945, upload-time = "2025-08-12T05:52:21.581Z" },
{ url = "https://files.pythonhosted.org/packages/d3/bd/4e70162ce398462a467bc09e768bee112f1412e563620adc353de9055d33/wrapt-1.17.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:d40770d7c0fd5cbed9d84b2c3f2e156431a12c9a37dc6284060fb4bec0b7ffd4", size = 86857, upload-time = "2025-08-12T05:52:43.043Z" },
{ url = "https://files.pythonhosted.org/packages/2b/b8/da8560695e9284810b8d3df8a19396a6e40e7518059584a1a394a2b35e0a/wrapt-1.17.3-cp314-cp314-win32.whl", hash = "sha256:fbd3c8319de8e1dc79d346929cd71d523622da527cca14e0c1d257e31c2b8b10", size = 37178, upload-time = "2025-08-12T05:53:12.605Z" },
{ url = "https://files.pythonhosted.org/packages/db/c8/b71eeb192c440d67a5a0449aaee2310a1a1e8eca41676046f99ed2487e9f/wrapt-1.17.3-cp314-cp314-win_amd64.whl", hash = "sha256:e1a4120ae5705f673727d3253de3ed0e016f7cd78dc463db1b31e2463e1f3cf6", size = 39310, upload-time = "2025-08-12T05:53:11.106Z" },
{ url = "https://files.pythonhosted.org/packages/45/20/2cda20fd4865fa40f86f6c46ed37a2a8356a7a2fde0773269311f2af56c7/wrapt-1.17.3-cp314-cp314-win_arm64.whl", hash = "sha256:507553480670cab08a800b9463bdb881b2edeed77dc677b0a5915e6106e91a58", size = 37266, upload-time = "2025-08-12T05:52:56.531Z" },
{ url = "https://files.pythonhosted.org/packages/77/ed/dd5cf21aec36c80443c6f900449260b80e2a65cf963668eaef3b9accce36/wrapt-1.17.3-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:ed7c635ae45cfbc1a7371f708727bf74690daedc49b4dba310590ca0bd28aa8a", size = 56544, upload-time = "2025-08-12T05:51:51.109Z" },
{ url = "https://files.pythonhosted.org/packages/8d/96/450c651cc753877ad100c7949ab4d2e2ecc4d97157e00fa8f45df682456a/wrapt-1.17.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:249f88ed15503f6492a71f01442abddd73856a0032ae860de6d75ca62eed8067", size = 40283, upload-time = "2025-08-12T05:51:39.912Z" },
{ url = "https://files.pythonhosted.org/packages/d1/86/2fcad95994d9b572db57632acb6f900695a648c3e063f2cd344b3f5c5a37/wrapt-1.17.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:5a03a38adec8066d5a37bea22f2ba6bbf39fcdefbe2d91419ab864c3fb515454", size = 40366, upload-time = "2025-08-12T05:52:00.693Z" },
{ url = "https://files.pythonhosted.org/packages/64/0e/f4472f2fdde2d4617975144311f8800ef73677a159be7fe61fa50997d6c0/wrapt-1.17.3-cp314-cp314t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:5d4478d72eb61c36e5b446e375bbc49ed002430d17cdec3cecb36993398e1a9e", size = 108571, upload-time = "2025-08-12T05:52:44.521Z" },
{ url = "https://files.pythonhosted.org/packages/cc/01/9b85a99996b0a97c8a17484684f206cbb6ba73c1ce6890ac668bcf3838fb/wrapt-1.17.3-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:223db574bb38637e8230eb14b185565023ab624474df94d2af18f1cdb625216f", size = 113094, upload-time = "2025-08-12T05:52:22.618Z" },
{ url = "https://files.pythonhosted.org/packages/25/02/78926c1efddcc7b3aa0bc3d6b33a822f7d898059f7cd9ace8c8318e559ef/wrapt-1.17.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:e405adefb53a435f01efa7ccdec012c016b5a1d3f35459990afc39b6be4d5056", size = 110659, upload-time = "2025-08-12T05:52:24.057Z" },
{ url = "https://files.pythonhosted.org/packages/dc/ee/c414501ad518ac3e6fe184753632fe5e5ecacdcf0effc23f31c1e4f7bfcf/wrapt-1.17.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:88547535b787a6c9ce4086917b6e1d291aa8ed914fdd3a838b3539dc95c12804", size = 106946, upload-time = "2025-08-12T05:52:45.976Z" },
{ url = "https://files.pythonhosted.org/packages/be/44/a1bd64b723d13bb151d6cc91b986146a1952385e0392a78567e12149c7b4/wrapt-1.17.3-cp314-cp314t-win32.whl", hash = "sha256:41b1d2bc74c2cac6f9074df52b2efbef2b30bdfe5f40cb78f8ca22963bc62977", size = 38717, upload-time = "2025-08-12T05:53:15.214Z" },
{ url = "https://files.pythonhosted.org/packages/79/d9/7cfd5a312760ac4dd8bf0184a6ee9e43c33e47f3dadc303032ce012b8fa3/wrapt-1.17.3-cp314-cp314t-win_amd64.whl", hash = "sha256:73d496de46cd2cdbdbcce4ae4bcdb4afb6a11234a1df9c085249d55166b95116", size = 41334, upload-time = "2025-08-12T05:53:14.178Z" },
{ url = "https://files.pythonhosted.org/packages/46/78/10ad9781128ed2f99dbc474f43283b13fea8ba58723e98844367531c18e9/wrapt-1.17.3-cp314-cp314t-win_arm64.whl", hash = "sha256:f38e60678850c42461d4202739f9bf1e3a737c7ad283638251e79cc49effb6b6", size = 38471, upload-time = "2025-08-12T05:52:57.784Z" },
{ url = "https://files.pythonhosted.org/packages/1f/f6/a933bd70f98e9cf3e08167fc5cd7aaaca49147e48411c0bd5ae701bb2194/wrapt-1.17.3-py3-none-any.whl", hash = "sha256:7171ae35d2c33d326ac19dd8facb1e82e5fd04ef8c6c0e394d7af55a55051c22", size = 23591, upload-time = "2025-08-12T05:53:20.674Z" },
]
[[package]]
name = "xlrd"
version = "2.0.2"

View File

@ -12,7 +12,7 @@
# ============================================================================
# Bump this number when the config schema changes.
# Run `make config-upgrade` to merge new fields into your local config.yaml.
config_version: 4
config_version: 5
# ============================================================================
# Logging
@ -324,6 +324,16 @@ tools:
group: file:read
use: deerflow.sandbox.tools:read_file_tool
- name: glob
group: file:read
use: deerflow.sandbox.tools:glob_tool
max_results: 200
- name: grep
group: file:read
use: deerflow.sandbox.tools:grep_tool
max_results: 100
- name: write_file
group: file:write
use: deerflow.sandbox.tools:write_file_tool
@ -358,12 +368,34 @@ tool_search:
# Option 1: Local Sandbox (Default)
# Executes commands directly on the host machine
uploads:
# PDF-to-Markdown converter used when a PDF is uploaded.
# auto — prefer pymupdf4llm when installed; fall back to MarkItDown for
# image-based or encrypted PDFs (recommended default).
# pymupdf4llm — always use pymupdf4llm (must be installed: uv add pymupdf4llm).
# Better heading/table extraction; faster on most files.
# markitdown — always use MarkItDown (original behaviour, no extra dependency).
pdf_converter: auto
sandbox:
use: deerflow.sandbox.local:LocalSandboxProvider
# Host bash execution is disabled by default because LocalSandboxProvider is
# not a secure isolation boundary for shell access. Enable only for fully
# trusted, single-user local workflows.
allow_host_bash: false
# Optional: Mount additional host directories into the sandbox.
# Each mount maps a host path to a virtual container path accessible by the agent.
# mounts:
# - host_path: /home/user/my-project # Absolute path on the host machine
# container_path: /mnt/my-project # Virtual path inside the sandbox
# read_only: true # Whether the mount is read-only (default: false)
# Tool output truncation limits (characters).
# bash uses middle-truncation (head + tail) since errors can appear anywhere in the output.
# read_file uses head-truncation since source code context is front-loaded.
# Set to 0 to disable truncation.
bash_output_max_chars: 20000
read_file_output_max_chars: 50000
# Option 2: Container-based AIO Sandbox
# Executes commands in isolated containers (Docker or Apple Container)
@ -476,6 +508,12 @@ skills:
# Default: /mnt/skills
container_path: /mnt/skills
# Note: To restrict which skills are loaded for a specific custom agent,
# define a `skills` list in that agent's `config.yaml` (e.g. `agents/my-agent/config.yaml`):
# - Omitted or null: load all globally enabled skills (default)
# - []: disable all skills for this agent
# - ["skill-name"]: load only specific skills
# ============================================================================
# Title Generation Configuration
# ============================================================================
@ -685,6 +723,10 @@ run_events:
# context:
# thinking_enabled: true
# subagent_enabled: true
# wecom:
# enabled: false
# bot_id: $WECOM_BOT_ID
# bot_secret: $WECOM_BOT_SECRET
# ============================================================================
# Guardrails Configuration

View File

@ -149,6 +149,7 @@ services:
working_dir: /app
environment:
- CI=true
- DEER_FLOW_HOME=/app/backend/.deer-flow
- DEER_FLOW_CHANNELS_LANGGRAPH_URL=${DEER_FLOW_CHANNELS_LANGGRAPH_URL:-http://langgraph:2024}
- DEER_FLOW_CHANNELS_GATEWAY_URL=${DEER_FLOW_CHANNELS_GATEWAY_URL:-http://gateway:8001}
- DEER_FLOW_HOST_BASE_DIR=${DEER_FLOW_ROOT}/backend/.deer-flow
@ -174,7 +175,7 @@ services:
UV_IMAGE: ${UV_IMAGE:-ghcr.io/astral-sh/uv:0.7.20}
UV_INDEX_URL: ${UV_INDEX_URL:-https://pypi.org/simple}
container_name: deer-flow-langgraph
command: sh -c "cd backend && uv sync && uv run langgraph dev --no-browser --allow-blocking --host 0.0.0.0 --port 2024 --n-jobs-per-worker 10 > /app/logs/langgraph.log 2>&1"
command: sh -c "cd backend && uv sync && allow_blocking='' && if [ \"\${LANGGRAPH_ALLOW_BLOCKING:-0}\" = '1' ]; then allow_blocking='--allow-blocking'; fi && uv run langgraph dev --no-browser \${allow_blocking} --host 0.0.0.0 --port 2024 --n-jobs-per-worker \${LANGGRAPH_JOBS_PER_WORKER:-10} > /app/logs/langgraph.log 2>&1"
volumes:
- ../backend/:/app/backend/
# Preserve the .venv built during Docker image build — mounting the full backend/
@ -204,6 +205,7 @@ services:
working_dir: /app
environment:
- CI=true
- DEER_FLOW_HOME=/app/backend/.deer-flow
- DEER_FLOW_HOST_BASE_DIR=${DEER_FLOW_ROOT}/backend/.deer-flow
- DEER_FLOW_HOST_SKILLS_PATH=${DEER_FLOW_ROOT}/skills
- DEER_FLOW_SANDBOX_HOST=host.docker.internal

View File

@ -121,7 +121,7 @@ services:
UV_INDEX_URL: ${UV_INDEX_URL:-https://pypi.org/simple}
UV_EXTRAS: ${UV_EXTRAS:-}
container_name: deer-flow-langgraph
command: sh -c "cd /app/backend && uv run langgraph dev --no-browser --allow-blocking --no-reload --host 0.0.0.0 --port 2024 --n-jobs-per-worker 10"
command: sh -c 'cd /app/backend && allow_blocking_flag="" && if [ "${LANGGRAPH_ALLOW_BLOCKING:-0}" = "1" ]; then allow_blocking_flag="--allow-blocking"; fi && uv run langgraph dev --no-browser ${allow_blocking_flag} --no-reload --host 0.0.0.0 --port 2024 --n-jobs-per-worker ${LANGGRAPH_JOBS_PER_WORKER:-10}'
volumes:
- ${DEER_FLOW_CONFIG_PATH}:/app/backend/config.yaml:ro
- ${DEER_FLOW_EXTENSIONS_CONFIG_PATH}:/app/backend/extensions_config.json:ro

View File

@ -18,21 +18,20 @@ http {
resolver 127.0.0.11 valid=10s ipv6=off;
# Upstream servers (using Docker service names)
# NOTE: add `resolve` so nginx re-resolves container IPs after restarts.
# Otherwise nginx may keep stale DNS results and proxy to the wrong container.
# NOTE: `zone` and `resolve` are nginx Plus-only features and are not
# available in the standard nginx:alpine image. Docker's internal DNS
# (127.0.0.11) handles service discovery; upstreams are resolved at
# nginx startup and remain valid for the lifetime of the deployment.
upstream gateway {
zone gateway 64k;
server gateway:8001 resolve;
server gateway:8001;
}
upstream langgraph {
zone langgraph 64k;
server langgraph:2024 resolve;
server langgraph:2024;
}
upstream frontend {
zone frontend 64k;
server frontend:3000 resolve;
server frontend:3000;
}
# ── Main server (path-based routing) ─────────────────────────────────

View File

@ -179,8 +179,8 @@ http {
}
# API Documentation: Swagger UI
location /docs {
proxy_pass http://gateway;
location /api/docs {
proxy_pass http://gateway/docs ;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
@ -189,8 +189,8 @@ http {
}
# API Documentation: ReDoc
location /redoc {
proxy_pass http://gateway;
location /api/redoc {
proxy_pass http://gateway/redoc;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;

View File

@ -0,0 +1,105 @@
# Langfuse Tracing Implementation Plan
**Goal:** Add optional Langfuse observability support to DeerFlow while preserving existing LangSmith tracing and allowing both providers to be enabled at the same time.
**Architecture:** Extend tracing configuration from a single LangSmith-only shape to a multi-provider config, add a tracing callback factory that builds zero, one, or two callbacks based on environment variables, and update model creation to attach those callbacks. If a provider is explicitly enabled but misconfigured or fails to initialize, tracing initialization during model creation should fail with a clear error naming that provider.
**Tech Stack:** Python 3.12, Pydantic, LangChain callbacks, LangSmith, Langfuse, pytest
---
### Task 1: Add failing tracing config tests
**Files:**
- Modify: `backend/tests/test_tracing_config.py`
**Step 1: Write the failing tests**
Add tests covering:
- Langfuse-only config parsing
- dual-provider parsing
- explicit enable with missing required Langfuse fields
- provider enable detection without relying on LangSmith-only helpers
**Step 2: Run tests to verify they fail**
Run: `cd backend && uv run pytest tests/test_tracing_config.py -q`
Expected: FAIL because tracing config only supports LangSmith today.
**Step 3: Write minimal implementation**
Update tracing config code to represent multiple providers and expose helper functions needed by the tests.
**Step 4: Run tests to verify they pass**
Run: `cd backend && uv run pytest tests/test_tracing_config.py -q`
Expected: PASS
### Task 2: Add failing callback factory and model attachment tests
**Files:**
- Modify: `backend/tests/test_model_factory.py`
- Create: `backend/tests/test_tracing_factory.py`
**Step 1: Write the failing tests**
Add tests covering:
- LangSmith callback creation
- Langfuse callback creation
- dual callback creation
- startup failure when an explicitly enabled provider cannot initialize
- model factory appends all tracing callbacks to model callbacks
**Step 2: Run tests to verify they fail**
Run: `cd backend && uv run pytest tests/test_model_factory.py tests/test_tracing_factory.py -q`
Expected: FAIL because there is no provider factory and model creation only attaches LangSmith.
**Step 3: Write minimal implementation**
Create tracing callback factory module and update model factory to use it.
**Step 4: Run tests to verify they pass**
Run: `cd backend && uv run pytest tests/test_model_factory.py tests/test_tracing_factory.py -q`
Expected: PASS
### Task 3: Wire dependency and docs
**Files:**
- Modify: `backend/packages/harness/pyproject.toml`
- Modify: `README.md`
- Modify: `backend/README.md`
**Step 1: Update dependency**
Add `langfuse` to the harness dependencies.
**Step 2: Update docs**
Document:
- Langfuse environment variables
- dual-provider behavior
- failure behavior for explicitly enabled providers
**Step 3: Run targeted verification**
Run: `cd backend && uv run pytest tests/test_tracing_config.py tests/test_model_factory.py tests/test_tracing_factory.py -q`
Expected: PASS
### Task 4: Run broader regression checks
**Files:**
- No code changes required
**Step 1: Run relevant suite**
Run: `cd backend && uv run pytest tests/test_tracing_config.py tests/test_model_factory.py tests/test_tracing_factory.py -q`
**Step 2: Run lint if needed**
Run: `cd backend && uv run ruff check packages/harness/deerflow/config/tracing_config.py packages/harness/deerflow/models/factory.py packages/harness/deerflow/tracing`
**Step 3: Review diff**
Run: `git diff -- backend/packages/harness backend/tests README.md backend/README.md`

View File

@ -10,9 +10,16 @@ function getInternalServiceURL(envKey, fallbackURL) {
? configured.replace(/\/+$/, "")
: fallbackURL;
}
import nextra from "nextra";
const withNextra = nextra({});
/** @type {import("next").NextConfig} */
const config = {
i18n: {
locales: ["en", "zh"],
defaultLocale: "en",
},
devIndicators: false,
async rewrites() {
const rewrites = [];
@ -51,4 +58,4 @@ const config = {
},
};
export default config;
export default withNextra(config);

View File

@ -69,6 +69,8 @@
"nanoid": "^5.1.6",
"next": "^16.1.7",
"next-themes": "^0.4.6",
"nextra": "^4.6.1",
"nextra-theme-docs": "^4.6.1",
"nuxt-og-image": "^5.1.13",
"ogl": "^1.0.11",
"react": "^19.0.0",

Some files were not shown because too many files have changed in this diff Show More