mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-04-25 11:18:22 +00:00
feat(sandbox): add built-in grep and glob tools (#1784)
* feat(sandbox): add grep and glob tools * refactor(aio-sandbox): use native file search APIs * fix(sandbox): address review issues in grep/glob tools - aio_sandbox: use should_ignore_path() instead of should_ignore_name() for include_dirs=True branch to filter nested ignored paths correctly - aio_sandbox: add early exit when max_results reached in glob loop - aio_sandbox: guard entry.path.startswith(path) before stripping prefix - aio_sandbox: validate regex locally before sending to remote API - search: skip lines exceeding max_line_chars to prevent ReDoS - search: remove resolve() syscall in os.walk loop - tools: avoid double get_thread_data() call in glob_tool/grep_tool - tests: add 6 new cases covering the above code paths - tests: patch get_app_config in truncation test to isolate config * Fix sandbox grep/glob review feedback * Remove unrelated Langfuse RFC from PR
This commit is contained in:
parent
9735d73b83
commit
c6cdf200ce
3
.gitignore
vendored
3
.gitignore
vendored
@ -54,4 +54,5 @@ web/
|
|||||||
# Deployment artifacts
|
# Deployment artifacts
|
||||||
backend/Dockerfile.langgraph
|
backend/Dockerfile.langgraph
|
||||||
config.yaml.bak
|
config.yaml.bak
|
||||||
.playwright-mcp
|
.playwright-mcp
|
||||||
|
.gstack/
|
||||||
|
|||||||
446
backend/docs/rfc-grep-glob-tools.md
Normal file
446
backend/docs/rfc-grep-glob-tools.md
Normal file
@ -0,0 +1,446 @@
|
|||||||
|
# [RFC] 在 DeerFlow 中增加 `grep` 与 `glob` 文件搜索工具
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
我认为这个方向是对的,而且值得做。
|
||||||
|
|
||||||
|
如果 DeerFlow 想更接近 Claude Code 这类 coding agent 的实际工作流,仅有 `ls` / `read_file` / `write_file` / `str_replace` 还不够。模型在进入修改前,通常还需要两类能力:
|
||||||
|
|
||||||
|
- `glob`: 快速按路径模式找文件
|
||||||
|
- `grep`: 快速按内容模式找候选位置
|
||||||
|
|
||||||
|
这两类工具的价值,不是“功能上 bash 也能做”,而是它们能以更低 token 成本、更强约束、更稳定的输出格式,替代模型频繁走 `bash find` / `bash grep` / `rg` 的习惯。
|
||||||
|
|
||||||
|
但前提是实现方式要对:**它们应该是只读、结构化、受限、可审计的原生工具,而不是对 shell 命令的简单包装。**
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
当前 DeerFlow 的文件工具层主要覆盖:
|
||||||
|
|
||||||
|
- `ls`: 浏览目录结构
|
||||||
|
- `read_file`: 读取文件内容
|
||||||
|
- `write_file`: 写文件
|
||||||
|
- `str_replace`: 做局部字符串替换
|
||||||
|
- `bash`: 兜底执行命令
|
||||||
|
|
||||||
|
这套能力能完成任务,但在代码库探索阶段效率不高。
|
||||||
|
|
||||||
|
典型问题:
|
||||||
|
|
||||||
|
1. 模型想找 “所有 `*.tsx` 的 page 文件” 时,只能反复 `ls` 多层目录,或者退回 `bash find`
|
||||||
|
2. 模型想找 “某个 symbol / 文案 / 配置键在哪里出现” 时,只能逐文件 `read_file`,或者退回 `bash grep` / `rg`
|
||||||
|
3. 一旦退回 `bash`,工具调用就失去结构化输出,结果也更难做裁剪、分页、审计和跨 sandbox 一致化
|
||||||
|
4. 对没有开启 host bash 的本地模式,`bash` 甚至可能不可用,此时缺少足够强的只读检索能力
|
||||||
|
|
||||||
|
结论:DeerFlow 现在缺的不是“再多一个 shell 命令”,而是**文件系统检索层**。
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
- 为 agent 提供稳定的路径搜索和内容搜索能力
|
||||||
|
- 减少对 `bash` 的依赖,特别是在仓库探索阶段
|
||||||
|
- 保持与现有 sandbox 安全模型一致
|
||||||
|
- 输出格式结构化,便于模型后续串联 `read_file` / `str_replace`
|
||||||
|
- 让本地 sandbox、容器 sandbox、未来 MCP 文件系统工具都能遵守同一语义
|
||||||
|
|
||||||
|
## Non-Goals
|
||||||
|
|
||||||
|
- 不做通用 shell 兼容层
|
||||||
|
- 不暴露完整 grep/find/rg CLI 语法
|
||||||
|
- 不在第一版支持二进制检索、复杂 PCRE 特性、上下文窗口高亮渲染等重功能
|
||||||
|
- 不把它做成“任意磁盘搜索”,仍然只允许在 DeerFlow 已授权的路径内执行
|
||||||
|
|
||||||
|
## Why This Is Worth Doing
|
||||||
|
|
||||||
|
参考 Claude Code 这一类 agent 的设计思路,`glob` 和 `grep` 的核心价值不是新能力本身,而是把“探索代码库”的常见动作从开放式 shell 降到受控工具层。
|
||||||
|
|
||||||
|
这样有几个直接收益:
|
||||||
|
|
||||||
|
1. **更低的模型负担**
|
||||||
|
模型不需要自己拼 `find`, `grep`, `rg`, `xargs`, quoting 等命令细节。
|
||||||
|
|
||||||
|
2. **更稳定的跨环境行为**
|
||||||
|
本地、Docker、AIO sandbox 不必依赖容器里是否装了 `rg`,也不会因为 shell 差异导致行为漂移。
|
||||||
|
|
||||||
|
3. **更强的安全与审计**
|
||||||
|
调用参数就是“搜索什么、在哪搜、最多返回多少”,天然比任意命令更容易审计和限流。
|
||||||
|
|
||||||
|
4. **更好的 token 效率**
|
||||||
|
`grep` 返回的是命中摘要而不是整段文件,模型只对少数候选路径再调用 `read_file`。
|
||||||
|
|
||||||
|
5. **对 `tool_search` 友好**
|
||||||
|
当 DeerFlow 持续扩展工具集时,`grep` / `glob` 会成为非常高频的基础工具,值得保留为 built-in,而不是让模型总是退回通用 bash。
|
||||||
|
|
||||||
|
## Proposal
|
||||||
|
|
||||||
|
增加两个 built-in sandbox tools:
|
||||||
|
|
||||||
|
- `glob`
|
||||||
|
- `grep`
|
||||||
|
|
||||||
|
推荐继续放在:
|
||||||
|
|
||||||
|
- `backend/packages/harness/deerflow/sandbox/tools.py`
|
||||||
|
|
||||||
|
并在 `config.example.yaml` 中默认加入 `file:read` 组。
|
||||||
|
|
||||||
|
### 1. `glob` 工具
|
||||||
|
|
||||||
|
用途:按路径模式查找文件或目录。
|
||||||
|
|
||||||
|
建议 schema:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@tool("glob", parse_docstring=True)
|
||||||
|
def glob_tool(
|
||||||
|
runtime: ToolRuntime[ContextT, ThreadState],
|
||||||
|
description: str,
|
||||||
|
pattern: str,
|
||||||
|
path: str,
|
||||||
|
include_dirs: bool = False,
|
||||||
|
max_results: int = 200,
|
||||||
|
) -> str:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
参数语义:
|
||||||
|
|
||||||
|
- `description`: 与现有工具保持一致
|
||||||
|
- `pattern`: glob 模式,例如 `**/*.py`、`src/**/test_*.ts`
|
||||||
|
- `path`: 搜索根目录,必须是绝对路径
|
||||||
|
- `include_dirs`: 是否返回目录
|
||||||
|
- `max_results`: 最大返回条数,防止一次性打爆上下文
|
||||||
|
|
||||||
|
建议返回格式:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Found 3 paths under /mnt/user-data/workspace
|
||||||
|
1. /mnt/user-data/workspace/backend/app.py
|
||||||
|
2. /mnt/user-data/workspace/backend/tests/test_app.py
|
||||||
|
3. /mnt/user-data/workspace/scripts/build.py
|
||||||
|
```
|
||||||
|
|
||||||
|
如果后续想更适合前端消费,也可以改成 JSON 字符串;但第一版为了兼容现有工具风格,返回可读文本即可。
|
||||||
|
|
||||||
|
### 2. `grep` 工具
|
||||||
|
|
||||||
|
用途:按内容模式搜索文件,返回命中位置摘要。
|
||||||
|
|
||||||
|
建议 schema:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@tool("grep", parse_docstring=True)
|
||||||
|
def grep_tool(
|
||||||
|
runtime: ToolRuntime[ContextT, ThreadState],
|
||||||
|
description: str,
|
||||||
|
pattern: str,
|
||||||
|
path: str,
|
||||||
|
glob: str | None = None,
|
||||||
|
literal: bool = False,
|
||||||
|
case_sensitive: bool = False,
|
||||||
|
max_results: int = 100,
|
||||||
|
) -> str:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
参数语义:
|
||||||
|
|
||||||
|
- `pattern`: 搜索词或正则
|
||||||
|
- `path`: 搜索根目录,必须是绝对路径
|
||||||
|
- `glob`: 可选路径过滤,例如 `**/*.py`
|
||||||
|
- `literal`: 为 `True` 时按普通字符串匹配,不解释为正则
|
||||||
|
- `case_sensitive`: 是否大小写敏感
|
||||||
|
- `max_results`: 最大返回命中数,不是文件数
|
||||||
|
|
||||||
|
建议返回格式:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Found 4 matches under /mnt/user-data/workspace
|
||||||
|
/mnt/user-data/workspace/backend/config.py:12: TOOL_GROUPS = [...]
|
||||||
|
/mnt/user-data/workspace/backend/config.py:48: def load_tool_config(...):
|
||||||
|
/mnt/user-data/workspace/backend/tools.py:91: "tool_groups"
|
||||||
|
/mnt/user-data/workspace/backend/tests/test_config.py:22: assert "tool_groups" in data
|
||||||
|
```
|
||||||
|
|
||||||
|
第一版建议只返回:
|
||||||
|
|
||||||
|
- 文件路径
|
||||||
|
- 行号
|
||||||
|
- 命中行摘要
|
||||||
|
|
||||||
|
不返回上下文块,避免结果过大。模型如果需要上下文,再调用 `read_file(path, start_line, end_line)`。
|
||||||
|
|
||||||
|
## Design Principles
|
||||||
|
|
||||||
|
### A. 不做 shell wrapper
|
||||||
|
|
||||||
|
不建议把 `grep` 实现为:
|
||||||
|
|
||||||
|
```python
|
||||||
|
subprocess.run("grep ...")
|
||||||
|
```
|
||||||
|
|
||||||
|
也不建议在容器里直接拼 `find` / `rg` 命令。
|
||||||
|
|
||||||
|
原因:
|
||||||
|
|
||||||
|
- 会引入 shell quoting 和注入面
|
||||||
|
- 会依赖不同 sandbox 内镜像是否安装同一套命令
|
||||||
|
- Windows / macOS / Linux 行为不一致
|
||||||
|
- 很难稳定控制输出条数与格式
|
||||||
|
|
||||||
|
正确方向是:
|
||||||
|
|
||||||
|
- `glob` 使用 Python 标准库路径遍历
|
||||||
|
- `grep` 使用 Python 逐文件扫描
|
||||||
|
- 输出由 DeerFlow 自己格式化
|
||||||
|
|
||||||
|
如果未来为了性能考虑要优先调用 `rg`,也应该封装在 provider 内部,并保证外部语义不变,而不是把 CLI 暴露给模型。
|
||||||
|
|
||||||
|
### B. 继续沿用 DeerFlow 的路径权限模型
|
||||||
|
|
||||||
|
这两个工具必须复用当前 `ls` / `read_file` 的路径校验逻辑:
|
||||||
|
|
||||||
|
- 本地模式走 `validate_local_tool_path(..., read_only=True)`
|
||||||
|
- 支持 `/mnt/skills/...`
|
||||||
|
- 支持 `/mnt/acp-workspace/...`
|
||||||
|
- 支持 thread workspace / uploads / outputs 的虚拟路径解析
|
||||||
|
- 明确拒绝越权路径与 path traversal
|
||||||
|
|
||||||
|
也就是说,它们属于 **file:read**,不是 `bash` 的替代越权入口。
|
||||||
|
|
||||||
|
### C. 结果必须硬限制
|
||||||
|
|
||||||
|
没有硬限制的 `glob` / `grep` 很容易炸上下文。
|
||||||
|
|
||||||
|
建议第一版至少限制:
|
||||||
|
|
||||||
|
- `glob.max_results` 默认 200,最大 1000
|
||||||
|
- `grep.max_results` 默认 100,最大 500
|
||||||
|
- 单行摘要最大长度,例如 200 字符
|
||||||
|
- 二进制文件跳过
|
||||||
|
- 超大文件跳过,例如单文件大于 1 MB 或按配置控制
|
||||||
|
|
||||||
|
此外,命中数超过阈值时应返回:
|
||||||
|
|
||||||
|
- 已展示的条数
|
||||||
|
- 被截断的事实
|
||||||
|
- 建议用户缩小搜索范围
|
||||||
|
|
||||||
|
例如:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Found more than 100 matches, showing first 100. Narrow the path or add a glob filter.
|
||||||
|
```
|
||||||
|
|
||||||
|
### D. 工具语义要彼此互补
|
||||||
|
|
||||||
|
推荐模型工作流应该是:
|
||||||
|
|
||||||
|
1. `glob` 找候选文件
|
||||||
|
2. `grep` 找候选位置
|
||||||
|
3. `read_file` 读局部上下文
|
||||||
|
4. `str_replace` / `write_file` 执行修改
|
||||||
|
|
||||||
|
这样工具边界清晰,也更利于 prompt 中教模型形成稳定习惯。
|
||||||
|
|
||||||
|
## Implementation Approach
|
||||||
|
|
||||||
|
## Option A: 直接在 `sandbox/tools.py` 实现第一版
|
||||||
|
|
||||||
|
这是我推荐的起步方案。
|
||||||
|
|
||||||
|
做法:
|
||||||
|
|
||||||
|
- 在 `sandbox/tools.py` 新增 `glob_tool` 与 `grep_tool`
|
||||||
|
- 在 local sandbox 场景直接使用 Python 文件系统 API
|
||||||
|
- 在非 local sandbox 场景,优先也通过 DeerFlow 自己控制的路径访问层实现
|
||||||
|
|
||||||
|
优点:
|
||||||
|
|
||||||
|
- 改动小
|
||||||
|
- 能尽快验证 agent 效果
|
||||||
|
- 不需要先改 `Sandbox` 抽象
|
||||||
|
|
||||||
|
缺点:
|
||||||
|
|
||||||
|
- `tools.py` 会继续变胖
|
||||||
|
- 如果未来想在 provider 侧做性能优化,需要再抽象一次
|
||||||
|
|
||||||
|
## Option B: 先扩展 `Sandbox` 抽象
|
||||||
|
|
||||||
|
例如新增:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Sandbox(ABC):
|
||||||
|
def glob(self, path: str, pattern: str, include_dirs: bool = False, max_results: int = 200) -> list[str]:
|
||||||
|
...
|
||||||
|
|
||||||
|
def grep(
|
||||||
|
self,
|
||||||
|
path: str,
|
||||||
|
pattern: str,
|
||||||
|
*,
|
||||||
|
glob: str | None = None,
|
||||||
|
literal: bool = False,
|
||||||
|
case_sensitive: bool = False,
|
||||||
|
max_results: int = 100,
|
||||||
|
) -> list[GrepMatch]:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
优点:
|
||||||
|
|
||||||
|
- 抽象更干净
|
||||||
|
- 容器 / 远程 sandbox 可以各自优化
|
||||||
|
|
||||||
|
缺点:
|
||||||
|
|
||||||
|
- 首次引入成本更高
|
||||||
|
- 需要同步改所有 sandbox provider
|
||||||
|
|
||||||
|
结论:
|
||||||
|
|
||||||
|
**第一版建议走 Option A,等工具价值验证后再下沉到 `Sandbox` 抽象层。**
|
||||||
|
|
||||||
|
## Detailed Behavior
|
||||||
|
|
||||||
|
### `glob` 行为
|
||||||
|
|
||||||
|
- 输入根目录不存在:返回清晰错误
|
||||||
|
- 根路径不是目录:返回清晰错误
|
||||||
|
- 模式非法:返回清晰错误
|
||||||
|
- 结果为空:返回 `No files matched`
|
||||||
|
- 默认忽略项应尽量与当前 `list_dir` 对齐,例如:
|
||||||
|
- `.git`
|
||||||
|
- `node_modules`
|
||||||
|
- `__pycache__`
|
||||||
|
- `.venv`
|
||||||
|
- 构建产物目录
|
||||||
|
|
||||||
|
这里建议抽一个共享 ignore 集,避免 `ls` 与 `glob` 结果风格不一致。
|
||||||
|
|
||||||
|
### `grep` 行为
|
||||||
|
|
||||||
|
- 默认只扫描文本文件
|
||||||
|
- 检测到二进制文件直接跳过
|
||||||
|
- 对超大文件直接跳过或只扫前 N KB
|
||||||
|
- regex 编译失败时返回参数错误
|
||||||
|
- 输出中的路径继续使用虚拟路径,而不是暴露宿主真实路径
|
||||||
|
- 建议默认按文件路径、行号排序,保持稳定输出
|
||||||
|
|
||||||
|
## Prompting Guidance
|
||||||
|
|
||||||
|
如果引入这两个工具,建议同步更新系统提示中的文件操作建议:
|
||||||
|
|
||||||
|
- 查找文件名模式时优先用 `glob`
|
||||||
|
- 查找代码符号、配置项、文案时优先用 `grep`
|
||||||
|
- 只有在工具不足以完成目标时才退回 `bash`
|
||||||
|
|
||||||
|
否则模型仍会习惯性先调用 `bash`。
|
||||||
|
|
||||||
|
## Risks
|
||||||
|
|
||||||
|
### 1. 与 `bash` 能力重叠
|
||||||
|
|
||||||
|
这是事实,但不是问题。
|
||||||
|
|
||||||
|
`ls` 和 `read_file` 也都能被 `bash` 替代,但我们仍然保留它们,因为结构化工具更适合 agent。
|
||||||
|
|
||||||
|
### 2. 性能问题
|
||||||
|
|
||||||
|
在大仓库上,纯 Python `grep` 可能比 `rg` 慢。
|
||||||
|
|
||||||
|
缓解方式:
|
||||||
|
|
||||||
|
- 第一版先加结果上限和文件大小上限
|
||||||
|
- 路径上强制要求 root path
|
||||||
|
- 提供 `glob` 过滤缩小扫描范围
|
||||||
|
- 后续如有必要,在 provider 内部做 `rg` 优化,但保持同一 schema
|
||||||
|
|
||||||
|
### 3. 忽略规则不一致
|
||||||
|
|
||||||
|
如果 `ls` 能看到的路径,`glob` 却看不到,模型会困惑。
|
||||||
|
|
||||||
|
缓解方式:
|
||||||
|
|
||||||
|
- 统一 ignore 规则
|
||||||
|
- 在文档里明确“默认跳过常见依赖和构建目录”
|
||||||
|
|
||||||
|
### 4. 正则搜索过于复杂
|
||||||
|
|
||||||
|
如果第一版就支持大量 grep 方言,边界会很乱。
|
||||||
|
|
||||||
|
缓解方式:
|
||||||
|
|
||||||
|
- 第一版只支持 Python `re`
|
||||||
|
- 并提供 `literal=True` 的简单模式
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
### A. 不增加工具,完全依赖 `bash`
|
||||||
|
|
||||||
|
不推荐。
|
||||||
|
|
||||||
|
这会让 DeerFlow 在代码探索体验上持续落后,也削弱无 bash 或受限 bash 场景下的能力。
|
||||||
|
|
||||||
|
### B. 只加 `glob`,不加 `grep`
|
||||||
|
|
||||||
|
不推荐。
|
||||||
|
|
||||||
|
只解决“找文件”,没有解决“找位置”。模型最终还是会退回 `bash grep`。
|
||||||
|
|
||||||
|
### C. 只加 `grep`,不加 `glob`
|
||||||
|
|
||||||
|
也不推荐。
|
||||||
|
|
||||||
|
`grep` 缺少路径模式过滤时,扫描范围经常太大;`glob` 是它的天然前置工具。
|
||||||
|
|
||||||
|
### D. 直接接入 MCP filesystem server 的搜索能力
|
||||||
|
|
||||||
|
短期不推荐作为主路径。
|
||||||
|
|
||||||
|
MCP 可以是补充,但 `glob` / `grep` 作为 DeerFlow 的基础 coding tool,最好仍然是 built-in,这样才能在默认安装中稳定可用。
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
|
||||||
|
- `config.example.yaml` 中可默认启用 `glob` 与 `grep`
|
||||||
|
- 两个工具归属 `file:read` 组
|
||||||
|
- 本地 sandbox 下严格遵守现有路径权限
|
||||||
|
- 输出不泄露宿主机真实路径
|
||||||
|
- 大结果集会被截断并明确提示
|
||||||
|
- 模型可以通过 `glob -> grep -> read_file -> str_replace` 完成典型改码流
|
||||||
|
- 在禁用 host bash 的本地模式下,仓库探索能力明显提升
|
||||||
|
|
||||||
|
## Rollout Plan
|
||||||
|
|
||||||
|
1. 在 `sandbox/tools.py` 中实现 `glob_tool` 与 `grep_tool`
|
||||||
|
2. 抽取与 `list_dir` 一致的 ignore 规则,避免行为漂移
|
||||||
|
3. 在 `config.example.yaml` 默认加入工具配置
|
||||||
|
4. 为本地路径校验、虚拟路径映射、结果截断、二进制跳过补测试
|
||||||
|
5. 更新 README / backend docs / prompt guidance
|
||||||
|
6. 收集实际 agent 调用数据,再决定是否下沉到 `Sandbox` 抽象
|
||||||
|
|
||||||
|
## Suggested Config
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
tools:
|
||||||
|
- name: glob
|
||||||
|
group: file:read
|
||||||
|
use: deerflow.sandbox.tools:glob_tool
|
||||||
|
|
||||||
|
- name: grep
|
||||||
|
group: file:read
|
||||||
|
use: deerflow.sandbox.tools:grep_tool
|
||||||
|
```
|
||||||
|
|
||||||
|
## Final Recommendation
|
||||||
|
|
||||||
|
结论是:**可以加,而且应该加。**
|
||||||
|
|
||||||
|
但我会明确卡三个边界:
|
||||||
|
|
||||||
|
1. `grep` / `glob` 必须是 built-in 的只读结构化工具
|
||||||
|
2. 第一版不要做 shell wrapper,不要把 CLI 方言直接暴露给模型
|
||||||
|
3. 先在 `sandbox/tools.py` 验证价值,再考虑是否下沉到 `Sandbox` provider 抽象
|
||||||
|
|
||||||
|
如果按这个方向做,它会明显提升 DeerFlow 在 coding / repo exploration 场景下的可用性,而且风险可控。
|
||||||
@ -7,6 +7,7 @@ import uuid
|
|||||||
from agent_sandbox import Sandbox as AioSandboxClient
|
from agent_sandbox import Sandbox as AioSandboxClient
|
||||||
|
|
||||||
from deerflow.sandbox.sandbox import Sandbox
|
from deerflow.sandbox.sandbox import Sandbox
|
||||||
|
from deerflow.sandbox.search import GrepMatch, path_matches, should_ignore_path, truncate_line
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@ -135,6 +136,86 @@ class AioSandbox(Sandbox):
|
|||||||
logger.error(f"Failed to write file in sandbox: {e}")
|
logger.error(f"Failed to write file in sandbox: {e}")
|
||||||
raise
|
raise
|
||||||
|
|
||||||
|
def glob(self, path: str, pattern: str, *, include_dirs: bool = False, max_results: int = 200) -> tuple[list[str], bool]:
|
||||||
|
if not include_dirs:
|
||||||
|
result = self._client.file.find_files(path=path, glob=pattern)
|
||||||
|
files = result.data.files if result.data and result.data.files else []
|
||||||
|
filtered = [file_path for file_path in files if not should_ignore_path(file_path)]
|
||||||
|
truncated = len(filtered) > max_results
|
||||||
|
return filtered[:max_results], truncated
|
||||||
|
|
||||||
|
result = self._client.file.list_path(path=path, recursive=True, show_hidden=False)
|
||||||
|
entries = result.data.files if result.data and result.data.files else []
|
||||||
|
matches: list[str] = []
|
||||||
|
root_path = path.rstrip("/") or "/"
|
||||||
|
root_prefix = root_path if root_path == "/" else f"{root_path}/"
|
||||||
|
for entry in entries:
|
||||||
|
if entry.path != root_path and not entry.path.startswith(root_prefix):
|
||||||
|
continue
|
||||||
|
if should_ignore_path(entry.path):
|
||||||
|
continue
|
||||||
|
rel_path = entry.path[len(root_path) :].lstrip("/")
|
||||||
|
if path_matches(pattern, rel_path):
|
||||||
|
matches.append(entry.path)
|
||||||
|
if len(matches) >= max_results:
|
||||||
|
return matches, True
|
||||||
|
return matches, False
|
||||||
|
|
||||||
|
def grep(
|
||||||
|
self,
|
||||||
|
path: str,
|
||||||
|
pattern: str,
|
||||||
|
*,
|
||||||
|
glob: str | None = None,
|
||||||
|
literal: bool = False,
|
||||||
|
case_sensitive: bool = False,
|
||||||
|
max_results: int = 100,
|
||||||
|
) -> tuple[list[GrepMatch], bool]:
|
||||||
|
import re as _re
|
||||||
|
|
||||||
|
regex_source = _re.escape(pattern) if literal else pattern
|
||||||
|
# Validate the pattern locally so an invalid regex raises re.error
|
||||||
|
# (caught by grep_tool's except re.error handler) rather than a
|
||||||
|
# generic remote API error.
|
||||||
|
_re.compile(regex_source, 0 if case_sensitive else _re.IGNORECASE)
|
||||||
|
regex = regex_source if case_sensitive else f"(?i){regex_source}"
|
||||||
|
|
||||||
|
if glob is not None:
|
||||||
|
find_result = self._client.file.find_files(path=path, glob=glob)
|
||||||
|
candidate_paths = find_result.data.files if find_result.data and find_result.data.files else []
|
||||||
|
else:
|
||||||
|
list_result = self._client.file.list_path(path=path, recursive=True, show_hidden=False)
|
||||||
|
entries = list_result.data.files if list_result.data and list_result.data.files else []
|
||||||
|
candidate_paths = [entry.path for entry in entries if not entry.is_directory]
|
||||||
|
|
||||||
|
matches: list[GrepMatch] = []
|
||||||
|
truncated = False
|
||||||
|
|
||||||
|
for file_path in candidate_paths:
|
||||||
|
if should_ignore_path(file_path):
|
||||||
|
continue
|
||||||
|
|
||||||
|
search_result = self._client.file.search_in_file(file=file_path, regex=regex)
|
||||||
|
data = search_result.data
|
||||||
|
if data is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
line_numbers = data.line_numbers or []
|
||||||
|
matched_lines = data.matches or []
|
||||||
|
for line_number, line in zip(line_numbers, matched_lines):
|
||||||
|
matches.append(
|
||||||
|
GrepMatch(
|
||||||
|
path=file_path,
|
||||||
|
line_number=line_number if isinstance(line_number, int) else 0,
|
||||||
|
line=truncate_line(line),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
if len(matches) >= max_results:
|
||||||
|
truncated = True
|
||||||
|
return matches, truncated
|
||||||
|
|
||||||
|
return matches, truncated
|
||||||
|
|
||||||
def update_file(self, path: str, content: bytes) -> None:
|
def update_file(self, path: str, content: bytes) -> None:
|
||||||
"""Update a file with binary content in the sandbox.
|
"""Update a file with binary content in the sandbox.
|
||||||
|
|
||||||
|
|||||||
@ -1,72 +1,6 @@
|
|||||||
import fnmatch
|
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
IGNORE_PATTERNS = [
|
from deerflow.sandbox.search import should_ignore_name
|
||||||
# Version Control
|
|
||||||
".git",
|
|
||||||
".svn",
|
|
||||||
".hg",
|
|
||||||
".bzr",
|
|
||||||
# Dependencies
|
|
||||||
"node_modules",
|
|
||||||
"__pycache__",
|
|
||||||
".venv",
|
|
||||||
"venv",
|
|
||||||
".env",
|
|
||||||
"env",
|
|
||||||
".tox",
|
|
||||||
".nox",
|
|
||||||
".eggs",
|
|
||||||
"*.egg-info",
|
|
||||||
"site-packages",
|
|
||||||
# Build outputs
|
|
||||||
"dist",
|
|
||||||
"build",
|
|
||||||
".next",
|
|
||||||
".nuxt",
|
|
||||||
".output",
|
|
||||||
".turbo",
|
|
||||||
"target",
|
|
||||||
"out",
|
|
||||||
# IDE & Editor
|
|
||||||
".idea",
|
|
||||||
".vscode",
|
|
||||||
"*.swp",
|
|
||||||
"*.swo",
|
|
||||||
"*~",
|
|
||||||
".project",
|
|
||||||
".classpath",
|
|
||||||
".settings",
|
|
||||||
# OS generated
|
|
||||||
".DS_Store",
|
|
||||||
"Thumbs.db",
|
|
||||||
"desktop.ini",
|
|
||||||
"*.lnk",
|
|
||||||
# Logs & temp files
|
|
||||||
"*.log",
|
|
||||||
"*.tmp",
|
|
||||||
"*.temp",
|
|
||||||
"*.bak",
|
|
||||||
"*.cache",
|
|
||||||
".cache",
|
|
||||||
"logs",
|
|
||||||
# Coverage & test artifacts
|
|
||||||
".coverage",
|
|
||||||
"coverage",
|
|
||||||
".nyc_output",
|
|
||||||
"htmlcov",
|
|
||||||
".pytest_cache",
|
|
||||||
".mypy_cache",
|
|
||||||
".ruff_cache",
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
def _should_ignore(name: str) -> bool:
|
|
||||||
"""Check if a file/directory name matches any ignore pattern."""
|
|
||||||
for pattern in IGNORE_PATTERNS:
|
|
||||||
if fnmatch.fnmatch(name, pattern):
|
|
||||||
return True
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def list_dir(path: str, max_depth: int = 2) -> list[str]:
|
def list_dir(path: str, max_depth: int = 2) -> list[str]:
|
||||||
@ -95,7 +29,7 @@ def list_dir(path: str, max_depth: int = 2) -> list[str]:
|
|||||||
|
|
||||||
try:
|
try:
|
||||||
for item in current_path.iterdir():
|
for item in current_path.iterdir():
|
||||||
if _should_ignore(item.name):
|
if should_ignore_name(item.name):
|
||||||
continue
|
continue
|
||||||
|
|
||||||
post_fix = "/" if item.is_dir() else ""
|
post_fix = "/" if item.is_dir() else ""
|
||||||
|
|||||||
@ -6,6 +6,7 @@ from pathlib import Path
|
|||||||
|
|
||||||
from deerflow.sandbox.local.list_dir import list_dir
|
from deerflow.sandbox.local.list_dir import list_dir
|
||||||
from deerflow.sandbox.sandbox import Sandbox
|
from deerflow.sandbox.sandbox import Sandbox
|
||||||
|
from deerflow.sandbox.search import GrepMatch, find_glob_matches, find_grep_matches
|
||||||
|
|
||||||
|
|
||||||
class LocalSandbox(Sandbox):
|
class LocalSandbox(Sandbox):
|
||||||
@ -259,6 +260,39 @@ class LocalSandbox(Sandbox):
|
|||||||
# Re-raise with the original path for clearer error messages, hiding internal resolved paths
|
# Re-raise with the original path for clearer error messages, hiding internal resolved paths
|
||||||
raise type(e)(e.errno, e.strerror, path) from None
|
raise type(e)(e.errno, e.strerror, path) from None
|
||||||
|
|
||||||
|
def glob(self, path: str, pattern: str, *, include_dirs: bool = False, max_results: int = 200) -> tuple[list[str], bool]:
|
||||||
|
resolved_path = Path(self._resolve_path(path))
|
||||||
|
matches, truncated = find_glob_matches(resolved_path, pattern, include_dirs=include_dirs, max_results=max_results)
|
||||||
|
return [self._reverse_resolve_path(match) for match in matches], truncated
|
||||||
|
|
||||||
|
def grep(
|
||||||
|
self,
|
||||||
|
path: str,
|
||||||
|
pattern: str,
|
||||||
|
*,
|
||||||
|
glob: str | None = None,
|
||||||
|
literal: bool = False,
|
||||||
|
case_sensitive: bool = False,
|
||||||
|
max_results: int = 100,
|
||||||
|
) -> tuple[list[GrepMatch], bool]:
|
||||||
|
resolved_path = Path(self._resolve_path(path))
|
||||||
|
matches, truncated = find_grep_matches(
|
||||||
|
resolved_path,
|
||||||
|
pattern,
|
||||||
|
glob_pattern=glob,
|
||||||
|
literal=literal,
|
||||||
|
case_sensitive=case_sensitive,
|
||||||
|
max_results=max_results,
|
||||||
|
)
|
||||||
|
return [
|
||||||
|
GrepMatch(
|
||||||
|
path=self._reverse_resolve_path(match.path),
|
||||||
|
line_number=match.line_number,
|
||||||
|
line=match.line,
|
||||||
|
)
|
||||||
|
for match in matches
|
||||||
|
], truncated
|
||||||
|
|
||||||
def update_file(self, path: str, content: bytes) -> None:
|
def update_file(self, path: str, content: bytes) -> None:
|
||||||
resolved_path = self._resolve_path(path)
|
resolved_path = self._resolve_path(path)
|
||||||
try:
|
try:
|
||||||
|
|||||||
@ -1,5 +1,7 @@
|
|||||||
from abc import ABC, abstractmethod
|
from abc import ABC, abstractmethod
|
||||||
|
|
||||||
|
from deerflow.sandbox.search import GrepMatch
|
||||||
|
|
||||||
|
|
||||||
class Sandbox(ABC):
|
class Sandbox(ABC):
|
||||||
"""Abstract base class for sandbox environments"""
|
"""Abstract base class for sandbox environments"""
|
||||||
@ -61,6 +63,25 @@ class Sandbox(ABC):
|
|||||||
"""
|
"""
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
def glob(self, path: str, pattern: str, *, include_dirs: bool = False, max_results: int = 200) -> tuple[list[str], bool]:
|
||||||
|
"""Find paths that match a glob pattern under a root directory."""
|
||||||
|
pass
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
def grep(
|
||||||
|
self,
|
||||||
|
path: str,
|
||||||
|
pattern: str,
|
||||||
|
*,
|
||||||
|
glob: str | None = None,
|
||||||
|
literal: bool = False,
|
||||||
|
case_sensitive: bool = False,
|
||||||
|
max_results: int = 100,
|
||||||
|
) -> tuple[list[GrepMatch], bool]:
|
||||||
|
"""Search for matches inside text files under a directory."""
|
||||||
|
pass
|
||||||
|
|
||||||
@abstractmethod
|
@abstractmethod
|
||||||
def update_file(self, path: str, content: bytes) -> None:
|
def update_file(self, path: str, content: bytes) -> None:
|
||||||
"""Update a file with binary content.
|
"""Update a file with binary content.
|
||||||
|
|||||||
210
backend/packages/harness/deerflow/sandbox/search.py
Normal file
210
backend/packages/harness/deerflow/sandbox/search.py
Normal file
@ -0,0 +1,210 @@
|
|||||||
|
import fnmatch
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path, PurePosixPath
|
||||||
|
|
||||||
|
IGNORE_PATTERNS = [
|
||||||
|
".git",
|
||||||
|
".svn",
|
||||||
|
".hg",
|
||||||
|
".bzr",
|
||||||
|
"node_modules",
|
||||||
|
"__pycache__",
|
||||||
|
".venv",
|
||||||
|
"venv",
|
||||||
|
".env",
|
||||||
|
"env",
|
||||||
|
".tox",
|
||||||
|
".nox",
|
||||||
|
".eggs",
|
||||||
|
"*.egg-info",
|
||||||
|
"site-packages",
|
||||||
|
"dist",
|
||||||
|
"build",
|
||||||
|
".next",
|
||||||
|
".nuxt",
|
||||||
|
".output",
|
||||||
|
".turbo",
|
||||||
|
"target",
|
||||||
|
"out",
|
||||||
|
".idea",
|
||||||
|
".vscode",
|
||||||
|
"*.swp",
|
||||||
|
"*.swo",
|
||||||
|
"*~",
|
||||||
|
".project",
|
||||||
|
".classpath",
|
||||||
|
".settings",
|
||||||
|
".DS_Store",
|
||||||
|
"Thumbs.db",
|
||||||
|
"desktop.ini",
|
||||||
|
"*.lnk",
|
||||||
|
"*.log",
|
||||||
|
"*.tmp",
|
||||||
|
"*.temp",
|
||||||
|
"*.bak",
|
||||||
|
"*.cache",
|
||||||
|
".cache",
|
||||||
|
"logs",
|
||||||
|
".coverage",
|
||||||
|
"coverage",
|
||||||
|
".nyc_output",
|
||||||
|
"htmlcov",
|
||||||
|
".pytest_cache",
|
||||||
|
".mypy_cache",
|
||||||
|
".ruff_cache",
|
||||||
|
]
|
||||||
|
|
||||||
|
DEFAULT_MAX_FILE_SIZE_BYTES = 1_000_000
|
||||||
|
DEFAULT_LINE_SUMMARY_LENGTH = 200
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class GrepMatch:
|
||||||
|
path: str
|
||||||
|
line_number: int
|
||||||
|
line: str
|
||||||
|
|
||||||
|
|
||||||
|
def should_ignore_name(name: str) -> bool:
|
||||||
|
for pattern in IGNORE_PATTERNS:
|
||||||
|
if fnmatch.fnmatch(name, pattern):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def should_ignore_path(path: str) -> bool:
|
||||||
|
return any(should_ignore_name(segment) for segment in path.replace("\\", "/").split("/") if segment)
|
||||||
|
|
||||||
|
|
||||||
|
def path_matches(pattern: str, rel_path: str) -> bool:
|
||||||
|
path = PurePosixPath(rel_path)
|
||||||
|
if path.match(pattern):
|
||||||
|
return True
|
||||||
|
if pattern.startswith("**/"):
|
||||||
|
return path.match(pattern[3:])
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def truncate_line(line: str, max_chars: int = DEFAULT_LINE_SUMMARY_LENGTH) -> str:
|
||||||
|
line = line.rstrip("\n\r")
|
||||||
|
if len(line) <= max_chars:
|
||||||
|
return line
|
||||||
|
return line[: max_chars - 3] + "..."
|
||||||
|
|
||||||
|
|
||||||
|
def is_binary_file(path: Path, sample_size: int = 8192) -> bool:
|
||||||
|
try:
|
||||||
|
with path.open("rb") as handle:
|
||||||
|
return b"\0" in handle.read(sample_size)
|
||||||
|
except OSError:
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def find_glob_matches(root: Path, pattern: str, *, include_dirs: bool = False, max_results: int = 200) -> tuple[list[str], bool]:
|
||||||
|
matches: list[str] = []
|
||||||
|
truncated = False
|
||||||
|
root = root.resolve()
|
||||||
|
|
||||||
|
if not root.exists():
|
||||||
|
raise FileNotFoundError(root)
|
||||||
|
if not root.is_dir():
|
||||||
|
raise NotADirectoryError(root)
|
||||||
|
|
||||||
|
for current_root, dirs, files in os.walk(root):
|
||||||
|
dirs[:] = [name for name in dirs if not should_ignore_name(name)]
|
||||||
|
# root is already resolved; os.walk builds current_root by joining under root,
|
||||||
|
# so relative_to() works without an extra stat()/resolve() per directory.
|
||||||
|
rel_dir = Path(current_root).relative_to(root)
|
||||||
|
|
||||||
|
if include_dirs:
|
||||||
|
for name in dirs:
|
||||||
|
rel_path = (rel_dir / name).as_posix()
|
||||||
|
if path_matches(pattern, rel_path):
|
||||||
|
matches.append(str(Path(current_root) / name))
|
||||||
|
if len(matches) >= max_results:
|
||||||
|
truncated = True
|
||||||
|
return matches, truncated
|
||||||
|
|
||||||
|
for name in files:
|
||||||
|
if should_ignore_name(name):
|
||||||
|
continue
|
||||||
|
rel_path = (rel_dir / name).as_posix()
|
||||||
|
if path_matches(pattern, rel_path):
|
||||||
|
matches.append(str(Path(current_root) / name))
|
||||||
|
if len(matches) >= max_results:
|
||||||
|
truncated = True
|
||||||
|
return matches, truncated
|
||||||
|
|
||||||
|
return matches, truncated
|
||||||
|
|
||||||
|
|
||||||
|
def find_grep_matches(
|
||||||
|
root: Path,
|
||||||
|
pattern: str,
|
||||||
|
*,
|
||||||
|
glob_pattern: str | None = None,
|
||||||
|
literal: bool = False,
|
||||||
|
case_sensitive: bool = False,
|
||||||
|
max_results: int = 100,
|
||||||
|
max_file_size: int = DEFAULT_MAX_FILE_SIZE_BYTES,
|
||||||
|
line_summary_length: int = DEFAULT_LINE_SUMMARY_LENGTH,
|
||||||
|
) -> tuple[list[GrepMatch], bool]:
|
||||||
|
matches: list[GrepMatch] = []
|
||||||
|
truncated = False
|
||||||
|
root = root.resolve()
|
||||||
|
|
||||||
|
if not root.exists():
|
||||||
|
raise FileNotFoundError(root)
|
||||||
|
if not root.is_dir():
|
||||||
|
raise NotADirectoryError(root)
|
||||||
|
|
||||||
|
regex_source = re.escape(pattern) if literal else pattern
|
||||||
|
flags = 0 if case_sensitive else re.IGNORECASE
|
||||||
|
regex = re.compile(regex_source, flags)
|
||||||
|
|
||||||
|
# Skip lines longer than this to prevent ReDoS on minified / no-newline files.
|
||||||
|
_max_line_chars = line_summary_length * 10
|
||||||
|
|
||||||
|
for current_root, dirs, files in os.walk(root):
|
||||||
|
dirs[:] = [name for name in dirs if not should_ignore_name(name)]
|
||||||
|
rel_dir = Path(current_root).relative_to(root)
|
||||||
|
|
||||||
|
for name in files:
|
||||||
|
if should_ignore_name(name):
|
||||||
|
continue
|
||||||
|
|
||||||
|
candidate_path = Path(current_root) / name
|
||||||
|
rel_path = (rel_dir / name).as_posix()
|
||||||
|
|
||||||
|
if glob_pattern is not None and not path_matches(glob_pattern, rel_path):
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
if candidate_path.is_symlink():
|
||||||
|
continue
|
||||||
|
file_path = candidate_path.resolve()
|
||||||
|
if not file_path.is_relative_to(root):
|
||||||
|
continue
|
||||||
|
if file_path.stat().st_size > max_file_size or is_binary_file(file_path):
|
||||||
|
continue
|
||||||
|
with file_path.open(encoding="utf-8", errors="replace") as handle:
|
||||||
|
for line_number, line in enumerate(handle, start=1):
|
||||||
|
if len(line) > _max_line_chars:
|
||||||
|
continue
|
||||||
|
if regex.search(line):
|
||||||
|
matches.append(
|
||||||
|
GrepMatch(
|
||||||
|
path=str(file_path),
|
||||||
|
line_number=line_number,
|
||||||
|
line=truncate_line(line, line_summary_length),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
if len(matches) >= max_results:
|
||||||
|
truncated = True
|
||||||
|
return matches, truncated
|
||||||
|
except OSError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
return matches, truncated
|
||||||
@ -7,6 +7,7 @@ from langchain.tools import ToolRuntime, tool
|
|||||||
from langgraph.typing import ContextT
|
from langgraph.typing import ContextT
|
||||||
|
|
||||||
from deerflow.agents.thread_state import ThreadDataState, ThreadState
|
from deerflow.agents.thread_state import ThreadDataState, ThreadState
|
||||||
|
from deerflow.config import get_app_config
|
||||||
from deerflow.config.paths import VIRTUAL_PATH_PREFIX
|
from deerflow.config.paths import VIRTUAL_PATH_PREFIX
|
||||||
from deerflow.sandbox.exceptions import (
|
from deerflow.sandbox.exceptions import (
|
||||||
SandboxError,
|
SandboxError,
|
||||||
@ -16,6 +17,7 @@ from deerflow.sandbox.exceptions import (
|
|||||||
from deerflow.sandbox.file_operation_lock import get_file_operation_lock
|
from deerflow.sandbox.file_operation_lock import get_file_operation_lock
|
||||||
from deerflow.sandbox.sandbox import Sandbox
|
from deerflow.sandbox.sandbox import Sandbox
|
||||||
from deerflow.sandbox.sandbox_provider import get_sandbox_provider
|
from deerflow.sandbox.sandbox_provider import get_sandbox_provider
|
||||||
|
from deerflow.sandbox.search import GrepMatch
|
||||||
from deerflow.sandbox.security import LOCAL_HOST_BASH_DISABLED_MESSAGE, is_host_bash_allowed
|
from deerflow.sandbox.security import LOCAL_HOST_BASH_DISABLED_MESSAGE, is_host_bash_allowed
|
||||||
|
|
||||||
_ABSOLUTE_PATH_PATTERN = re.compile(r"(?<![:\w])(?<!:/)/(?:[^\s\"'`;&|<>()]+)")
|
_ABSOLUTE_PATH_PATTERN = re.compile(r"(?<![:\w])(?<!:/)/(?:[^\s\"'`;&|<>()]+)")
|
||||||
@ -31,6 +33,10 @@ _LOCAL_BASH_SYSTEM_PATH_PREFIXES = (
|
|||||||
|
|
||||||
_DEFAULT_SKILLS_CONTAINER_PATH = "/mnt/skills"
|
_DEFAULT_SKILLS_CONTAINER_PATH = "/mnt/skills"
|
||||||
_ACP_WORKSPACE_VIRTUAL_PATH = "/mnt/acp-workspace"
|
_ACP_WORKSPACE_VIRTUAL_PATH = "/mnt/acp-workspace"
|
||||||
|
_DEFAULT_GLOB_MAX_RESULTS = 200
|
||||||
|
_MAX_GLOB_MAX_RESULTS = 1000
|
||||||
|
_DEFAULT_GREP_MAX_RESULTS = 100
|
||||||
|
_MAX_GREP_MAX_RESULTS = 500
|
||||||
|
|
||||||
|
|
||||||
def _get_skills_container_path() -> str:
|
def _get_skills_container_path() -> str:
|
||||||
@ -245,6 +251,69 @@ def _get_mcp_allowed_paths() -> list[str]:
|
|||||||
return allowed_paths
|
return allowed_paths
|
||||||
|
|
||||||
|
|
||||||
|
def _get_tool_config_int(name: str, key: str, default: int) -> int:
|
||||||
|
try:
|
||||||
|
tool_config = get_app_config().get_tool_config(name)
|
||||||
|
if tool_config is not None and key in tool_config.model_extra:
|
||||||
|
value = tool_config.model_extra.get(key)
|
||||||
|
if isinstance(value, int):
|
||||||
|
return value
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def _clamp_max_results(value: int, *, default: int, upper_bound: int) -> int:
|
||||||
|
if value <= 0:
|
||||||
|
return default
|
||||||
|
return min(value, upper_bound)
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_max_results(name: str, requested: int, *, default: int, upper_bound: int) -> int:
|
||||||
|
requested_max_results = _clamp_max_results(requested, default=default, upper_bound=upper_bound)
|
||||||
|
configured_max_results = _clamp_max_results(
|
||||||
|
_get_tool_config_int(name, "max_results", default),
|
||||||
|
default=default,
|
||||||
|
upper_bound=upper_bound,
|
||||||
|
)
|
||||||
|
return min(requested_max_results, configured_max_results)
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_local_read_path(path: str, thread_data: ThreadDataState) -> str:
|
||||||
|
validate_local_tool_path(path, thread_data, read_only=True)
|
||||||
|
if _is_skills_path(path):
|
||||||
|
return _resolve_skills_path(path)
|
||||||
|
if _is_acp_workspace_path(path):
|
||||||
|
return _resolve_acp_workspace_path(path, _extract_thread_id_from_thread_data(thread_data))
|
||||||
|
return _resolve_and_validate_user_data_path(path, thread_data)
|
||||||
|
|
||||||
|
|
||||||
|
def _format_glob_results(root_path: str, matches: list[str], truncated: bool) -> str:
|
||||||
|
if not matches:
|
||||||
|
return f"No files matched under {root_path}"
|
||||||
|
|
||||||
|
lines = [f"Found {len(matches)} paths under {root_path}"]
|
||||||
|
if truncated:
|
||||||
|
lines[0] += f" (showing first {len(matches)})"
|
||||||
|
lines.extend(f"{index}. {path}" for index, path in enumerate(matches, start=1))
|
||||||
|
if truncated:
|
||||||
|
lines.append("Results truncated. Narrow the path or pattern to see fewer matches.")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def _format_grep_results(root_path: str, matches: list[GrepMatch], truncated: bool) -> str:
|
||||||
|
if not matches:
|
||||||
|
return f"No matches found under {root_path}"
|
||||||
|
|
||||||
|
lines = [f"Found {len(matches)} matches under {root_path}"]
|
||||||
|
if truncated:
|
||||||
|
lines[0] += f" (showing first {len(matches)})"
|
||||||
|
lines.extend(f"{match.path}:{match.line_number}: {match.line}" for match in matches)
|
||||||
|
if truncated:
|
||||||
|
lines.append("Results truncated. Narrow the path or add a glob filter.")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
def _path_variants(path: str) -> set[str]:
|
def _path_variants(path: str) -> set[str]:
|
||||||
return {path, path.replace("\\", "/"), path.replace("/", "\\")}
|
return {path, path.replace("\\", "/"), path.replace("/", "\\")}
|
||||||
|
|
||||||
@ -901,6 +970,126 @@ def ls_tool(runtime: ToolRuntime[ContextT, ThreadState], description: str, path:
|
|||||||
return f"Error: Unexpected error listing directory: {_sanitize_error(e, runtime)}"
|
return f"Error: Unexpected error listing directory: {_sanitize_error(e, runtime)}"
|
||||||
|
|
||||||
|
|
||||||
|
@tool("glob", parse_docstring=True)
|
||||||
|
def glob_tool(
|
||||||
|
runtime: ToolRuntime[ContextT, ThreadState],
|
||||||
|
description: str,
|
||||||
|
pattern: str,
|
||||||
|
path: str,
|
||||||
|
include_dirs: bool = False,
|
||||||
|
max_results: int = _DEFAULT_GLOB_MAX_RESULTS,
|
||||||
|
) -> str:
|
||||||
|
"""Find files or directories that match a glob pattern under a root directory.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
description: Explain why you are searching for these paths in short words. ALWAYS PROVIDE THIS PARAMETER FIRST.
|
||||||
|
pattern: The glob pattern to match relative to the root path, for example `**/*.py`.
|
||||||
|
path: The **absolute** root directory to search under.
|
||||||
|
include_dirs: Whether matching directories should also be returned. Default is False.
|
||||||
|
max_results: Maximum number of paths to return. Default is 200.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
sandbox = ensure_sandbox_initialized(runtime)
|
||||||
|
ensure_thread_directories_exist(runtime)
|
||||||
|
requested_path = path
|
||||||
|
effective_max_results = _resolve_max_results(
|
||||||
|
"glob",
|
||||||
|
max_results,
|
||||||
|
default=_DEFAULT_GLOB_MAX_RESULTS,
|
||||||
|
upper_bound=_MAX_GLOB_MAX_RESULTS,
|
||||||
|
)
|
||||||
|
thread_data = None
|
||||||
|
if is_local_sandbox(runtime):
|
||||||
|
thread_data = get_thread_data(runtime)
|
||||||
|
if thread_data is None:
|
||||||
|
raise SandboxRuntimeError("Thread data not available for local sandbox")
|
||||||
|
path = _resolve_local_read_path(path, thread_data)
|
||||||
|
matches, truncated = sandbox.glob(path, pattern, include_dirs=include_dirs, max_results=effective_max_results)
|
||||||
|
if thread_data is not None:
|
||||||
|
matches = [mask_local_paths_in_output(match, thread_data) for match in matches]
|
||||||
|
return _format_glob_results(requested_path, matches, truncated)
|
||||||
|
except SandboxError as e:
|
||||||
|
return f"Error: {e}"
|
||||||
|
except FileNotFoundError:
|
||||||
|
return f"Error: Directory not found: {requested_path}"
|
||||||
|
except NotADirectoryError:
|
||||||
|
return f"Error: Path is not a directory: {requested_path}"
|
||||||
|
except PermissionError:
|
||||||
|
return f"Error: Permission denied: {requested_path}"
|
||||||
|
except Exception as e:
|
||||||
|
return f"Error: Unexpected error searching paths: {_sanitize_error(e, runtime)}"
|
||||||
|
|
||||||
|
|
||||||
|
@tool("grep", parse_docstring=True)
|
||||||
|
def grep_tool(
|
||||||
|
runtime: ToolRuntime[ContextT, ThreadState],
|
||||||
|
description: str,
|
||||||
|
pattern: str,
|
||||||
|
path: str,
|
||||||
|
glob: str | None = None,
|
||||||
|
literal: bool = False,
|
||||||
|
case_sensitive: bool = False,
|
||||||
|
max_results: int = _DEFAULT_GREP_MAX_RESULTS,
|
||||||
|
) -> str:
|
||||||
|
"""Search for matching lines inside text files under a root directory.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
description: Explain why you are searching file contents in short words. ALWAYS PROVIDE THIS PARAMETER FIRST.
|
||||||
|
pattern: The string or regex pattern to search for.
|
||||||
|
path: The **absolute** root directory to search under.
|
||||||
|
glob: Optional glob filter for candidate files, for example `**/*.py`.
|
||||||
|
literal: Whether to treat `pattern` as a plain string. Default is False.
|
||||||
|
case_sensitive: Whether matching is case-sensitive. Default is False.
|
||||||
|
max_results: Maximum number of matching lines to return. Default is 100.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
sandbox = ensure_sandbox_initialized(runtime)
|
||||||
|
ensure_thread_directories_exist(runtime)
|
||||||
|
requested_path = path
|
||||||
|
effective_max_results = _resolve_max_results(
|
||||||
|
"grep",
|
||||||
|
max_results,
|
||||||
|
default=_DEFAULT_GREP_MAX_RESULTS,
|
||||||
|
upper_bound=_MAX_GREP_MAX_RESULTS,
|
||||||
|
)
|
||||||
|
thread_data = None
|
||||||
|
if is_local_sandbox(runtime):
|
||||||
|
thread_data = get_thread_data(runtime)
|
||||||
|
if thread_data is None:
|
||||||
|
raise SandboxRuntimeError("Thread data not available for local sandbox")
|
||||||
|
path = _resolve_local_read_path(path, thread_data)
|
||||||
|
matches, truncated = sandbox.grep(
|
||||||
|
path,
|
||||||
|
pattern,
|
||||||
|
glob=glob,
|
||||||
|
literal=literal,
|
||||||
|
case_sensitive=case_sensitive,
|
||||||
|
max_results=effective_max_results,
|
||||||
|
)
|
||||||
|
if thread_data is not None:
|
||||||
|
matches = [
|
||||||
|
GrepMatch(
|
||||||
|
path=mask_local_paths_in_output(match.path, thread_data),
|
||||||
|
line_number=match.line_number,
|
||||||
|
line=match.line,
|
||||||
|
)
|
||||||
|
for match in matches
|
||||||
|
]
|
||||||
|
return _format_grep_results(requested_path, matches, truncated)
|
||||||
|
except SandboxError as e:
|
||||||
|
return f"Error: {e}"
|
||||||
|
except FileNotFoundError:
|
||||||
|
return f"Error: Directory not found: {requested_path}"
|
||||||
|
except NotADirectoryError:
|
||||||
|
return f"Error: Path is not a directory: {requested_path}"
|
||||||
|
except re.error as e:
|
||||||
|
return f"Error: Invalid regex pattern: {e}"
|
||||||
|
except PermissionError:
|
||||||
|
return f"Error: Permission denied: {requested_path}"
|
||||||
|
except Exception as e:
|
||||||
|
return f"Error: Unexpected error searching file contents: {_sanitize_error(e, runtime)}"
|
||||||
|
|
||||||
|
|
||||||
@tool("read_file", parse_docstring=True)
|
@tool("read_file", parse_docstring=True)
|
||||||
def read_file_tool(
|
def read_file_tool(
|
||||||
runtime: ToolRuntime[ContextT, ThreadState],
|
runtime: ToolRuntime[ContextT, ThreadState],
|
||||||
|
|||||||
393
backend/tests/test_sandbox_search_tools.py
Normal file
393
backend/tests/test_sandbox_search_tools.py
Normal file
@ -0,0 +1,393 @@
|
|||||||
|
from types import SimpleNamespace
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
from deerflow.community.aio_sandbox.aio_sandbox import AioSandbox
|
||||||
|
from deerflow.sandbox.local.local_sandbox import LocalSandbox
|
||||||
|
from deerflow.sandbox.search import GrepMatch, find_glob_matches, find_grep_matches
|
||||||
|
from deerflow.sandbox.tools import glob_tool, grep_tool
|
||||||
|
|
||||||
|
|
||||||
|
def _make_runtime(tmp_path):
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
uploads = tmp_path / "uploads"
|
||||||
|
outputs = tmp_path / "outputs"
|
||||||
|
workspace.mkdir()
|
||||||
|
uploads.mkdir()
|
||||||
|
outputs.mkdir()
|
||||||
|
return SimpleNamespace(
|
||||||
|
state={
|
||||||
|
"sandbox": {"sandbox_id": "local"},
|
||||||
|
"thread_data": {
|
||||||
|
"workspace_path": str(workspace),
|
||||||
|
"uploads_path": str(uploads),
|
||||||
|
"outputs_path": str(outputs),
|
||||||
|
},
|
||||||
|
},
|
||||||
|
context={"thread_id": "thread-1"},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_glob_tool_returns_virtual_paths_and_ignores_common_dirs(tmp_path, monkeypatch) -> None:
|
||||||
|
runtime = _make_runtime(tmp_path)
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
(workspace / "app.py").write_text("print('hi')\n", encoding="utf-8")
|
||||||
|
(workspace / "pkg").mkdir()
|
||||||
|
(workspace / "pkg" / "util.py").write_text("print('util')\n", encoding="utf-8")
|
||||||
|
(workspace / "node_modules").mkdir()
|
||||||
|
(workspace / "node_modules" / "skip.py").write_text("ignored\n", encoding="utf-8")
|
||||||
|
|
||||||
|
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
|
||||||
|
|
||||||
|
result = glob_tool.func(
|
||||||
|
runtime=runtime,
|
||||||
|
description="find python files",
|
||||||
|
pattern="**/*.py",
|
||||||
|
path="/mnt/user-data/workspace",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "/mnt/user-data/workspace/app.py" in result
|
||||||
|
assert "/mnt/user-data/workspace/pkg/util.py" in result
|
||||||
|
assert "node_modules" not in result
|
||||||
|
assert str(workspace) not in result
|
||||||
|
|
||||||
|
|
||||||
|
def test_glob_tool_supports_skills_virtual_paths(tmp_path, monkeypatch) -> None:
|
||||||
|
runtime = _make_runtime(tmp_path)
|
||||||
|
skills_dir = tmp_path / "skills"
|
||||||
|
(skills_dir / "public" / "demo").mkdir(parents=True)
|
||||||
|
(skills_dir / "public" / "demo" / "SKILL.md").write_text("# Demo\n", encoding="utf-8")
|
||||||
|
|
||||||
|
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
|
||||||
|
|
||||||
|
with (
|
||||||
|
patch("deerflow.sandbox.tools._get_skills_container_path", return_value="/mnt/skills"),
|
||||||
|
patch("deerflow.sandbox.tools._get_skills_host_path", return_value=str(skills_dir)),
|
||||||
|
):
|
||||||
|
result = glob_tool.func(
|
||||||
|
runtime=runtime,
|
||||||
|
description="find skills",
|
||||||
|
pattern="**/SKILL.md",
|
||||||
|
path="/mnt/skills",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "/mnt/skills/public/demo/SKILL.md" in result
|
||||||
|
assert str(skills_dir) not in result
|
||||||
|
|
||||||
|
|
||||||
|
def test_grep_tool_filters_by_glob_and_skips_binary_files(tmp_path, monkeypatch) -> None:
|
||||||
|
runtime = _make_runtime(tmp_path)
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
(workspace / "main.py").write_text("TODO = 'ship it'\nprint(TODO)\n", encoding="utf-8")
|
||||||
|
(workspace / "notes.txt").write_text("TODO in txt should be filtered\n", encoding="utf-8")
|
||||||
|
(workspace / "image.bin").write_bytes(b"\0binary TODO")
|
||||||
|
|
||||||
|
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
|
||||||
|
|
||||||
|
result = grep_tool.func(
|
||||||
|
runtime=runtime,
|
||||||
|
description="find todo references",
|
||||||
|
pattern="TODO",
|
||||||
|
path="/mnt/user-data/workspace",
|
||||||
|
glob="**/*.py",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "/mnt/user-data/workspace/main.py:1: TODO = 'ship it'" in result
|
||||||
|
assert "notes.txt" not in result
|
||||||
|
assert "image.bin" not in result
|
||||||
|
assert str(workspace) not in result
|
||||||
|
|
||||||
|
|
||||||
|
def test_grep_tool_truncates_results(tmp_path, monkeypatch) -> None:
|
||||||
|
runtime = _make_runtime(tmp_path)
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
(workspace / "main.py").write_text("TODO one\nTODO two\nTODO three\n", encoding="utf-8")
|
||||||
|
|
||||||
|
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
|
||||||
|
# Prevent config.yaml tool config from overriding the caller-supplied max_results=2.
|
||||||
|
monkeypatch.setattr("deerflow.sandbox.tools.get_app_config", lambda: SimpleNamespace(get_tool_config=lambda name: None))
|
||||||
|
|
||||||
|
result = grep_tool.func(
|
||||||
|
runtime=runtime,
|
||||||
|
description="limit matches",
|
||||||
|
pattern="TODO",
|
||||||
|
path="/mnt/user-data/workspace",
|
||||||
|
max_results=2,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "Found 2 matches under /mnt/user-data/workspace (showing first 2)" in result
|
||||||
|
assert "TODO one" in result
|
||||||
|
assert "TODO two" in result
|
||||||
|
assert "TODO three" not in result
|
||||||
|
assert "Results truncated." in result
|
||||||
|
|
||||||
|
|
||||||
|
def test_glob_tool_include_dirs_filters_nested_ignored_paths(tmp_path, monkeypatch) -> None:
|
||||||
|
runtime = _make_runtime(tmp_path)
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
(workspace / "src").mkdir()
|
||||||
|
(workspace / "src" / "main.py").write_text("x\n", encoding="utf-8")
|
||||||
|
(workspace / "node_modules").mkdir()
|
||||||
|
(workspace / "node_modules" / "lib").mkdir()
|
||||||
|
|
||||||
|
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
|
||||||
|
|
||||||
|
result = glob_tool.func(
|
||||||
|
runtime=runtime,
|
||||||
|
description="find dirs",
|
||||||
|
pattern="**",
|
||||||
|
path="/mnt/user-data/workspace",
|
||||||
|
include_dirs=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "src" in result
|
||||||
|
assert "node_modules" not in result
|
||||||
|
|
||||||
|
|
||||||
|
def test_grep_tool_literal_mode(tmp_path, monkeypatch) -> None:
|
||||||
|
runtime = _make_runtime(tmp_path)
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
(workspace / "file.py").write_text("price = (a+b)\nresult = a+b\n", encoding="utf-8")
|
||||||
|
|
||||||
|
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
|
||||||
|
|
||||||
|
# literal=True should treat (a+b) as a plain string, not a regex group
|
||||||
|
result = grep_tool.func(
|
||||||
|
runtime=runtime,
|
||||||
|
description="literal search",
|
||||||
|
pattern="(a+b)",
|
||||||
|
path="/mnt/user-data/workspace",
|
||||||
|
literal=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "price = (a+b)" in result
|
||||||
|
assert "result = a+b" not in result
|
||||||
|
|
||||||
|
|
||||||
|
def test_grep_tool_case_sensitive(tmp_path, monkeypatch) -> None:
|
||||||
|
runtime = _make_runtime(tmp_path)
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
(workspace / "file.py").write_text("TODO: fix\ntodo: also fix\n", encoding="utf-8")
|
||||||
|
|
||||||
|
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
|
||||||
|
|
||||||
|
result = grep_tool.func(
|
||||||
|
runtime=runtime,
|
||||||
|
description="case sensitive search",
|
||||||
|
pattern="TODO",
|
||||||
|
path="/mnt/user-data/workspace",
|
||||||
|
case_sensitive=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "TODO: fix" in result
|
||||||
|
assert "todo: also fix" not in result
|
||||||
|
|
||||||
|
|
||||||
|
def test_grep_tool_invalid_regex_returns_error(tmp_path, monkeypatch) -> None:
|
||||||
|
runtime = _make_runtime(tmp_path)
|
||||||
|
|
||||||
|
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
|
||||||
|
|
||||||
|
result = grep_tool.func(
|
||||||
|
runtime=runtime,
|
||||||
|
description="bad pattern",
|
||||||
|
pattern="[invalid",
|
||||||
|
path="/mnt/user-data/workspace",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "Invalid regex pattern" in result
|
||||||
|
|
||||||
|
|
||||||
|
def test_aio_sandbox_glob_include_dirs_filters_nested_ignored(monkeypatch) -> None:
|
||||||
|
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
|
||||||
|
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
|
||||||
|
monkeypatch.setattr(
|
||||||
|
sandbox._client.file,
|
||||||
|
"list_path",
|
||||||
|
lambda **kwargs: SimpleNamespace(
|
||||||
|
data=SimpleNamespace(
|
||||||
|
files=[
|
||||||
|
SimpleNamespace(name="src", path="/mnt/workspace/src"),
|
||||||
|
SimpleNamespace(name="node_modules", path="/mnt/workspace/node_modules"),
|
||||||
|
# child of node_modules — should be filtered via should_ignore_path
|
||||||
|
SimpleNamespace(name="lib", path="/mnt/workspace/node_modules/lib"),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
matches, truncated = sandbox.glob("/mnt/workspace", "**", include_dirs=True)
|
||||||
|
|
||||||
|
assert "/mnt/workspace/src" in matches
|
||||||
|
assert "/mnt/workspace/node_modules" not in matches
|
||||||
|
assert "/mnt/workspace/node_modules/lib" not in matches
|
||||||
|
assert truncated is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_aio_sandbox_grep_invalid_regex_raises() -> None:
|
||||||
|
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
|
||||||
|
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
try:
|
||||||
|
sandbox.grep("/mnt/workspace", "[invalid")
|
||||||
|
assert False, "Expected re.error"
|
||||||
|
except re.error:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def test_aio_sandbox_glob_parses_json(monkeypatch) -> None:
|
||||||
|
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
|
||||||
|
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
|
||||||
|
monkeypatch.setattr(
|
||||||
|
sandbox._client.file,
|
||||||
|
"find_files",
|
||||||
|
lambda **kwargs: SimpleNamespace(data=SimpleNamespace(files=["/mnt/user-data/workspace/app.py", "/mnt/user-data/workspace/node_modules/skip.py"])),
|
||||||
|
)
|
||||||
|
|
||||||
|
matches, truncated = sandbox.glob("/mnt/user-data/workspace", "**/*.py")
|
||||||
|
|
||||||
|
assert matches == ["/mnt/user-data/workspace/app.py"]
|
||||||
|
assert truncated is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_aio_sandbox_grep_parses_json(monkeypatch) -> None:
|
||||||
|
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
|
||||||
|
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
|
||||||
|
monkeypatch.setattr(
|
||||||
|
sandbox._client.file,
|
||||||
|
"list_path",
|
||||||
|
lambda **kwargs: SimpleNamespace(
|
||||||
|
data=SimpleNamespace(
|
||||||
|
files=[
|
||||||
|
SimpleNamespace(
|
||||||
|
name="app.py",
|
||||||
|
path="/mnt/user-data/workspace/app.py",
|
||||||
|
is_directory=False,
|
||||||
|
)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
sandbox._client.file,
|
||||||
|
"search_in_file",
|
||||||
|
lambda **kwargs: SimpleNamespace(data=SimpleNamespace(line_numbers=[7], matches=["TODO = True"])),
|
||||||
|
)
|
||||||
|
|
||||||
|
matches, truncated = sandbox.grep("/mnt/user-data/workspace", "TODO")
|
||||||
|
|
||||||
|
assert matches == [GrepMatch(path="/mnt/user-data/workspace/app.py", line_number=7, line="TODO = True")]
|
||||||
|
assert truncated is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_find_glob_matches_raises_not_a_directory(tmp_path) -> None:
|
||||||
|
file_path = tmp_path / "file.txt"
|
||||||
|
file_path.write_text("x\n", encoding="utf-8")
|
||||||
|
|
||||||
|
try:
|
||||||
|
find_glob_matches(file_path, "**/*.py")
|
||||||
|
assert False, "Expected NotADirectoryError"
|
||||||
|
except NotADirectoryError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def test_find_grep_matches_raises_not_a_directory(tmp_path) -> None:
|
||||||
|
file_path = tmp_path / "file.txt"
|
||||||
|
file_path.write_text("TODO\n", encoding="utf-8")
|
||||||
|
|
||||||
|
try:
|
||||||
|
find_grep_matches(file_path, "TODO")
|
||||||
|
assert False, "Expected NotADirectoryError"
|
||||||
|
except NotADirectoryError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def test_find_grep_matches_skips_symlink_outside_root(tmp_path) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
outside = tmp_path / "outside.txt"
|
||||||
|
outside.write_text("TODO outside\n", encoding="utf-8")
|
||||||
|
(workspace / "outside-link.txt").symlink_to(outside)
|
||||||
|
|
||||||
|
matches, truncated = find_grep_matches(workspace, "TODO")
|
||||||
|
|
||||||
|
assert matches == []
|
||||||
|
assert truncated is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_glob_tool_honors_smaller_requested_max_results(tmp_path, monkeypatch) -> None:
|
||||||
|
runtime = _make_runtime(tmp_path)
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
(workspace / "a.py").write_text("print('a')\n", encoding="utf-8")
|
||||||
|
(workspace / "b.py").write_text("print('b')\n", encoding="utf-8")
|
||||||
|
(workspace / "c.py").write_text("print('c')\n", encoding="utf-8")
|
||||||
|
|
||||||
|
monkeypatch.setattr("deerflow.sandbox.tools.ensure_sandbox_initialized", lambda runtime: LocalSandbox(id="local"))
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"deerflow.sandbox.tools.get_app_config",
|
||||||
|
lambda: SimpleNamespace(get_tool_config=lambda name: SimpleNamespace(model_extra={"max_results": 50})),
|
||||||
|
)
|
||||||
|
|
||||||
|
result = glob_tool.func(
|
||||||
|
runtime=runtime,
|
||||||
|
description="limit glob matches",
|
||||||
|
pattern="**/*.py",
|
||||||
|
path="/mnt/user-data/workspace",
|
||||||
|
max_results=2,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "Found 2 paths under /mnt/user-data/workspace (showing first 2)" in result
|
||||||
|
assert "Results truncated." in result
|
||||||
|
|
||||||
|
|
||||||
|
def test_aio_sandbox_glob_include_dirs_enforces_root_boundary(monkeypatch) -> None:
|
||||||
|
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
|
||||||
|
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
|
||||||
|
monkeypatch.setattr(
|
||||||
|
sandbox._client.file,
|
||||||
|
"list_path",
|
||||||
|
lambda **kwargs: SimpleNamespace(
|
||||||
|
data=SimpleNamespace(
|
||||||
|
files=[
|
||||||
|
SimpleNamespace(name="src", path="/mnt/workspace/src"),
|
||||||
|
SimpleNamespace(name="src2", path="/mnt/workspace2/src2"),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
matches, truncated = sandbox.glob("/mnt/workspace", "**", include_dirs=True)
|
||||||
|
|
||||||
|
assert matches == ["/mnt/workspace/src"]
|
||||||
|
assert truncated is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_aio_sandbox_grep_skips_mismatched_line_number_payloads(monkeypatch) -> None:
|
||||||
|
with patch("deerflow.community.aio_sandbox.aio_sandbox.AioSandboxClient"):
|
||||||
|
sandbox = AioSandbox(id="test-sandbox", base_url="http://localhost:8080")
|
||||||
|
monkeypatch.setattr(
|
||||||
|
sandbox._client.file,
|
||||||
|
"list_path",
|
||||||
|
lambda **kwargs: SimpleNamespace(
|
||||||
|
data=SimpleNamespace(
|
||||||
|
files=[
|
||||||
|
SimpleNamespace(
|
||||||
|
name="app.py",
|
||||||
|
path="/mnt/user-data/workspace/app.py",
|
||||||
|
is_directory=False,
|
||||||
|
)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
sandbox._client.file,
|
||||||
|
"search_in_file",
|
||||||
|
lambda **kwargs: SimpleNamespace(data=SimpleNamespace(line_numbers=[7], matches=["TODO = True", "extra"])),
|
||||||
|
)
|
||||||
|
|
||||||
|
matches, truncated = sandbox.grep("/mnt/user-data/workspace", "TODO")
|
||||||
|
|
||||||
|
assert matches == [GrepMatch(path="/mnt/user-data/workspace/app.py", line_number=7, line="TODO = True")]
|
||||||
|
assert truncated is False
|
||||||
@ -325,6 +325,16 @@ tools:
|
|||||||
group: file:read
|
group: file:read
|
||||||
use: deerflow.sandbox.tools:read_file_tool
|
use: deerflow.sandbox.tools:read_file_tool
|
||||||
|
|
||||||
|
- name: glob
|
||||||
|
group: file:read
|
||||||
|
use: deerflow.sandbox.tools:glob_tool
|
||||||
|
max_results: 200
|
||||||
|
|
||||||
|
- name: grep
|
||||||
|
group: file:read
|
||||||
|
use: deerflow.sandbox.tools:grep_tool
|
||||||
|
max_results: 100
|
||||||
|
|
||||||
- name: write_file
|
- name: write_file
|
||||||
group: file:write
|
group: file:write
|
||||||
use: deerflow.sandbox.tools:write_file_tool
|
use: deerflow.sandbox.tools:write_file_tool
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user