mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-09 09:02:02 +00:00
* fix(dev): create backend/sandbox before uvicorn reload-exclude (#3459) #3426 switched the dev gateway's --reload-exclude patterns to absolute paths. uvicorn only excludes an absolute path directly when it already exists as a directory; otherwise it globs the pattern, and Python 3.12's pathlib raises NotImplementedError("Non-relative patterns are unsupported") for an absolute glob pattern. serve.sh mkdir'd the .deer-flow excludes but not backend/sandbox, so `make dev` crashed on startup on a fresh checkout under Python 3.12 (#3454). docker/dev-entrypoint.sh had the same latent gap. Create backend/sandbox in both launchers so every absolute exclude stays on uvicorn's is_dir() short-circuit. Add a regression test that pins the uvicorn mechanism (crash on missing dir, safe once created) and enforces that every absolute --reload-exclude is mkdir'd before launch. Closes #3459 * test(dev): harden reload-exclude invariant parser against false pass/negatives The launcher invariant test parsed shell with a "mkdir -p" line filter and a substring membership check. Two latent gaps (sub-threshold for this fix, but this code guards a user-facing startup path, so close them): - A `\`-continued multi-line `mkdir` would drop arguments on continuation lines, silently weakening coverage. - Substring membership could false-pass when an exclude is a path-prefix of a different created dir (e.g. `/app/backend/sandbox` "found" inside `/app/backend/sandbox-other`). Fold line-continuations, drop comments, and shlex-tokenize each `mkdir` argument list into an exact set (quotes stripped, `$VAR` literal); assert exact set membership. Same shlex handling for `--reload-exclude` values. Verified the parser still flags the pre-fix missing `backend/sandbox` (RED preserved) and no longer false-passes on a path-prefix. * fix(dev): gitignore backend/sandbox runtime dir + pin mkdir-before-launch Address two review findings on the #3459 fix: - backend/sandbox was described as "gitignored runtime state" but no ignore rule actually matched it. Add an anchored `/sandbox/` to backend/.gitignore (anchored so it does NOT shadow the source package backend/packages/harness/deerflow/sandbox/) so sandbox artifacts created at runtime can't pollute the working tree or be committed by accident. New test asserts content under backend/sandbox is ignored, making the claim verifiable. - The launcher invariant test only proved the sandbox mkdir exists somewhere, not that it runs before uvicorn starts. Add an order test (sandbox mkdir line must precede the `uv run uvicorn` launch) so a future edit can't move the mkdir below the launch and silently reintroduce the crash. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * test(dev): fix reload-exclude parser to handle serve.sh's quoted flag bundle The previous autofix tokenized each whole line with shlex, but serve.sh packs every flag into a single double-quoted `GATEWAY_EXTRA_FLAGS="..."` assignment. shlex collapses that into one token, so no `--reload-exclude` flag is found and `test_launcher_precreates_every_absolute_reload_exclude[scripts/serve.sh]` failed CI with "expected at least one absolute reload-exclude". Parse `--reload-exclude` with a regex that matches a balanced single/double quoted group or a bare token, so the assignment's surrounding `"` is never swallowed into the value. This recovers all three serve.sh excludes (the prior regex also silently dropped the last `$BACKEND_RUNTIME_HOME` because the adjacent closing quote broke shlex) while still covering dev-entrypoint.sh and the space-separated `--reload-exclude <value>` form. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
parent
0fb18e368c
commit
93e3281cbf
5
backend/.gitignore
vendored
5
backend/.gitignore
vendored
@ -24,5 +24,10 @@ config.yaml
|
||||
# Langgraph
|
||||
.langgraph_api
|
||||
|
||||
# Sandbox runtime working dir — pre-created and excluded from uvicorn reload
|
||||
# (scripts/serve.sh, docker/dev-entrypoint.sh). Anchored so it does not match
|
||||
# the source package backend/packages/harness/deerflow/sandbox/.
|
||||
/sandbox/
|
||||
|
||||
# Claude Code settings
|
||||
.claude/settings.local.json
|
||||
|
||||
@ -44,7 +44,8 @@ def test_entrypoint_excludes_runtime_state_from_uvicorn_reload():
|
||||
content = ENTRYPOINT.read_text(encoding="utf-8")
|
||||
|
||||
assert ': "${DEER_FLOW_HOME:=/app/backend/.deer-flow}"' in content
|
||||
assert 'mkdir -p "$DEER_FLOW_HOME" /app/backend/.deer-flow' in content
|
||||
# sandbox must be created too, not just .deer-flow (#3459 / #3454).
|
||||
assert 'mkdir -p "$DEER_FLOW_HOME" /app/backend/.deer-flow /app/backend/sandbox' in content
|
||||
assert "--reload-include='*.yaml .env'" not in content
|
||||
assert "--reload-include='*.yaml'" in content
|
||||
assert "--reload-include='.env'" in content
|
||||
|
||||
@ -49,7 +49,9 @@ def test_local_dev_gateway_reload_excludes_runtime_state_with_absolute_dirs():
|
||||
assert 'export DEER_FLOW_PROJECT_ROOT="$REPO_ROOT"' in serve_sh
|
||||
assert 'BACKEND_RUNTIME_HOME="$REPO_ROOT/backend/.deer-flow"' in serve_sh
|
||||
assert 'export DEER_FLOW_HOME="$BACKEND_RUNTIME_HOME"' in serve_sh
|
||||
assert 'mkdir -p "$DEER_FLOW_HOME" "$BACKEND_RUNTIME_HOME"' in serve_sh
|
||||
# Every absolute reload-exclude must be pre-created, including backend/sandbox
|
||||
# (#3459 / #3454) — see test_uvicorn_reload_exclude.py for the mechanism.
|
||||
assert 'mkdir -p "$DEER_FLOW_HOME" "$BACKEND_RUNTIME_HOME" "$REPO_ROOT/backend/sandbox"' in serve_sh
|
||||
assert "--reload-exclude='$DEER_FLOW_HOME'" in serve_sh
|
||||
assert "--reload-exclude='$BACKEND_RUNTIME_HOME'" in serve_sh
|
||||
assert "--reload-exclude='sandbox/'" not in serve_sh
|
||||
|
||||
185
backend/tests/test_uvicorn_reload_exclude.py
Normal file
185
backend/tests/test_uvicorn_reload_exclude.py
Normal file
@ -0,0 +1,185 @@
|
||||
"""Regression for #3459 / #3454 — dev gateway reload-exclude must not crash.
|
||||
|
||||
#3426 switched the dev gateway's ``--reload-exclude`` patterns from relative
|
||||
(``sandbox/``) to absolute (``$REPO_ROOT/backend/sandbox``). uvicorn only
|
||||
excludes such a path directly when it already exists as a directory; otherwise
|
||||
it falls back to ``Path.cwd().glob(pattern)``, and on **Python 3.12**
|
||||
``pathlib.Path.glob()`` raises ``NotImplementedError: Non-relative patterns are
|
||||
unsupported`` for an absolute pattern. ``serve.sh`` created the ``.deer-flow``
|
||||
excludes but not ``backend/sandbox``, so a fresh checkout crashed ``make dev``
|
||||
on startup.
|
||||
|
||||
Two layers of coverage:
|
||||
|
||||
* ``test_*_resolve_*`` exercises uvicorn's real ``resolve_reload_patterns`` to
|
||||
pin the failure mode and the fix's mechanism.
|
||||
* ``test_launcher_precreates_every_absolute_reload_exclude`` enforces the actual
|
||||
invariant on both launchers: every absolute exclude dir is ``mkdir -p``'d
|
||||
before uvicorn starts. This encodes the root cause, so any future absolute
|
||||
exclude that forgets its ``mkdir`` fails here.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
import shlex
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
from uvicorn.config import resolve_reload_patterns
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parents[2]
|
||||
|
||||
LAUNCHERS = {
|
||||
"scripts/serve.sh": REPO_ROOT / "scripts" / "serve.sh",
|
||||
"docker/dev-entrypoint.sh": REPO_ROOT / "docker" / "dev-entrypoint.sh",
|
||||
}
|
||||
|
||||
# Shell terminators / redirects that end a simple command's argument list.
|
||||
_CMD_BOUNDARY = re.compile(r"[;&|<>]")
|
||||
|
||||
|
||||
def _logical_lines(script: str) -> list[str]:
|
||||
"""Fold ``\\``-continuations and drop comment lines, yielding logical lines.
|
||||
|
||||
A ``mkdir`` or ``--reload-exclude`` list split across lines with a trailing
|
||||
backslash becomes one line here, so an argument on a continuation line can't
|
||||
be silently dropped by per-line scanning.
|
||||
"""
|
||||
folded = script.replace("\\\n", " ")
|
||||
return [line for line in folded.splitlines() if not line.lstrip().startswith("#")]
|
||||
|
||||
|
||||
def _shlex(fragment: str) -> list[str]:
|
||||
"""Tokenize a shell fragment (quotes stripped, ``$VAR`` kept literal,
|
||||
trailing ``# comment`` honored); tolerate pathological quoting."""
|
||||
try:
|
||||
return shlex.split(fragment, comments=True)
|
||||
except ValueError:
|
||||
return fragment.split()
|
||||
|
||||
|
||||
# ``--reload-exclude`` followed by ``=`` or whitespace, then a value that is a
|
||||
# single-quoted group, a double-quoted group, or a bare token. The quoted
|
||||
# alternatives match a *balanced* pair first, so serve.sh's surrounding
|
||||
# ``GATEWAY_EXTRA_FLAGS="..."`` closing quote is never swallowed into the value.
|
||||
_RELOAD_EXCLUDE = re.compile(r"""--reload-exclude[=\s]+('[^']*'|"[^"]*"|[^\s'"]+)""")
|
||||
|
||||
|
||||
def _reload_exclude_values(script: str) -> list[str]:
|
||||
"""Every ``--reload-exclude`` value, with surrounding quotes removed.
|
||||
|
||||
Handles both CLI forms (``--reload-exclude=<value>`` and the space form
|
||||
``--reload-exclude <value>``) and both shell quotings the launchers use:
|
||||
|
||||
* ``docker/dev-entrypoint.sh`` puts each flag on its own line.
|
||||
* ``scripts/serve.sh`` packs every flag into a single double-quoted
|
||||
``GATEWAY_EXTRA_FLAGS="... --reload-exclude='$X' ..."`` assignment. A
|
||||
whole-line ``shlex`` would collapse that assignment into one token and
|
||||
find no flags (this is what regressed serve.sh in CI); matching balanced
|
||||
inner quotes here keeps the assignment's closing ``"`` out of the value,
|
||||
so every exclude — including the last ``$BACKEND_RUNTIME_HOME`` — is seen.
|
||||
"""
|
||||
values: list[str] = []
|
||||
for line in _logical_lines(script):
|
||||
for raw in _RELOAD_EXCLUDE.findall(line):
|
||||
values.append(raw.strip("\"'"))
|
||||
return values
|
||||
|
||||
|
||||
def _mkdir_dirs(script: str) -> set[str]:
|
||||
"""Exact set of directories created by every ``mkdir`` command.
|
||||
|
||||
Tokenizes each ``mkdir`` argument list rather than substring-matching, so
|
||||
``/app/backend/sandbox`` is not falsely considered created by, say,
|
||||
``mkdir -p /app/backend/sandbox-other``.
|
||||
"""
|
||||
dirs: set[str] = set()
|
||||
for line in _logical_lines(script):
|
||||
match = re.search(r"\bmkdir\b(.*)", line)
|
||||
if not match:
|
||||
continue
|
||||
args = _CMD_BOUNDARY.split(match.group(1), maxsplit=1)[0]
|
||||
for token in _shlex(args):
|
||||
if token.startswith("-"): # skip flags such as -p
|
||||
continue
|
||||
dirs.add(token)
|
||||
return dirs
|
||||
|
||||
|
||||
@pytest.mark.skipif(
|
||||
sys.version_info >= (3, 13),
|
||||
reason="pathlib accepts absolute glob patterns on 3.13+, so the crash is 3.12-only",
|
||||
)
|
||||
def test_resolve_reload_patterns_crashes_on_missing_absolute_dir(tmp_path):
|
||||
"""The exact #3454 failure: absolute exclude + missing dir on Python 3.12."""
|
||||
missing = tmp_path / "sandbox" # absolute path that does not exist yet
|
||||
assert not missing.exists()
|
||||
with pytest.raises(NotImplementedError):
|
||||
resolve_reload_patterns([str(missing)], [])
|
||||
|
||||
|
||||
def test_resolve_reload_patterns_is_safe_once_dir_exists(tmp_path):
|
||||
"""The fix's mechanism: a pre-created dir takes uvicorn's is_dir() path."""
|
||||
sandbox = tmp_path / "sandbox"
|
||||
sandbox.mkdir()
|
||||
_patterns, directories = resolve_reload_patterns([str(sandbox)], [])
|
||||
resolved = {d.resolve() for d in directories}
|
||||
assert sandbox.resolve() in resolved
|
||||
|
||||
|
||||
@pytest.mark.parametrize("name", list(LAUNCHERS))
|
||||
def test_launcher_precreates_every_absolute_reload_exclude(name):
|
||||
"""Every absolute ``--reload-exclude`` dir must be created by ``mkdir`` first.
|
||||
|
||||
Relative glob patterns (``*.pyc``, ``__pycache__``) are safe and skipped;
|
||||
anything anchored at ``/`` or a shell variable is an absolute path that
|
||||
uvicorn would glob — and crash on — unless it already exists. Membership is
|
||||
an exact match against the parsed ``mkdir`` argument set (not a substring
|
||||
test), so a path-prefix can't produce a false pass.
|
||||
"""
|
||||
script = LAUNCHERS[name].read_text(encoding="utf-8")
|
||||
created = _mkdir_dirs(script)
|
||||
|
||||
absolute_excludes = [v for v in _reload_exclude_values(script) if v.startswith(("/", "$"))]
|
||||
assert absolute_excludes, f"{name}: expected at least one absolute reload-exclude"
|
||||
|
||||
for value in absolute_excludes:
|
||||
assert value in created, f"{name}: absolute reload-exclude {value!r} is never created via mkdir (created dirs: {sorted(created)})"
|
||||
|
||||
|
||||
@pytest.mark.parametrize("name", list(LAUNCHERS))
|
||||
def test_sandbox_mkdir_precedes_uvicorn_launch(name):
|
||||
"""The sandbox mkdir must come before the uvicorn launch, not just exist.
|
||||
|
||||
``_mkdir_dirs`` only proves the mkdir is present somewhere; this pins script
|
||||
order so a future edit can't move (or guard) the mkdir below the launch and
|
||||
silently reintroduce the #3454 crash on a fresh checkout. ``uv run uvicorn``
|
||||
matches the launch but not serve.sh's ``stop_all`` kill line.
|
||||
"""
|
||||
lines = LAUNCHERS[name].read_text(encoding="utf-8").splitlines()
|
||||
launch_idx = next((i for i, ln in enumerate(lines) if "uv run uvicorn" in ln), None)
|
||||
mkdir_idx = next((i for i, ln in enumerate(lines) if re.search(r"\bmkdir\b", ln) and "sandbox" in ln), None)
|
||||
|
||||
assert launch_idx is not None, f"{name}: could not locate the 'uv run uvicorn' launch line"
|
||||
assert mkdir_idx is not None, f"{name}: could not locate the sandbox mkdir line"
|
||||
assert mkdir_idx < launch_idx, f"{name}: sandbox mkdir (line {mkdir_idx + 1}) must precede uvicorn launch (line {launch_idx + 1})"
|
||||
|
||||
|
||||
def test_precreated_sandbox_artifacts_are_gitignored():
|
||||
"""backend/sandbox is runtime state — its contents must stay out of git so
|
||||
sandbox artifacts can't be accidentally committed (matches the reload-exclude
|
||||
intent). A content path is existence-independent, unlike the bare dir path.
|
||||
|
||||
Guards against the inaccurate "gitignored" claim by making it verifiable.
|
||||
"""
|
||||
probe = "backend/sandbox/__artifact_probe__"
|
||||
result = subprocess.run(
|
||||
["git", "-C", str(REPO_ROOT), "check-ignore", "-q", probe],
|
||||
capture_output=True,
|
||||
)
|
||||
if result.returncode == 128: # not a git checkout (e.g. packaged install)
|
||||
pytest.skip("not inside a git working tree")
|
||||
assert result.returncode == 0, "backend/sandbox/* should be gitignored (see backend/.gitignore '/sandbox/')"
|
||||
@ -64,12 +64,14 @@ if [ -n "$EXTRAS_FLAGS" ]; then
|
||||
echo "[startup] uv extras:$EXTRAS_FLAGS"
|
||||
fi
|
||||
|
||||
# Keep runtime-owned files out of uvicorn's reload watcher. The directory must
|
||||
# exist before uvicorn starts so watchfiles treats it as an excluded directory,
|
||||
# not as a plain glob pattern.
|
||||
# Keep runtime-owned files out of uvicorn's reload watcher. Each excluded path
|
||||
# must exist before uvicorn starts so watchfiles treats it as an excluded
|
||||
# directory, not as a plain glob pattern — on Python 3.12, globbing an absolute
|
||||
# pattern raises NotImplementedError and crashes startup (#3459 / #3454). That
|
||||
# means `sandbox` must be created here too, not just `.deer-flow`.
|
||||
: "${DEER_FLOW_HOME:=/app/backend/.deer-flow}"
|
||||
export DEER_FLOW_HOME
|
||||
mkdir -p "$DEER_FLOW_HOME" /app/backend/.deer-flow
|
||||
mkdir -p "$DEER_FLOW_HOME" /app/backend/.deer-flow /app/backend/sandbox
|
||||
|
||||
# ── Sync dependencies (with self-heal) ──────────────────────────────────────
|
||||
|
||||
|
||||
@ -297,7 +297,12 @@ if [ -z "$DEER_FLOW_HOME" ]; then
|
||||
export DEER_FLOW_HOME="$BACKEND_RUNTIME_HOME"
|
||||
fi
|
||||
|
||||
mkdir -p "$DEER_FLOW_HOME" "$BACKEND_RUNTIME_HOME"
|
||||
# `backend/sandbox` is excluded from uvicorn's reload watcher below. uvicorn only
|
||||
# excludes an absolute path directly when it already exists as a directory;
|
||||
# otherwise it globs the pattern, and Python 3.12's pathlib rejects absolute glob
|
||||
# patterns with NotImplementedError, crashing `make dev` on a fresh checkout
|
||||
# (#3459 / #3454). Creating it here keeps every absolute exclude on the is_dir path.
|
||||
mkdir -p "$DEER_FLOW_HOME" "$BACKEND_RUNTIME_HOME" "$REPO_ROOT/backend/sandbox"
|
||||
DEER_FLOW_HOME="$(cd "$DEER_FLOW_HOME" && pwd -P)"
|
||||
BACKEND_RUNTIME_HOME="$(cd "$BACKEND_RUNTIME_HOME" && pwd -P)"
|
||||
export DEER_FLOW_HOME
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user