deer-flow/docker/dev-entrypoint.sh
AochenShen99 93e3281cbf
fix(dev): create backend/sandbox before uvicorn reload-exclude (#3459) (#3460)
* fix(dev): create backend/sandbox before uvicorn reload-exclude (#3459)

#3426 switched the dev gateway's --reload-exclude patterns to absolute
paths. uvicorn only excludes an absolute path directly when it already
exists as a directory; otherwise it globs the pattern, and Python 3.12's
pathlib raises NotImplementedError("Non-relative patterns are unsupported")
for an absolute glob pattern. serve.sh mkdir'd the .deer-flow excludes but
not backend/sandbox, so `make dev` crashed on startup on a fresh checkout
under Python 3.12 (#3454). docker/dev-entrypoint.sh had the same latent gap.

Create backend/sandbox in both launchers so every absolute exclude stays on
uvicorn's is_dir() short-circuit. Add a regression test that pins the uvicorn
mechanism (crash on missing dir, safe once created) and enforces that every
absolute --reload-exclude is mkdir'd before launch.

Closes #3459

* test(dev): harden reload-exclude invariant parser against false pass/negatives

The launcher invariant test parsed shell with a "mkdir -p" line filter and a
substring membership check. Two latent gaps (sub-threshold for this fix, but
this code guards a user-facing startup path, so close them):

- A `\`-continued multi-line `mkdir` would drop arguments on continuation
  lines, silently weakening coverage.
- Substring membership could false-pass when an exclude is a path-prefix of a
  different created dir (e.g. `/app/backend/sandbox` "found" inside
  `/app/backend/sandbox-other`).

Fold line-continuations, drop comments, and shlex-tokenize each `mkdir`
argument list into an exact set (quotes stripped, `$VAR` literal); assert exact
set membership. Same shlex handling for `--reload-exclude` values. Verified the
parser still flags the pre-fix missing `backend/sandbox` (RED preserved) and no
longer false-passes on a path-prefix.

* fix(dev): gitignore backend/sandbox runtime dir + pin mkdir-before-launch

Address two review findings on the #3459 fix:

- backend/sandbox was described as "gitignored runtime state" but no ignore
  rule actually matched it. Add an anchored `/sandbox/` to backend/.gitignore
  (anchored so it does NOT shadow the source package
  backend/packages/harness/deerflow/sandbox/) so sandbox artifacts created at
  runtime can't pollute the working tree or be committed by accident. New test
  asserts content under backend/sandbox is ignored, making the claim verifiable.

- The launcher invariant test only proved the sandbox mkdir exists somewhere,
  not that it runs before uvicorn starts. Add an order test (sandbox mkdir line
  must precede the `uv run uvicorn` launch) so a future edit can't move the
  mkdir below the launch and silently reintroduce the crash.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* test(dev): fix reload-exclude parser to handle serve.sh's quoted flag bundle

The previous autofix tokenized each whole line with shlex, but serve.sh packs
every flag into a single double-quoted `GATEWAY_EXTRA_FLAGS="..."` assignment.
shlex collapses that into one token, so no `--reload-exclude` flag is found and
`test_launcher_precreates_every_absolute_reload_exclude[scripts/serve.sh]`
failed CI with "expected at least one absolute reload-exclude".

Parse `--reload-exclude` with a regex that matches a balanced single/double
quoted group or a bare token, so the assignment's surrounding `"` is never
swallowed into the value. This recovers all three serve.sh excludes (the prior
regex also silently dropped the last `$BACKEND_RUNTIME_HOME` because the
adjacent closing quote broke shlex) while still covering dev-entrypoint.sh and
the space-separated `--reload-exclude <value>` form.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-09 15:29:40 +08:00

100 lines
4.3 KiB
Bash
Executable File

#!/usr/bin/env sh
#
# DeerFlow gateway dev entrypoint — runs inside the docker-compose-dev gateway
# container. Extracted from docker/docker-compose-dev.yaml's inline `command:`
# (PR #2767, addressing review on Issue #2754).
#
# Responsibilities:
# 1. Resolve `--extra X` flags from UV_EXTRAS (comma- or whitespace-separated,
# mirroring scripts/detect_uv_extras.py for parity with local `make dev`).
# 2. Validate each extra against [A-Za-z][A-Za-z0-9_-]* so a stray shell
# metacharacter in `.env` cannot reach `uv sync`.
# 3. `uv sync --all-packages` so workspace member extras (deerflow-harness's
# postgres extra in particular) are installed — see PR #2584.
# 4. Self-heal: if the first sync fails, recreate .venv and retry once.
# 5. Hand off to uvicorn with reload, replacing this shell so uvicorn becomes
# PID 1 inside the container.
#
# Anchored at /bin/sh (not bash) since alpine-based base images may not ship
# bash. Uses POSIX-only constructs throughout.
set -e
# `--print-extras` is a dry-run hook: parse + validate UV_EXTRAS, print the
# resulting `--extra X` flags to stdout, and exit. Used by the unit test in
# backend/tests/test_dev_entrypoint.py and useful for ad-hoc debugging.
PRINT_EXTRAS_ONLY=0
if [ "${1:-}" = "--print-extras" ]; then
PRINT_EXTRAS_ONLY=1
fi
# Mirror the legacy command's behavior: redirect both stdout and stderr to the
# host-mounted log file (../logs/gateway.log → /app/logs/gateway.log). Skip
# the redirect under --print-extras so the test runner can capture stdout.
if [ "$PRINT_EXTRAS_ONLY" = "0" ]; then
exec >/app/logs/gateway.log 2>&1
fi
# ── Resolve extras ──────────────────────────────────────────────────────────
EXTRAS_FLAGS=""
if [ -n "${UV_EXTRAS:-}" ]; then
# Normalize comma → space, then split on whitespace via the unquoted `for`.
for raw in $(printf '%s' "$UV_EXTRAS" | tr ',' ' '); do
[ -z "$raw" ] && continue
# Reject anything that does not look like an identifier.
# Two patterns: leading non-letter, or any non-[A-Za-z0-9_-] character.
case "$raw" in
[!A-Za-z]* | *[!A-Za-z0-9_-]*)
echo "[startup] UV_EXTRAS entry '$raw' is invalid (must match [A-Za-z][A-Za-z0-9_-]*) — aborting" >&2
exit 1
;;
esac
EXTRAS_FLAGS="$EXTRAS_FLAGS --extra $raw"
done
fi
if [ "$PRINT_EXTRAS_ONLY" = "1" ]; then
# Trim leading space for tidier output, then exit.
printf '%s\n' "${EXTRAS_FLAGS# }"
exit 0
fi
if [ -n "$EXTRAS_FLAGS" ]; then
echo "[startup] uv extras:$EXTRAS_FLAGS"
fi
# Keep runtime-owned files out of uvicorn's reload watcher. Each excluded path
# must exist before uvicorn starts so watchfiles treats it as an excluded
# directory, not as a plain glob pattern — on Python 3.12, globbing an absolute
# pattern raises NotImplementedError and crashes startup (#3459 / #3454). That
# means `sandbox` must be created here too, not just `.deer-flow`.
: "${DEER_FLOW_HOME:=/app/backend/.deer-flow}"
export DEER_FLOW_HOME
mkdir -p "$DEER_FLOW_HOME" /app/backend/.deer-flow /app/backend/sandbox
# ── Sync dependencies (with self-heal) ──────────────────────────────────────
cd /app/backend
# `--all-packages` propagates extras into workspace members (PR #2584).
# `$EXTRAS_FLAGS` intentionally unquoted so each `--extra X` becomes its own arg.
# shellcheck disable=SC2086 # word-splitting is intentional here
if ! uv sync --all-packages $EXTRAS_FLAGS; then
echo "[startup] uv sync failed; recreating .venv and retrying once"
uv venv --allow-existing .venv
# shellcheck disable=SC2086
uv sync --all-packages $EXTRAS_FLAGS
fi
# ── Hand off to uvicorn ─────────────────────────────────────────────────────
PYTHONPATH=. exec uv run uvicorn app.gateway.app:app \
--host 0.0.0.0 --port 8001 \
--reload \
--reload-include='*.yaml' \
--reload-include='.env' \
--reload-exclude=/app/backend/sandbox \
--reload-exclude="$DEER_FLOW_HOME" \
--reload-exclude=/app/backend/.deer-flow