5 Commits

Author SHA1 Message Date
KiteEater
17447fccbe
fix(runtime): make rollback restore checkpoint supersede newer checkpoints (#2582)
* Restore rollback checkpoints with fresh ids

* Tighten rollback checkpoint tests and imports

* Update test_run_worker_rollback.py

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-02 11:25:45 +08:00
greatmengqi
8ba01dfd83
refactor: thread app_config through lead and subagent task path (#2666)
* refactor: thread app config through lead prompt

* fix: honor explicit app config across runtime paths

* style: format subagent executor tests

* fix: thread resolved app config and guard subagents-only fallback

Address two PR review findings:

1. _create_summarization_middleware passed the original (possibly None)
   app_config into create_chat_model, forcing the model factory back to
   ambient get_app_config() and risking config drift between the
   middleware's resolved view and the model's view. Pass the resolved
   AppConfig instance through end-to-end.

2. get_available_subagent_names accepted Any-typed config and forwarded
   it to is_host_bash_allowed, which reads ``.sandbox``. A
   SubagentsAppConfig (also accepted upstream as a sum-type input) has
   no ``.sandbox`` attribute and would be silently treated as "no
   sandbox configured", incorrectly disabling the bash subagent. Guard
   on hasattr and fall back to ambient lookup otherwise.

Adds regression tests for both paths.

* chore: simplify hasattr guard and tighten regression tests

- Collapse if/else into ternary in get_available_subagent_names; hasattr(None, ...) is False so the explicit None check was redundant.
- Drop comments that narrate the change rather than explain non-obvious WHY (test names already convey intent).
- Replace stringly-typed sentinel "no-arg" in regression test with direct args tuple comparison.

---------

Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
2026-05-02 06:37:49 +08:00
yangzheli
78633c69ac
fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2679)
* fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2677)

When creating a custom agent via the web UI, SOUL.md was always written
to the global base_dir/SOUL.md instead of agents/<name>/SOUL.md.

Root cause: the bootstrap flow sends agent_name via body.context, but
two layers were broken:

1. services.py only forwarded body.context keys into config["configurable"];
   config["context"] was never populated.
2. worker.py constructed the parent Runtime with a hard-coded
   {thread_id, run_id} context, ignoring config["context"] entirely.

After the langgraph >= 1.1.9 bump (#98a5b34f), ToolRuntime.context no
longer falls back to configurable, so setup_agent's
runtime.context.get("agent_name") returned None and the tool's silent
agent_name=None -> base_dir fallback kicked in, overwriting the global
SOUL.md.

Fix:
- services.py: extract merge_run_context_overrides() and write the
  whitelisted context keys into both configurable (legacy readers) and
  context (langgraph 1.1+ ToolRuntime consumers).
- worker.py: extract _build_runtime_context() and merge config["context"]
  into the Runtime's context (without letting callers override
  thread_id/run_id).

The base_dir fallback in setup_agent_tool.py is left in place because
the IM /bootstrap channel command depends on it. That code path can
be tightened in a follow-up.

Adds regression tests covering both helpers.

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-05-01 16:00:11 +08:00
greatmengqi
e82940c03d
refactor: thread release config through lead path (#2612)
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
2026-04-28 14:53:18 +08:00
luo jiyin
35f141fc48
feat: implement full checkpoint rollback on user cancellation (#1867)
* feat: implement full checkpoint rollback on user cancellation

- Capture pre-run checkpoint snapshot including checkpoint state, metadata, and pending_writes
- Add _rollback_to_pre_run_checkpoint() function to restore thread state
- Implement _call_checkpointer_method() helper to support both async and sync checkpointer methods
- Rollback now properly restores checkpoint, metadata, channel_versions, and pending_writes
- Remove obsolete TODO comment (Phase 2) as rollback is now complete

This resolves the TODO(Phase 2) comment and enables full thread state
restoration when a run is cancelled by the user.

* fix: address rollback review feedback

* fix: strengthen checkpoint rollback validation and error handling

- Validate restored_config structure and checkpoint_id before use
- Raise RuntimeError on malformed pending_writes instead of silent skip
- Normalize None checkpoint_ns to empty string instead of "None"
- Move delete_thread to only execute when pre_run_snapshot is None
- Add docstring noting non-atomic rollback as known limitation

This addresses review feedback on PR #1867 regarding data integrity
in the checkpoint rollback implementation.

* test: add comprehensive coverage for checkpoint rollback edge cases

- test_rollback_restores_snapshot_without_deleting_thread
- test_rollback_deletes_thread_when_no_snapshot_exists
- test_rollback_raises_when_restore_config_has_no_checkpoint_id
- test_rollback_normalizes_none_checkpoint_ns_to_root_namespace
- test_rollback_raises_on_malformed_pending_write_not_a_tuple
- test_rollback_raises_on_malformed_pending_write_non_string_channel
- test_rollback_propagates_aput_writes_failure

Covers all scenarios from PR #1867 review feedback.

* test: format rollback worker tests
2026-04-09 17:56:36 +08:00