greatmengqi 3e6a34297d refactor(config): eliminate global mutable state — explicit parameter passing on top of main
Squashes 25 PR commits onto current main. AppConfig becomes a pure value
object with no ambient lookup. Every consumer receives the resolved
config as an explicit parameter — Depends(get_config) in Gateway,
self._app_config in DeerFlowClient, runtime.context.app_config in agent
runs, AppConfig.from_file() at the LangGraph Server registration
boundary.

Phase 1 — frozen data + typed context

- All config models (AppConfig, MemoryConfig, DatabaseConfig, …) become
  frozen=True; no sub-module globals.
- AppConfig.from_file() is pure (no side-effect singleton loaders).
- Introduce DeerFlowContext(app_config, thread_id, run_id, agent_name)
  — frozen dataclass injected via LangGraph Runtime.
- Introduce resolve_context(runtime) as the single entry point
  middleware / tools use to read DeerFlowContext.

Phase 2 — pure explicit parameter passing

- Gateway: app.state.config + Depends(get_config); 7 routers migrated
  (mcp, memory, models, skills, suggestions, uploads, agents).
- DeerFlowClient: __init__(config=...) captures config locally.
- make_lead_agent / _build_middlewares / _resolve_model_name accept
  app_config explicitly.
- RunContext.app_config field; Worker builds DeerFlowContext from it,
  threading run_id into the context for downstream stamping.
- Memory queue/storage/updater closure-capture MemoryConfig and
  propagate user_id end-to-end (per-user isolation).
- Sandbox/skills/community/factories/tools thread app_config.
- resolve_context() rejects non-typed runtime.context.
- Test suite migrated off AppConfig.current() monkey-patches.
- AppConfig.current() classmethod deleted.

Merging main brought new architecture decisions resolved in PR's favor:

- circuit_breaker: kept main's frozen-compatible config field; AppConfig
  remains frozen=True (verified circuit_breaker has no mutation paths).
- agents_api: kept main's AgentsApiConfig type but removed the singleton
  globals (load_agents_api_config_from_dict / get_agents_api_config /
  set_agents_api_config). 8 routes in agents.py now read via
  Depends(get_config).
- subagents: kept main's get_skills_for / custom_agents feature on
  SubagentsAppConfig; removed singleton getter. registry.py now reads
  app_config.subagents directly.
- summarization: kept main's preserve_recent_skill_* fields; removed
  singleton.
- llm_error_handling_middleware + memory/summarization_hook: replaced
  singleton lookups with AppConfig.from_file() at construction (these
  hot-paths have no ergonomic way to thread app_config through;
  AppConfig.from_file is a pure load).
- worker.py + thread_data_middleware.py: DeerFlowContext.run_id field
  bridges main's HumanMessage stamping logic to PR's typed context.

Trade-offs (follow-up work):

- main's #2138 (async memory updater) reverted to PR's sync
  implementation. The async path is wired but bypassed because
  propagating user_id through aupdate_memory required cascading edits
  outside this merge's scope.
- tests/test_subagent_skills_config.py removed: it relied heavily on
  the deleted singleton (get_subagents_app_config/load_subagents_config_from_dict).
  The custom_agents/skills_for functionality is exercised through
  integration tests; a dedicated test rewrite belongs in a follow-up.

Verification: backend test suite — 2560 passed, 4 skipped, 84 failures.
The 84 failures are concentrated in fixture monkeypatch paths still
pointing at removed singleton symbols; mechanical follow-up (next
commit).
2026-04-26 21:45:02 +08:00

171 lines
7.1 KiB
Python

import logging
from langchain.tools import BaseTool
from deerflow.config.app_config import AppConfig
from deerflow.reflection import resolve_variable
from deerflow.sandbox.security import is_host_bash_allowed
from deerflow.tools.builtins import ask_clarification_tool, present_file_tool, task_tool, view_image_tool
from deerflow.tools.builtins.tool_search import reset_deferred_registry
logger = logging.getLogger(__name__)
BUILTIN_TOOLS = [
present_file_tool,
ask_clarification_tool,
]
SUBAGENT_TOOLS = [
task_tool,
# task_status_tool is no longer exposed to LLM (backend handles polling internally)
]
def _is_host_bash_tool(tool: object) -> bool:
"""Return True if the tool config represents a host-bash execution surface."""
group = getattr(tool, "group", None)
use = getattr(tool, "use", None)
if group == "bash":
return True
if use == "deerflow.sandbox.tools:bash_tool":
return True
return False
def get_available_tools(
groups: list[str] | None = None,
include_mcp: bool = True,
model_name: str | None = None,
subagent_enabled: bool = False,
*,
app_config: AppConfig,
) -> list[BaseTool]:
"""Get all available tools from config.
Note: MCP tools should be initialized at application startup using
`initialize_mcp_tools()` from deerflow.mcp module.
Args:
groups: Optional list of tool groups to filter by.
include_mcp: Whether to include tools from MCP servers (default: True).
model_name: Optional model name to determine if vision tools should be included.
subagent_enabled: Whether to include subagent tools (task, task_status).
app_config: Application config — required.
Returns:
List of available tools.
"""
config = app_config
tool_configs = [tool for tool in config.tools if groups is None or tool.group in groups]
# Do not expose host bash by default when LocalSandboxProvider is active.
if not is_host_bash_allowed(config):
tool_configs = [tool for tool in tool_configs if not _is_host_bash_tool(tool)]
loaded_tools_raw = [(cfg, resolve_variable(cfg.use, BaseTool)) for cfg in tool_configs]
# Warn when the config ``name`` field and the tool object's ``.name``
# attribute diverge — this mismatch is the root cause of issue #1803 where
# the LLM receives one name in its tool schema but the runtime router
# recognises a different name, producing "not a valid tool" errors.
for cfg, loaded in loaded_tools_raw:
if cfg.name != loaded.name:
logger.warning(
"Tool name mismatch: config name %r does not match tool .name %r (use: %s). The tool's own .name will be used for binding.",
cfg.name,
loaded.name,
cfg.use,
)
loaded_tools = [t for _, t in loaded_tools_raw]
# Conditionally add tools based on config
builtin_tools = BUILTIN_TOOLS.copy()
skill_evolution_config = getattr(config, "skill_evolution", None)
if getattr(skill_evolution_config, "enabled", False):
from deerflow.tools.skill_manage_tool import skill_manage_tool
builtin_tools.append(skill_manage_tool)
# Add subagent tools only if enabled via runtime parameter
if subagent_enabled:
builtin_tools.extend(SUBAGENT_TOOLS)
logger.info("Including subagent tools (task)")
# If no model_name specified, use the first model (default)
if model_name is None and config.models:
model_name = config.models[0].name
# Add view_image_tool only if the model supports vision
model_config = config.get_model_config(model_name) if model_name else None
if model_config is not None and model_config.supports_vision:
builtin_tools.append(view_image_tool)
logger.info(f"Including view_image_tool for model '{model_name}' (supports_vision=True)")
# Get cached MCP tools if enabled
# NOTE: We use ExtensionsConfig.from_file() instead of config.extensions
# to always read the latest configuration from disk. This ensures that changes
# made through the Gateway API (which runs in a separate process) are immediately
# reflected when loading MCP tools.
mcp_tools = []
# Reset deferred registry upfront to prevent stale state from previous calls
reset_deferred_registry()
if include_mcp:
try:
from deerflow.config.extensions_config import ExtensionsConfig
from deerflow.mcp.cache import get_cached_mcp_tools
extensions_config = ExtensionsConfig.from_file()
if extensions_config.get_enabled_mcp_servers():
mcp_tools = get_cached_mcp_tools()
if mcp_tools:
logger.info(f"Using {len(mcp_tools)} cached MCP tool(s)")
# When tool_search is enabled, register MCP tools in the
# deferred registry and add tool_search to builtin tools.
if config.tool_search.enabled:
from deerflow.tools.builtins.tool_search import DeferredToolRegistry, set_deferred_registry
from deerflow.tools.builtins.tool_search import tool_search as tool_search_tool
registry = DeferredToolRegistry()
for t in mcp_tools:
registry.register(t)
set_deferred_registry(registry)
builtin_tools.append(tool_search_tool)
logger.info(f"Tool search active: {len(mcp_tools)} tools deferred")
except ImportError:
logger.warning("MCP module not available. Install 'langchain-mcp-adapters' package to enable MCP tools.")
except Exception as e:
logger.error(f"Failed to get cached MCP tools: {e}")
# Add invoke_acp_agent tool if any ACP agents are configured
acp_tools: list[BaseTool] = []
try:
from deerflow.tools.builtins.invoke_acp_agent_tool import build_invoke_acp_agent_tool
acp_agents = config.acp_agents
if acp_agents:
acp_tools.append(build_invoke_acp_agent_tool(acp_agents))
logger.info(f"Including invoke_acp_agent tool ({len(acp_agents)} agent(s): {list(acp_agents.keys())})")
except Exception as e:
logger.warning(f"Failed to load ACP tool: {e}")
logger.info(f"Total tools loaded: {len(loaded_tools)}, built-in tools: {len(builtin_tools)}, MCP tools: {len(mcp_tools)}, ACP tools: {len(acp_tools)}")
# Deduplicate by tool name — config-loaded tools take priority, followed by
# built-ins, MCP tools, and ACP tools. Duplicate names cause the LLM to
# receive ambiguous or concatenated function schemas (issue #1803).
all_tools = loaded_tools + builtin_tools + mcp_tools + acp_tools
seen_names: set[str] = set()
unique_tools: list[BaseTool] = []
for t in all_tools:
if t.name not in seen_names:
unique_tools.append(t)
seen_names.add(t.name)
else:
logger.warning(
"Duplicate tool name %r detected and skipped — check your config.yaml and MCP server registrations (issue #1803).",
t.name,
)
return unique_tools