The exception handler in JinaClient.crawl used logger.exception, which
emits an ERROR-level record with the full httpx/httpcore/anyio traceback
for every transient network failure (timeout, connection refused). Other
search/crawl providers in the project log the same class of recoverable
failures as a single line. One offline/slow-network session could produce
dozens of multi-frame ERROR stack traces, drowning out real problems.
Switch to logger.warning with a concise message that includes the
exception type and its str, matching the style used elsewhere for
recoverable transient failures (aio_sandbox, ddg, etc.). The exception
type now also surfaces into the returned "Error: ..." string so callers
retain diagnostic signal.
Adds a regression test that asserts the log record is WARNING, carries
no exc_info, and includes the exception class name.
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: wrap blocking readability call with asyncio.to_thread in web_fetch
The readability extractor internally spawns a Node.js subprocess via
readabilipy, which blocks the async event loop and causes a
BlockingError when web_fetch is invoked inside LangGraph's async
runtime.
Wrap the synchronous extract_article call with asyncio.to_thread to
offload it to a thread pool, unblocking the event loop.
Note: community/infoquest/tools.py has the same latent issue and
should be addressed in a follow-up PR.
Closes#2152
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: verify web_fetch offloads extraction via asyncio.to_thread
Add a regression test that monkeypatches asyncio.to_thread to confirm
readability extraction is offloaded to a worker thread, preventing
future refactors from reintroducing the blocking call.
Addresses Copilot review feedback on #2157.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* refactor: replace sync requests with async httpx in Jina AI client
Replace synchronous `requests.post()` with `httpx.AsyncClient` in
JinaClient.crawl() and make web_fetch_tool async. This is part of the
planned async concurrency optimization for the agent hot path
(see docs/TODO.md).
* fix: address Copilot review feedback on async Jina client
- Short-circuit error strings in web_fetch_tool before passing to
ReadabilityExtractor, preventing misleading extraction results
- Log missing JINA_API_KEY warning only once per process to reduce
noise under concurrent async fetching
- Use logger.exception instead of logger.error in crawl exception
handler to preserve stack traces for debugging
- Add async web_fetch_tool tests and warn-once coverage
* fix: mock get_app_config in web_fetch_tool tests for CI
The web_fetch_tool tests failed in CI because get_app_config requires
a config.yaml file that isn't present in the test environment. Mock
the config loader to remove the filesystem dependency.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>