refactor: update Mem0Memory to use independent user/agent scoping and exclude assistant output

This commit is contained in:
kartik-mem0 2026-04-06 13:16:46 +05:30
parent adc00f4faf
commit 531741545e
5 changed files with 159 additions and 61 deletions

View File

@ -111,9 +111,9 @@ This schema lets multimodal outputs flow into Memory/Thinking modules without ex
### 5.4 Mem0Memory ### 5.4 Mem0Memory
- **Config** Requires `api_key` (from [app.mem0.ai](https://app.mem0.ai)). Optional `user_id`, `agent_id`, `org_id`, `project_id` for scoping. - **Config** Requires `api_key` (from [app.mem0.ai](https://app.mem0.ai)). Optional `user_id`, `agent_id`, `org_id`, `project_id` for scoping.
- **Important**: `user_id` and `agent_id` are mutually exclusive in Mem0 API calls. If both are configured, two separate searches are made and results merged. For writes, `agent_id` takes precedence. Agent-generated content is stored with `role: "assistant"`. - **Entity scoping**: `user_id` and `agent_id` are independent dimensions — both can be included simultaneously in `add()` and `search()` calls. When both are configured, retrieval uses an OR filter (`{"OR": [{"user_id": ...}, {"agent_id": ...}]}`) to search across both scopes. Writes include both IDs when available.
- **Retrieval** Uses Mem0's server-side semantic search. Supports `top_k` and `similarity_threshold` via `MemoryAttachmentConfig`. - **Retrieval** Uses Mem0's server-side semantic search. Supports `top_k` and `similarity_threshold` via `MemoryAttachmentConfig`.
- **Write** `update()` sends conversation messages to Mem0 via the SDK. Agent outputs use `role: "assistant"`, user inputs use `role: "user"`. - **Write** `update()` sends only user input to Mem0 via the SDK (as `role: "user"` messages). Assistant output is excluded to prevent noise memories from the LLM's responses being extracted as facts.
- **Persistence** Fully cloud-managed. `load()` and `save()` are no-ops. Memories persist across runs and sessions automatically. - **Persistence** Fully cloud-managed. `load()` and `save()` are no-ops. Memories persist across runs and sessions automatically.
- **Dependencies** Requires `mem0ai` package (`pip install mem0ai`). - **Dependencies** Requires `mem0ai` package (`pip install mem0ai`).

View File

@ -32,12 +32,23 @@ memory:
model: text-embedding-3-small model: text-embedding-3-small
``` ```
### Mem0 Memory 配置
```yaml
memory:
- name: agent_memory
type: mem0
config:
api_key: ${MEM0_API_KEY}
agent_id: my-agent
```
## 3. 内置 Memory Store 对比 ## 3. 内置 Memory Store 对比
| 类型 | 路径 | 特点 | 适用场景 | | 类型 | 路径 | 特点 | 适用场景 |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `simple` | `node/agent/memory/simple_memory.py` | 运行结束后可选择落盘JSON使用向量搜索FAISS+语义重打分;支持读写 | 小规模对话记忆、快速原型 | | `simple` | `node/agent/memory/simple_memory.py` | 运行结束后可选择落盘JSON使用向量搜索FAISS+语义重打分;支持读写 | 小规模对话记忆、快速原型 |
| `file` | `node/agent/memory/file_memory.py` | 将指定文件/目录切片为向量索引,只读;自动检测文件变更并更新索引 | 知识库、文档问答 | | `file` | `node/agent/memory/file_memory.py` | 将指定文件/目录切片为向量索引,只读;自动检测文件变更并更新索引 | 知识库、文档问答 |
| `blackboard` | `node/agent/memory/blackboard_memory.py` | 轻量附加日志,按时间/条数裁剪;不依赖向量检索 | 简易广播板、流水线调试 | | `blackboard` | `node/agent/memory/blackboard_memory.py` | 轻量附加日志,按时间/条数裁剪;不依赖向量检索 | 简易广播板、流水线调试 |
| `mem0` | `node/agent/memory/mem0_memory.py` | 由 Mem0 云端托管;支持语义搜索 + 图关系;无需本地 embedding 或持久化。需安装 `mem0ai` 包。 | 生产级记忆、跨会话持久化、多 Agent 记忆共享 |
> 所有内置 store 都会在 `register_memory_store()` 中注册,摘要可通过 `MemoryStoreConfig.field_specs()` 在 UI 中展示。 > 所有内置 store 都会在 `register_memory_store()` 中注册,摘要可通过 `MemoryStoreConfig.field_specs()` 在 UI 中展示。
@ -100,6 +111,14 @@ nodes:
- **检索**:直接返回最近 `top_k` 条,按时间排序。 - **检索**:直接返回最近 `top_k` 条,按时间排序。
- **写入**`update()` 以 append 方式存储最新的输入/输出 snapshot文本 + 块 + 附件信息),不生成向量,适合事件流或人工批注。 - **写入**`update()` 以 append 方式存储最新的输入/输出 snapshot文本 + 块 + 附件信息),不生成向量,适合事件流或人工批注。
### 5.4 Mem0Memory
- **配置**:必须提供 `api_key`(从 [app.mem0.ai](https://app.mem0.ai) 获取)。可选参数 `user_id``agent_id``org_id``project_id` 用于记忆范围控制。
- **实体范围**`user_id``agent_id` 是独立的维度,可在 `add()``search()` 调用中同时使用。若同时配置,检索时使用 OR 过滤器(`{"OR": [{"user_id": ...}, {"agent_id": ...}]}`)在一次 API 调用中搜索两个范围。写入时两个 ID 同时包含。
- **检索**:使用 Mem0 服务端语义搜索。通过 `MemoryAttachmentConfig` 中的 `top_k``similarity_threshold` 控制。
- **写入**`update()` 仅将用户输入(`role: "user"` 消息)发送至 Mem0。不包含 Agent 输出,以避免 LLM 响应中的内容被提取为噪声记忆。
- **持久化**:完全由云端托管。`load()``save()` 为空操作no-op。记忆在不同运行和会话间自动持久化。
- **依赖**:需安装 `mem0ai` 包(`pip install mem0ai`)。
## 6. EmbeddingConfig 提示 ## 6. EmbeddingConfig 提示
- 字段:`provider`, `model`, `api_key`, `base_url`, `params` - 字段:`provider`, `model`, `api_key`, `base_url`, `params`
- `provider=openai` 时使用 `openai.OpenAI` 客户端,可配置 `base_url` 以兼容兼容层。 - `provider=openai` 时使用 `openai.OpenAI` 客户端,可配置 `base_url` 以兼容兼容层。

View File

@ -1,6 +1,7 @@
"""Mem0 managed memory store implementation.""" """Mem0 managed memory store implementation."""
import logging import logging
import re
import time import time
import uuid import uuid
from typing import Any, Dict, List from typing import Any, Dict, List
@ -45,8 +46,8 @@ class Mem0Memory(MemoryBase):
Important API constraints: Important API constraints:
- Agent memories use role="assistant" + agent_id - Agent memories use role="assistant" + agent_id
- user_id and agent_id are stored as separate records in Mem0; - user_id and agent_id are independent scoping dimensions and can be
if both are configured, an OR filter is used to search across both scopes. combined in both add() and search() calls.
- search() uses filters dict; add() uses top-level kwargs. - search() uses filters dict; add() uses top-level kwargs.
- SDK returns {"memories": [...]} from search. - SDK returns {"memories": [...]} from search.
""" """
@ -151,53 +152,68 @@ class Mem0Memory(MemoryBase):
# -------- Update -------- # -------- Update --------
def update(self, payload: MemoryWritePayload) -> None: def update(self, payload: MemoryWritePayload) -> None:
"""Store a memory in Mem0. """Store user input as a memory in Mem0.
Uses role="assistant" + agent_id for agent-generated memories, Only user input is sent for extraction. Assistant output is excluded
and role="user" + user_id for user-scoped memories. to prevent noise memories from the LLM's responses.
""" """
snapshot = payload.output_snapshot or payload.input_snapshot raw_input = payload.inputs_text or ""
if not snapshot or not snapshot.text.strip(): if not raw_input.strip():
return return
messages = self._build_messages(payload) messages = self._build_messages(payload)
if not messages: if not messages:
return return
add_kwargs: Dict[str, Any] = {"messages": messages} add_kwargs: Dict[str, Any] = {
"messages": messages,
"infer": True,
}
# Determine scoping: agent_id takes precedence for agent-generated content # Include both user_id and agent_id when available — they are
# independent scoping dimensions in Mem0, not mutually exclusive.
if self.agent_id: if self.agent_id:
add_kwargs["agent_id"] = self.agent_id add_kwargs["agent_id"] = self.agent_id
elif self.user_id: if self.user_id:
add_kwargs["user_id"] = self.user_id add_kwargs["user_id"] = self.user_id
else:
# Default: use agent_role as agent_id # Fallback when neither is configured
if "agent_id" not in add_kwargs and "user_id" not in add_kwargs:
add_kwargs["agent_id"] = payload.agent_role add_kwargs["agent_id"] = payload.agent_role
try: try:
self.client.add(**add_kwargs) result = self.client.add(**add_kwargs)
logger.info("Mem0 add result: %s", result)
except Exception as e: except Exception as e:
logger.error("Mem0 add failed: %s", e) logger.error("Mem0 add failed: %s", e)
@staticmethod
def _clean_pipeline_text(text: str) -> str:
"""Strip ChatDev pipeline headers so Mem0 sees clean conversational text.
The executor wraps each input with '=== INPUT FROM <source> (<role>) ==='
headers. Mem0's extraction LLM treats these as system metadata and skips
them, resulting in zero memories extracted.
"""
cleaned = re.sub(r"===\s*INPUT FROM\s+\S+\s*\(\w+\)\s*===\s*", "", text)
return cleaned.strip()
def _build_messages(self, payload: MemoryWritePayload) -> List[Dict[str, str]]: def _build_messages(self, payload: MemoryWritePayload) -> List[Dict[str, str]]:
"""Build Mem0-compatible message list from write payload. """Build Mem0-compatible message list from write payload.
Agent-generated content uses role="assistant". Only sends user input to Mem0. Assistant output is excluded because
User input uses role="user". Mem0's extraction LLM processes ALL messages and extracts facts from
assistant responses too, creating noise memories like "Assistant says
Python is fascinating" instead of actual user facts.
""" """
messages: List[Dict[str, str]] = [] messages: List[Dict[str, str]] = []
if payload.inputs_text and payload.inputs_text.strip(): raw_input = payload.inputs_text or ""
clean_input = self._clean_pipeline_text(raw_input)
if clean_input:
messages.append({ messages.append({
"role": "user", "role": "user",
"content": payload.inputs_text.strip(), "content": clean_input,
})
if payload.output_snapshot and payload.output_snapshot.text.strip():
messages.append({
"role": "assistant",
"content": payload.output_snapshot.text.strip(),
}) })
return messages return messages

View File

@ -197,8 +197,8 @@ class TestMem0MemoryRetrieve:
class TestMem0MemoryUpdate: class TestMem0MemoryUpdate:
def test_update_with_agent_id_uses_assistant_role(self): def test_update_sends_only_user_input(self):
"""Agent-scoped update sends role=assistant messages with agent_id.""" """Update sends only user input, not assistant output, to prevent noise."""
memory, client = _make_mem0_memory(agent_id="agent-1") memory, client = _make_mem0_memory(agent_id="agent-1")
client.add.return_value = [{"id": "new", "event": "ADD"}] client.add.return_value = [{"id": "new", "event": "ADD"}]
@ -215,8 +215,26 @@ class TestMem0MemoryUpdate:
assert call_kwargs["agent_id"] == "agent-1" assert call_kwargs["agent_id"] == "agent-1"
assert "user_id" not in call_kwargs assert "user_id" not in call_kwargs
messages = call_kwargs["messages"] messages = call_kwargs["messages"]
assert len(messages) == 1
assert messages[0]["role"] == "user" assert messages[0]["role"] == "user"
assert messages[1]["role"] == "assistant" assert messages[0]["content"] == "Write about AI"
def test_update_does_not_send_async_mode(self):
"""Update does not send deprecated async_mode parameter."""
memory, client = _make_mem0_memory(agent_id="agent-1")
client.add.return_value = []
payload = MemoryWritePayload(
agent_role="writer",
inputs_text="test",
input_snapshot=None,
output_snapshot=MemoryContentSnapshot(text="output"),
)
memory.update(payload)
call_kwargs = client.add.call_args[1]
assert "async_mode" not in call_kwargs
assert call_kwargs["infer"] is True
def test_update_with_user_id(self): def test_update_with_user_id(self):
"""User-scoped update uses user_id, not agent_id.""" """User-scoped update uses user_id, not agent_id."""
@ -227,7 +245,7 @@ class TestMem0MemoryUpdate:
agent_role="writer", agent_role="writer",
inputs_text="I prefer Python", inputs_text="I prefer Python",
input_snapshot=None, input_snapshot=None,
output_snapshot=MemoryContentSnapshot(text="Noted your preference"), output_snapshot=None,
) )
memory.update(payload) memory.update(payload)
@ -244,15 +262,15 @@ class TestMem0MemoryUpdate:
agent_role="coder", agent_role="coder",
inputs_text="test input", inputs_text="test input",
input_snapshot=None, input_snapshot=None,
output_snapshot=MemoryContentSnapshot(text="test output"), output_snapshot=None,
) )
memory.update(payload) memory.update(payload)
call_kwargs = client.add.call_args[1] call_kwargs = client.add.call_args[1]
assert call_kwargs["agent_id"] == "coder" assert call_kwargs["agent_id"] == "coder"
def test_update_with_both_ids_prefers_agent_id(self): def test_update_with_both_ids_includes_both(self):
"""When both user_id and agent_id configured, agent_id takes precedence for writes.""" """When both user_id and agent_id configured, both are included in add() call."""
memory, client = _make_mem0_memory(user_id="user-1", agent_id="agent-1") memory, client = _make_mem0_memory(user_id="user-1", agent_id="agent-1")
client.add.return_value = [] client.add.return_value = []
@ -260,37 +278,37 @@ class TestMem0MemoryUpdate:
agent_role="writer", agent_role="writer",
inputs_text="input", inputs_text="input",
input_snapshot=None, input_snapshot=None,
output_snapshot=MemoryContentSnapshot(text="output"), output_snapshot=None,
) )
memory.update(payload) memory.update(payload)
call_kwargs = client.add.call_args[1] call_kwargs = client.add.call_args[1]
assert call_kwargs["agent_id"] == "agent-1" assert call_kwargs["agent_id"] == "agent-1"
assert "user_id" not in call_kwargs assert call_kwargs["user_id"] == "user-1"
def test_update_empty_output_is_noop(self): def test_update_empty_input_is_noop(self):
"""Empty output snapshot skips API call.""" """Empty inputs_text skips API call."""
memory, client = _make_mem0_memory(agent_id="a1")
payload = MemoryWritePayload(
agent_role="writer",
inputs_text=" ",
input_snapshot=None,
output_snapshot=MemoryContentSnapshot(text="some output"),
)
memory.update(payload)
client.add.assert_not_called()
def test_update_no_input_is_noop(self):
"""No inputs_text skips API call."""
memory, client = _make_mem0_memory(agent_id="a1") memory, client = _make_mem0_memory(agent_id="a1")
payload = MemoryWritePayload( payload = MemoryWritePayload(
agent_role="writer", agent_role="writer",
inputs_text="", inputs_text="",
input_snapshot=None, input_snapshot=None,
output_snapshot=MemoryContentSnapshot(text=" "), output_snapshot=MemoryContentSnapshot(text="output"),
)
memory.update(payload)
client.add.assert_not_called()
def test_update_no_snapshot_is_noop(self):
"""No snapshot at all skips API call."""
memory, client = _make_mem0_memory(agent_id="a1")
payload = MemoryWritePayload(
agent_role="writer",
inputs_text="test",
input_snapshot=None,
output_snapshot=None,
) )
memory.update(payload) memory.update(payload)
@ -303,14 +321,63 @@ class TestMem0MemoryUpdate:
payload = MemoryWritePayload( payload = MemoryWritePayload(
agent_role="writer", agent_role="writer",
inputs_text="test", inputs_text="test user input",
input_snapshot=None, input_snapshot=None,
output_snapshot=MemoryContentSnapshot(text="output"), output_snapshot=None,
) )
# Should not raise # Should not raise
memory.update(payload) memory.update(payload)
class TestMem0MemoryPipelineTextCleaning:
def test_strips_input_from_task_header(self):
"""Pipeline headers like '=== INPUT FROM TASK (user) ===' are stripped."""
memory, client = _make_mem0_memory(agent_id="a1")
client.add.return_value = []
payload = MemoryWritePayload(
agent_role="writer",
inputs_text="=== INPUT FROM TASK (user) ===\n\nMy name is Alex, I love Python",
input_snapshot=None,
output_snapshot=MemoryContentSnapshot(text="Nice to meet you Alex!"),
)
memory.update(payload)
call_kwargs = client.add.call_args[1]
messages = call_kwargs["messages"]
assert messages[0]["role"] == "user"
assert messages[0]["content"] == "My name is Alex, I love Python"
assert "INPUT FROM" not in messages[0]["content"]
def test_strips_multiple_input_headers(self):
"""Multiple pipeline headers from different sources are all stripped."""
memory, client = _make_mem0_memory(agent_id="a1")
client.add.return_value = []
payload = MemoryWritePayload(
agent_role="writer",
inputs_text=(
"=== INPUT FROM TASK (user) ===\n\nHello\n\n"
"=== INPUT FROM reviewer (assistant) ===\n\nWorld"
),
input_snapshot=None,
output_snapshot=MemoryContentSnapshot(text="Hi!"),
)
memory.update(payload)
call_kwargs = client.add.call_args[1]
user_content = call_kwargs["messages"][0]["content"]
assert "INPUT FROM" not in user_content
assert "Hello" in user_content
assert "World" in user_content
def test_clean_text_without_headers_unchanged(self):
"""Text without pipeline headers passes through unchanged."""
from runtime.node.agent.memory.mem0_memory import Mem0Memory
assert Mem0Memory._clean_pipeline_text("Just normal text") == "Just normal text"
class TestMem0MemoryLoadSave: class TestMem0MemoryLoadSave:
def test_load_is_noop(self): def test_load_is_noop(self):

View File

@ -28,19 +28,15 @@ graph:
write: true write: true
edges: [] edges: []
memory: memory:
# Agent-scoped memory: uses agent_id for storing and retrieving # User-scoped: extracts facts about the user (name, preferences, etc.)
# Agent-scoped: extracts what the agent learned (decisions, context)
# Both can be used together for different memory dimensions.
- name: mem0_store - name: mem0_store
type: mem0 type: mem0
config: config:
api_key: ${MEM0_API_KEY} api_key: ${MEM0_API_KEY}
user_id: project-user-123
agent_id: writer-agent agent_id: writer-agent
# Alternative: User-scoped memory (uncomment to use instead)
# - name: mem0_store
# type: mem0
# config:
# api_key: ${MEM0_API_KEY}
# user_id: project-user-123
start: start:
- writer - writer
end: [] end: []