mirror of
https://github.com/mtayfur/openwebui-memory-system.git
synced 2026-01-22 06:51:01 +01:00
Merge branch 'dev' of https://github.com/GlisseManTV/openwebui-memory-system into dev
This commit is contained in:
27
README.md
27
README.md
@@ -2,6 +2,18 @@
|
|||||||
|
|
||||||
A long-term memory system that learns from conversations and personalizes responses without requiring external APIs or tokens.
|
A long-term memory system that learns from conversations and personalizes responses without requiring external APIs or tokens.
|
||||||
|
|
||||||
|
## ⚠️ Important Notices
|
||||||
|
|
||||||
|
**🔒 Privacy & Data Sharing:**
|
||||||
|
- User messages and stored memories are shared with your configured LLM for memory consolidation and retrieval
|
||||||
|
- If using remote embedding models (like OpenAI text-embedding-3-small), memories will also be sent to those external providers
|
||||||
|
- All data is processed through Open WebUI's built-in models using your existing configuration
|
||||||
|
|
||||||
|
**💰 Cost & Model Requirements:**
|
||||||
|
- The system uses complex prompts and sends relevant memories to the LLM, which increase token usage and costs
|
||||||
|
- Requires public models configured in OpenWebUI - you can use any public model ID from your instance
|
||||||
|
- **Recommended cost-effective models:** `gpt-5-nano`, `gemini-2.5-flash-lite`, `qwen3-instruct`, or your local LLMs
|
||||||
|
|
||||||
## Core Features
|
## Core Features
|
||||||
|
|
||||||
**Zero External Dependencies**
|
**Zero External Dependencies**
|
||||||
@@ -21,7 +33,7 @@ Avoids wasting resources on irrelevant messages through two-stage detection:
|
|||||||
Categories automatically skipped: technical discussions, formatting requests, calculations, translation tasks, proofreading, and non-personal queries.
|
Categories automatically skipped: technical discussions, formatting requests, calculations, translation tasks, proofreading, and non-personal queries.
|
||||||
|
|
||||||
**Multi-Layer Caching**
|
**Multi-Layer Caching**
|
||||||
Three specialized caches (embeddings, retrieval results, memory lookups) with LRU eviction keep responses fast while managing memory efficiently. Each user gets isolated cache storage.
|
Three specialized caches (embeddings, retrieval, memory) with LRU eviction keep responses fast while managing memory efficiently. Each user gets isolated cache storage.
|
||||||
|
|
||||||
**Real-Time Status Updates**
|
**Real-Time Status Updates**
|
||||||
Emits progress messages during operations: memory retrieval progress, consolidation status, operation summaries — keeping users informed without overwhelming them.
|
Emits progress messages during operations: memory retrieval progress, consolidation status, operation summaries — keeping users informed without overwhelming them.
|
||||||
@@ -32,10 +44,10 @@ All prompts and logic work language-agnostically. Stores memories in English but
|
|||||||
## Model Support
|
## Model Support
|
||||||
|
|
||||||
**LLM Support**
|
**LLM Support**
|
||||||
Tested with Gemini 2.5 Flash Lite, GPT-4o-mini, Qwen2.5-Instruct, and Mistral-Small. Should work with any model that supports structured outputs.
|
Tested with gemini-2.5-flash-lite, gpt-5-nano, and qwen3-instruct. Should work with any model that supports structured outputs.
|
||||||
|
|
||||||
**Embedding Model Support**
|
**Embedding Model Support**
|
||||||
Supports any sentence-transformers model. The default `gte-multilingual-base` works well for diverse languages and is efficient enough for real-time use. Make sure to tweak thresholds if you switch to a different model.
|
Uses OpenWebUI's configured embedding model (supports Ollama, OpenAI, Azure OpenAI, and local sentence-transformers). Configure embedding models through OpenWebUI's RAG settings. The memory system automatically uses whatever embedding backend you've configured in OpenWebUI.
|
||||||
|
|
||||||
## How It Works
|
## How It Works
|
||||||
|
|
||||||
@@ -54,18 +66,19 @@ Supports any sentence-transformers model. The default `gte-multilingual-base` wo
|
|||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
Customize behavior through valves:
|
Customize behavior through valves:
|
||||||
- **model**: LLM for consolidation and reranking (default: `gemini-2.5-flash-lite`)
|
- **model**: LLM for consolidation and reranking (default: `google/gemini-2.5-flash-lite`)
|
||||||
- **embedding_model**: Sentence transformer (default: `gte-multilingual-base`)
|
- **max_message_chars**: Maximum message length before skipping operations (default: 2500)
|
||||||
- **max_memories_returned**: Context injection limit (default: 10)
|
- **max_memories_returned**: Context injection limit (default: 10)
|
||||||
- **semantic_retrieval_threshold**: Minimum similarity score (default: 0.5)
|
- **semantic_retrieval_threshold**: Minimum similarity score (default: 0.5)
|
||||||
|
- **relaxed_semantic_threshold_multiplier**: Adjusts threshold for consolidation (default: 0.9)
|
||||||
- **enable_llm_reranking**: Toggle smart reranking (default: true)
|
- **enable_llm_reranking**: Toggle smart reranking (default: true)
|
||||||
- **llm_reranking_trigger_multiplier**: When to activate LLM (default: 0.5 = 50%)
|
- **llm_reranking_trigger_multiplier**: When to activate LLM reranking (default: 0.5 = 50%)
|
||||||
|
|
||||||
## Performance Optimizations
|
## Performance Optimizations
|
||||||
|
|
||||||
- Batched embedding generation for efficiency
|
- Batched embedding generation for efficiency
|
||||||
- Normalized embeddings for faster similarity computation
|
- Normalized embeddings for faster similarity computation
|
||||||
- Cached embeddings prevent redundant model calls
|
- Cached embeddings prevent redundant API calls to OpenWebUI's embedding backend
|
||||||
- LRU eviction keeps memory footprint bounded
|
- LRU eviction keeps memory footprint bounded
|
||||||
- Fast-path skip detection for instant filtering
|
- Fast-path skip detection for instant filtering
|
||||||
- Selective LLM usage based on candidate count
|
- Selective LLM usage based on candidate count
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ import hashlib
|
|||||||
import json
|
import json
|
||||||
import logging
|
import logging
|
||||||
import statistics
|
import statistics
|
||||||
|
import statistics
|
||||||
import time
|
import time
|
||||||
from collections import OrderedDict
|
from collections import OrderedDict
|
||||||
from datetime import datetime, timezone
|
from datetime import datetime, timezone
|
||||||
@@ -27,6 +28,7 @@ from open_webui.models.users import Users
|
|||||||
from open_webui.routers.memories import Memories
|
from open_webui.routers.memories import Memories
|
||||||
from fastapi import Request
|
from fastapi import Request
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
_SHARED_SKIP_DETECTOR_CACHE = {}
|
_SHARED_SKIP_DETECTOR_CACHE = {}
|
||||||
@@ -47,6 +49,8 @@ class Constants:
|
|||||||
# Cache System
|
# Cache System
|
||||||
MAX_CACHE_ENTRIES_PER_TYPE = 500 # Maximum cache entries per cache type
|
MAX_CACHE_ENTRIES_PER_TYPE = 500 # Maximum cache entries per cache type
|
||||||
MAX_CONCURRENT_USER_CACHES = 50 # Maximum concurrent user cache instances
|
MAX_CONCURRENT_USER_CACHES = 50 # Maximum concurrent user cache instances
|
||||||
|
MAX_CACHE_ENTRIES_PER_TYPE = 500 # Maximum cache entries per cache type
|
||||||
|
MAX_CONCURRENT_USER_CACHES = 50 # Maximum concurrent user cache instances
|
||||||
CACHE_KEY_HASH_PREFIX_LENGTH = 10 # Hash prefix length for cache keys
|
CACHE_KEY_HASH_PREFIX_LENGTH = 10 # Hash prefix length for cache keys
|
||||||
|
|
||||||
# Retrieval & Similarity
|
# Retrieval & Similarity
|
||||||
@@ -104,6 +108,7 @@ Build precise memories of the user's personal narrative with factual, temporal s
|
|||||||
- Ensure Memory Quality:
|
- Ensure Memory Quality:
|
||||||
- High Bar for Creation: Only CREATE memories for significant life facts, relationships, events, or core personal attributes. Skip trivial details or passing interests.
|
- High Bar for Creation: Only CREATE memories for significant life facts, relationships, events, or core personal attributes. Skip trivial details or passing interests.
|
||||||
- Contextual Completeness: Combine related information into cohesive statements. Group connected facts (same topic, person, event, or timeframe) into single memories rather than fragmenting. Include supporting details while respecting boundaries. Only combine directly related facts. Avoid bare statements and never merge unrelated information.
|
- Contextual Completeness: Combine related information into cohesive statements. Group connected facts (same topic, person, event, or timeframe) into single memories rather than fragmenting. Include supporting details while respecting boundaries. Only combine directly related facts. Avoid bare statements and never merge unrelated information.
|
||||||
|
- Contextual Completeness: Combine related information into cohesive statements. Group connected facts (same topic, person, event, or timeframe) into single memories rather than fragmenting. Include supporting details while respecting boundaries. Only combine directly related facts. Avoid bare statements and never merge unrelated information.
|
||||||
- Mandatory Semantic Enhancement: Enhance entities with descriptive categorical nouns for better retrieval.
|
- Mandatory Semantic Enhancement: Enhance entities with descriptive categorical nouns for better retrieval.
|
||||||
- Verify Nouns/Pronouns: Link pronouns (he, she, they) and nouns to specific entities.
|
- Verify Nouns/Pronouns: Link pronouns (he, she, they) and nouns to specific entities.
|
||||||
- First-Person Format: Write all memories in English from the user's perspective.
|
- First-Person Format: Write all memories in English from the user's perspective.
|
||||||
@@ -488,6 +493,8 @@ class SkipDetector:
|
|||||||
def _initialize_reference_embeddings(self) -> None:
|
def _initialize_reference_embeddings(self) -> None:
|
||||||
"""Compute and cache embeddings for category descriptions."""
|
"""Compute and cache embeddings for category descriptions."""
|
||||||
try:
|
try:
|
||||||
|
technical_embeddings = self.embedding_function(
|
||||||
|
self.TECHNICAL_CATEGORY_DESCRIPTIONS
|
||||||
technical_embeddings = self.embedding_function(
|
technical_embeddings = self.embedding_function(
|
||||||
self.TECHNICAL_CATEGORY_DESCRIPTIONS
|
self.TECHNICAL_CATEGORY_DESCRIPTIONS
|
||||||
)
|
)
|
||||||
@@ -862,6 +869,7 @@ CANDIDATE MEMORIES:
|
|||||||
|
|
||||||
selected_memories = []
|
selected_memories = []
|
||||||
for memory in candidate_memories:
|
for memory in candidate_memories:
|
||||||
|
if memory["id"] in response.ids and len(selected_memories) < max_count:
|
||||||
if memory["id"] in response.ids and len(selected_memories) < max_count:
|
if memory["id"] in response.ids and len(selected_memories) < max_count:
|
||||||
selected_memories.append(memory)
|
selected_memories.append(memory)
|
||||||
|
|
||||||
@@ -1252,6 +1260,7 @@ class LLMConsolidationService:
|
|||||||
)
|
)
|
||||||
logger.info(f"🔄 Memory Operations: {', '.join(operation_details)}")
|
logger.info(f"🔄 Memory Operations: {', '.join(operation_details)}")
|
||||||
await self.memory_system._refresh_user_cache(user_id)
|
await self.memory_system._refresh_user_cache(user_id)
|
||||||
|
await self.memory_system._refresh_user_cache(user_id)
|
||||||
|
|
||||||
return created_count, updated_count, deleted_count, failed_count
|
return created_count, updated_count, deleted_count, failed_count
|
||||||
|
|
||||||
@@ -1377,6 +1386,8 @@ class Filter:
|
|||||||
self._background_tasks: set = set()
|
self._background_tasks: set = set()
|
||||||
self._shutdown_event = asyncio.Event()
|
self._shutdown_event = asyncio.Event()
|
||||||
|
|
||||||
|
self._embedding_function = None
|
||||||
|
self._skip_detector = None
|
||||||
self._embedding_function = None
|
self._embedding_function = None
|
||||||
self._skip_detector = None
|
self._skip_detector = None
|
||||||
|
|
||||||
@@ -1508,6 +1519,10 @@ class Filter:
|
|||||||
self, embedding: Union[List[float], np.ndarray]
|
self, embedding: Union[List[float], np.ndarray]
|
||||||
) -> np.ndarray:
|
) -> np.ndarray:
|
||||||
"""Normalize embedding vector."""
|
"""Normalize embedding vector."""
|
||||||
|
if isinstance(embedding, list):
|
||||||
|
embedding = np.array(embedding, dtype=np.float16)
|
||||||
|
else:
|
||||||
|
embedding = embedding.astype(np.float16)
|
||||||
if isinstance(embedding, list):
|
if isinstance(embedding, list):
|
||||||
embedding = np.array(embedding, dtype=np.float16)
|
embedding = np.array(embedding, dtype=np.float16)
|
||||||
else:
|
else:
|
||||||
@@ -1674,6 +1689,7 @@ class Filter:
|
|||||||
top_score = max(scores)
|
top_score = max(scores)
|
||||||
lowest_score = min(scores)
|
lowest_score = min(scores)
|
||||||
median_score = statistics.median(scores)
|
median_score = statistics.median(scores)
|
||||||
|
median_score = statistics.median(scores)
|
||||||
|
|
||||||
context_label = (
|
context_label = (
|
||||||
"📊 Consolidation candidate memories"
|
"📊 Consolidation candidate memories"
|
||||||
@@ -2079,6 +2095,8 @@ class Filter:
|
|||||||
|
|
||||||
await self._cache_manager.clear_all_caches()
|
await self._cache_manager.clear_all_caches()
|
||||||
|
|
||||||
|
async def _refresh_user_cache(self, user_id: str) -> None:
|
||||||
|
"""Refresh user cache - clear stale caches and update with fresh embeddings."""
|
||||||
async def _refresh_user_cache(self, user_id: str) -> None:
|
async def _refresh_user_cache(self, user_id: str) -> None:
|
||||||
"""Refresh user cache - clear stale caches and update with fresh embeddings."""
|
"""Refresh user cache - clear stale caches and update with fresh embeddings."""
|
||||||
start_time = time.time()
|
start_time = time.time()
|
||||||
@@ -2137,6 +2155,8 @@ class Filter:
|
|||||||
"""Execute a single memory operation."""
|
"""Execute a single memory operation."""
|
||||||
try:
|
try:
|
||||||
if operation.operation == Models.MemoryOperationType.CREATE:
|
if operation.operation == Models.MemoryOperationType.CREATE:
|
||||||
|
content_stripped = operation.content.strip()
|
||||||
|
if not content_stripped:
|
||||||
content_stripped = operation.content.strip()
|
content_stripped = operation.content.strip()
|
||||||
if not content_stripped:
|
if not content_stripped:
|
||||||
logger.warning(f"⚠️ Skipping CREATE operation: empty content")
|
logger.warning(f"⚠️ Skipping CREATE operation: empty content")
|
||||||
@@ -2151,6 +2171,8 @@ class Filter:
|
|||||||
return Models.MemoryOperationType.CREATE.value
|
return Models.MemoryOperationType.CREATE.value
|
||||||
|
|
||||||
elif operation.operation == Models.MemoryOperationType.UPDATE:
|
elif operation.operation == Models.MemoryOperationType.UPDATE:
|
||||||
|
id_stripped = operation.id.strip()
|
||||||
|
if not id_stripped:
|
||||||
id_stripped = operation.id.strip()
|
id_stripped = operation.id.strip()
|
||||||
if not id_stripped:
|
if not id_stripped:
|
||||||
logger.warning(f"⚠️ Skipping UPDATE operation: empty ID")
|
logger.warning(f"⚠️ Skipping UPDATE operation: empty ID")
|
||||||
@@ -2175,6 +2197,8 @@ class Filter:
|
|||||||
return Models.MemoryOperationType.UPDATE.value
|
return Models.MemoryOperationType.UPDATE.value
|
||||||
|
|
||||||
elif operation.operation == Models.MemoryOperationType.DELETE:
|
elif operation.operation == Models.MemoryOperationType.DELETE:
|
||||||
|
id_stripped = operation.id.strip()
|
||||||
|
if not id_stripped:
|
||||||
id_stripped = operation.id.strip()
|
id_stripped = operation.id.strip()
|
||||||
if not id_stripped:
|
if not id_stripped:
|
||||||
logger.warning(f"⚠️ Skipping DELETE operation: empty ID")
|
logger.warning(f"⚠️ Skipping DELETE operation: empty ID")
|
||||||
|
|||||||
Reference in New Issue
Block a user