mirror of https://github.com/mtayfur/openwebui-memory-system.git synced 2026-01-22 06:51:01 +01:00

Go to file

mtayfur 59ff55d662 docs(memory_system): clarify example outputs and explanations for memory operations

Improves clarity in example 3 by specifying the origin city in the CREATE
operation for more complete context, and updates example 6 to better reflect
the distinction between technical requests and personal statements, ensuring
the documentation accurately guides memory handling logic.

refactor(memory_system): rewrite category descriptions for clarity and conciseness

Category descriptions in NON_PERSONAL_CATEGORY_DESCRIPTIONS and
PERSONAL_CATEGORY_DESCRIPTIONS are rewritten to be more concise,
generalized, and easier to parse, reducing verbosity and removing
example-heavy phrasing. This improves maintainability, readability,
and consistency, making the intent of each category clearer for
future development and review.

docs: expand and clarify examples of personal information categories

Additional examples are added to better illustrate the types of
personal information covered, improving clarity for users and
developers about what constitutes sensitive data in various contexts.

2025-11-26 16:45:07 +03:00

.gitignore

♻️ (memory_system): refactor skip detection and add semantic deduplication

2025-10-27 00:27:33 +03:00

.python-version

feat(memory_system): add configurable status message verbosity levels

2025-11-07 00:19:50 +03:00

dev-check.sh

🔧 (dev-check.sh, pyproject.toml, requirements.txt): add development tooling and configuration

2025-10-27 00:20:05 +03:00

memory_system.py

docs(memory_system): clarify example outputs and explanations for memory operations

2025-11-26 16:45:07 +03:00

pyproject.toml

🔧 (dev-check.sh, pyproject.toml, requirements.txt): add development tooling and configuration

2025-10-27 00:20:05 +03:00

README.md

refactor(memory): remove redundant valve options and clarify reranking controls

2025-11-09 16:52:56 +03:00

requirements.txt

♻️ (memory_system): refactor skip detection and add semantic deduplication

2025-10-27 00:27:33 +03:00

README.md

Memory System for Open WebUI

A long-term memory system that learns from conversations and personalizes responses without requiring external APIs or tokens.

⚠️ Important Notices

🔒 Privacy & Data Sharing:

User messages and stored memories are shared with your configured LLM for memory consolidation and retrieval
If using remote embedding models (like OpenAI text-embedding-3-small), memories will also be sent to those external providers
All data is processed through Open WebUI's built-in models using your existing configuration

💰 Cost & Model Requirements:

The system uses complex prompts and sends relevant memories to the LLM, which increase token usage and costs
Requires public models configured in OpenWebUI - you can use any public model ID from your instance
Recommended cost-effective models: gpt-5-nano, gemini-2.5-flash-lite, qwen3-instruct, or your local LLMs

Core Features

Zero External Dependencies
Uses Open WebUI's built-in models (LLM and embeddings) — no API keys, no external services.

Intelligent Memory Consolidation
Automatically processes conversations in the background to create, update, or delete memories. The LLM analyzes context and decides when to store personal facts, enriching existing memories rather than creating duplicates.

Hybrid Memory Retrieval
Starts with fast semantic search, then switches to LLM-powered reranking only when needed. The system triggers LLM reranking automatically when candidate count exceeds 50% of max retrieval limit, optimizing for both speed and accuracy.

Smart Skip Detection
Avoids wasting resources on irrelevant messages through two-stage detection:

Fast-path: Regex patterns catch technical content (code, logs, URLs, commands) instantly
Semantic: Zero-shot classification identifies instructions, math, translations, and grammar requests

Categories automatically skipped: technical discussions, formatting requests, calculations, translation tasks, proofreading, and non-personal queries.

Multi-Layer Caching
Three specialized caches (embeddings, retrieval, memory) with LRU eviction keep responses fast while managing memory efficiently. Each user gets isolated cache storage.

Real-Time Status Updates
Emits progress messages during operations: memory retrieval progress, consolidation status, operation summaries — keeping users informed without overwhelming them.

Multilingual by Design
All prompts and logic work language-agnostically. Stores memories in English but processes any input language seamlessly.

Model Support

LLM Support
Tested with gemini-2.5-flash-lite, gpt-5-nano, and qwen3-instruct. Should work with any model that supports structured outputs.

Embedding Model Support
Uses OpenWebUI's configured embedding model (supports Ollama, OpenAI, Azure OpenAI, and local sentence-transformers). Configure embedding models through OpenWebUI's RAG settings. The memory system automatically uses whatever embedding backend you've configured in OpenWebUI.

How It Works

During Chat (Inlet)

Checks if message should be skipped (technical/instruction content)
Retrieves relevant memories using semantic search
Applies LLM reranking if candidate count is high
Injects top memories into context for personalized responses

After Response (Outlet)

Runs consolidation in background without blocking
Gathers candidate memories using relaxed similarity threshold
LLM generates operations (CREATE/UPDATE/DELETE)
Executes validated operations and clears affected caches

Configuration

Customize behavior through valves:

model: LLM for consolidation and reranking. Set to "Default" to use the current chat model, or specify a model ID to use that specific model
max_memories_returned: Context injection limit (default: 10)
semantic_retrieval_threshold: Minimum similarity score (default: 0.5)
llm_reranking_trigger_multiplier: When to activate LLM reranking (0.0 = disabled, default: 0.5 = 50%)
skip_category_margin: Margin for skip detection classification (default: 0.20)
status_emit_level: Status message verbosity - Basic or Detailed (default: Detailed)

Performance Optimizations

Batched embedding generation for efficiency
Normalized embeddings for faster similarity computation
Cached embeddings prevent redundant API calls to OpenWebUI's embedding backend
LRU eviction keeps memory footprint bounded
Fast-path skip detection for instant filtering
Selective LLM usage based on candidate count

Memory Quality

The system maintains high-quality memories through:

Temporal tracking with date anchoring
Entity enrichment (combining names with descriptions)
Relationship completeness (never stores partial connections)
Contextual grouping (related facts stored together)
Historical preservation (superseded facts converted to past tense)