Update README.md for improved clarity and accuracy; revise privacy notice, cache descriptions, and model support details.

This commit is contained in:
mtayfur
2025-10-15 14:13:55 +03:00
parent e3709fe677
commit 0726293446

View File

@@ -2,6 +2,10 @@
A long-term memory system that learns from conversations and personalizes responses without requiring external APIs or tokens.
## Important Notice
**Privacy Consideration:** This system shares user messages and stored memories with your configured LLM for memory consolidation and retrieval operations. All data is processed through Open WebUI's built-in models using your existing configuration. No data is sent to external services beyond what your LLM provider configuration already allows.
## Core Features
**Zero External Dependencies**
@@ -21,7 +25,7 @@ Avoids wasting resources on irrelevant messages through two-stage detection:
Categories automatically skipped: technical discussions, formatting requests, calculations, translation tasks, proofreading, and non-personal queries.
**Multi-Layer Caching**
Three specialized caches (embeddings, retrieval results, memory lookups) with LRU eviction keep responses fast while managing memory efficiently. Each user gets isolated cache storage.
Three specialized caches (embeddings, retrieval, memory) with LRU eviction keep responses fast while managing memory efficiently. Each user gets isolated cache storage.
**Real-Time Status Updates**
Emits progress messages during operations: memory retrieval progress, consolidation status, operation summaries — keeping users informed without overwhelming them.
@@ -32,7 +36,7 @@ All prompts and logic work language-agnostically. Stores memories in English but
## Model Support
**LLM Support**
Tested with Gemini 2.5 Flash Lite, GPT-4o-mini, Qwen2.5-Instruct, and Mistral-Small. Should work with any model that supports structured outputs.
Tested with gemini-2.5-flash-lite, gpt-5-nano, and qwen3-instruct. Should work with any model that supports structured outputs.
**Embedding Model Support**
Uses OpenWebUI's configured embedding model (supports Ollama, OpenAI, Azure OpenAI, and local sentence-transformers). Configure embedding models through OpenWebUI's RAG settings. The memory system automatically uses whatever embedding backend you've configured in OpenWebUI.
@@ -54,11 +58,13 @@ Uses OpenWebUI's configured embedding model (supports Ollama, OpenAI, Azure Open
## Configuration
Customize behavior through valves:
- **model**: LLM for consolidation and reranking (default: `gemini-2.5-flash-lite`)
- **model**: LLM for consolidation and reranking (default: `google/gemini-2.5-flash-lite`)
- **max_message_chars**: Maximum message length before skipping operations (default: 2500)
- **max_memories_returned**: Context injection limit (default: 10)
- **semantic_retrieval_threshold**: Minimum similarity score (default: 0.5)
- **relaxed_semantic_threshold_multiplier**: Adjusts threshold for consolidation (default: 0.9)
- **enable_llm_reranking**: Toggle smart reranking (default: true)
- **llm_reranking_trigger_multiplier**: When to activate LLM (default: 0.5 = 50%)
- **llm_reranking_trigger_multiplier**: When to activate LLM reranking (default: 0.5 = 50%)
## Performance Optimizations