diff --git a/README.md b/README.md index 4423c8d..eb24629 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,10 @@ A long-term memory system that learns from conversations and personalizes responses without requiring external APIs or tokens. +## Important Notice + +**Privacy Consideration:** This system shares user messages and stored memories with your configured LLM for memory consolidation and retrieval operations. All data is processed through Open WebUI's built-in models using your existing configuration. No data is sent to external services beyond what your LLM provider configuration already allows. + ## Core Features **Zero External Dependencies** @@ -21,7 +25,7 @@ Avoids wasting resources on irrelevant messages through two-stage detection: Categories automatically skipped: technical discussions, formatting requests, calculations, translation tasks, proofreading, and non-personal queries. **Multi-Layer Caching** -Three specialized caches (embeddings, retrieval results, memory lookups) with LRU eviction keep responses fast while managing memory efficiently. Each user gets isolated cache storage. +Three specialized caches (embeddings, retrieval, memory) with LRU eviction keep responses fast while managing memory efficiently. Each user gets isolated cache storage. **Real-Time Status Updates** Emits progress messages during operations: memory retrieval progress, consolidation status, operation summaries — keeping users informed without overwhelming them. @@ -32,7 +36,7 @@ All prompts and logic work language-agnostically. Stores memories in English but ## Model Support **LLM Support** -Tested with Gemini 2.5 Flash Lite, GPT-4o-mini, Qwen2.5-Instruct, and Mistral-Small. Should work with any model that supports structured outputs. +Tested with gemini-2.5-flash-lite, gpt-5-nano, and qwen3-instruct. Should work with any model that supports structured outputs. **Embedding Model Support** Uses OpenWebUI's configured embedding model (supports Ollama, OpenAI, Azure OpenAI, and local sentence-transformers). Configure embedding models through OpenWebUI's RAG settings. The memory system automatically uses whatever embedding backend you've configured in OpenWebUI. @@ -54,11 +58,13 @@ Uses OpenWebUI's configured embedding model (supports Ollama, OpenAI, Azure Open ## Configuration Customize behavior through valves: -- **model**: LLM for consolidation and reranking (default: `gemini-2.5-flash-lite`) +- **model**: LLM for consolidation and reranking (default: `google/gemini-2.5-flash-lite`) +- **max_message_chars**: Maximum message length before skipping operations (default: 2500) - **max_memories_returned**: Context injection limit (default: 10) - **semantic_retrieval_threshold**: Minimum similarity score (default: 0.5) +- **relaxed_semantic_threshold_multiplier**: Adjusts threshold for consolidation (default: 0.9) - **enable_llm_reranking**: Toggle smart reranking (default: true) -- **llm_reranking_trigger_multiplier**: When to activate LLM (default: 0.5 = 50%) +- **llm_reranking_trigger_multiplier**: When to activate LLM reranking (default: 0.5 = 50%) ## Performance Optimizations