32 Commits

Author SHA1 Message Date
mtayfur
577f6d6406 Enhance skip detection logic with additional checks for structured content and code indicators 2025-10-28 03:44:26 +03:00
mtayfur
fe3c47f6e4 Update README and logging messages for model configuration clarity 2025-10-28 03:03:11 +03:00
M. Tayfur
551b0c571b ♻️ Skip detection now binary (#4) 2025-10-28 02:35:29 +03:00
M. Tayfur
e6e3f7ab99 Merge pull request #3 from GlisseManTV/dev
Way to use current model instead of dedicated model.
2025-10-27 23:39:06 +03:00
iTConsult4Care
cca8079b94 Fix formatting and add metadata to memory_system.py 2025-10-27 21:14:35 +01:00
GlissemanTV
55a8c70bac add suppport of current chat model 2025-10-27 21:06:35 +01:00
GlissemanTV
e12aa7b776 Merge branch 'dev' of https://github.com/GlisseManTV/openwebui-memory-system into dev 2025-10-27 21:05:26 +01:00
mtayfur
b5a4872096 📝 (memory_system): clarify and strengthen intent filtering and memory consolidation guidelines
Expand and clarify the "Filter for Intent" rule to ensure only direct,
personally significant facts are stored, explicitly excluding messages
where the user's primary intent is instructional, technical, or
analytical. Update processing and decision frameworks to reinforce
selectivity based on user intent. Revise and annotate examples to
demonstrate correct application of the new rules, making it clear that
requests for advice, recommendations, or technical tasks are ignored.
These changes improve the precision and reliability of memory
consolidation, reducing the risk of storing irrelevant or transient
information.
2025-10-27 00:57:26 +03:00
mtayfur
3f9b4c6d48 ♻️ (memory_system): refactor skip detection and add semantic deduplication
- Unify skip detection to a binary classifier (personal vs non-personal)
 for improved maintainability and clarity. Remove multiple technical/
 instruction/translation/etc. categories and consolidate into
 NON_PERSONAL and PERSONAL.
- Adjust skip detection margin for more precise classification.
- Add semantic deduplication for memory operations using embedding
 similarity, preventing duplicate memory creation and updates.
- Normalize and validate embedding dimensions for robustness.
- Add per-user async locks to prevent race conditions during memory
 consolidation.
- Refactor requirements.txt to remove version pinning for easier
 dependency management.
- Improve logging and error handling for embedding and deduplication
 operations.

These changes improve the reliability and accuracy of memory
classification and deduplication, reduce false positives in skip
detection, and prevent duplicate or conflicting memory operations in
concurrent environments. Dependency management is simplified for
compatibility.
2025-10-27 00:27:33 +03:00
mtayfur
bb1bd01222 ♻️ (memory_system.py): reformat code for consistency, readability, and maintainability
- Reorder and group imports for clarity and PEP8 compliance.
- Standardize string quoting and whitespace for consistency.
- Refactor long function signatures and dictionary constructions for better readability.
- Use double quotes for all string literals and dictionary keys.
- Improve formatting of multiline statements and function calls.
- Add or adjust line breaks to keep lines within recommended length.
- Reformat class and method docstrings for clarity.
- Use consistent indentation and spacing throughout the file.

These changes improve code readability, maintainability, and consistency, making it easier for future contributors to understand and modify the codebase. No functional logic is changed.
2025-10-27 00:20:35 +03:00
mtayfur
189c6d4226 🔧 (dev-check.sh, pyproject.toml, requirements.txt): add development tooling and configuration
Introduce a dev-check.sh script to automate code formatting and import
sorting using Black and isort. Add a pyproject.toml file to configure
Black and isort settings for consistent code style. Update
requirements.txt to include Black and isort as development dependencies
and remove version pinning for easier dependency management.

These changes streamline the development workflow, enforce code style
consistency, and make it easier for contributors to run formatting and
import checks locally.
2025-10-27 00:20:05 +03:00
iTConsult4Care
7630259ce1 Delete memory_system_ollama.py 2025-10-26 16:48:26 +01:00
GlissemanTV
a3a627339a Merge branch 'dev' of https://github.com/GlisseManTV/openwebui-memory-system into dev 2025-10-26 16:46:06 +01:00
iTConsult4Care
390747c1d1 Delete memory_system_ollama.py 2025-10-26 16:45:30 +01:00
iTConsult4Care
b6b4c5fde8 Add files via upload 2025-10-26 16:44:24 +01:00
iTConsult4Care
4328e4b79c Add current model usecase 2025-10-26 16:42:36 +01:00
GlissemanTV
fcae27e840 add current model usecase 2025-10-26 16:14:05 +01:00
GlissemanTV
1f779d86ec Merge branch 'dev' of https://github.com/GlisseManTV/openwebui-memory-system into dev 2025-10-26 16:12:12 +01:00
GlissemanTV
d07a853aeb add current model using case 2025-10-26 16:05:35 +01:00
GlissemanTV
89399f57cc add current model workflow with checkbox 2025-10-26 16:01:13 +01:00
mtayfur
c0bfb3927b Refactor memory creation guidelines for improved clarity and conciseness in contextual completeness section. 2025-10-19 05:13:45 +03:00
mtayfur
d05ed8a16e Update semantic retrieval thresholds in Constants class for improved accuracy 2025-10-19 04:57:36 +03:00
mtayfur
7e2209633d Refactor logger initialization in memory_system.py to use module name for better context in log messages. 2025-10-18 19:25:22 +03:00
mtayfur
505c443050 Update README.md to enhance clarity on privacy and cost considerations; restructure sections for better readability and add relevant details. 2025-10-15 14:33:33 +03:00
mtayfur
0726293446 Update README.md for improved clarity and accuracy; revise privacy notice, cache descriptions, and model support details. 2025-10-15 14:13:55 +03:00
mtayfur
e3709fe677 Refactor cache management in Filter class; reduce maximum cache entries and concurrent user caches for improved performance and clarity. Update cache management methods for consistency and better logging. 2025-10-15 14:05:01 +03:00
mtayfur
2deba4fb2c Refactor Filter class to use async for pipeline context setup; implement locking mechanism for shared skip detector cache to enhance concurrency safety. 2025-10-12 23:24:58 +03:00
mtayfur
849dd71a01 Refactor memory selection logic in LLMRerankingService for improved clarity; streamline response handling by directly using response.ids. 2025-10-12 23:03:36 +03:00
mtayfur
158f0d1983 Refactor memory operations in Filter class for improved readability and consistency; utilize statistics.median for score calculation and streamline operation details formatting. 2025-10-12 22:54:18 +03:00
mtayfur
2db2d3f2c8 Refactor SkipDetector to streamline skip detection logic and improve clarity; update method signature for better integration with memory system. 2025-10-12 21:44:51 +03:00
GlissemanTV
08155816ff add memory_system_ollama.py 2025-10-10 09:10:03 +02:00
mtayfur
840d4c59ca Refactor SkipDetector to use a callable embedding function instead of SentenceTransformer; update requirements to remove unnecessary dependencies. 2025-10-09 23:36:27 +03:00
6 changed files with 693 additions and 442 deletions

2
.gitignore vendored
View File

@@ -1,4 +1,4 @@
__pycache__/
.github/instructions/*
.venv/
**AGENTS.md
tests/

View File

@@ -2,6 +2,18 @@
A long-term memory system that learns from conversations and personalizes responses without requiring external APIs or tokens.
## ⚠️ Important Notices
**🔒 Privacy & Data Sharing:**
- User messages and stored memories are shared with your configured LLM for memory consolidation and retrieval
- If using remote embedding models (like OpenAI text-embedding-3-small), memories will also be sent to those external providers
- All data is processed through Open WebUI's built-in models using your existing configuration
**💰 Cost & Model Requirements:**
- The system uses complex prompts and sends relevant memories to the LLM, which increase token usage and costs
- Requires public models configured in OpenWebUI - you can use any public model ID from your instance
- **Recommended cost-effective models:** `gpt-5-nano`, `gemini-2.5-flash-lite`, `qwen3-instruct`, or your local LLMs
## Core Features
**Zero External Dependencies**
@@ -21,7 +33,7 @@ Avoids wasting resources on irrelevant messages through two-stage detection:
Categories automatically skipped: technical discussions, formatting requests, calculations, translation tasks, proofreading, and non-personal queries.
**Multi-Layer Caching**
Three specialized caches (embeddings, retrieval results, memory lookups) with LRU eviction keep responses fast while managing memory efficiently. Each user gets isolated cache storage.
Three specialized caches (embeddings, retrieval, memory) with LRU eviction keep responses fast while managing memory efficiently. Each user gets isolated cache storage.
**Real-Time Status Updates**
Emits progress messages during operations: memory retrieval progress, consolidation status, operation summaries — keeping users informed without overwhelming them.
@@ -32,7 +44,7 @@ All prompts and logic work language-agnostically. Stores memories in English but
## Model Support
**LLM Support**
Tested with Gemini 2.5 Flash Lite, GPT-4o-mini, Qwen2.5-Instruct, and Mistral-Small. Should work with any model that supports structured outputs.
Tested with gemini-2.5-flash-lite, gpt-5-nano, and qwen3-instruct. Should work with any model that supports structured outputs.
**Embedding Model Support**
Uses OpenWebUI's configured embedding model (supports Ollama, OpenAI, Azure OpenAI, and local sentence-transformers). Configure embedding models through OpenWebUI's RAG settings. The memory system automatically uses whatever embedding backend you've configured in OpenWebUI.
@@ -54,11 +66,13 @@ Uses OpenWebUI's configured embedding model (supports Ollama, OpenAI, Azure Open
## Configuration
Customize behavior through valves:
- **model**: LLM for consolidation and reranking (default: `gemini-2.5-flash-lite`)
- **model**: LLM for consolidation and reranking. Set to "Default" to use the current chat model, or specify a model ID to use that specific model
- **max_message_chars**: Maximum message length before skipping operations (default: 2500)
- **max_memories_returned**: Context injection limit (default: 10)
- **semantic_retrieval_threshold**: Minimum similarity score (default: 0.5)
- **relaxed_semantic_threshold_multiplier**: Adjusts threshold for consolidation (default: 0.9)
- **enable_llm_reranking**: Toggle smart reranking (default: true)
- **llm_reranking_trigger_multiplier**: When to activate LLM (default: 0.5 = 50%)
- **llm_reranking_trigger_multiplier**: When to activate LLM reranking (default: 0.5 = 50%)
## Performance Optimizations

25
dev-check.sh Executable file
View File

@@ -0,0 +1,25 @@
#!/usr/bin/env bash
# Development tools script for openwebui-memory-system
set -e
if [ -f "./.venv/bin/python" ]; then
PYTHON="./.venv/bin/python"
elif command -v python3 &> /dev/null; then
PYTHON="python3"
elif command -v python &> /dev/null; then
PYTHON="python"
else
echo "Python 3 is not installed. Please install Python 3 to proceed."
exit 1
fi
echo "🔧 Running development tools..."
echo "🎨 Formatting with Black..."
$PYTHON -m black .
echo "📦 Sorting imports with isort..."
$PYTHON -m isort .
echo "✅ All checks passed!"

File diff suppressed because it is too large Load Diff

24
pyproject.toml Normal file
View File

@@ -0,0 +1,24 @@
[tool.black]
line-length = 160
target-version = ['py38']
include = '\.pyi?$'
extend-exclude = '''
/(
\.eggs
| \.git
| \.hg
| \.mypy_cache
| \.tox
| \.venv
| build
| dist
)/
'''
[tool.isort]
line_length = 160
multi_line_output = 3
include_trailing_comma = true
force_grid_wrap = 0
use_parentheses = true
ensure_newline_before_comments = true

View File

@@ -1,5 +1,7 @@
aiohttp>=3.12.15
pydantic>=2.11.7
numpy>=2.0.0
open-webui>=0.6.32
tiktoken>=0.11.0
aiohttp
pydantic
numpy
tiktoken
black
isort