mirror of
https://github.com/crawlab-team/crawlab.git
synced 2026-01-22 17:31:03 +01:00
- Introduced README.md for the file sync issue after gRPC migration, outlining the problem, root cause, and proposed solutions. - Added release notes for Crawlab 0.7.0 highlighting community features and improvements. - Created a README.md for the specs directory to provide an overview and usage instructions for LeanSpec.
3.8 KiB
3.8 KiB
Crawlab Task Worker Configuration Examples
Basic Configuration
config.yml
# Task execution configuration
task:
workers: 20 # Number of concurrent task workers (default: 10)
# Node configuration (optional)
node:
maxRunners: 50 # Maximum total tasks per node (0 = unlimited)
Environment Variables
# Set via environment variables
export CRAWLAB_TASK_WORKERS=20
export CRAWLAB_NODE_MAXRUNNERS=50
Configuration Guidelines
Worker Count Recommendations
| Scenario | Task Workers | Queue Size | Memory Usage |
|---|---|---|---|
| Development | 5-10 | 25-50 | ~100MB |
| Small Production | 15-20 | 75-100 | ~200MB |
| Medium Production | 25-35 | 125-175 | ~400MB |
| Large Production | 40-60 | 200-300 | ~800MB |
Factors to Consider
- Task Complexity: CPU/Memory intensive tasks need fewer workers
- Task Duration: Long-running tasks need more workers for throughput
- System Resources: Balance workers with available CPU/Memory
- Database Load: More workers = more database connections
- External Dependencies: Network-bound tasks can handle more workers
Performance Tuning
Too Few Workers (Queue Full Errors)
WARN task queue is full (50/50), consider increasing task.workers configuration
Solution: Increase task.workers value
Too Many Workers (Resource Exhaustion)
ERROR failed to create task runner: out of memory
ERROR database connection pool exhausted
Solution: Decrease task.workers value
Optimal Configuration
INFO Task handler service started with 20 workers and queue size 100
DEBUG task[abc123] queued, queue usage: 5/100
Docker Configuration
docker-compose.yml
version: '3'
services:
crawlab-master:
image: crawlab/crawlab:latest
environment:
- CRAWLAB_TASK_WORKERS=25
- CRAWLAB_NODE_MAXRUNNERS=100
# ... other config
crawlab-worker:
image: crawlab/crawlab:latest
environment:
- CRAWLAB_TASK_WORKERS=30 # Workers can be different per node
- CRAWLAB_NODE_MAXRUNNERS=150
# ... other config
Kubernetes ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: crawlab-config
data:
config.yml: |
task:
workers: 25
node:
maxRunners: 100
Monitoring Worker Performance
Log Monitoring
# Monitor worker pool status
grep -E "(workers|queue usage|queue is full)" /var/log/crawlab/crawlab.log
# Monitor task throughput
grep -E "(task.*queued|task.*finished)" /var/log/crawlab/crawlab.log | wc -l
Metrics to Track
- Queue utilization percentage
- Average task execution time
- Worker pool saturation
- Memory usage per worker
- Task success/failure rates
Troubleshooting
Queue Always Full
- Increase worker count:
task.workers - Check task complexity and optimization
- Verify database performance
- Consider scaling horizontally (more nodes)
High Memory Usage
- Decrease worker count
- Optimize task memory usage
- Implement task batching
- Add memory monitoring alerts
Slow Task Processing
- Profile individual tasks
- Check database query performance
- Optimize external API calls
- Consider async task patterns
Testing Configuration Changes
# Test new configuration
export CRAWLAB_TASK_WORKERS=30
./scripts/test_goroutine_fixes.sh 900 10
# Monitor during peak load
./scripts/test_goroutine_fixes.sh 3600 5
Best Practices
- Start Conservative: Begin with default values and monitor
- Load Test: Always test configuration changes under load
- Monitor Metrics: Track queue utilization and task throughput
- Scale Gradually: Increase worker count in small increments
- Resource Limits: Set appropriate memory/CPU limits in containers
- High Availability: Configure different worker counts per node type