# Crawlab Task Worker Configuration Examples ## Basic Configuration ### config.yml ```yaml # Task execution configuration task: workers: 20 # Number of concurrent task workers (default: 10) # Node configuration (optional) node: maxRunners: 50 # Maximum total tasks per node (0 = unlimited) ``` ### Environment Variables ```bash # Set via environment variables export CRAWLAB_TASK_WORKERS=20 export CRAWLAB_NODE_MAXRUNNERS=50 ``` ## Configuration Guidelines ### Worker Count Recommendations | Scenario | Task Workers | Queue Size | Memory Usage | |----------|-------------|------------|--------------| | Development | 5-10 | 25-50 | ~100MB | | Small Production | 15-20 | 75-100 | ~200MB | | Medium Production | 25-35 | 125-175 | ~400MB | | Large Production | 40-60 | 200-300 | ~800MB | ### Factors to Consider 1. **Task Complexity**: CPU/Memory intensive tasks need fewer workers 2. **Task Duration**: Long-running tasks need more workers for throughput 3. **System Resources**: Balance workers with available CPU/Memory 4. **Database Load**: More workers = more database connections 5. **External Dependencies**: Network-bound tasks can handle more workers ### Performance Tuning #### Too Few Workers (Queue Full Errors) ```log WARN task queue is full (50/50), consider increasing task.workers configuration ``` **Solution**: Increase `task.workers` value #### Too Many Workers (Resource Exhaustion) ```log ERROR failed to create task runner: out of memory ERROR database connection pool exhausted ``` **Solution**: Decrease `task.workers` value #### Optimal Configuration ```log INFO Task handler service started with 20 workers and queue size 100 DEBUG task[abc123] queued, queue usage: 5/100 ``` ## Docker Configuration ### docker-compose.yml ```yaml version: '3' services: crawlab-master: image: crawlab/crawlab:latest environment: - CRAWLAB_TASK_WORKERS=25 - CRAWLAB_NODE_MAXRUNNERS=100 # ... other config crawlab-worker: image: crawlab/crawlab:latest environment: - CRAWLAB_TASK_WORKERS=30 # Workers can be different per node - CRAWLAB_NODE_MAXRUNNERS=150 # ... other config ``` ### Kubernetes ConfigMap ```yaml apiVersion: v1 kind: ConfigMap metadata: name: crawlab-config data: config.yml: | task: workers: 25 node: maxRunners: 100 ``` ## Monitoring Worker Performance ### Log Monitoring ```bash # Monitor worker pool status grep -E "(workers|queue usage|queue is full)" /var/log/crawlab/crawlab.log # Monitor task throughput grep -E "(task.*queued|task.*finished)" /var/log/crawlab/crawlab.log | wc -l ``` ### Metrics to Track - Queue utilization percentage - Average task execution time - Worker pool saturation - Memory usage per worker - Task success/failure rates ## Troubleshooting ### Queue Always Full 1. Increase worker count: `task.workers` 2. Check task complexity and optimization 3. Verify database performance 4. Consider scaling horizontally (more nodes) ### High Memory Usage 1. Decrease worker count 2. Optimize task memory usage 3. Implement task batching 4. Add memory monitoring alerts ### Slow Task Processing 1. Profile individual tasks 2. Check database query performance 3. Optimize external API calls 4. Consider async task patterns ## Testing Configuration Changes ```bash # Test new configuration export CRAWLAB_TASK_WORKERS=30 ./scripts/test_goroutine_fixes.sh 900 10 # Monitor during peak load ./scripts/test_goroutine_fixes.sh 3600 5 ``` ## Best Practices 1. **Start Conservative**: Begin with default values and monitor 2. **Load Test**: Always test configuration changes under load 3. **Monitor Metrics**: Track queue utilization and task throughput 4. **Scale Gradually**: Increase worker count in small increments 5. **Resource Limits**: Set appropriate memory/CPU limits in containers 6. **High Availability**: Configure different worker counts per node type