mirror of https://github.com/crawlab-team/crawlab.git synced 2026-01-22 17:31:03 +01:00

Files

Marvin Zhang 97ab39119c feat(specs): add detailed documentation for gRPC file sync migration and release 0.7.0

- Introduced README.md for the file sync issue after gRPC migration, outlining the problem, root cause, and proposed solutions.
- Added release notes for Crawlab 0.7.0 highlighting community features and improvements.
- Created a README.md for the specs directory to provide an overview and usage instructions for LeanSpec.

2025-11-10 14:07:36 +08:00

3.8 KiB

Raw Permalink Blame History

Crawlab Task Worker Configuration Examples

Basic Configuration

config.yml

# Task execution configuration
task:
  workers: 20  # Number of concurrent task workers (default: 10)

# Node configuration (optional)
node:
  maxRunners: 50  # Maximum total tasks per node (0 = unlimited)

Environment Variables

# Set via environment variables
export CRAWLAB_TASK_WORKERS=20
export CRAWLAB_NODE_MAXRUNNERS=50

Configuration Guidelines

Worker Count Recommendations

Scenario	Task Workers	Queue Size	Memory Usage
Development	5-10	25-50	~100MB
Small Production	15-20	75-100	~200MB
Medium Production	25-35	125-175	~400MB
Large Production	40-60	200-300	~800MB

Factors to Consider

Task Complexity: CPU/Memory intensive tasks need fewer workers
Task Duration: Long-running tasks need more workers for throughput
System Resources: Balance workers with available CPU/Memory
Database Load: More workers = more database connections
External Dependencies: Network-bound tasks can handle more workers

Performance Tuning

Too Few Workers (Queue Full Errors)

WARN task queue is full (50/50), consider increasing task.workers configuration

Solution: Increase task.workers value

Too Many Workers (Resource Exhaustion)

ERROR failed to create task runner: out of memory
ERROR database connection pool exhausted

Solution: Decrease task.workers value

Optimal Configuration

INFO Task handler service started with 20 workers and queue size 100
DEBUG task[abc123] queued, queue usage: 5/100

Docker Configuration

docker-compose.yml

version: '3'
services:
  crawlab-master:
    image: crawlab/crawlab:latest
    environment:
      - CRAWLAB_TASK_WORKERS=25
      - CRAWLAB_NODE_MAXRUNNERS=100
    # ... other config

  crawlab-worker:
    image: crawlab/crawlab:latest
    environment:
      - CRAWLAB_TASK_WORKERS=30  # Workers can be different per node
      - CRAWLAB_NODE_MAXRUNNERS=150
    # ... other config

Kubernetes ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: crawlab-config
data:
  config.yml: |
    task:
      workers: 25
    node:
      maxRunners: 100

Monitoring Worker Performance

Log Monitoring

# Monitor worker pool status
grep -E "(workers|queue usage|queue is full)" /var/log/crawlab/crawlab.log

# Monitor task throughput
grep -E "(task.*queued|task.*finished)" /var/log/crawlab/crawlab.log | wc -l

Metrics to Track

Queue utilization percentage
Average task execution time
Worker pool saturation
Memory usage per worker
Task success/failure rates

Troubleshooting

Queue Always Full

Increase worker count: task.workers
Check task complexity and optimization
Verify database performance
Consider scaling horizontally (more nodes)

High Memory Usage

Decrease worker count
Optimize task memory usage
Implement task batching
Add memory monitoring alerts

Slow Task Processing

Profile individual tasks
Check database query performance
Optimize external API calls
Consider async task patterns

Testing Configuration Changes

# Test new configuration
export CRAWLAB_TASK_WORKERS=30
./scripts/test_goroutine_fixes.sh 900 10

# Monitor during peak load
./scripts/test_goroutine_fixes.sh 3600 5

Best Practices

Start Conservative: Begin with default values and monitor
Load Test: Always test configuration changes under load
Monitor Metrics: Track queue utilization and task throughput
Scale Gradually: Increase worker count in small increments
Resource Limits: Set appropriate memory/CPU limits in containers
High Availability: Configure different worker counts per node type

3.8 KiB Raw Permalink Blame History