- Added main server file (index.ts) to initialize the Crawlab MCP Server. - Created prompts for spider analysis, task debugging, spider setup, and system monitoring. - Developed tools for managing spiders, tasks, nodes, schedules, and system health. - Implemented a mock client for testing server functionality. - Added versioning support with version.ts. - Created a test script (test-server.mjs) to validate tool configurations and server responses. - Included build validation script (validate-build.mjs) to ensure proper setup and functionality. - Configured TypeScript settings with tsconfig.json for better development experience.
Crawlab MCP Server
A Model Context Protocol (MCP) server for interacting with Crawlab, a distributed web crawler management platform. This server provides tools to manage spiders, tasks, schedules, and monitor your Crawlab cluster through an AI assistant.
Features
Spider Management
- List, create, update, and delete spiders
- Run spiders with custom parameters
- Browse and edit spider files
- View spider execution history
Task Management
- Monitor running and completed tasks
- Cancel, restart, and delete tasks
- View task logs and results
- Filter tasks by spider, status, or time range
Schedule Management
- Create and manage cron-based schedules
- Enable/disable schedules
- View scheduled task history
Node Monitoring
- List cluster nodes and their status
- Monitor node health and availability
System Monitoring
- Health checks and system status
- Comprehensive cluster overview
Installation
npm install
npm run build
Usage
Basic Usage
# Start the MCP server
mcp-server-crawlab <crawlab_url> [api_token]
# Examples:
mcp-server-crawlab http://localhost:8080
mcp-server-crawlab https://crawlab.example.com your-api-token
Environment Variables
You can also set the API token via environment variable:
export CRAWLAB_API_TOKEN=your-api-token
mcp-server-crawlab http://localhost:8080
With MCP Inspector
For development and testing, you can use the MCP Inspector:
npm run inspect
Integration with AI Assistants
This MCP server is designed to work with AI assistants that support the Model Context Protocol. Configure your AI assistant to connect to this server to enable Crawlab management capabilities.
Available Tools
Spider Tools
crawlab_list_spiders- List all spiders with optional paginationcrawlab_get_spider- Get detailed information about a specific spidercrawlab_create_spider- Create a new spidercrawlab_update_spider- Update spider configurationcrawlab_delete_spider- Delete a spidercrawlab_run_spider- Execute a spidercrawlab_list_spider_files- Browse spider files and directoriescrawlab_get_spider_file_content- Read spider file contentcrawlab_save_spider_file- Save content to spider files
Task Tools
crawlab_list_tasks- List tasks with filtering optionscrawlab_get_task- Get detailed task informationcrawlab_cancel_task- Cancel a running taskcrawlab_restart_task- Restart a completed or failed taskcrawlab_delete_task- Delete a taskcrawlab_get_task_logs- Retrieve task execution logscrawlab_get_task_results- Get data collected by a task
Schedule Tools
crawlab_list_schedules- List all schedulescrawlab_get_schedule- Get schedule detailscrawlab_create_schedule- Create a new cron schedulecrawlab_update_schedule- Update schedule configurationcrawlab_delete_schedule- Delete a schedulecrawlab_enable_schedule- Enable a schedulecrawlab_disable_schedule- Disable a schedule
Node Tools
crawlab_list_nodes- List cluster nodescrawlab_get_node- Get node details and status
System Tools
crawlab_health_check- Check system healthcrawlab_system_status- Get comprehensive system overview
Available Prompts
The server includes several helpful prompts for common workflows:
spider-analysis
Analyze spider performance and provide optimization insights.
Parameters:
spider_id(required) - ID of the spider to analyzetime_range(optional) - Time range for analysis (e.g., '7d', '30d', '90d')
task-debugging
Debug failed tasks and identify root causes.
Parameters:
task_id(required) - ID of the failed task
spider-setup
Guide for creating and configuring new spiders.
Parameters:
spider_name(required) - Name for the new spidertarget_website(optional) - Target website to scrapespider_type(optional) - Type of spider (scrapy, selenium, custom)
system-monitoring
Monitor system health and performance.
Parameters:
focus_area(optional) - Area to focus on (nodes, tasks, storage, overall)
Example Interactions
Create and Run a Spider
AI: I'll help you create a new spider for scraping news articles.
[Uses crawlab_create_spider with appropriate parameters]
[Uses crawlab_run_spider to test the spider]
[Uses crawlab_get_task_logs to check execution]
Debug a Failed Task
User: "My task abc123 failed, can you help me debug it?"
[Uses task-debugging prompt]
[AI retrieves task details, logs, and provides analysis]
Monitor System Health
User: "How is my Crawlab cluster performing?"
[Uses system-monitoring prompt]
[AI provides comprehensive health overview and recommendations]
Configuration
Crawlab Setup
Ensure your Crawlab instance is accessible and optionally configure API authentication:
- Make sure Crawlab is running and accessible at the specified URL
- If using authentication, obtain an API token from your Crawlab instance
- Configure the token via command line argument or environment variable
MCP Client Configuration
Add this server to your MCP client configuration:
{
"servers": {
"crawlab": {
"command": "mcp-server-crawlab",
"args": ["http://localhost:8080", "your-api-token"]
}
}
}
Development
Building
npm run build
Watching for Changes
npm run watch
Testing
npm test
Linting
npm run lint
npm run lint:fix
Requirements
- Node.js 18+
- A running Crawlab instance
- Valid network access to the Crawlab API
License
MIT License
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
Support
For issues and questions:
- Check the Crawlab documentation
- Review the MCP specification
- Open an issue in this repository