2019-03-03 10:57:19 +08:00
2019-02-17 11:14:22 +08:00
2019-03-02 09:37:55 +08:00
2019-02-21 14:18:26 +08:00
2019-03-02 09:37:55 +08:00
2019-03-03 10:48:04 +08:00
2019-03-03 10:48:04 +08:00
2019-02-25 19:32:43 +08:00
2019-03-02 09:37:55 +08:00
2019-02-17 11:14:22 +08:00
2019-02-28 18:57:44 +08:00
2019-03-03 10:48:04 +08:00
2019-03-03 10:48:04 +08:00
2019-03-02 09:37:55 +08:00
2019-03-02 09:37:55 +08:00
2019-03-02 09:37:55 +08:00
2019-03-02 10:07:05 +08:00
2019-02-26 12:42:12 +08:00
2019-03-03 10:57:19 +08:00

Crawlab

Celery-based web crawler admin platform for managing distributed web spiders regardless of languages and frameworks.

Pre-requisite

  • Python3
  • MongoDB
  • Redis

Installation

pip install -r requirements.txt

Configure

Please edit configuration file config.py to configure api and database connections.

Quick Start

# run web app
python app.py

# run flower app
python ./bin/run_flower.py

# run worker
python ./bin/run_worker.py
# TODO: frontend

Nodes

Nodes are actually the workers defined in Celery. A node is running and connected to a task queue, redis for example, to receive and run tasks. As spiders need to be deployed to the nodes, users should specify their ip addresses and ports before the deployment.

Spiders

Auto Discovery

In config.py file, edit PROJECT_SOURCE_FILE_FOLDER as the directory where the spiders projects are located. The web app will discover spider projects automatically.

Deploy Spiders

All spiders need to be deployed to a specific node before crawling. Simply click "Deploy" button on spider detail page and select the right node for the deployment.

Run Spiders

After deploying the spider, you can click "Run" button on spider detail page and select a specific node to start crawling. It will triggers a task for the crawling, where you can see in detail in tasks page.

Tasks

Tasks are triggered and run by the workers. Users can check the task status info and logs in the task detail page.

Description
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Readme BSD-3-Clause 30 MiB
Languages
Go 99.3%
Shell 0.6%
Dockerfile 0.1%