updated README.md

2026-01-21 17:21:09 +01:00 · 2019-03-02 10:07:05 +08:00
parent 307a1872f1
commit d26f43e09e
1 changed files with 54 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,54 @@
-# crawlab
-Centralized admin platform for managing web crawlers including scrapy, pyspider, webmagic and much more
+# Crawlab
+Celery-based web crawler admin platform for managing distributed web spiders regardless of languages and frameworks.
+
+## Pre-requisite
+- Python3
+- MongoDB
+- Redis
+
+## Installation
+
+```bash
+pip install -r requirements.txt
+```
+
+## Configure
+
+Please edit configuration file `config.py` to configure api and database connections.
+
+## Quick Start
+```bash
+# run web app
+python app.py
+
+# run flower app
+python ./bin/run_flower.py
+
+# run worker
+python ./bin/run_worker.py
+```
+
+```bash
+# TODO: frontend
+```
+
+## Nodes
+
+Nodes are actually the workers defined in Celery. A node is running and connected to a task queue, redis for example, to receive and run tasks. As spiders need to be deployed to the nodes, users should specify their ip addresses and ports before the deployment.
+
+## Spiders
+
+#### Auto Discovery
+In `config.py` file, edit `PROJECT_SOURCE_FILE_FOLDER` as the directory where the spiders projects are located. The web app will discover spider projects automatically.
+
+#### Deploy Spiders
+
+All spiders need to be deployed to a specific node before crawling. Simply click "Deploy" button on spider detail page and select the right node for the deployment. 
+
+#### Run Spiders
+
+After deploying the spider, you can click "Run" button on spider detail page and select a specific node to start crawling. It will triggers a task for the crawling, where you can see in detail in tasks page.
+
+## Tasks
+
+Tasks are triggered and run by the workers. Users can check the task status info and logs in the task detail page.