Merge branch 'develop' of github.com:tikazyq/crawlab into develop

2026-01-21 17:21:09 +01:00 · 2019-08-04 21:19:29 +08:00
parent fb590e382d a58d8e3fbe
commit 4aa31450cb
7 changed files with 51 additions and 29 deletions
--- a/README-zh.md
+++ b/README-zh.md
@@ -52,7 +52,7 @@ docker run -d --rm --name crawlab \

 当然也可以用`docker-compose`来一键启动，甚至不用配置MongoDB和Redis数据库，**当然我们推荐这样做**。在当前目录中创建`docker-compose.yml`文件，输入以下内容。

-```bash
+```yaml
 version: '3.3'
 services:
  master: 
@@ -97,49 +97,49 @@ Docker部署的详情，请见[相关文档](https://tikazyq.github.io/crawlab/I

 #### 登录

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/login.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/login.png)

 #### 首页

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/home.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/home.png)

 #### 节点列表

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/node-list.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/node-list.png)

 #### 节点拓扑图

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/node-network.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/node-network.png)

 #### 爬虫列表

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/spider-list.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/spider-list.png)

 #### 爬虫概览

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/spider-overview.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/spider-overview.png)

 #### 爬虫分析

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/spider-analytics.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/spider-analytics.png)

 #### 爬虫文件

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/spider-file.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/spider-file.png)

 #### 任务详情 - 抓取结果

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/task-results.png?v0.3.0_1">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/task-results.png)

 #### 定时任务

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/schedule.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/schedule.png)

 ## 架构

 Crawlab的架构包括了一个主节点（Master Node）和多个工作节点（Worker Node），以及负责通信和数据储存的Redis和MongoDB数据库。

-![](https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/architecture.png)
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/architecture.png)

 前端应用向主节点请求数据，主节点通过MongoDB和Redis来执行任务派发调度以及部署，工作节点收到任务之后，开始执行爬虫任务，并将任务结果储存到MongoDB。架构相对于`v0.3.0`之前的Celery版本有所精简，去除了不必要的节点监控模块Flower，节点监控主要由Redis完成。

--- a/README.md
+++ b/README.md
@@ -53,7 +53,7 @@ docker run -d --rm --name crawlab \
 Surely you can use `docker-compose` to one-click to start up. By doing so, you don't even have to configure MongoDB and Redis databases. Create a file named `docker-compose.yml` and input the code below.


-```bash
+```yaml
 version: '3.3'
 services:
  master: 
@@ -95,49 +95,49 @@ For Docker Deployment details, please refer to [relevant documentation](https://

 #### Login

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/login.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/login.png)

 #### Home Page

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/home.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/home.png)

 #### Node List

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/node-list.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/node-list.png)

 #### Node Network

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/node-network.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/node-network.png)

 #### Spider List

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/spider-list.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/spider-list.png)

 #### Spider Overview

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/spider-overview.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/spider-overview.png)

 #### Spider Analytics

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/spider-analytics.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/spider-analytics.png)

 #### Spider Files

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/spider-file.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/spider-file.png)

 #### Task Results

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/task-results.png?v0.3.0_1">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/task-results.png)

 #### Cron Job

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/schedule.png?v0.3.0">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/schedule.png)

 ## Architecture

 The architecture of Crawlab is consisted of the Master Node and multiple Worker Nodes, and Redis and MongoDB databases which are mainly for nodes communication and data storage.

-<img src="https://crawlab.oss-cn-hangzhou.aliyuncs.com/v0.3.0/architecture.png">
+![](https://raw.githubusercontent.com/tikazyq/crawlab-docs/master/images/architecture.png)

 The frontend app makes requests to the Master Node, which assigns tasks and deploys spiders through MongoDB and Redis. When a Worker Node receives a task, it begins to execute the crawling task, and stores the results to MongoDB. The architecture is much more concise compared with versions before `v0.3.0`. It has removed unnecessary Flower module which offers node monitoring services. They are now done by Redis.

@@ -169,7 +169,7 @@ Redis is a very popular Key-Value database. It offers node communication service
 ### Frontend

 Frontend is a SPA based on 
-[Vue-Element-Admin](https://github.com/PanJiaChen/vue-element-admin). It has re-used many Element-UI components to support correspoinding display. 
+[Vue-Element-Admin](https://github.com/PanJiaChen/vue-element-admin). It has re-used many Element-UI components to support corresponding display. 

 ## Integration with Other Frameworks

@@ -206,7 +206,7 @@ class JuejinPipeline(object):

 There are existing spider management frameworks. So why use Crawlab? 

-The reason is that most of the existing platforms are depending on Scrapyd, which limits the choice only within python and scrapy. Surely scrapy is a great web crawl frameowrk, but it cannot do everything. 
+The reason is that most of the existing platforms are depending on Scrapyd, which limits the choice only within python and scrapy. Surely scrapy is a great web crawl framework, but it cannot do everything. 

 Crawlab is easy to use, general enough to adapt spiders in any language and any framework. It has also a beautiful frontend interface for users to manage spiders much more easily. 

--- a/backend/main.go
+++ b/backend/main.go
@@ -17,6 +17,7 @@ func main() {

 	// 初始化配置
 	if err := config.InitConfig(""); err != nil {
+		log.Error("init config error:" + err.Error())
 		panic(err)
 	}
 	log.Info("初始化配置成功")
@@ -30,6 +31,7 @@ func main() {

 	// 初始化Mongodb数据库
 	if err := database.InitMongo(); err != nil {
+		log.Error("init mongodb error:" + err.Error())
 		debug.PrintStack()
 		panic(err)
 	}
@@ -37,6 +39,7 @@ func main() {

 	// 初始化Redis数据库
 	if err := database.InitRedis(); err != nil {
+		log.Error("init redis error:" + err.Error())
 		debug.PrintStack()
 		panic(err)
 	}
@@ -45,6 +48,7 @@ func main() {
 	if services.IsMaster() {
 		// 初始化定时任务
 		if err := services.InitScheduler(); err != nil {
+			log.Error("init scheduler error:" + err.Error())
 			debug.PrintStack()
 			panic(err)
 		}
@@ -53,6 +57,7 @@ func main() {

 	// 初始化任务执行器
 	if err := services.InitTaskExecutor(); err != nil {
+		log.Error("init task executor error:" + err.Error())
 		debug.PrintStack()
 		panic(err)
 	}
@@ -60,12 +65,14 @@ func main() {

 	// 初始化节点服务
 	if err := services.InitNodeService(); err != nil {
+		log.Error("init node service error:" + err.Error())
 		panic(err)
 	}
 	log.Info("初始化节点配置成功")

 	// 初始化爬虫服务
 	if err := services.InitSpiderService(); err != nil {
+		log.Error("init spider service error:" + err.Error())
 		debug.PrintStack()
 		panic(err)
 	}
@@ -73,6 +80,7 @@ func main() {

 	// 初始化用户服务
 	if err := services.InitUserService(); err != nil {
+		log.Error("init user service error:" + err.Error())
 		debug.PrintStack()
 		panic(err)
 	}
@@ -91,7 +99,7 @@ func main() {
 		app.POST("/nodes/:id", routes.PostNode)             // 修改节点
 		app.GET("/nodes/:id/tasks", routes.GetNodeTaskList) // 节点任务列表
 		app.GET("/nodes/:id/system", routes.GetSystemInfo)  // 节点任务列表
-		app.DELETE("/nodes/:id", routes.DeleteNode)			// 删除节点
+		app.DELETE("/nodes/:id", routes.DeleteNode)         // 删除节点
 		// 爬虫
 		app.GET("/spiders", routes.GetSpiderList)              // 爬虫列表
 		app.GET("/spiders/:id", routes.GetSpider)              // 爬虫详情
@@ -138,6 +146,7 @@ func main() {
 	host := viper.GetString("server.host")
 	port := viper.GetString("server.port")
 	if err := app.Run(host + ":" + port); err != nil {
+		log.Error("run server error:" + err.Error())
 		panic(err)
 	}
 }
--- a/docker_init.sh
+++ b/docker_init.sh
@@ -7,7 +7,7 @@ then
 else
 	jspath=`ls /app/dist/js/app.*.js`
 	cp ${jspath} ${jspath}.bak
-	sed -i "s/localhost:8000/${CRAWLAB_API_ADDRESS}/g" ${jspath}
+	sed -i "s?localhost:8000?${CRAWLAB_API_ADDRESS}?g" ${jspath}
 fi

 # start nginx
--- a/examples/worker/README.md
+++ b/examples/worker/README.md
@@ -21,3 +21,6 @@ docker build -t crawlab:worker .
 ```
 docker-compose up -d
 ```
+
+如果在多台服务器使用`docker-compose.yml`进行编排，可能出现节点注册不上的问题，因为mac地址冲突了。
+可以使用`networks`定义当前节点的IP段，这样就可以正常注册到redis
--- a/examples/worker/crawlab
+++ b/examples/worker/crawlab
--- a/examples/worker/docker-compose.yml
+++ b/examples/worker/docker-compose.yml
@@ -5,4 +5,14 @@ services:
    container_name: crawlab-worker
    volumes:
    - $PWD/conf/config.yml:/opt/crawlab/conf/config.yml
-    - $PWD/crawlab:/usr/local/bin/crawlab
+    # 二进制包使用源码生成
+    - $PWD/crawlab:/usr/local/bin/crawlab
+    networks:
+      - crawlabnet
+
+networks:
+  crawlabnet:
+    ipam:
+      driver: default
+      config:
+        - subnet: 172.30.0.0/16