diff --git a/CHANGELOG-zh.md b/CHANGELOG-zh.md new file mode 100644 index 00000000..c00c4fc1 --- /dev/null +++ b/CHANGELOG-zh.md @@ -0,0 +1,149 @@ +# 0.4.3 (2020-01-07) + +### 功能 / 优化 +- **依赖安装**. 允许用户在平台 Web 界面安装/卸载依赖以及添加编程语言(暂时只有 Node.js)。 +- **Docker 中预装编程语言**. 允许 Docker 用户通过设置 `CRAWLAB_SERVER_LANG_NODE` 为 `Y` 来预装 `Node.js` 环境. +- **在爬虫详情页添加定时任务列表**. 允许用户在爬虫详情页查看、添加、编辑定时任务. [#360](https://github.com/crawlab-team/crawlab/issues/360) +- **Cron 表达式与 Linux 一致**. 将表达式从 6 元素改为 5 元素,与 Linux 一致. +- **启用/禁用定时任务**. 允许用户启用/禁用定时任务. [#297](https://github.com/crawlab-team/crawlab/issues/297) +- **优化任务管理**. 允许用户批量删除任务. [#341](https://github.com/crawlab-team/crawlab/issues/341) +- **优化爬虫管理**. 允许用户在爬虫列表页对爬虫进行筛选和排序. +- **添加中文版 `CHANGELOG`**. +- **在顶部添加 Github 加星按钮**. + +### Bug 修复 +- **定时任务问题**. [#423](https://github.com/crawlab-team/crawlab/issues/423) +- **上传爬虫zip文件问题**. [#403](https://github.com/crawlab-team/crawlab/issues/403) [#407](https://github.com/crawlab-team/crawlab/issues/407) +- **因为网络原因导致崩溃**. [#340](https://github.com/crawlab-team/crawlab/issues/340) + +# 0.4.2 (2019-12-26) +### 功能 / 优化 +- **免责声明**. 加入免责声明. +- **通过 API 获取版本号**. [#371](https://github.com/crawlab-team/crawlab/issues/371) +- **通过配置来允许用户注册**. [#346](https://github.com/crawlab-team/crawlab/issues/346) +- **允许添加新用户**. +- **更高级的文件管理**. 允许用户添加、编辑、重命名、删除代码文件. [#286](https://github.com/crawlab-team/crawlab/issues/286) +- **优化爬虫创建流程**. 允许用户在上传 zip 文件前创建空的自定义爬虫. +- **优化任务管理**. 允许用户通过选择条件过滤任务. [#341](https://github.com/crawlab-team/crawlab/issues/341) + +### Bug 修复 +- **重复节点**. [#391](https://github.com/crawlab-team/crawlab/issues/391) +- **"mongodb no reachable" 错误**. [#373](https://github.com/crawlab-team/crawlab/issues/373) + +# 0.4.1 (2019-12-13) +### 功能 / 优化 +- **Spiderfile 优化**. 将阶段由数组更换为字典. [#358](https://github.com/crawlab-team/crawlab/issues/358) +- **百度统计更新**. + +### Bug 修复 +- **无法展示定时任务**. [#353](https://github.com/crawlab-team/crawlab/issues/353) +- **重复节点注册**. [#334](https://github.com/crawlab-team/crawlab/issues/334) + +# 0.4.0 (2019-12-06) +### 功能 / 优化 +- **可配置爬虫**. 允许用户添加 `Spiderfile` 来配置抓取规则. +- **执行模式**. 允许用户选择 3 种任务执行模式: *所有节点*, *指定节点* and *随机*. + +### Bug 修复 +- **任务意外被杀死**. [#306](https://github.com/crawlab-team/crawlab/issues/306) +- **文档更正**. [#301](https://github.com/crawlab-team/crawlab/issues/258) [#301](https://github.com/crawlab-team/crawlab/issues/258) +- **直接部署与 Windows 不兼容**. [#288](https://github.com/crawlab-team/crawlab/issues/288) +- **日志文件丢失**. [#269](https://github.com/crawlab-team/crawlab/issues/269) + +# 0.3.5 (2019-10-28) +### 功能 / 优化 +- **优雅关闭**. [详情](https://github.com/crawlab-team/crawlab/commit/63fab3917b5a29fd9770f9f51f1572b9f0420385) +- **节点信息优化**. [详情](https://github.com/crawlab-team/crawlab/commit/973251a0fbe7a2184ac0da09e0404a17c736aee7) +- **将系统环境变量添加到任务**. [详情](https://github.com/crawlab-team/crawlab/commit/4ab4892471965d6342d30385578ca60dc51f8ad3) +- **自动刷新任务日志**. [详情](https://github.com/crawlab-team/crawlab/commit/4ab4892471965d6342d30385578ca60dc51f8ad3) +- **允许 HTTPS 部署**. [详情](https://github.com/crawlab-team/crawlab/commit/5d8f6f0c56768a6e58f5e46cbf5adff8c7819228) + +### Bug 修复 +- **定时任务中无法获取爬虫列表**. [详情](https://github.com/crawlab-team/crawlab/commit/311f72da19094e3fa05ab4af49812f58843d8d93) +- **无法获取工作节点信息**. [详情](https://github.com/crawlab-team/crawlab/commit/6af06efc17685a9e232e8c2b5fd819ec7d2d1674) +- **运行爬虫任务时无法选择节点**. [详情](https://github.com/crawlab-team/crawlab/commit/31f8e03234426e97aed9b0bce6a50562f957edad) +- **结果量很大时无法获取结果数量**. [#260](https://github.com/crawlab-team/crawlab/issues/260) +- **定时任务中的节点问题**. [#244](https://github.com/crawlab-team/crawlab/issues/244) + + +# 0.3.1 (2019-08-25) +### 功能 / 优化 +- **Docker 镜像优化**. 将 Docker 镜像进一步分割成 alpine 镜像版本的 master、worker、frontendSplit docker further into master, worker, frontend. +- **单元测试**. 用单元测试覆盖部分后端代码. +- **前端优化**. 登录页、按钮大小、上传 UI 提示. +- **更灵活的节点注册**. 允许用户传一个变量作为注册 key,而不是默认的 MAC 地址. + +### Bug 修复 +- **上传大爬虫文件错误**. 上传大爬虫文件时的内存崩溃问题. [#150](https://github.com/crawlab-team/crawlab/issues/150) +- **无法同步爬虫**. 通过提高写权限等级来修复同步爬虫文件时的问题. [#114](https://github.com/crawlab-team/crawlab/issues/114) +- **爬虫页问题**. 通过删除 `Site` 字段来修复. [#112](https://github.com/crawlab-team/crawlab/issues/112) +- **节点展示问题**. 当在多个机器上跑 Docker 容器时,节点无法正确展示. [#99](https://github.com/crawlab-team/crawlab/issues/99) + +# 0.3.0 (2019-07-31) +### 功能 / 优化 +- **Golang 后端**: 将后端由 Python 重构为 Golang,很大的提高了稳定性和性能. +- **节点网络图**: 节点拓扑图可视化. +- **节点系统信息**: 可以查看包括操作系统、CPU数量、可执行文件在内的系统信息. +- **节点监控改进**: 节点通过 Redis 来监控和注册. +- **文件管理**: 可以在线编辑爬虫文件,包括代码高亮. +- **登录页/注册页/用户管理**: 要求用户登录后才能使用 Crawlab, 允许用户注册和用户管理,有一些基于角色的鉴权机制. +- **自动部署爬虫**: 爬虫将被自动部署或同步到所有在线节点. +- **更小的 Docker 镜像**: 瘦身版 Docker 镜像,通过多阶段构建将 Docker 镜像大小从 1.3G 减小到 700M 左右. + +### Bug 修复 +- **节点状态**. 节点状态不会随着节点下线而更新. [#87](https://github.com/tikazyq/crawlab/issues/87) +- **爬虫部署错误**. 通过自动爬虫部署来修复 [#83](https://github.com/tikazyq/crawlab/issues/83) +- **节点无法显示**. 节点无法显示在线 [#81](https://github.com/tikazyq/crawlab/issues/81) +- **定时任务无法工作**. 通过 Golang 后端修复 [#64](https://github.com/tikazyq/crawlab/issues/64) +- **Flower 错误**. 通过 Golang 后端修复 [#57](https://github.com/tikazyq/crawlab/issues/57) + +# 0.2.4 (2019-07-07) +### 功能 / 优化 +- **文档**: 更优和更详细的文档. +- **更好的 Crontab**: 通过 UI 界面生成 Cron 表达式. +- **更优的性能**: 从原生 flask 引擎 切换到 `gunicorn`. [#78](https://github.com/tikazyq/crawlab/issues/78) + +### Bug 修复 +- **删除爬虫**. 删除爬虫时不止在数据库中删除,还应该删除相关的文件夹、任务和定时任务. [#69](https://github.com/tikazyq/crawlab/issues/69) +- **MongoDB 授权**. 允许用户注明 `authenticationDatabase` 来连接 `mongodb`. [#68](https://github.com/tikazyq/crawlab/issues/68) +- **Windows 兼容性**. 加入 `eventlet` 到 `requirements.txt`. [#59](https://github.com/tikazyq/crawlab/issues/59) + + +# 0.2.3 (2019-06-12) +### 功能 / 优化 +- **Docker**: 用户能够运行 Docker 镜像来加快部署. +- **CLI**: 允许用户通过命令行来执行 Crawlab 程序. +- **上传爬虫**: 允许用户上传自定义爬虫到 Crawlab. +- **预览时编辑字段**: 允许用户在可配置爬虫中预览数据时编辑字段. + +### Bug 修复 +- **爬虫分页**. 爬虫列表页中修复分页问题. + +# 0.2.2 (2019-05-30) +### 功能 / 优化 +- **自动抓取字段**: 在可配置爬虫列表页种自动抓取字段. +- **下载结果**: 允许下载结果为 CSV 文件. +- **百度统计**: 允许用户选择是否允许向百度统计发送统计数据. + +### Bug 修复 +- **结果页分页**. [#45](https://github.com/tikazyq/crawlab/issues/45) +- **定时任务重复触发**: 将 Flask DEBUG 设置为 False 来保证定时任务无法重复触发. [#32](https://github.com/tikazyq/crawlab/issues/32) +- **前端环境**: 添加 `VUE_APP_BASE_URL` 作为生产环境模式变量,然后 API 不会永远都是 `localhost` [#30](https://github.com/tikazyq/crawlab/issues/30) + +# 0.2.1 (2019-05-27) +- **可配置爬虫**: 允许用户创建爬虫来抓取数据,而不用编写代码. + +# 0.2 (2019-05-10) + +- **高级数据统计**: 爬虫详情页的高级数据统计. +- **网站数据**: 加入网站列表(中国),允许用户查看 robots.txt、首页响应时间等信息. + +# 0.1.1 (2019-04-23) + +- **基础统计**: 用户可以查看基础统计数据,包括爬虫和任务页中的失败任务数、结果数. +- **近实时任务信息**: 周期性(5 秒)向服务器轮训数据来实现近实时查看任务信息. +- **定时任务**: 利用 apscheduler 实现定时任务,允许用户设置类似 Cron 的定时任务. + +# 0.1 (2019-04-17) + +- **首次发布** diff --git a/CHANGELOG.md b/CHANGELOG.md index aa2682ce..6c64fbd8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,21 @@ +# 0.4.3 (2020-01-07) + +### Features / Enhancement +- **Dependency Installation**. Allow users to install/uninstall dependencies and add programming languages (Node.js only for now) on the platform web interface. +- **Pre-install Programming Languages in Docker**. Allow Docker users to set `CRAWLAB_SERVER_LANG_NODE` as `Y` to pre-install `Node.js` environments. +- **Add Schedule List in Spider Detail Page**. Allow users to view / add / edit schedule cron jobs in the spider detail page. [#360](https://github.com/crawlab-team/crawlab/issues/360) +- **Align Cron Expression with Linux**. Change the expression of 6 elements to 5 elements as aligned in Linux. +- **Enable/Disable Schedule Cron**. Allow users to enable/disable the schedule jobs. [#297](https://github.com/crawlab-team/crawlab/issues/297) +- **Better Task Management**. Allow users to batch delete tasks. [#341](https://github.com/crawlab-team/crawlab/issues/341) +- **Better Spider Management**. Allow users to sort and filter spiders in the spider list page. +- **Added Chinese `CHANGELOG`**. +- **Added Github Star Button at Nav Bar**. + +### Bug Fixes +- **Schedule Cron Task Issue**. [#423](https://github.com/crawlab-team/crawlab/issues/423) +- **Upload Spider Zip File Issue**. [#403](https://github.com/crawlab-team/crawlab/issues/403) [#407](https://github.com/crawlab-team/crawlab/issues/407) +- **Exit due to Network Failure**. [#340](https://github.com/crawlab-team/crawlab/issues/340) + # 0.4.2 (2019-12-26) ### Features / Enhancement - **Disclaimer**. Added page for Disclaimer. diff --git a/Dockerfile b/Dockerfile index cf8ab174..73883c64 100644 --- a/Dockerfile +++ b/Dockerfile @@ -59,4 +59,4 @@ EXPOSE 8080 EXPOSE 8000 # start backend -CMD ["/bin/sh", "/app/docker_init.sh"] +CMD ["/bin/bash", "/app/docker_init.sh"] diff --git a/Dockerfile.local b/Dockerfile.local index d99010a4..59b8736d 100644 --- a/Dockerfile.local +++ b/Dockerfile.local @@ -57,4 +57,4 @@ EXPOSE 8080 EXPOSE 8000 # start backend -CMD ["/bin/sh", "/app/docker_init.sh"] +CMD ["/bin/bash", "/app/docker_init.sh"] diff --git a/README-zh.md b/README-zh.md index 5b9acf29..9057fcc3 100644 --- a/README-zh.md +++ b/README-zh.md @@ -1,16 +1,16 @@ # Crawlab - - + + +  - - - + +  中文 | [English](https://github.com/crawlab-team/crawlab) -[安装](#安装) | [运行](#运行) | [截图](#截图) | [架构](#架构) | [集成](#与其他框架的集成) | [比较](#与其他框架比较) | [相关文章](#相关文章) | [社区&赞助](#社区--赞助) | [免责声明](https://github.com/crawlab-team/crawlab/blob/master/DISCLAIMER-zh.md) +[安装](#安装) | [运行](#运行) | [截图](#截图) | [架构](#架构) | [集成](#与其他框架的集成) | [比较](#与其他框架比较) | [相关文章](#相关文章) | [社区&赞助](#社区--赞助) | [更新日志](https://github.com/crawlab-team/crawlab/blob/master/CHANGELOG-zh.md) | [免责声明](https://github.com/crawlab-team/crawlab/blob/master/DISCLAIMER-zh.md) 基于Golang的分布式爬虫管理平台,支持Python、NodeJS、Go、Java、PHP等多种编程语言以及多种爬虫框架。 @@ -19,9 +19,9 @@ ## 安装 三种方式: -1. [Docker](https://tikazyq.github.io/crawlab-docs/Installation/Docker.html)(推荐) -2. [直接部署](https://tikazyq.github.io/crawlab-docs/Installation/Direct.html)(了解内核) -3. [Kubernetes](https://mp.weixin.qq.com/s/3Q1BQATUIEE_WXcHPqhYbA) +1. [Docker](http://docs.crawlab.cn/Installation/Docker.html)(推荐) +2. [直接部署](http://docs.crawlab.cn/Installation/Direct.html)(了解内核) +3. [Kubernetes](https://juejin.im/post/5e0a02d851882549884c27ad) (多节点部署) ### 要求(Docker) - Docker 18.03+ @@ -31,9 +31,17 @@ ### 要求(直接部署) - Go 1.12+ - Node 8.12+ -- Redis +- Redis 5.x+ - MongoDB 3.6+ +## 快速开始 + +```bash +git clone https://github.com/crawlab-team/crawlab +cd crawlab +docker-compose up -d +``` + ## 运行 ### Docker @@ -123,6 +131,10 @@ Docker部署的详情,请见[相关文档](https://tikazyq.github.io/crawlab-d  +#### 依赖安装 + + + ## 架构 Crawlab的架构包括了一个主节点(Master Node)和多个工作节点(Worker Node),以及负责通信和数据储存的Redis和MongoDB数据库。 diff --git a/README.md b/README.md index 7b7c3d2d..075a80b5 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,16 @@ # Crawlab - - + + +  - - - + +  [中文](https://github.com/crawlab-team/crawlab/blob/master/README-zh.md) | English -[Installation](#installation) | [Run](#run) | [Screenshot](#screenshot) | [Architecture](#architecture) | [Integration](#integration-with-other-frameworks) | [Compare](#comparison-with-other-frameworks) | [Community & Sponsorship](#community--sponsorship) | [Disclaimer](https://github.com/crawlab-team/crawlab/blob/master/DISCLAIMER.md) +[Installation](#installation) | [Run](#run) | [Screenshot](#screenshot) | [Architecture](#architecture) | [Integration](#integration-with-other-frameworks) | [Compare](#comparison-with-other-frameworks) | [Community & Sponsorship](#community--sponsorship) | [CHANGELOG](https://github.com/crawlab-team/crawlab/blob/master/CHANGELOG.md) | [Disclaimer](https://github.com/crawlab-team/crawlab/blob/master/DISCLAIMER.md) Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. @@ -19,9 +19,9 @@ Golang-based distributed web crawler management platform, supporting various lan ## Installation Two methods: -1. [Docker](https://tikazyq.github.io/crawlab-docs/Installation/Docker.html) (Recommended) -2. [Direct Deploy](https://tikazyq.github.io/crawlab-docs/Installation/Direct.html) (Check Internal Kernel) -3. [Kubernetes](https://mp.weixin.qq.com/s/3Q1BQATUIEE_WXcHPqhYbA) +1. [Docker](http://docs.crawlab.cn/Installation/Docker.html) (Recommended) +2. [Direct Deploy](http://docs.crawlab.cn/Installation/Direct.html) (Check Internal Kernel) +3. [Kubernetes](https://juejin.im/post/5e0a02d851882549884c27ad) (Multi-Node Deployment) ### Pre-requisite (Docker) - Docker 18.03+ @@ -31,9 +31,17 @@ Two methods: ### Pre-requisite (Direct Deploy) - Go 1.12+ - Node 8.12+ -- Redis +- Redis 5.x+ - MongoDB 3.6+ +## Quick Start + +```bash +git clone https://github.com/crawlab-team/crawlab +cd crawlab +docker-compose up -d +``` + ## Run ### Docker @@ -121,6 +129,10 @@ For Docker Deployment details, please refer to [relevant documentation](https://  +#### Dependency Installation + + + ## Architecture The architecture of Crawlab is consisted of the Master Node and multiple Worker Nodes, and Redis and MongoDB databases which are mainly for nodes communication and data storage. diff --git a/backend/conf/config.yml b/backend/conf/config.yml index 17dc9daf..9fb0dee5 100644 --- a/backend/conf/config.yml +++ b/backend/conf/config.yml @@ -26,12 +26,15 @@ server: # mac地址 或者 ip地址,如果是ip,则需要手动指定IP type: "mac" ip: "" + lang: # 安装语言环境, Y 为安装,N 为不安装,只对 Docker 有效 + python: "Y" + node: "N" spider: path: "./spiders" task: workers: 4 other: - tmppath: "./tmp" -version: 0.4.1 + tmppath: "/tmp" +version: 0.4.3 setting: allowRegister: "N" diff --git a/backend/constants/common.go b/backend/constants/common.go new file mode 100644 index 00000000..9ac6cdbc --- /dev/null +++ b/backend/constants/common.go @@ -0,0 +1,6 @@ +package constants + +const ( + ASCENDING = "ascending" + DESCENDING = "descending" +) diff --git a/backend/constants/rpc.go b/backend/constants/rpc.go new file mode 100644 index 00000000..6eebf0d5 --- /dev/null +++ b/backend/constants/rpc.go @@ -0,0 +1,9 @@ +package constants + +const ( + RpcInstallLang = "install_lang" + RpcInstallDep = "install_dep" + RpcUninstallDep = "uninstall_dep" + RpcGetDepList = "get_dep_list" + RpcGetInstalledDepList = "get_installed_dep_list" +) diff --git a/backend/constants/schedule.go b/backend/constants/schedule.go index c3104601..520626a9 100644 --- a/backend/constants/schedule.go +++ b/backend/constants/schedule.go @@ -1,7 +1,7 @@ package constants const ( - ScheduleStatusStop = "stop" + ScheduleStatusStop = "stopped" ScheduleStatusRunning = "running" ScheduleStatusError = "error" diff --git a/backend/constants/system.go b/backend/constants/system.go index 70d41063..bec8b8c5 100644 --- a/backend/constants/system.go +++ b/backend/constants/system.go @@ -8,6 +8,6 @@ const ( const ( Python = "python" - NodeJS = "node" + Nodejs = "node" Java = "java" ) diff --git a/backend/database/mongo.go b/backend/database/mongo.go index d646285d..5d205ae4 100644 --- a/backend/database/mongo.go +++ b/backend/database/mongo.go @@ -93,5 +93,14 @@ func InitMongo() error { // 赋值给全局mongo session Session = sess } + //Add Unique index for 'key' + keyIndex := mgo.Index{ + Key: []string{"key"}, + Unique: true, + } + s, c := GetCol("nodes") + defer s.Close() + c.EnsureIndex(keyIndex) + return nil } diff --git a/backend/database/redis.go b/backend/database/redis.go index b165aaa3..4ecabbbd 100644 --- a/backend/database/redis.go +++ b/backend/database/redis.go @@ -36,6 +36,19 @@ func (r *Redis) RPush(collection string, value interface{}) error { defer utils.Close(c) if _, err := c.Do("RPUSH", collection, value); err != nil { + log.Error(err.Error()) + debug.PrintStack() + return err + } + return nil +} + +func (r *Redis) LPush(collection string, value interface{}) error { + c := r.pool.Get() + defer utils.Close(c) + + if _, err := c.Do("RPUSH", collection, value); err != nil { + log.Error(err.Error()) debug.PrintStack() return err } @@ -58,6 +71,7 @@ func (r *Redis) HSet(collection string, key string, value string) error { defer utils.Close(c) if _, err := c.Do("HSET", collection, key, value); err != nil { + log.Error(err.Error()) debug.PrintStack() return err } @@ -70,6 +84,8 @@ func (r *Redis) HGet(collection string, key string) (string, error) { value, err2 := redis.String(c.Do("HGET", collection, key)) if err2 != nil { + log.Error(err2.Error()) + debug.PrintStack() return value, err2 } return value, nil @@ -80,6 +96,8 @@ func (r *Redis) HDel(collection string, key string) error { defer utils.Close(c) if _, err := c.Do("HDEL", collection, key); err != nil { + log.Error(err.Error()) + debug.PrintStack() return err } return nil @@ -91,11 +109,29 @@ func (r *Redis) HKeys(collection string) ([]string, error) { value, err2 := redis.Strings(c.Do("HKeys", collection)) if err2 != nil { + log.Error(err2.Error()) + debug.PrintStack() return []string{}, err2 } return value, nil } +func (r *Redis) BRPop(collection string, timeout int) (string, error) { + if timeout <= 0 { + timeout = 60 + } + c := r.pool.Get() + defer utils.Close(c) + + values, err := redis.Strings(c.Do("BRPOP", collection, timeout)) + if err != nil { + log.Error(err.Error()) + debug.PrintStack() + return "", err + } + return values[1], nil +} + func NewRedisPool() *redis.Pool { var address = viper.GetString("redis.address") var port = viper.GetString("redis.port") @@ -112,8 +148,8 @@ func NewRedisPool() *redis.Pool { Dial: func() (conn redis.Conn, e error) { return redis.DialURL(url, redis.DialConnectTimeout(time.Second*10), - redis.DialReadTimeout(time.Second*10), - redis.DialWriteTimeout(time.Second*15), + redis.DialReadTimeout(time.Second*600), + redis.DialWriteTimeout(time.Second*10), ) }, TestOnBorrow: func(c redis.Conn, t time.Time) error { diff --git a/backend/main.go b/backend/main.go index 6a807331..08cdf70f 100644 --- a/backend/main.go +++ b/backend/main.go @@ -110,12 +110,20 @@ func main() { // 初始化依赖服务 if err := services.InitDepsFetcher(); err != nil { - log.Error("init user service error:" + err.Error()) + log.Error("init dependency fetcher error:" + err.Error()) debug.PrintStack() panic(err) } log.Info("initialized dependency fetcher successfully") + // 初始化RPC服务 + if err := services.InitRpcService(); err != nil { + log.Error("init rpc service error:" + err.Error()) + debug.PrintStack() + panic(err) + } + log.Info("initialized rpc service successfully") + // 以下为主节点服务 if model.IsMaster() { // 中间件 @@ -139,6 +147,9 @@ func main() { authGroup.GET("/nodes/:id/langs", routes.GetLangList) // 节点语言环境列表 authGroup.GET("/nodes/:id/deps", routes.GetDepList) // 节点第三方依赖列表 authGroup.GET("/nodes/:id/deps/installed", routes.GetInstalledDepList) // 节点已安装第三方依赖列表 + authGroup.POST("/nodes/:id/deps/install", routes.InstallDep) // 节点安装依赖 + authGroup.POST("/nodes/:id/deps/uninstall", routes.UninstallDep) // 节点卸载依赖 + authGroup.POST("/nodes/:id/langs/install", routes.InstallLang) // 节点安装语言 // 爬虫 authGroup.GET("/spiders", routes.GetSpiderList) // 爬虫列表 authGroup.GET("/spiders/:id", routes.GetSpider) // 爬虫详情 @@ -157,7 +168,7 @@ func main() { authGroup.POST("/spiders/:id/file/rename", routes.RenameSpiderFile) // 爬虫文件重命名 authGroup.GET("/spiders/:id/dir", routes.GetSpiderDir) // 爬虫目录 authGroup.GET("/spiders/:id/stats", routes.GetSpiderStats) // 爬虫统计数据 - authGroup.GET("/spider/types", routes.GetSpiderTypes) // 爬虫类型 + authGroup.GET("/spiders/:id/schedules", routes.GetSpiderSchedules) // 爬虫定时任务 // 可配置爬虫 authGroup.GET("/config_spiders/:id/config", routes.GetConfigSpiderConfig) // 获取可配置爬虫配置 authGroup.POST("/config_spiders/:id/config", routes.PostConfigSpiderConfig) // 更改可配置爬虫配置 @@ -178,13 +189,13 @@ func main() { authGroup.GET("/tasks/:id/results", routes.GetTaskResults) // 任务结果 authGroup.GET("/tasks/:id/results/download", routes.DownloadTaskResultsCsv) // 下载任务结果 // 定时任务 - authGroup.GET("/schedules", routes.GetScheduleList) // 定时任务列表 - authGroup.GET("/schedules/:id", routes.GetSchedule) // 定时任务详情 - authGroup.PUT("/schedules", routes.PutSchedule) // 创建定时任务 - authGroup.POST("/schedules/:id", routes.PostSchedule) // 修改定时任务 - authGroup.DELETE("/schedules/:id", routes.DeleteSchedule) // 删除定时任务 - authGroup.POST("/schedules/:id/stop", routes.StopSchedule) // 停止定时任务 - authGroup.POST("/schedules/:id/run", routes.RunSchedule) // 运行定时任务 + authGroup.GET("/schedules", routes.GetScheduleList) // 定时任务列表 + authGroup.GET("/schedules/:id", routes.GetSchedule) // 定时任务详情 + authGroup.PUT("/schedules", routes.PutSchedule) // 创建定时任务 + authGroup.POST("/schedules/:id", routes.PostSchedule) // 修改定时任务 + authGroup.DELETE("/schedules/:id", routes.DeleteSchedule) // 删除定时任务 + authGroup.POST("/schedules/:id/disable", routes.DisableSchedule) // 禁用定时任务 + authGroup.POST("/schedules/:id/enable", routes.EnableSchedule) // 启用定时任务 // 统计数据 authGroup.GET("/stats/home", routes.GetHomeStats) // 首页统计数据 // 用户 @@ -196,7 +207,8 @@ func main() { // release版本 authGroup.GET("/version", routes.GetVersion) // 获取发布的版本 // 系统 - authGroup.GET("/system/deps", routes.GetAllDepList) // 节点所有第三方依赖列表 + authGroup.GET("/system/deps/:lang", routes.GetAllDepList) // 节点所有第三方依赖列表 + authGroup.GET("/system/deps/:lang/:dep_name/json", routes.GetDepJson) // 节点第三方依赖JSON } } diff --git a/backend/model/node.go b/backend/model/node.go index effbfbd0..88c4ed66 100644 --- a/backend/model/node.go +++ b/backend/model/node.go @@ -173,8 +173,8 @@ func GetNode(id bson.ObjectId) (Node, error) { defer s.Close() if err := c.FindId(id).One(&node); err != nil { - log.Errorf("get node error: %s, id: %s", err.Error(), id.Hex()) - debug.PrintStack() + //log.Errorf("get node error: %s, id: %s", err.Error(), id.Hex()) + //debug.PrintStack() return node, err } return node, nil diff --git a/backend/model/schedule.go b/backend/model/schedule.go index c1923885..3b654b74 100644 --- a/backend/model/schedule.go +++ b/backend/model/schedule.go @@ -16,20 +16,17 @@ type Schedule struct { Name string `json:"name" bson:"name"` Description string `json:"description" bson:"description"` SpiderId bson.ObjectId `json:"spider_id" bson:"spider_id"` - //NodeId bson.ObjectId `json:"node_id" bson:"node_id"` - //NodeKey string `json:"node_key" bson:"node_key"` Cron string `json:"cron" bson:"cron"` EntryId cron.EntryID `json:"entry_id" bson:"entry_id"` Param string `json:"param" bson:"param"` RunType string `json:"run_type" bson:"run_type"` NodeIds []bson.ObjectId `json:"node_ids" bson:"node_ids"` - - // 状态 - Status string `json:"status" bson:"status"` + Status string `json:"status" bson:"status"` + Enabled bool `json:"enabled" bson:"enabled"` // 前端展示 SpiderName string `json:"spider_name" bson:"spider_name"` - NodeName string `json:"node_name" bson:"node_name"` + Nodes []Node `json:"nodes" bson:"nodes"` Message string `json:"message" bson:"message"` CreateTs time.Time `json:"create_ts" bson:"create_ts"` @@ -84,20 +81,15 @@ func GetScheduleList(filter interface{}) ([]Schedule, error) { var schs []Schedule for _, schedule := range schedules { - // TODO: 获取节点名称 - //if schedule.NodeId == bson.ObjectIdHex(constants.ObjectIdNull) { - // // 选择所有节点 - // schedule.NodeName = "All Nodes" - //} else { - // // 选择单一节点 - // node, err := GetNode(schedule.NodeId) - // if err != nil { - // schedule.Status = constants.ScheduleStatusError - // schedule.Message = constants.ScheduleStatusErrorNotFoundNode - // } else { - // schedule.NodeName = node.Name - // } - //} + // 获取节点名称 + schedule.Nodes = []Node{} + if schedule.RunType == constants.RunTypeSelectedNodes { + for _, nodeId := range schedule.NodeIds { + // 选择单一节点 + node, _ := GetNode(nodeId) + schedule.Nodes = append(schedule.Nodes, node) + } + } // 获取爬虫名称 spider, err := GetSpider(schedule.SpiderId) diff --git a/backend/model/spider.go b/backend/model/spider.go index 02c3aa8d..3026a66b 100644 --- a/backend/model/spider.go +++ b/backend/model/spider.go @@ -107,13 +107,13 @@ func (spider *Spider) Delete() error { } // 获取爬虫列表 -func GetSpiderList(filter interface{}, skip int, limit int) ([]Spider, int, error) { +func GetSpiderList(filter interface{}, skip int, limit int, sortStr string) ([]Spider, int, error) { s, c := database.GetCol("spiders") defer s.Close() // 获取爬虫列表 var spiders []Spider - if err := c.Find(filter).Skip(skip).Limit(limit).Sort("+name").All(&spiders); err != nil { + if err := c.Find(filter).Skip(skip).Limit(limit).Sort(sortStr).All(&spiders); err != nil { debug.PrintStack() return spiders, 0, err } @@ -275,27 +275,7 @@ func GetSpiderCount() (int, error) { return count, nil } -// 获取爬虫类型 -func GetSpiderTypes() ([]*entity.SpiderType, error) { - s, c := database.GetCol("spiders") - defer s.Close() - - group := bson.M{ - "$group": bson.M{ - "_id": "$type", - "count": bson.M{"$sum": 1}, - }, - } - var types []*entity.SpiderType - if err := c.Pipe([]bson.M{group}).All(&types); err != nil { - log.Errorf("get spider types error: %s", err.Error()) - debug.PrintStack() - return nil, err - } - - return types, nil -} - +// 获取爬虫定时任务 func GetConfigSpiderData(spider Spider) (entity.ConfigSpiderData, error) { // 构造配置数据 configData := entity.ConfigSpiderData{} diff --git a/backend/model/task.go b/backend/model/task.go index 299661ed..6762bd54 100644 --- a/backend/model/task.go +++ b/backend/model/task.go @@ -117,18 +117,12 @@ func GetTaskList(filter interface{}, skip int, limit int, sortKey string) ([]Tas for i, task := range tasks { // 获取爬虫名称 - spider, err := task.GetSpider() - if err != nil || spider.Id.Hex() == "" { - _ = spider.Delete() - } else { + if spider, err := task.GetSpider(); err == nil { tasks[i].SpiderName = spider.DisplayName } // 获取节点名称 - node, err := task.GetNode() - if node.Id.Hex() == "" || err != nil { - _ = task.Delete() - } else { + if node, err := task.GetNode(); err == nil { tasks[i].NodeName = node.Name } } @@ -142,6 +136,8 @@ func GetTaskListTotal(filter interface{}) (int, error) { var result int result, err := c.Find(filter).Count() if err != nil { + log.Errorf(err.Error()) + debug.PrintStack() return result, err } return result, nil @@ -168,6 +164,8 @@ func AddTask(item Task) error { item.UpdateTs = time.Now() if err := c.Insert(&item); err != nil { + log.Errorf(err.Error()) + debug.PrintStack() return err } return nil @@ -179,6 +177,8 @@ func RemoveTask(id string) error { var result Task if err := c.FindId(id).One(&result); err != nil { + log.Errorf(err.Error()) + debug.PrintStack() return err } diff --git a/backend/routes/schedule.go b/backend/routes/schedule.go index e54c49a3..c7ef474a 100644 --- a/backend/routes/schedule.go +++ b/backend/routes/schedule.go @@ -110,9 +110,9 @@ func DeleteSchedule(c *gin.Context) { } // 停止定时任务 -func StopSchedule(c *gin.Context) { +func DisableSchedule(c *gin.Context) { id := c.Param("id") - if err := services.Sched.Stop(bson.ObjectIdHex(id)); err != nil { + if err := services.Sched.Disable(bson.ObjectIdHex(id)); err != nil { HandleError(http.StatusInternalServerError, c, err) return } @@ -120,9 +120,9 @@ func StopSchedule(c *gin.Context) { } // 运行定时任务 -func RunSchedule(c *gin.Context) { +func EnableSchedule(c *gin.Context) { id := c.Param("id") - if err := services.Sched.Run(bson.ObjectIdHex(id)); err != nil { + if err := services.Sched.Enable(bson.ObjectIdHex(id)); err != nil { HandleError(http.StatusInternalServerError, c, err) return } diff --git a/backend/routes/spider.go b/backend/routes/spider.go index a5623b67..91bab47a 100644 --- a/backend/routes/spider.go +++ b/backend/routes/spider.go @@ -27,22 +27,38 @@ import ( ) func GetSpiderList(c *gin.Context) { - pageNum, _ := c.GetQuery("pageNum") - pageSize, _ := c.GetQuery("pageSize") + pageNum, _ := c.GetQuery("page_num") + pageSize, _ := c.GetQuery("page_size") keyword, _ := c.GetQuery("keyword") t, _ := c.GetQuery("type") + sortKey, _ := c.GetQuery("sort_key") + sortDirection, _ := c.GetQuery("sort_direction") + // 筛选 filter := bson.M{ "name": bson.M{"$regex": bson.RegEx{Pattern: keyword, Options: "im"}}, } - if t != "" && t != "all" { filter["type"] = t } + // 排序 + sortStr := "-_id" + if sortKey != "" && sortDirection != "" { + if sortDirection == constants.DESCENDING { + sortStr = "-" + sortKey + } else if sortDirection == constants.ASCENDING { + sortStr = "+" + sortKey + } else { + HandleErrorF(http.StatusBadRequest, c, "invalid sort_direction") + } + } + + // 分页 page := &entity.Page{} page.GetPage(pageNum, pageSize) - results, count, err := model.GetSpiderList(filter, page.Skip, page.Limit) + + results, count, err := model.GetSpiderList(filter, page.Skip, page.Limit, sortStr) if err != nil { HandleError(http.StatusInternalServerError, c, err) return @@ -693,20 +709,6 @@ func RenameSpiderFile(c *gin.Context) { }) } -// 爬虫类型 -func GetSpiderTypes(c *gin.Context) { - types, err := model.GetSpiderTypes() - if err != nil { - HandleError(http.StatusInternalServerError, c, err) - return - } - c.JSON(http.StatusOK, Response{ - Status: "ok", - Message: "success", - Data: types, - }) -} - func GetSpiderStats(c *gin.Context) { type Overview struct { TaskCount int `json:"task_count" bson:"task_count"` @@ -826,3 +828,25 @@ func GetSpiderStats(c *gin.Context) { }, }) } + +func GetSpiderSchedules(c *gin.Context) { + id := c.Param("id") + + if !bson.IsObjectIdHex(id) { + HandleErrorF(http.StatusBadRequest, c, "spider_id is invalid") + return + } + + // 获取定时任务 + list, err := model.GetScheduleList(bson.M{"spider_id": bson.ObjectIdHex(id)}) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + + c.JSON(http.StatusOK, Response{ + Status: "ok", + Message: "success", + Data: list, + }) +} diff --git a/backend/routes/system.go b/backend/routes/system.go index bcd186f8..b4e130a9 100644 --- a/backend/routes/system.go +++ b/backend/routes/system.go @@ -32,24 +32,8 @@ func GetDepList(c *gin.Context) { return } depList = list - } else { - HandleErrorF(http.StatusBadRequest, c, fmt.Sprintf("%s is not implemented", lang)) - return - } - - c.JSON(http.StatusOK, Response{ - Status: "ok", - Message: "success", - Data: depList, - }) -} - -func GetInstalledDepList(c *gin.Context) { - nodeId := c.Param("id") - lang := c.Query("lang") - var depList []entity.Dependency - if lang == constants.Python { - list, err := services.GetPythonInstalledDepList(nodeId) + } else if lang == constants.Nodejs { + list, err := services.GetNodejsDepList(nodeId, depName) if err != nil { HandleError(http.StatusInternalServerError, c, err) return @@ -67,8 +51,56 @@ func GetInstalledDepList(c *gin.Context) { }) } -func GetAllDepList(c *gin.Context) { +func GetInstalledDepList(c *gin.Context) { + nodeId := c.Param("id") lang := c.Query("lang") + var depList []entity.Dependency + if lang == constants.Python { + if services.IsMasterNode(nodeId) { + list, err := services.GetPythonLocalInstalledDepList(nodeId) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + depList = list + } else { + list, err := services.GetPythonRemoteInstalledDepList(nodeId) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + depList = list + } + } else if lang == constants.Nodejs { + if services.IsMasterNode(nodeId) { + list, err := services.GetNodejsLocalInstalledDepList(nodeId) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + depList = list + } else { + list, err := services.GetNodejsRemoteInstalledDepList(nodeId) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + depList = list + } + } else { + HandleErrorF(http.StatusBadRequest, c, fmt.Sprintf("%s is not implemented", lang)) + return + } + + c.JSON(http.StatusOK, Response{ + Status: "ok", + Message: "success", + Data: depList, + }) +} + +func GetAllDepList(c *gin.Context) { + lang := c.Param("lang") depName := c.Query("dep_name") // 获取所有依赖列表 @@ -108,3 +140,176 @@ func GetAllDepList(c *gin.Context) { Data: returnList, }) } + +func InstallDep(c *gin.Context) { + type ReqBody struct { + Lang string `json:"lang"` + DepName string `json:"dep_name"` + } + + nodeId := c.Param("id") + + var reqBody ReqBody + if err := c.ShouldBindJSON(&reqBody); err != nil { + HandleError(http.StatusBadRequest, c, err) + return + } + + if reqBody.Lang == constants.Python { + if services.IsMasterNode(nodeId) { + _, err := services.InstallPythonLocalDep(reqBody.DepName) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + } else { + _, err := services.InstallPythonRemoteDep(nodeId, reqBody.DepName) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + } + } else if reqBody.Lang == constants.Nodejs { + if services.IsMasterNode(nodeId) { + _, err := services.InstallNodejsLocalDep(reqBody.DepName) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + } else { + _, err := services.InstallNodejsRemoteDep(nodeId, reqBody.DepName) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + } + } else { + HandleErrorF(http.StatusBadRequest, c, fmt.Sprintf("%s is not implemented", reqBody.Lang)) + return + } + + // TODO: check if install is successful + + c.JSON(http.StatusOK, Response{ + Status: "ok", + Message: "success", + }) +} + +func UninstallDep(c *gin.Context) { + type ReqBody struct { + Lang string `json:"lang"` + DepName string `json:"dep_name"` + } + + nodeId := c.Param("id") + + var reqBody ReqBody + if err := c.ShouldBindJSON(&reqBody); err != nil { + HandleError(http.StatusBadRequest, c, err) + } + + if reqBody.Lang == constants.Python { + if services.IsMasterNode(nodeId) { + _, err := services.UninstallPythonLocalDep(reqBody.DepName) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + } else { + _, err := services.UninstallPythonRemoteDep(nodeId, reqBody.DepName) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + } + } else if reqBody.Lang == constants.Nodejs { + if services.IsMasterNode(nodeId) { + _, err := services.UninstallNodejsLocalDep(reqBody.DepName) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + } else { + _, err := services.UninstallNodejsRemoteDep(nodeId, reqBody.DepName) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + } + } else { + HandleErrorF(http.StatusBadRequest, c, fmt.Sprintf("%s is not implemented", reqBody.Lang)) + return + } + + // TODO: check if uninstall is successful + + c.JSON(http.StatusOK, Response{ + Status: "ok", + Message: "success", + }) +} + +func GetDepJson(c *gin.Context) { + depName := c.Param("dep_name") + lang := c.Param("lang") + + var dep entity.Dependency + if lang == constants.Python { + _dep, err := services.FetchPythonDepInfo(depName) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + } + dep = _dep + } else { + HandleErrorF(http.StatusBadRequest, c, fmt.Sprintf("%s is not implemented", lang)) + return + } + + c.Header("Cache-Control", "max-age=86400") + c.JSON(http.StatusOK, Response{ + Status: "ok", + Message: "success", + Data: dep, + }) +} + +func InstallLang(c *gin.Context) { + type ReqBody struct { + Lang string `json:"lang"` + } + + nodeId := c.Param("id") + + var reqBody ReqBody + if err := c.ShouldBindJSON(&reqBody); err != nil { + HandleError(http.StatusBadRequest, c, err) + return + } + + if reqBody.Lang == constants.Nodejs { + if services.IsMasterNode(nodeId) { + _, err := services.InstallNodejsLocalLang() + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + } else { + _, err := services.InstallNodejsRemoteLang(nodeId) + if err != nil { + HandleError(http.StatusInternalServerError, c, err) + return + } + } + } else { + HandleErrorF(http.StatusBadRequest, c, fmt.Sprintf("%s is not implemented", reqBody.Lang)) + return + } + + // TODO: check if install is successful + + c.JSON(http.StatusOK, Response{ + Status: "ok", + Message: "success", + }) +} diff --git a/backend/scripts/install-nodejs.sh b/backend/scripts/install-nodejs.sh new file mode 100644 index 00000000..1ca73b2d --- /dev/null +++ b/backend/scripts/install-nodejs.sh @@ -0,0 +1,17 @@ +#!/bin/env bash + +# install nvm +curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.2/install.sh | bash +export NVM_DIR="$([ -z "${XDG_CONFIG_HOME-}" ] && printf %s "${HOME}/.nvm" || printf %s "${XDG_CONFIG_HOME}/nvm")" +[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm + +# install Node.js v8.12 +nvm install 8.12 + +# create soft links +ln -s $HOME/.nvm/versions/node/v8.12.0/bin/npm /usr/local/bin/npm +ln -s $HOME/.nvm/versions/node/v8.12.0/bin/node /usr/local/bin/node + +# environments manipulation +export NODE_PATH=$HOME.nvm/versions/node/v8.12.0/lib/node_modules +export PATH=$NODE_PATH:$PATH \ No newline at end of file diff --git a/backend/services/node.go b/backend/services/node.go index 47d6185a..d6124205 100644 --- a/backend/services/node.go +++ b/backend/services/node.go @@ -12,6 +12,7 @@ import ( "encoding/json" "fmt" "github.com/apex/log" + "github.com/globalsign/mgo" "github.com/globalsign/mgo/bson" "github.com/gomodule/redigo/redis" "runtime/debug" @@ -116,7 +117,7 @@ func handleNodeInfo(key string, data *Data) { defer s.Close() var node model.Node - if err := c.Find(bson.M{"key": key}).One(&node); err != nil { + if err := c.Find(bson.M{"key": key}).One(&node); err != nil && err == mgo.ErrNotFound { // 数据库不存在该节点 node = model.Node{ Key: key, @@ -133,7 +134,7 @@ func handleNodeInfo(key string, data *Data) { log.Errorf(err.Error()) return } - } else { + } else if node.Key != "" { // 数据库存在该节点 node.Status = constants.StatusOnline node.UpdateTs = time.Now() @@ -190,35 +191,7 @@ func UpdateNodeData() { log.Errorf(err.Error()) return } - - // 注释掉,无需这样处理。 直接覆盖key对应的节点信息即可 by xyz 2020.01.01 - //先获取所有Redis的nodekey - /*list, _ := database.RedisClient.HKeys("nodes") - - if i := utils.Contains(list, key); i == false { - // 构造节点数据 - data := Data{ - Key: key, - Mac: mac, - Ip: ip, - Master: model.IsMaster(), - UpdateTs: time.Now(), - UpdateTsUnix: time.Now().Unix(), - } - - // 注册节点到Redis - dataBytes, err := json.Marshal(&data) - if err != nil { - log.Errorf(err.Error()) - debug.PrintStack() - return - } - if err := database.RedisClient.HSet("nodes", key, utils.BytesToString(dataBytes)); err != nil { - log.Errorf(err.Error()) - return - } - }*/ - + } func MasterNodeCallback(message redis.Message) (err error) { diff --git a/backend/services/rpc.go b/backend/services/rpc.go new file mode 100644 index 00000000..9b6a5b74 --- /dev/null +++ b/backend/services/rpc.go @@ -0,0 +1,231 @@ +package services + +import ( + "crawlab/constants" + "crawlab/database" + "crawlab/entity" + "crawlab/model" + "crawlab/utils" + "encoding/json" + "fmt" + "github.com/apex/log" + uuid "github.com/satori/go.uuid" + "runtime/debug" +) + +type RpcMessage struct { + Id string `json:"id"` + Method string `json:"method"` + Params map[string]string `json:"params"` + Result string `json:"result"` +} + +func RpcServerInstallLang(msg RpcMessage) RpcMessage { + lang := GetRpcParam("lang", msg.Params) + if lang == constants.Nodejs { + output, _ := InstallNodejsLocalLang() + msg.Result = output + } + return msg +} + +func RpcClientInstallLang(nodeId string, lang string) (output string, err error) { + params := map[string]string{} + params["lang"] = lang + + data, err := RpcClientFunc(nodeId, constants.RpcInstallLang, params, 600)() + if err != nil { + return + } + + output = data + + return +} + +func RpcServerInstallDep(msg RpcMessage) RpcMessage { + lang := GetRpcParam("lang", msg.Params) + depName := GetRpcParam("dep_name", msg.Params) + if lang == constants.Python { + output, _ := InstallPythonLocalDep(depName) + msg.Result = output + } + return msg +} + +func RpcClientInstallDep(nodeId string, lang string, depName string) (output string, err error) { + params := map[string]string{} + params["lang"] = lang + params["dep_name"] = depName + + data, err := RpcClientFunc(nodeId, constants.RpcInstallDep, params, 10)() + if err != nil { + return + } + + output = data + + return +} + +func RpcServerUninstallDep(msg RpcMessage) RpcMessage { + lang := GetRpcParam("lang", msg.Params) + depName := GetRpcParam("dep_name", msg.Params) + if lang == constants.Python { + output, _ := UninstallPythonLocalDep(depName) + msg.Result = output + } + return msg +} + +func RpcClientUninstallDep(nodeId string, lang string, depName string) (output string, err error) { + params := map[string]string{} + params["lang"] = lang + params["dep_name"] = depName + + data, err := RpcClientFunc(nodeId, constants.RpcUninstallDep, params, 60)() + if err != nil { + return + } + + output = data + + return +} + +func RpcServerGetInstalledDepList(nodeId string, msg RpcMessage) RpcMessage { + lang := GetRpcParam("lang", msg.Params) + if lang == constants.Python { + depList, _ := GetPythonLocalInstalledDepList(nodeId) + resultStr, _ := json.Marshal(depList) + msg.Result = string(resultStr) + } else if lang == constants.Nodejs { + depList, _ := GetNodejsLocalInstalledDepList(nodeId) + resultStr, _ := json.Marshal(depList) + msg.Result = string(resultStr) + } + return msg +} + +func RpcClientGetInstalledDepList(nodeId string, lang string) (list []entity.Dependency, err error) { + params := map[string]string{} + params["lang"] = lang + + data, err := RpcClientFunc(nodeId, constants.RpcGetInstalledDepList, params, 10)() + if err != nil { + return + } + + // 反序列化结果 + if err := json.Unmarshal([]byte(data), &list); err != nil { + return list, err + } + + return +} + +func RpcClientFunc(nodeId string, method string, params map[string]string, timeout int) func() (string, error) { + return func() (result string, err error) { + // 请求ID + id := uuid.NewV4().String() + + // 构造RPC消息 + msg := RpcMessage{ + Id: id, + Method: method, + Params: params, + Result: "", + } + + // 发送RPC消息 + msgStr := ObjectToString(msg) + if err := database.RedisClient.LPush(fmt.Sprintf("rpc:%s", nodeId), msgStr); err != nil { + return result, err + } + + // 获取RPC回复消息 + dataStr, err := database.RedisClient.BRPop(fmt.Sprintf("rpc:%s", nodeId), timeout) + if err != nil { + return result, err + } + + // 反序列化消息 + if err := json.Unmarshal([]byte(dataStr), &msg); err != nil { + return result, err + } + + return msg.Result, err + } +} + +func GetRpcParam(key string, params map[string]string) string { + return params[key] +} + +func ObjectToString(params interface{}) string { + bytes, _ := json.Marshal(params) + return utils.BytesToString(bytes) +} + +var IsRpcStopped = false + +func StopRpcService() { + IsRpcStopped = true +} + +func InitRpcService() error { + go func() { + for { + // 获取当前节点 + node, err := model.GetCurrentNode() + if err != nil { + log.Errorf(err.Error()) + debug.PrintStack() + continue + } + + // 获取获取消息队列信息 + dataStr, err := database.RedisClient.BRPop(fmt.Sprintf("rpc:%s", node.Id.Hex()), 300) + if err != nil { + log.Errorf(err.Error()) + debug.PrintStack() + continue + } + + // 反序列化消息 + var msg RpcMessage + if err := json.Unmarshal([]byte(dataStr), &msg); err != nil { + log.Errorf(err.Error()) + debug.PrintStack() + continue + } + + // 根据Method调用本地方法 + var replyMsg RpcMessage + if msg.Method == constants.RpcInstallDep { + replyMsg = RpcServerInstallDep(msg) + } else if msg.Method == constants.RpcUninstallDep { + replyMsg = RpcServerUninstallDep(msg) + } else if msg.Method == constants.RpcInstallLang { + replyMsg = RpcServerInstallLang(msg) + } else if msg.Method == constants.RpcGetInstalledDepList { + replyMsg = RpcServerGetInstalledDepList(node.Id.Hex(), msg) + } else { + continue + } + + // 发送返回消息 + if err := database.RedisClient.LPush(fmt.Sprintf("rpc:%s", node.Id.Hex()), ObjectToString(replyMsg)); err != nil { + log.Errorf(err.Error()) + debug.PrintStack() + continue + } + + // 如果停止RPC服务,则返回 + if IsRpcStopped { + return + } + } + }() + return nil +} diff --git a/backend/services/schedule.go b/backend/services/schedule.go index 53938aea..d737c3ac 100644 --- a/backend/services/schedule.go +++ b/backend/services/schedule.go @@ -53,6 +53,8 @@ func AddScheduleTask(s model.Schedule) func() { Param: s.Param, } if err := AddTask(t); err != nil { + log.Errorf(err.Error()) + debug.PrintStack() return } if err := AssignTask(t); err != nil { @@ -137,7 +139,7 @@ func (s *Scheduler) Start() error { func (s *Scheduler) AddJob(job model.Schedule) error { spec := job.Cron - // 添加任务 + // 添加定时任务 eid, err := s.cron.AddFunc(spec, AddScheduleTask(job)) if err != nil { log.Errorf("add func task error: %s", err.Error()) @@ -147,7 +149,12 @@ func (s *Scheduler) AddJob(job model.Schedule) error { // 更新EntryID job.EntryId = eid + + // 更新状态 job.Status = constants.ScheduleStatusRunning + job.Enabled = true + + // 保存定时任务 if err := job.Save(); err != nil { log.Errorf("job save error: %s", err.Error()) debug.PrintStack() @@ -176,8 +183,8 @@ func ParserCron(spec string) error { return nil } -// 停止定时任务 -func (s *Scheduler) Stop(id bson.ObjectId) error { +// 禁用定时任务 +func (s *Scheduler) Disable(id bson.ObjectId) error { schedule, err := model.GetSchedule(id) if err != nil { return err @@ -185,17 +192,22 @@ func (s *Scheduler) Stop(id bson.ObjectId) error { if schedule.EntryId == 0 { return errors.New("entry id not found") } + + // 从cron服务中删除该任务 s.cron.Remove(schedule.EntryId) + // 更新状态 schedule.Status = constants.ScheduleStatusStop + schedule.Enabled = false + if err = schedule.Save(); err != nil { return err } return nil } -// 运行任务 -func (s *Scheduler) Run(id bson.ObjectId) error { +// 启用定时任务 +func (s *Scheduler) Enable(id bson.ObjectId) error { schedule, err := model.GetSchedule(id) if err != nil { return err diff --git a/backend/services/spider.go b/backend/services/spider.go index 3515afa9..e97c7992 100644 --- a/backend/services/spider.go +++ b/backend/services/spider.go @@ -143,7 +143,7 @@ func ReadFileByStep(filePath string, handle func([]byte, *mgo.GridFile), fileCre // 发布所有爬虫 func PublishAllSpiders() { // 获取爬虫列表 - spiders, _, _ := model.GetSpiderList(nil, 0, constants.Infinite) + spiders, _, _ := model.GetSpiderList(nil, 0, constants.Infinite, "-_id") if len(spiders) == 0 { return } diff --git a/backend/services/system.go b/backend/services/system.go index 045ecbff..12b8744c 100644 --- a/backend/services/system.go +++ b/backend/services/system.go @@ -13,6 +13,7 @@ import ( "github.com/apex/log" "github.com/imroc/req" "os/exec" + "path" "regexp" "runtime/debug" "sort" @@ -20,29 +21,10 @@ import ( "sync" ) -type PythonDepJsonData struct { - Info PythonDepJsonDataInfo `json:"info"` -} - -type PythonDepJsonDataInfo struct { - Name string `json:"name"` - Summary string `json:"summary"` - Version string `json:"version"` -} - -type PythonDepNameDict struct { - Name string `json:"name"` - Weight int `json:"weight"` -} - -type PythonDepNameDictSlice []PythonDepNameDict - -func (s PythonDepNameDictSlice) Len() int { return len(s) } -func (s PythonDepNameDictSlice) Swap(i, j int) { s[i], s[j] = s[j], s[i] } -func (s PythonDepNameDictSlice) Less(i, j int) bool { return s[i].Weight > s[j].Weight } - +// 系统信息 chan 映射 var SystemInfoChanMap = utils.NewChanMap() +// 从远端获取系统信息 func GetRemoteSystemInfo(nodeId string) (sysInfo entity.SystemInfo, err error) { // 发送消息 msg := entity.NodeMessage{ @@ -70,6 +52,7 @@ func GetRemoteSystemInfo(nodeId string) (sysInfo entity.SystemInfo, err error) { return sysInfo, nil } +// 获取系统信息 func GetSystemInfo(nodeId string) (sysInfo entity.SystemInfo, err error) { if IsMasterNode(nodeId) { sysInfo, err = model.GetLocalSystemInfo() @@ -79,11 +62,12 @@ func GetSystemInfo(nodeId string) (sysInfo entity.SystemInfo, err error) { return } +// 获取语言列表 func GetLangList(nodeId string) []entity.Lang { list := []entity.Lang{ {Name: "Python", ExecutableName: "python", ExecutablePath: "/usr/local/bin/python", DepExecutablePath: "/usr/local/bin/pip"}, - {Name: "NodeJS", ExecutableName: "node", ExecutablePath: "/usr/local/bin/node"}, - {Name: "Java", ExecutableName: "java", ExecutablePath: "/usr/local/bin/java"}, + {Name: "Node.js", ExecutableName: "node", ExecutablePath: "/usr/local/bin/node", DepExecutablePath: "/usr/local/bin/npm"}, + //{Name: "Java", ExecutableName: "java", ExecutablePath: "/usr/local/bin/java"}, } for i, lang := range list { list[i].Installed = IsInstalledLang(nodeId, lang) @@ -91,6 +75,7 @@ func GetLangList(nodeId string) []entity.Lang { return list } +// 根据语言名获取语言实例 func GetLangFromLangName(nodeId string, name string) entity.Lang { langList := GetLangList(nodeId) for _, lang := range langList { @@ -101,6 +86,70 @@ func GetLangFromLangName(nodeId string, name string) entity.Lang { return entity.Lang{} } +// 是否已安装该依赖 +func IsInstalledLang(nodeId string, lang entity.Lang) bool { + sysInfo, err := GetSystemInfo(nodeId) + if err != nil { + return false + } + for _, exec := range sysInfo.Executables { + if exec.Path == lang.ExecutablePath { + return true + } + } + return false +} + +// 是否已安装该依赖 +func IsInstalledDep(installedDepList []entity.Dependency, dep entity.Dependency) bool { + for _, _dep := range installedDepList { + if strings.ToLower(_dep.Name) == strings.ToLower(dep.Name) { + return true + } + } + return false +} + +// 初始化函数 +func InitDepsFetcher() error { + c := cron.New(cron.WithSeconds()) + c.Start() + if _, err := c.AddFunc("0 */5 * * * *", UpdatePythonDepList); err != nil { + return err + } + + go func() { + UpdatePythonDepList() + }() + return nil +} + +// ========= +// Python +// ========= + +type PythonDepJsonData struct { + Info PythonDepJsonDataInfo `json:"info"` +} + +type PythonDepJsonDataInfo struct { + Name string `json:"name"` + Summary string `json:"summary"` + Version string `json:"version"` +} + +type PythonDepNameDict struct { + Name string `json:"name"` + Weight int `json:"weight"` +} + +type PythonDepNameDictSlice []PythonDepNameDict + +func (s PythonDepNameDictSlice) Len() int { return len(s) } +func (s PythonDepNameDictSlice) Swap(i, j int) { s[i], s[j] = s[j], s[i] } +func (s PythonDepNameDictSlice) Less(i, j int) bool { return s[i].Weight > s[j].Weight } + +// 获取Python本地依赖列表 func GetPythonDepList(nodeId string, searchDepName string) ([]entity.Dependency, error) { var list []entity.Dependency @@ -129,22 +178,51 @@ func GetPythonDepList(nodeId string, searchDepName string) ([]entity.Dependency, } } - // 获取已安装依赖 - installedDepList, err := GetPythonInstalledDepList(nodeId) - if err != nil { - return list, err + // 获取已安装依赖列表 + var installedDepList []entity.Dependency + if IsMasterNode(nodeId) { + installedDepList, err = GetPythonLocalInstalledDepList(nodeId) + if err != nil { + return list, err + } + } else { + installedDepList, err = GetPythonRemoteInstalledDepList(nodeId) + if err != nil { + return list, err + } } - // 从依赖源获取数据 - var goSync sync.WaitGroup + // 根据依赖名排序 sort.Stable(depNameList) + + // 遍历依赖名列表,取前20个 for i, depNameDict := range depNameList { + if i > 20 { + break + } + dep := entity.Dependency{ + Name: depNameDict.Name, + } + dep.Installed = IsInstalledDep(installedDepList, dep) + list = append(list, dep) + } + + // 从依赖源获取信息 + //list, err = GetPythonDepListWithInfo(list) + + return list, nil +} + +// 获取Python依赖的源数据信息 +func GetPythonDepListWithInfo(depList []entity.Dependency) ([]entity.Dependency, error) { + var goSync sync.WaitGroup + for i, dep := range depList { if i > 10 { break } goSync.Add(1) - go func(depName string, n *sync.WaitGroup) { - url := fmt.Sprintf("https://pypi.org/pypi/%s/json", depName) + go func(i int, dep entity.Dependency, depList []entity.Dependency, n *sync.WaitGroup) { + url := fmt.Sprintf("https://pypi.org/pypi/%s/json", dep.Name) res, err := req.Get(url) if err != nil { n.Done() @@ -155,21 +233,38 @@ func GetPythonDepList(nodeId string, searchDepName string) ([]entity.Dependency, n.Done() return } - dep := entity.Dependency{ - Name: depName, - Version: data.Info.Version, - Description: data.Info.Summary, - } - dep.Installed = IsInstalledDep(installedDepList, dep) - list = append(list, dep) + depList[i].Version = data.Info.Version + depList[i].Description = data.Info.Summary n.Done() - }(depNameDict.Name, &goSync) + }(i, dep, depList, &goSync) } goSync.Wait() - - return list, nil + return depList, nil } +func FetchPythonDepInfo(depName string) (entity.Dependency, error) { + url := fmt.Sprintf("https://pypi.org/pypi/%s/json", depName) + res, err := req.Get(url) + if err != nil { + log.Errorf(err.Error()) + debug.PrintStack() + return entity.Dependency{}, err + } + var data PythonDepJsonData + if err := res.ToJSON(&data); err != nil { + log.Errorf(err.Error()) + debug.PrintStack() + return entity.Dependency{}, err + } + dep := entity.Dependency{ + Name: depName, + Version: data.Info.Version, + Description: data.Info.Summary, + } + return dep, nil +} + +// 从Redis获取Python依赖列表 func GetPythonDepListFromRedis() ([]string, error) { var list []string @@ -192,28 +287,7 @@ func GetPythonDepListFromRedis() ([]string, error) { return list, nil } -func IsInstalledLang(nodeId string, lang entity.Lang) bool { - sysInfo, err := GetSystemInfo(nodeId) - if err != nil { - return false - } - for _, exec := range sysInfo.Executables { - if exec.Path == lang.ExecutablePath { - return true - } - } - return false -} - -func IsInstalledDep(installedDepList []entity.Dependency, dep entity.Dependency) bool { - for _, _dep := range installedDepList { - if strings.ToLower(_dep.Name) == strings.ToLower(dep.Name) { - return true - } - } - return false -} - +// 从Python依赖源获取依赖列表并返回 func FetchPythonDepList() ([]string, error) { // 依赖URL url := "https://pypi.tuna.tsinghua.edu.cn/simple" @@ -251,6 +325,7 @@ func FetchPythonDepList() ([]string, error) { return list, nil } +// 更新Python依赖列表到Redis func UpdatePythonDepList() { // 从依赖源获取列表 list, _ := FetchPythonDepList() @@ -271,7 +346,8 @@ func UpdatePythonDepList() { } } -func GetPythonInstalledDepList(nodeId string) ([]entity.Dependency, error){ +// 获取Python本地已安装的依赖列表 +func GetPythonLocalInstalledDepList(nodeId string) ([]entity.Dependency, error) { var list []entity.Dependency lang := GetLangFromLangName(nodeId, constants.Python) @@ -301,11 +377,206 @@ func GetPythonInstalledDepList(nodeId string) ([]entity.Dependency, error){ return list, nil } -func InitDepsFetcher() error { - c := cron.New(cron.WithSeconds()) - c.Start() - if _, err := c.AddFunc("0 */5 * * * *", UpdatePythonDepList); err != nil { - return err +// 获取Python远端依赖列表 +func GetPythonRemoteInstalledDepList(nodeId string) ([]entity.Dependency, error) { + depList, err := RpcClientGetInstalledDepList(nodeId, constants.Python) + if err != nil { + return depList, err } - return nil + return depList, nil +} + +// 安装Python本地依赖 +func InstallPythonLocalDep(depName string) (string, error) { + // 依赖镜像URL + url := "https://pypi.tuna.tsinghua.edu.cn/simple" + + cmd := exec.Command("pip", "install", depName, "-i", url) + outputBytes, err := cmd.Output() + if err != nil { + log.Errorf(err.Error()) + debug.PrintStack() + return fmt.Sprintf("error: %s", err.Error()), err + } + return string(outputBytes), nil +} + +// 获取Python远端依赖列表 +func InstallPythonRemoteDep(nodeId string, depName string) (string, error) { + output, err := RpcClientInstallDep(nodeId, constants.Python, depName) + if err != nil { + return output, err + } + return output, nil +} + +// 安装Python本地依赖 +func UninstallPythonLocalDep(depName string) (string, error) { + cmd := exec.Command("pip", "uninstall", "-y", depName) + outputBytes, err := cmd.Output() + if err != nil { + log.Errorf(string(outputBytes)) + log.Errorf(err.Error()) + debug.PrintStack() + return fmt.Sprintf("error: %s", err.Error()), err + } + return string(outputBytes), nil +} + +// 获取Python远端依赖列表 +func UninstallPythonRemoteDep(nodeId string, depName string) (string, error) { + output, err := RpcClientUninstallDep(nodeId, constants.Python, depName) + if err != nil { + return output, err + } + return output, nil +} + +// ============== +// Node.js +// ============== + +func InstallNodejsLocalLang() (string, error) { + cmd := exec.Command("/bin/sh", path.Join("scripts", "install-nodejs.sh")) + output, err := cmd.Output() + if err != nil { + log.Error(err.Error()) + debug.PrintStack() + return string(output), err + } + + // TODO: check if Node.js is installed successfully + + return string(output), nil +} + +// 获取Node.js远端依赖列表 +func InstallNodejsRemoteLang(nodeId string) (string, error) { + output, err := RpcClientInstallLang(nodeId, constants.Nodejs) + if err != nil { + return output, err + } + return output, nil +} + +// 获取Nodejs本地已安装的依赖列表 +func GetNodejsLocalInstalledDepList(nodeId string) ([]entity.Dependency, error) { + var list []entity.Dependency + + lang := GetLangFromLangName(nodeId, constants.Nodejs) + if !IsInstalledLang(nodeId, lang) { + return list, errors.New("nodejs is not installed") + } + cmd := exec.Command("npm", "ls", "-g", "--depth", "0") + outputBytes, _ := cmd.Output() + //if err != nil { + // log.Error("error: " + string(outputBytes)) + // debug.PrintStack() + // return list, err + //} + + regex := regexp.MustCompile("\\s(.*)@(.*)") + for _, line := range strings.Split(string(outputBytes), "\n") { + arr := regex.FindStringSubmatch(line) + if len(arr) < 3 { + continue + } + dep := entity.Dependency{ + Name: strings.ToLower(arr[1]), + Version: arr[2], + Installed: true, + } + list = append(list, dep) + } + + return list, nil +} + +// 获取Nodejs远端依赖列表 +func GetNodejsRemoteInstalledDepList(nodeId string) ([]entity.Dependency, error) { + depList, err := RpcClientGetInstalledDepList(nodeId, constants.Nodejs) + if err != nil { + return depList, err + } + return depList, nil +} + +// 安装Nodejs本地依赖 +func InstallNodejsLocalDep(depName string) (string, error) { + // 依赖镜像URL + url := "https://registry.npm.taobao.org" + + cmd := exec.Command("npm", "install", depName, "-g", "--registry", url) + outputBytes, err := cmd.Output() + if err != nil { + log.Errorf(err.Error()) + debug.PrintStack() + return fmt.Sprintf("error: %s", err.Error()), err + } + return string(outputBytes), nil +} + +// 获取Nodejs远端依赖列表 +func InstallNodejsRemoteDep(nodeId string, depName string) (string, error) { + output, err := RpcClientInstallDep(nodeId, constants.Nodejs, depName) + if err != nil { + return output, err + } + return output, nil +} + +// 安装Nodejs本地依赖 +func UninstallNodejsLocalDep(depName string) (string, error) { + cmd := exec.Command("npm", "uninstall", depName, "-g") + outputBytes, err := cmd.Output() + if err != nil { + log.Errorf(err.Error()) + debug.PrintStack() + return fmt.Sprintf("error: %s", err.Error()), err + } + return string(outputBytes), nil +} + +// 获取Nodejs远端依赖列表 +func UninstallNodejsRemoteDep(nodeId string, depName string) (string, error) { + output, err := RpcClientUninstallDep(nodeId, constants.Nodejs, depName) + if err != nil { + return output, err + } + return output, nil +} + +// 获取Nodejs本地依赖列表 +func GetNodejsDepList(nodeId string, searchDepName string) (depList []entity.Dependency, err error) { + // 执行shell命令 + cmd := exec.Command("npm", "search", "--json", searchDepName) + outputBytes, _ := cmd.Output() + + // 获取已安装依赖列表 + var installedDepList []entity.Dependency + if IsMasterNode(nodeId) { + installedDepList, err = GetNodejsLocalInstalledDepList(nodeId) + if err != nil { + return depList, err + } + } else { + installedDepList, err = GetNodejsRemoteInstalledDepList(nodeId) + if err != nil { + return depList, err + } + } + + // 反序列化 + if err := json.Unmarshal(outputBytes, &depList); err != nil { + log.Errorf(err.Error()) + debug.PrintStack() + return depList, err + } + + // 遍历安装列表 + for i, dep := range depList { + depList[i].Installed = IsInstalledDep(installedDepList, dep) + } + + return depList, nil } diff --git a/backend/services/task.go b/backend/services/task.go index 9e6fdbc8..7da6b022 100644 --- a/backend/services/task.go +++ b/backend/services/task.go @@ -19,6 +19,7 @@ import ( "runtime" "runtime/debug" "strconv" + "strings" "sync" "syscall" "time" @@ -104,6 +105,17 @@ func AssignTask(task model.Task) error { // 设置环境变量 func SetEnv(cmd *exec.Cmd, envs []model.Env, taskId string, dataCol string) *exec.Cmd { + // 默认把Node.js的全局node_modules加入环境变量 + envPath := os.Getenv("PATH") + for _, _path := range strings.Split(envPath, ":") { + if strings.Contains(_path, "/.nvm/versions/node/") { + pathNodeModules := strings.Replace(_path, "/bin", "/lib/node_modules", -1) + _ = os.Setenv("PATH", pathNodeModules+":"+envPath) + _ = os.Setenv("NODE_PATH", pathNodeModules) + break + } + } + // 默认环境变量 cmd.Env = append(os.Environ(), "CRAWLAB_TASK_ID="+taskId) cmd.Env = append(cmd.Env, "CRAWLAB_COLLECTION="+dataCol) @@ -615,11 +627,15 @@ func AddTask(t model.Task) error { // 将任务存入数据库 if err := model.AddTask(t); err != nil { + log.Errorf(err.Error()) + debug.PrintStack() return err } // 加入任务队列 if err := AssignTask(t); err != nil { + log.Errorf(err.Error()) + debug.PrintStack() return err } diff --git a/docker-compose.yml b/docker-compose.yml index b4f36e86..5c059f95 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -8,12 +8,15 @@ services: CRAWLAB_SERVER_MASTER: "Y" # whether to be master node 是否为主节点,主节点为 Y,工作节点为 N CRAWLAB_MONGO_HOST: "mongo" # MongoDB host address MongoDB 的地址,在 docker compose 网络中,直接引用服务名称 CRAWLAB_REDIS_ADDRESS: "redis" # Redis host address Redis 的地址,在 docker compose 网络中,直接引用服务名称 + # CRAWLAB_SERVER_LANG_NODE: "Y" # 预安装 Node.js 语言环境 ports: - "8080:8080" # frontend port mapping 前端端口映射 - "8000:8000" # backend port mapping 后端端口映射 depends_on: - mongo - redis + volumes: + - "/Users/marvzhang/projects/crawlab-team/crawlab/docker_init.sh:/app/docker_init.sh" worker: image: tikazyq/crawlab:latest container_name: worker diff --git a/docker_init.sh b/docker_init.sh index 97c505dc..648634cd 100755 --- a/docker_init.sh +++ b/docker_init.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # replace default api path to new one if [ "${CRAWLAB_API_ADDRESS}" = "" ]; @@ -22,5 +22,12 @@ fi # start nginx service nginx start +# install languages: Node.js +if [ "${CRAWLAB_SERVER_LANG_NODE}" = "Y" ]; +then + echo "installing node.js" + /bin/sh /app/backend/scripts/install-nodejs.sh +fi + # start backend crawlab \ No newline at end of file diff --git a/frontend/index.html b/frontend/index.html index 2c943e7e..5066906e 100644 --- a/frontend/index.html +++ b/frontend/index.html @@ -6,6 +6,10 @@ + + + +