diff --git a/CHANGELOG.md b/CHANGELOG.md
index 0973aa8a..aa2682ce 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,6 +1,16 @@
-# 0.4.2 (unknown)
+# 0.4.2 (2019-12-26)
### Features / Enhancement
- **Disclaimer**. Added page for Disclaimer.
+- **Call API to fetch version**. [#371](https://github.com/crawlab-team/crawlab/issues/371)
+- **Configure to allow user registration**. [#346](https://github.com/crawlab-team/crawlab/issues/346)
+- **Allow adding new users**.
+- **More Advanced File Management**. Allow users to add / edit / rename / delete files. [#286](https://github.com/crawlab-team/crawlab/issues/286)
+- **Optimized Spider Creation Process**. Allow users to create an empty customized spider before uploading the zip file.
+- **Better Task Management**. Allow users to filter tasks by selecting through certian criterions. [#341](https://github.com/crawlab-team/crawlab/issues/341)
+
+### Bug Fixes
+- **Duplicated nodes**. [#391](https://github.com/crawlab-team/crawlab/issues/391)
+- **"mongodb no reachable" error**. [#373](https://github.com/crawlab-team/crawlab/issues/373)
# 0.4.1 (2019-12-13)
### Features / Enhancement
diff --git a/DISCLAIMER-zh.md b/DISCLAIMER-zh.md
index 6a333d0b..a329e4e9 100644
--- a/DISCLAIMER-zh.md
+++ b/DISCLAIMER-zh.md
@@ -1,6 +1,6 @@
# 免责声明
-本免责及隐私保护声明(以下简称“隐私声明”或“本声明”)适用于 Crawlab 开发组 (以下简称“开发组”)研发的系列软件(以下简称"Crawlab") 在您阅读本声明后若不同意此声明中的任何条款,或对本声明存在质疑,请立刻停止使用我们的软件。若您已经开始或正在使用 Crawlab,则表示您已阅读并同意本声明的所有条款之约定。
+本免责及隐私保护声明(以下简称“免责声明”或“本声明”)适用于 Crawlab 开发组 (以下简称“开发组”)研发的系列软件(以下简称"Crawlab") 在您阅读本声明后若不同意此声明中的任何条款,或对本声明存在质疑,请立刻停止使用我们的软件。若您已经开始或正在使用 Crawlab,则表示您已阅读并同意本声明的所有条款之约定。
1. 总则:您通过安装 Crawlab 并使用 Crawlab 提供的服务与功能即表示您已经同意与开发组立本协议。开发组可随时执行全权决定更改“条款”。经修订的“条款”一经在 Github 免责声明页面上公布后,立即自动生效。
2. 本产品是基于Golang的分布式爬虫管理平台,支持Python、NodeJS、Go、Java、PHP等多种编程语言以及多种爬虫框架。
diff --git a/DISCLAIMER.md b/DISCLAIMER.md
index 8f31739d..72aae961 100644
--- a/DISCLAIMER.md
+++ b/DISCLAIMER.md
@@ -1,6 +1,6 @@
# Disclaimer
-This Disclaimer and privacy protection statement (hereinafter referred to as "privacy statement" or "this statement") is applicable to the series of software (hereinafter referred to as "crawlab") developed by crawlab development group (hereinafter referred to as "development group") after you read this statement, if you do not agree with any terms in this statement or have doubts about this statement, please stop using our software immediately. If you have started or are using crawlab, you have read and agree to all terms of this statement.
+This Disclaimer and privacy protection statement (hereinafter referred to as "disclaimer statement" or "this statement") is applicable to the series of software (hereinafter referred to as "crawlab") developed by crawlab development group (hereinafter referred to as "development group") after you read this statement, if you do not agree with any terms in this statement or have doubts about this statement, please stop using our software immediately. If you have started or are using crawlab, you have read and agree to all terms of this statement.
1. General: by installing crawlab and using the services and functions provided by crawlab, you have agreed to establish this agreement with the development team. The developer group may at any time change the terms at its sole discretion. The amended "terms" shall take effect automatically as soon as they are published on the GitHub disclaimer page.
2. This product is a distributed crawler management platform based on golang, supporting python, nodejs, go, Java, PHP and other programming languages as well as a variety of crawler frameworks.
diff --git a/README-zh.md b/README-zh.md
index 0c943c3e..194720fc 100644
--- a/README-zh.md
+++ b/README-zh.md
@@ -254,6 +254,9 @@ Crawlab使用起来很方便,也很通用,可以适用于几乎任何主流
+
+
+
## 社区 & 赞助
diff --git a/README.md b/README.md
index 11ac8383..4c970cd8 100644
--- a/README.md
+++ b/README.md
@@ -219,6 +219,9 @@ Crawlab is easy to use, general enough to adapt spiders in any language and any
+
+
+
## Community & Sponsorship
diff --git a/backend/conf/config.yml b/backend/conf/config.yml
index 60d2bd41..5ada78f6 100644
--- a/backend/conf/config.yml
+++ b/backend/conf/config.yml
@@ -32,3 +32,6 @@ task:
workers: 4
other:
tmppath: "/tmp"
+version: 0.4.2
+setting:
+ allowRegister: "N"
\ No newline at end of file
diff --git a/backend/database/pubsub.go b/backend/database/pubsub.go
index 7f647cda..444ce91a 100644
--- a/backend/database/pubsub.go
+++ b/backend/database/pubsub.go
@@ -58,9 +58,9 @@ func (r *Redis) subscribe(ctx context.Context, consume ConsumeFunc, channel ...s
}
done <- nil
case <-tick.C:
- //fmt.Printf("ping message \n")
if err := psc.Ping(""); err != nil {
- done <- err
+ fmt.Printf("ping message error: %s \n", err)
+ //done <- err
}
case err := <-done:
close(done)
diff --git a/backend/main.go b/backend/main.go
index 0d7b7cc1..955fb77c 100644
--- a/backend/main.go
+++ b/backend/main.go
@@ -114,9 +114,9 @@ func main() {
app.Use(middlewares.CORSMiddleware())
anonymousGroup := app.Group("/")
{
- anonymousGroup.POST("/login", routes.Login) // 用户登录
- anonymousGroup.PUT("/users", routes.PutUser) // 添加用户
-
+ anonymousGroup.POST("/login", routes.Login) // 用户登录
+ anonymousGroup.PUT("/users", routes.PutUser) // 添加用户
+ anonymousGroup.GET("/setting", routes.GetSetting) // 获取配置信息
}
authGroup := app.Group("/", middlewares.AuthorizationMiddleware())
{
@@ -129,18 +129,24 @@ func main() {
authGroup.GET("/nodes/:id/system", routes.GetSystemInfo) // 节点任务列表
authGroup.DELETE("/nodes/:id", routes.DeleteNode) // 删除节点
// 爬虫
- authGroup.GET("/spiders", routes.GetSpiderList) // 爬虫列表
- authGroup.GET("/spiders/:id", routes.GetSpider) // 爬虫详情
- authGroup.POST("/spiders", routes.PutSpider) // 上传爬虫 TODO: 名称不对
- authGroup.POST("/spiders/:id", routes.PostSpider) // 修改爬虫
- authGroup.POST("/spiders/:id/publish", routes.PublishSpider) // 发布爬虫
- authGroup.DELETE("/spiders/:id", routes.DeleteSpider) // 删除爬虫
- authGroup.GET("/spiders/:id/tasks", routes.GetSpiderTasks) // 爬虫任务列表
- authGroup.GET("/spiders/:id/file", routes.GetSpiderFile) // 爬虫文件读取
- authGroup.POST("/spiders/:id/file", routes.PostSpiderFile) // 爬虫目录写入
- authGroup.GET("/spiders/:id/dir", routes.GetSpiderDir) // 爬虫目录
- authGroup.GET("/spiders/:id/stats", routes.GetSpiderStats) // 爬虫统计数据
- authGroup.GET("/spider/types", routes.GetSpiderTypes) // 爬虫类型
+ authGroup.GET("/spiders", routes.GetSpiderList) // 爬虫列表
+ authGroup.GET("/spiders/:id", routes.GetSpider) // 爬虫详情
+ authGroup.PUT("/spiders", routes.PutSpider) // 添加爬虫
+ authGroup.POST("/spiders", routes.UploadSpider) // 上传爬虫
+ authGroup.POST("/spiders/:id", routes.PostSpider) // 修改爬虫
+ authGroup.POST("/spiders/:id/publish", routes.PublishSpider) // 发布爬虫
+ authGroup.POST("/spiders/:id/upload", routes.UploadSpiderFromId) // 上传爬虫(ID)
+ authGroup.DELETE("/spiders/:id", routes.DeleteSpider) // 删除爬虫
+ authGroup.GET("/spiders/:id/tasks", routes.GetSpiderTasks) // 爬虫任务列表
+ authGroup.GET("/spiders/:id/file", routes.GetSpiderFile) // 爬虫文件读取
+ authGroup.POST("/spiders/:id/file", routes.PostSpiderFile) // 爬虫文件更改
+ authGroup.PUT("/spiders/:id/file", routes.PutSpiderFile) // 爬虫文件创建
+ authGroup.PUT("/spiders/:id/dir", routes.PutSpiderDir) // 爬虫目录创建
+ authGroup.DELETE("/spiders/:id/file", routes.DeleteSpiderFile) // 爬虫文件删除
+ authGroup.POST("/spiders/:id/file/rename", routes.RenameSpiderFile) // 爬虫文件重命名
+ authGroup.GET("/spiders/:id/dir", routes.GetSpiderDir) // 爬虫目录
+ authGroup.GET("/spiders/:id/stats", routes.GetSpiderStats) // 爬虫统计数据
+ authGroup.GET("/spider/types", routes.GetSpiderTypes) // 爬虫类型
// 可配置爬虫
authGroup.GET("/config_spiders/:id/config", routes.GetConfigSpiderConfig) // 获取可配置爬虫配置
authGroup.POST("/config_spiders/:id/config", routes.PostConfigSpiderConfig) // 更改可配置爬虫配置
@@ -176,6 +182,8 @@ func main() {
authGroup.POST("/users/:id", routes.PostUser) // 更改用户
authGroup.DELETE("/users/:id", routes.DeleteUser) // 删除用户
authGroup.GET("/me", routes.GetMe) // 获取自己账户
+ // release版本
+ authGroup.GET("/version", routes.GetVersion) // 获取发布的版本
}
}
diff --git a/backend/model/node.go b/backend/model/node.go
index 2fe810f8..effbfbd0 100644
--- a/backend/model/node.go
+++ b/backend/model/node.go
@@ -63,7 +63,9 @@ func GetCurrentNode() (Node, error) {
// 如果获取失败
if err != nil {
// 如果为主节点,表示为第一次注册,插入节点信息
- if IsMaster() {
+ // update: 增加具体错误过滤。防止加入多个master节点,后续需要职责拆分,
+ //只在master节点运行的时候才检测master节点的信息是否存在
+ if IsMaster() && err == mgo.ErrNotFound {
// 获取本机信息
ip, mac, key, err := GetNodeBaseInfo()
if err != nil {
diff --git a/backend/model/spider.go b/backend/model/spider.go
index 78adc4d0..02c3aa8d 100644
--- a/backend/model/spider.go
+++ b/backend/model/spider.go
@@ -157,15 +157,15 @@ func GetSpiderByFileId(fileId bson.ObjectId) *Spider {
}
// 获取爬虫(根据名称)
-func GetSpiderByName(name string) *Spider {
+func GetSpiderByName(name string) Spider {
s, c := database.GetCol("spiders")
defer s.Close()
- var result *Spider
+ var result Spider
if err := c.Find(bson.M{"name": name}).One(&result); err != nil {
log.Errorf("get spider error: %s, spider_name: %s", err.Error(), name)
//debug.PrintStack()
- return nil
+ return result
}
return result
}
diff --git a/backend/routes/config_spider.go b/backend/routes/config_spider.go
index e387935a..ac6a11e0 100644
--- a/backend/routes/config_spider.go
+++ b/backend/routes/config_spider.go
@@ -40,7 +40,7 @@ func PutConfigSpider(c *gin.Context) {
}
// 判断爬虫是否存在
- if spider := model.GetSpiderByName(spider.Name); spider != nil {
+ if spider := model.GetSpiderByName(spider.Name); spider.Name != "" {
HandleErrorF(http.StatusBadRequest, c, fmt.Sprintf("spider for '%s' already exists", spider.Name))
return
}
diff --git a/backend/routes/setting.go b/backend/routes/setting.go
new file mode 100644
index 00000000..4429873e
--- /dev/null
+++ b/backend/routes/setting.go
@@ -0,0 +1,33 @@
+package routes
+
+import (
+ "github.com/gin-gonic/gin"
+ "github.com/spf13/viper"
+ "net/http"
+)
+
+type SettingBody struct {
+ AllowRegister string `json:"allow_register"`
+}
+
+func GetVersion(c *gin.Context) {
+ version := viper.GetString("version")
+
+ c.JSON(http.StatusOK, Response{
+ Status: "ok",
+ Message: "success",
+ Data: version,
+ })
+}
+
+func GetSetting(c *gin.Context) {
+ allowRegister := viper.GetString("setting.allowRegister")
+
+ body := SettingBody{AllowRegister: allowRegister}
+
+ c.JSON(http.StatusOK, Response{
+ Status: "ok",
+ Message: "success",
+ Data: body,
+ })
+}
diff --git a/backend/routes/spider.go b/backend/routes/spider.go
index 588811e3..1ca45f05 100644
--- a/backend/routes/spider.go
+++ b/backend/routes/spider.go
@@ -7,6 +7,7 @@ import (
"crawlab/model"
"crawlab/services"
"crawlab/utils"
+ "fmt"
"github.com/apex/log"
"github.com/gin-gonic/gin"
"github.com/globalsign/mgo"
@@ -17,6 +18,7 @@ import (
"io/ioutil"
"net/http"
"os"
+ "path"
"path/filepath"
"runtime/debug"
"strconv"
@@ -117,6 +119,64 @@ func PublishSpider(c *gin.Context) {
}
func PutSpider(c *gin.Context) {
+ var spider model.Spider
+ if err := c.ShouldBindJSON(&spider); err != nil {
+ HandleError(http.StatusBadRequest, c, err)
+ return
+ }
+
+ // 爬虫名称不能为空
+ if spider.Name == "" {
+ HandleErrorF(http.StatusBadRequest, c, "spider name should not be empty")
+ return
+ }
+
+ // 判断爬虫是否存在
+ if spider := model.GetSpiderByName(spider.Name); spider.Name != "" {
+ HandleErrorF(http.StatusBadRequest, c, fmt.Sprintf("spider for '%s' already exists", spider.Name))
+ return
+ }
+
+ // 设置爬虫类别
+ spider.Type = constants.Customized
+
+ // 将FileId置空
+ spider.FileId = bson.ObjectIdHex(constants.ObjectIdNull)
+
+ // 创建爬虫目录
+ spiderDir := filepath.Join(viper.GetString("spider.path"), spider.Name)
+ if utils.Exists(spiderDir) {
+ if err := os.RemoveAll(spiderDir); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+ }
+ if err := os.MkdirAll(spiderDir, 0777); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+ spider.Src = spiderDir
+
+ // 添加爬虫到数据库
+ if err := spider.Add(); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ // 同步到GridFS
+ if err := services.UploadSpiderToGridFsFromMaster(spider); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ c.JSON(http.StatusOK, Response{
+ Status: "ok",
+ Message: "success",
+ Data: spider,
+ })
+}
+
+func UploadSpider(c *gin.Context) {
// 从body中获取文件
uploadFile, err := c.FormFile("file")
if err != nil {
@@ -178,7 +238,7 @@ func PutSpider(c *gin.Context) {
// 判断爬虫是否存在
spiderName := strings.Replace(targetFilename, ".zip", "", 1)
spider := model.GetSpiderByName(spiderName)
- if spider == nil {
+ if spider.Name == "" {
// 保存爬虫信息
srcPath := viper.GetString("spider.path")
spider := model.Spider{
@@ -195,6 +255,96 @@ func PutSpider(c *gin.Context) {
_ = spider.Save()
}
+ // 发起同步
+ services.PublishAllSpiders()
+
+ // 获取爬虫
+ spider = model.GetSpiderByName(spiderName)
+
+ c.JSON(http.StatusOK, Response{
+ Status: "ok",
+ Message: "success",
+ Data: spider,
+ })
+}
+
+func UploadSpiderFromId(c *gin.Context) {
+ // TODO: 与 UploadSpider 部分逻辑重复,需要优化代码
+ // 爬虫ID
+ spiderId := c.Param("id")
+
+ // 获取爬虫
+ spider, err := model.GetSpider(bson.ObjectIdHex(spiderId))
+ if err != nil {
+ if err == mgo.ErrNotFound {
+ HandleErrorF(http.StatusNotFound, c, "cannot find spider")
+ } else {
+ HandleError(http.StatusInternalServerError, c, err)
+ }
+ return
+ }
+
+ // 从body中获取文件
+ uploadFile, err := c.FormFile("file")
+ if err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ // 如果不为zip文件,返回错误
+ if !strings.HasSuffix(uploadFile.Filename, ".zip") {
+ debug.PrintStack()
+ HandleError(http.StatusBadRequest, c, errors.New("Not a valid zip file"))
+ return
+ }
+
+ // 以防tmp目录不存在
+ tmpPath := viper.GetString("other.tmppath")
+ if !utils.Exists(tmpPath) {
+ if err := os.MkdirAll(tmpPath, os.ModePerm); err != nil {
+ log.Error("mkdir other.tmppath dir error:" + err.Error())
+ debug.PrintStack()
+ HandleError(http.StatusBadRequest, c, errors.New("Mkdir other.tmppath dir error"))
+ return
+ }
+ }
+
+ // 保存到本地临时文件
+ randomId := uuid.NewV4()
+ tmpFilePath := filepath.Join(tmpPath, randomId.String()+".zip")
+ if err := c.SaveUploadedFile(uploadFile, tmpFilePath); err != nil {
+ log.Error("save upload file error: " + err.Error())
+ debug.PrintStack()
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ // 获取 GridFS 实例
+ s, gf := database.GetGridFs("files")
+ defer s.Close()
+
+ // 判断文件是否已经存在
+ var gfFile model.GridFs
+ if err := gf.Find(bson.M{"filename": uploadFile.Filename}).One(&gfFile); err == nil {
+ // 已经存在文件,则删除
+ _ = gf.RemoveId(gfFile.Id)
+ }
+
+ // 上传到GridFs
+ fid, err := services.UploadToGridFs(uploadFile.Filename, tmpFilePath)
+ if err != nil {
+ log.Errorf("upload to grid fs error: %s", err.Error())
+ debug.PrintStack()
+ return
+ }
+
+ // 更新file_id
+ spider.FileId = fid
+ _ = spider.Save()
+
+ // 发起同步
+ services.PublishSpider(spider)
+
c.JSON(http.StatusOK, Response{
Status: "ok",
Message: "success",
@@ -283,6 +433,14 @@ func GetSpiderDir(c *gin.Context) {
})
}
+// 爬虫文件管理
+
+type SpiderFileReqBody struct {
+ Path string `json:"path"`
+ Content string `json:"content"`
+ NewPath string `json:"new_path"`
+}
+
func GetSpiderFile(c *gin.Context) {
// 爬虫ID
id := c.Param("id")
@@ -311,11 +469,6 @@ func GetSpiderFile(c *gin.Context) {
})
}
-type SpiderFileReqBody struct {
- Path string `json:"path"`
- Content string `json:"content"`
-}
-
func PostSpiderFile(c *gin.Context) {
// 爬虫ID
id := c.Param("id")
@@ -340,6 +493,12 @@ func PostSpiderFile(c *gin.Context) {
return
}
+ // 同步到GridFS
+ if err := services.UploadSpiderToGridFsFromMaster(spider); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
// 返回结果
c.JSON(http.StatusOK, Response{
Status: "ok",
@@ -347,6 +506,161 @@ func PostSpiderFile(c *gin.Context) {
})
}
+func PutSpiderFile(c *gin.Context) {
+ spiderId := c.Param("id")
+ var reqBody SpiderFileReqBody
+ if err := c.ShouldBindJSON(&reqBody); err != nil {
+ HandleError(http.StatusBadRequest, c, err)
+ return
+ }
+ spider, err := model.GetSpider(bson.ObjectIdHex(spiderId))
+ if err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ // 文件路径
+ filePath := path.Join(spider.Src, reqBody.Path)
+
+ // 如果文件已存在,则报错
+ if utils.Exists(filePath) {
+ HandleErrorF(http.StatusInternalServerError, c, fmt.Sprintf(`%s already exists`, filePath))
+ return
+ }
+
+ // 写入文件
+ if err := ioutil.WriteFile(filePath, []byte(reqBody.Content), 0777); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ // 同步到GridFS
+ if err := services.UploadSpiderToGridFsFromMaster(spider); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ c.JSON(http.StatusOK, Response{
+ Status: "ok",
+ Message: "success",
+ })
+}
+
+func PutSpiderDir(c *gin.Context) {
+ spiderId := c.Param("id")
+ var reqBody SpiderFileReqBody
+ if err := c.ShouldBindJSON(&reqBody); err != nil {
+ HandleError(http.StatusBadRequest, c, err)
+ return
+ }
+ spider, err := model.GetSpider(bson.ObjectIdHex(spiderId))
+ if err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ // 文件路径
+ filePath := path.Join(spider.Src, reqBody.Path)
+
+ // 如果文件已存在,则报错
+ if utils.Exists(filePath) {
+ HandleErrorF(http.StatusInternalServerError, c, fmt.Sprintf(`%s already exists`, filePath))
+ return
+ }
+
+ // 创建文件夹
+ if err := os.MkdirAll(filePath, 0777); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ // 同步到GridFS
+ if err := services.UploadSpiderToGridFsFromMaster(spider); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ c.JSON(http.StatusOK, Response{
+ Status: "ok",
+ Message: "success",
+ })
+}
+
+func DeleteSpiderFile(c *gin.Context) {
+ spiderId := c.Param("id")
+ var reqBody SpiderFileReqBody
+ if err := c.ShouldBindJSON(&reqBody); err != nil {
+ HandleError(http.StatusBadRequest, c, err)
+ return
+ }
+ spider, err := model.GetSpider(bson.ObjectIdHex(spiderId))
+ if err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+ filePath := path.Join(spider.Src, reqBody.Path)
+ if err := os.RemoveAll(filePath); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ // 同步到GridFS
+ if err := services.UploadSpiderToGridFsFromMaster(spider); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ c.JSON(http.StatusOK, Response{
+ Status: "ok",
+ Message: "success",
+ })
+}
+
+func RenameSpiderFile(c *gin.Context) {
+ spiderId := c.Param("id")
+ var reqBody SpiderFileReqBody
+ if err := c.ShouldBindJSON(&reqBody); err != nil {
+ HandleError(http.StatusBadRequest, c, err)
+ }
+ spider, err := model.GetSpider(bson.ObjectIdHex(spiderId))
+ if err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ // 原文件路径
+ filePath := path.Join(spider.Src, reqBody.Path)
+ newFilePath := path.Join(spider.Src, reqBody.NewPath)
+
+ // 如果新文件已存在,则报错
+ if utils.Exists(newFilePath) {
+ HandleErrorF(http.StatusInternalServerError, c, fmt.Sprintf(`%s already exists`, newFilePath))
+ return
+ }
+
+ // 重命名
+ if err := os.Rename(filePath, newFilePath); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ // 删除原文件
+ if err := os.RemoveAll(filePath); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ }
+
+ // 同步到GridFS
+ if err := services.UploadSpiderToGridFsFromMaster(spider); err != nil {
+ HandleError(http.StatusInternalServerError, c, err)
+ return
+ }
+
+ c.JSON(http.StatusOK, Response{
+ Status: "ok",
+ Message: "success",
+ })
+}
+
// 爬虫类型
func GetSpiderTypes(c *gin.Context) {
types, err := model.GetSpiderTypes()
diff --git a/backend/routes/user.go b/backend/routes/user.go
index a6d44cae..33b6a958 100644
--- a/backend/routes/user.go
+++ b/backend/routes/user.go
@@ -21,6 +21,7 @@ type UserListRequestData struct {
type UserRequestData struct {
Username string `json:"username"`
Password string `json:"password"`
+ Role string `json:"role"`
}
func GetUser(c *gin.Context) {
@@ -88,11 +89,16 @@ func PutUser(c *gin.Context) {
return
}
+ // 默认为正常用户
+ if reqData.Role == "" {
+ reqData.Role = constants.RoleNormal
+ }
+
// 添加用户
user := model.User{
Username: strings.ToLower(reqData.Username),
Password: utils.EncryptPassword(reqData.Password),
- Role: constants.RoleNormal,
+ Role: reqData.Role,
}
if err := user.Add(); err != nil {
HandleError(http.StatusInternalServerError, c, err)
diff --git a/backend/services/node.go b/backend/services/node.go
index e6c2ac08..d14ce4ae 100644
--- a/backend/services/node.go
+++ b/backend/services/node.go
@@ -167,27 +167,34 @@ func UpdateNodeData() {
debug.PrintStack()
return
}
- // 构造节点数据
- data := Data{
- Key: key,
- Mac: mac,
- Ip: ip,
- Master: model.IsMaster(),
- UpdateTs: time.Now(),
- UpdateTsUnix: time.Now().Unix(),
+
+ //先获取所有Redis的nodekey
+ list, _ := database.RedisClient.HKeys("nodes")
+
+ if i := utils.Contains(list, key); i == false {
+ // 构造节点数据
+ data := Data{
+ Key: key,
+ Mac: mac,
+ Ip: ip,
+ Master: model.IsMaster(),
+ UpdateTs: time.Now(),
+ UpdateTsUnix: time.Now().Unix(),
+ }
+
+ // 注册节点到Redis
+ dataBytes, err := json.Marshal(&data)
+ if err != nil {
+ log.Errorf(err.Error())
+ debug.PrintStack()
+ return
+ }
+ if err := database.RedisClient.HSet("nodes", key, utils.BytesToString(dataBytes)); err != nil {
+ log.Errorf(err.Error())
+ return
+ }
}
- // 注册节点到Redis
- dataBytes, err := json.Marshal(&data)
- if err != nil {
- log.Errorf(err.Error())
- debug.PrintStack()
- return
- }
- if err := database.RedisClient.HSet("nodes", key, utils.BytesToString(dataBytes)); err != nil {
- log.Errorf(err.Error())
- return
- }
}
func MasterNodeCallback(message redis.Message) (err error) {
diff --git a/backend/services/spider.go b/backend/services/spider.go
index 3922d822..3515afa9 100644
--- a/backend/services/spider.go
+++ b/backend/services/spider.go
@@ -12,6 +12,7 @@ import (
"github.com/apex/log"
"github.com/globalsign/mgo"
"github.com/globalsign/mgo/bson"
+ uuid "github.com/satori/go.uuid"
"github.com/spf13/viper"
"os"
"path/filepath"
@@ -30,6 +31,48 @@ type SpiderUploadMessage struct {
SpiderId string
}
+// 从主节点上传爬虫到GridFS
+func UploadSpiderToGridFsFromMaster(spider model.Spider) error {
+ // 爬虫所在目录
+ spiderDir := spider.Src
+
+ // 打包为 zip 文件
+ files, err := utils.GetFilesFromDir(spiderDir)
+ if err != nil {
+ return err
+ }
+ randomId := uuid.NewV4()
+ tmpFilePath := filepath.Join(viper.GetString("other.tmppath"), spider.Name+"."+randomId.String()+".zip")
+ spiderZipFileName := spider.Name + ".zip"
+ if err := utils.Compress(files, tmpFilePath); err != nil {
+ return err
+ }
+
+ // 获取 GridFS 实例
+ s, gf := database.GetGridFs("files")
+ defer s.Close()
+
+ // 判断文件是否已经存在
+ var gfFile model.GridFs
+ if err := gf.Find(bson.M{"filename": spiderZipFileName}).One(&gfFile); err == nil {
+ // 已经存在文件,则删除
+ _ = gf.RemoveId(gfFile.Id)
+ }
+
+ // 上传到GridFs
+ fid, err := UploadToGridFs(spiderZipFileName, tmpFilePath)
+ if err != nil {
+ log.Errorf("upload to grid fs error: %s", err.Error())
+ return err
+ }
+
+ // 保存爬虫 FileId
+ spider.FileId = fid
+ _ = spider.Save()
+
+ return nil
+}
+
// 上传zip文件到GridFS
func UploadToGridFs(fileName string, filePath string) (fid bson.ObjectId, err error) {
fid = ""
diff --git a/backend/utils/helpers.go b/backend/utils/helpers.go
index 8a80e9e8..e181c66c 100644
--- a/backend/utils/helpers.go
+++ b/backend/utils/helpers.go
@@ -6,6 +6,7 @@ import (
"github.com/apex/log"
"github.com/gomodule/redigo/redis"
"io"
+ "reflect"
"runtime/debug"
"unsafe"
)
@@ -40,3 +41,20 @@ func Close(c io.Closer) {
//log.WithError(err).Error("关闭资源文件失败。")
}
}
+
+func Contains(array interface{}, val interface{}) (fla bool) {
+ fla = false
+ switch reflect.TypeOf(array).Kind() {
+ case reflect.Slice:
+ {
+ s := reflect.ValueOf(array)
+ for i := 0; i < s.Len(); i++ {
+ if reflect.DeepEqual(val, s.Index(i).Interface()) {
+ fla = true
+ return
+ }
+ }
+ }
+ }
+ return
+}
diff --git a/docker-compose.yml b/docker-compose.yml
index bea50fb1..b4f36e86 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -6,25 +6,25 @@ services:
environment:
CRAWLAB_API_ADDRESS: "http://localhost:8000" # backend API address 后端 API 地址,设置为 http://<宿主机IP>:<端口>,端口为映射出来的端口
CRAWLAB_SERVER_MASTER: "Y" # whether to be master node 是否为主节点,主节点为 Y,工作节点为 N
- CRAWLAB_MONGO_HOST: "mongo1" # MongoDB host address MongoDB 的地址,在 docker compose 网络中,直接引用服务名称
+ CRAWLAB_MONGO_HOST: "mongo" # MongoDB host address MongoDB 的地址,在 docker compose 网络中,直接引用服务名称
CRAWLAB_REDIS_ADDRESS: "redis" # Redis host address Redis 的地址,在 docker compose 网络中,直接引用服务名称
ports:
- "8080:8080" # frontend port mapping 前端端口映射
- "8000:8000" # backend port mapping 后端端口映射
depends_on:
- - mongo1
+ - mongo
- redis
worker:
image: tikazyq/crawlab:latest
container_name: worker
environment:
CRAWLAB_SERVER_MASTER: "N"
- CRAWLAB_MONGO_HOST: "mongo1"
+ CRAWLAB_MONGO_HOST: "mongo"
CRAWLAB_REDIS_ADDRESS: "redis"
depends_on:
- - mongo1
+ - mongo
- redis
- mongo1:
+ mongo:
image: mongo:latest
restart: always
# volumes:
diff --git a/frontend/package.json b/frontend/package.json
index d11b503b..32432b8f 100644
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -1,6 +1,6 @@
{
"name": "crawlab",
- "version": "0.4.1",
+ "version": "0.4.2",
"private": true,
"scripts": {
"serve": "vue-cli-service serve --ip=0.0.0.0 --mode=development",
diff --git a/frontend/src/App.vue b/frontend/src/App.vue
index 2a91e61a..a7ba8069 100644
--- a/frontend/src/App.vue
+++ b/frontend/src/App.vue
@@ -6,6 +6,9 @@
diff --git a/frontend/src/views/login/index.vue b/frontend/src/views/login/index.vue
index a21c0f42..664d05e5 100644
--- a/frontend/src/views/login/index.vue
+++ b/frontend/src/views/login/index.vue
@@ -48,7 +48,7 @@
{{$t('You can click "Add" to create an empty spider and upload files later.')}}
+{{$t('OR, you can also click "Upload" and upload a zip file containing your spider project.')}}
++ {{$t('NOTE: When uploading a zip file, please zip your' + + ' spider files from the ROOT DIRECTORY.')}} +
+