Commit Graph

6176 Commits

Author SHA1 Message Date
Marvin Zhang
ee11cd78ec test: add unit tests for IgnoreFileRegexPattern to validate ignored paths 2025-12-03 15:45:41 +08:00
Marvin Zhang
aba7be3b86 fix: update IgnoreFileRegexPattern to exclude .git directory 2025-12-03 15:45:18 +08:00
Marvin Zhang
034fbf1a84 fix(git): include path parameter in listDir dispatch for directory listing 2025-12-03 15:39:03 +08:00
Marvin Zhang
6085ed0db0 fix(git): update listDir dispatch to include path parameter 2025-12-02 17:05:22 +08:00
Marvin Zhang
b2ff8baed8 fix(spider): update node selection to use active nodes instead of all nodes
fix(spider): optimize form update logic to watch specific fields for changes
fix(grpc): adjust sync request ID handling for git and regular spiders
2025-12-02 11:42:29 +08:00
Marvin Zhang
97ab39119c feat(specs): add detailed documentation for gRPC file sync migration and release 0.7.0
- Introduced README.md for the file sync issue after gRPC migration, outlining the problem, root cause, and proposed solutions.
- Added release notes for Crawlab 0.7.0 highlighting community features and improvements.
- Created a README.md for the specs directory to provide an overview and usage instructions for LeanSpec.
2025-11-10 14:07:36 +08:00
Marvin Zhang
18c5eb3956 fix: replace string slicing with filepath.Dir() in gRPC file sync
- Fix directory path calculation bug in downloadFileGRPC()
- Bug caused nested directory creation to fail (e.g., crawlab_project/spiders/)
- String slicing incorrectly truncated paths mid-character
- Now uses filepath.Dir() for correct parent directory extraction
- Fixes 'no such file or directory' errors during worker file sync
- Resolves spider task failures on worker nodes after gRPC migration

Validated by: REL-004, REL-005 test cases
2025-10-30 15:22:53 +08:00
Marvin Zhang
bd99899182 fix(core): default to _id descending sort when sort is nil or parsing fails
Ensure MustGetSortOption returns a default bson.D{{"_id", -1}} on parse errors and
GetPaginationPipeline uses {_id: -1} when sort is nil to provide consistent default ordering.
2025-10-28 11:55:00 +08:00
Marvin Zhang
851097dc59 fix(node/service): implement worker client wrapper using local TaskServiceServer
Implement createWorkerClient to return a workerTaskClient when a node has an active gRPC stream.
Add workerTaskClient that wraps the server.TaskServiceServer, stubs other client RPCs and forwards
CheckProcess calls directly to server.CheckProcess for querying worker process status.
2025-10-27 16:26:32 +08:00
Marvin Zhang
b8e62c7b6b fix(node/master): add failure counter and grace period to node health checks; increase monitor interval 2025-10-23 14:49:30 +08:00
Marvin Zhang
ef70312430 fix(grpc/server): add keepalive enforcement and params to match client
Configure server-side keepalive (EnforcementPolicy and ServerParameters) to align with client settings and prevent connection timeouts after network disconnection/reconnection.
2025-10-23 10:58:22 +08:00
Marvin Zhang
893cb3cb8a fix(grpc/server): mark node active on Subscribe and notify on status change
Optimistically mark node as online/active and persist ActiveAt when a node
subscribes, logging success or warning on failure. Send a node notification
in Pro mode if the status changed. Also tidy import ordering.
2025-10-22 22:03:16 +08:00
Marvin Zhang
18fc84afb7 fix(grpc/client): clear reconnecting on failure and requeue reconnection after backoff
Ensure the reconnecting flag is reset on failed attempts so subsequent retries can proceed,
and explicitly trigger a reconnection attempt after the backoff period to keep retrying recovery.
2025-10-21 22:03:17 +08:00
Marvin Zhang
138bed5c05 fix(grpc/client): wait for full reconnection readiness before clearing reconnecting flag
- add maxReconnectionWait and reconnectionCheckInterval constants for reconnection readiness polling
- introduce waitForFullReconnectionReady() to verify: connection READY, clients registered, and ability to obtain critical service clients (model/task) within short timeouts
- ensure reconnecting flag is cleared immediately on reconnection failure and only cleared after full readiness checks on success
- improve logging around reconnection stabilization and readiness checks
2025-10-21 21:26:57 +08:00
Marvin Zhang
ba6d989c7e fix(controllers/health): return after responding OK to avoid falling through; tidy imports 2025-10-20 16:34:55 +08:00
Marvin Zhang
ec3dd2d077 fix(grpc/client): protect GetGrpcClient with _clientMux lock to avoid race during singleton init 2025-10-20 13:43:55 +08:00
Marvin Zhang
2dfc66743b fix(grpc/client,node/task/handler): add RetryWithBackoff, stabilize reconnection, and retry gRPC ops
- add RetryWithBackoff helper to grpc client for exponential retry with backoff and reconnection-aware handling
- increase reconnectionClientTimeout to 90s and introduce connectionStabilizationDelay; wait briefly after reconnection to avoid immediate flapping
- refresh reconnection flag while waiting for client registration and improve cancellation message
- replace direct heartbeat RPC with RetryWithBackoff in WorkerService (use extended timeout)
- use RetryWithBackoff for worker node status updates in task handler and propagate errors
2025-10-20 13:01:10 +08:00
Marvin Zhang
f441265cc2 feat(sync): add gRPC file synchronization service and integrate end-to-end
- add proto/services/sync_service.proto and generate Go pb + grpc bindings
- implement SyncServiceServer (streaming file scan + download) with:
  - request deduplication, in-memory cache (TTL), chunked streaming
  - concurrent-safe broadcast to waiters and server-side logging
- register SyncSvr in gRPC server and expose sync client in GrpcClient:
  - add syncClient field, registration and safe getters with reconnection-aware timeouts
- integrate gRPC sync into runner:
  - split syncFiles into syncFilesHTTP (legacy) and syncFilesGRPC
  - Runner now chooses implementation via config flag and performs streaming scan/download
- controller improvements:
  - add semaphore-based rate limiting for sync scan requests with in-flight counters and logs
- misc:
  - add utils.IsSyncGrpcEnabled() config helper
  - improve HTTP sync error diagnostics (Content-Type validation, response previews)
  - update/regenerate many protobuf and gRPC generated files (protoc/protoc-gen-go / protoc-gen-go-grpc version bumps)
2025-10-20 12:48:53 +08:00
Marvin Zhang
61604e1817 fix(task/handler): ensure latest gRPC client is used for task fetch/subscribe
Add svc.getGrpcClient() helper and use it when obtaining TaskClient so task fetch and
subscribe operations don't hold a stale client instance after ResetGrpcClient().
2025-10-20 12:22:34 +08:00
Marvin Zhang
4baa5fad59 fix(grpc/client): trigger reconnection on bad conn state and improve connection logging
- Trigger reconnection proactively from Get*WithTimeout when underlying connection is in
  SHUTDOWN or TRANSIENT_FAILURE to avoid returning stale/unusable clients.
- Add debug/info logs around client registration, connection attempts, closing existing
  connections, connection initiation, reconnection start, backoff retry and successful
  reconnection (including current state and registration status).
- Surface more context in reconnection and connection logs to aid diagnostics.
2025-10-20 11:34:41 +08:00
Marvin Zhang
6020fef30b chore(node): add timing logs and improve node status diagnostics
- master: add TIMING logs in setWorkerNodeOnline to mark start and completed DB update
- handler: log node status for reconnection debugging and include active/enabled values in "node not active or enabled" error
2025-10-20 11:14:55 +08:00
Marvin Zhang
49165b2165 refactor(node): reorganize task reconciliation, prioritize worker cache, add periodic cleanup
- Move and document reconciliation constants and add sectioned organization/comments.
- Split large monolithic logic into smaller functions:
  - reconcileDisconnectedTasks / reconcileDisconnectedTask
  - reconcileAbandonedAssignedTasks
  - reconcileStalePendingTasks / handleStalePendingTask
  - getActualTaskStatus / getStatusFromWorkerCache / triggerWorkerStatusSync
  - queryProcessStatus / requestProcessStatusFromWorker / mapProcessStatusToTaskStatus
  - findTasksByStatus / markTaskDisconnected / findAvailableNodeForTask
  - updateTaskStatus / saveTask / shouldMarkTaskAbnormal / markTaskAbnormal
- Add periodic background workers:
  - StartPeriodicReconciliation -> runPeriodicReconciliation to reconcile running/disconnected tasks
  - runPeriodicAssignedTaskCleanup -> cleanupStuckAssignedTasks to detect and recover stuck assigned tasks
- Prioritize worker-side cached status and attempt sync from task runner before querying worker processes.
- Introduce a placeholder createWorkerClient for future gRPC worker discovery/invocation.
- Replace ad-hoc DB updates with saveTask using retry/backoff and centralize status update logic.
- Improve logging and error messages, and tighten conditions for marking tasks abnormal.

This refactor clarifies responsibilities, improves reliability of status updates, and prepares the codebase for future worker gRPC integration.
2025-10-20 10:54:32 +08:00
Marvin Zhang
883c954b4e feat: update Dockerfile to include 'go mod tidy' before installation 2025-10-09 12:39:12 +08:00
Marvin Zhang
44fd0809e6 feat: disable backend unit tests and document reasons for integration test requirements 2025-10-09 12:35:09 +08:00
Marvin Zhang
5bff8823a8 feat: update test workflows to skip API tests and document controller test status 2025-10-09 11:34:26 +08:00
Marvin Zhang
2a211923da Update core/task/handler/runner_sync.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-09 11:14:32 +08:00
Marvin Zhang
587d9d0960 Merge branch 'test' into develop 2025-10-09 11:13:51 +08:00
Marvin Zhang
4c508557e7 feat: parameterize ports in Docker Compose for better configurability 2025-09-29 14:31:33 +08:00
Marvin Zhang
29ef8d67da feat: implement synchronization and error handling improvements in task reconciliation and file synchronization 2025-09-28 17:42:23 +08:00
Marvin Zhang
e80256aa61 feat: add support for Chinese locale in Docker setup 2025-09-17 16:00:27 +08:00
Marvin Zhang
b6e14a13fe refactor: remove obsolete task reconciliation service tests 2025-09-17 11:05:27 +08:00
Marvin Zhang
afa5fab4c1 feat: enhance task reconciliation with worker-side status caching and synchronization 2025-09-17 11:03:35 +08:00
Marvin Zhang
8c2c23d9b6 feat: Update gRPC service definitions and implement CheckProcess method
- Downgraded protoc-gen-go-grpc and protoc versions for compatibility.
- Added CheckProcess method to TaskService with corresponding request and response types.
- Updated Subscribe and Connect methods to use new generic client stream types.
- Refactored server and client implementations for Subscribe and Connect methods.
- Ensured backward compatibility by maintaining existing method signatures where applicable.
- Added necessary handler for CheckProcess in the service descriptor.
2025-09-17 10:37:03 +08:00
Marvin Zhang
c6834e9964 feat: enhance task reconciliation logic with improved status handling and error messaging 2025-09-17 10:18:13 +08:00
Marvin Zhang
8ebdd98f99 feat: enhance ORM functionality with toggle support and UI updates 2025-09-16 16:09:33 +08:00
Marvin Zhang
bfe40e7c67 fix: comment out AI Assistant toggle button in Header.vue 2025-09-16 15:45:56 +08:00
Marvin Zhang
39f83d71b1 fix: update NotificationRequestDTO to include BSON field names for setting and channel 2025-09-16 15:43:43 +08:00
Marvin Zhang
196273c423 feat: implement ORM support with toggle functionality and UI updates 2025-09-16 15:18:35 +08:00
Marvin Zhang
875ca290b5 fix: update UseORM description to specify supported databases (MySQL, PostgreSQL, SQL Server) 2025-09-16 14:04:01 +08:00
Marvin Zhang
293e630f6f feat: add UseORM field to Database struct for ORM support 2025-09-16 13:30:50 +08:00
Marvin Zhang
8450b074c0 fix: comment out unused 'models' menu item in SystemDetail.vue 2025-09-16 09:33:15 +08:00
Marvin Zhang
56277e47be fix: remove unused build scripts to streamline the build process 2025-09-16 09:32:23 +08:00
Marvin Zhang
9ade564d0a Remove unused TypeScript declaration files for task, token, and user components in the Crawlab UI, streamlining the codebase and improving maintainability. 2025-09-16 09:31:34 +08:00
Marvin Zhang
7d1a61581e feat: add support for multi-architecture Docker builds with configurable input 2025-09-14 16:39:29 +08:00
Marvin Zhang
72177b2728 fix: update node disconnected status styling and behavior 2025-09-14 15:20:01 +08:00
Marvin Zhang
437c30b699 fix: ensure worker services depend on healthy master service 2025-09-14 15:02:06 +08:00
Marvin Zhang
829fcac3ff feat: add multi-platform support for Docker builds 2025-09-14 14:49:53 +08:00
Marvin Zhang
7c33fec784 refactor: remove unused fields from WorkerService struct 2025-09-12 18:17:36 +08:00
Marvin Zhang
e221e3c640 feat: enhance gRPC client handling with improved reconnection logic and monitoring 2025-09-12 18:16:52 +08:00
Marvin Zhang
07bb7f8ba9 fix: enhance node metrics handling by checking state before accessing metrics map 2025-09-12 16:34:40 +08:00