crawlab

mirror of https://github.com/crawlab-team/crawlab.git synced 2026-01-22 17:31:03 +01:00

Author	SHA1	Message	Date
Marvin Zhang	138bed5c05	fix(grpc/client): wait for full reconnection readiness before clearing reconnecting flag - add maxReconnectionWait and reconnectionCheckInterval constants for reconnection readiness polling - introduce waitForFullReconnectionReady() to verify: connection READY, clients registered, and ability to obtain critical service clients (model/task) within short timeouts - ensure reconnecting flag is cleared immediately on reconnection failure and only cleared after full readiness checks on success - improve logging around reconnection stabilization and readiness checks	2025-10-21 21:26:57 +08:00
Marvin Zhang	ba6d989c7e	fix(controllers/health): return after responding OK to avoid falling through; tidy imports	2025-10-20 16:34:55 +08:00
Marvin Zhang	ec3dd2d077	fix(grpc/client): protect GetGrpcClient with _clientMux lock to avoid race during singleton init	2025-10-20 13:43:55 +08:00
Marvin Zhang	2dfc66743b	fix(grpc/client,node/task/handler): add RetryWithBackoff, stabilize reconnection, and retry gRPC ops - add RetryWithBackoff helper to grpc client for exponential retry with backoff and reconnection-aware handling - increase reconnectionClientTimeout to 90s and introduce connectionStabilizationDelay; wait briefly after reconnection to avoid immediate flapping - refresh reconnection flag while waiting for client registration and improve cancellation message - replace direct heartbeat RPC with RetryWithBackoff in WorkerService (use extended timeout) - use RetryWithBackoff for worker node status updates in task handler and propagate errors	2025-10-20 13:01:10 +08:00
Marvin Zhang	f441265cc2	feat(sync): add gRPC file synchronization service and integrate end-to-end - add proto/services/sync_service.proto and generate Go pb + grpc bindings - implement SyncServiceServer (streaming file scan + download) with: - request deduplication, in-memory cache (TTL), chunked streaming - concurrent-safe broadcast to waiters and server-side logging - register SyncSvr in gRPC server and expose sync client in GrpcClient: - add syncClient field, registration and safe getters with reconnection-aware timeouts - integrate gRPC sync into runner: - split syncFiles into syncFilesHTTP (legacy) and syncFilesGRPC - Runner now chooses implementation via config flag and performs streaming scan/download - controller improvements: - add semaphore-based rate limiting for sync scan requests with in-flight counters and logs - misc: - add utils.IsSyncGrpcEnabled() config helper - improve HTTP sync error diagnostics (Content-Type validation, response previews) - update/regenerate many protobuf and gRPC generated files (protoc/protoc-gen-go / protoc-gen-go-grpc version bumps)	2025-10-20 12:48:53 +08:00
Marvin Zhang	61604e1817	fix(task/handler): ensure latest gRPC client is used for task fetch/subscribe Add svc.getGrpcClient() helper and use it when obtaining TaskClient so task fetch and subscribe operations don't hold a stale client instance after ResetGrpcClient().	2025-10-20 12:22:34 +08:00
Marvin Zhang	4baa5fad59	fix(grpc/client): trigger reconnection on bad conn state and improve connection logging - Trigger reconnection proactively from Get*WithTimeout when underlying connection is in SHUTDOWN or TRANSIENT_FAILURE to avoid returning stale/unusable clients. - Add debug/info logs around client registration, connection attempts, closing existing connections, connection initiation, reconnection start, backoff retry and successful reconnection (including current state and registration status). - Surface more context in reconnection and connection logs to aid diagnostics.	2025-10-20 11:34:41 +08:00
Marvin Zhang	6020fef30b	chore(node): add timing logs and improve node status diagnostics - master: add TIMING logs in setWorkerNodeOnline to mark start and completed DB update - handler: log node status for reconnection debugging and include active/enabled values in "node not active or enabled" error	2025-10-20 11:14:55 +08:00
Marvin Zhang	49165b2165	refactor(node): reorganize task reconciliation, prioritize worker cache, add periodic cleanup - Move and document reconciliation constants and add sectioned organization/comments. - Split large monolithic logic into smaller functions: - reconcileDisconnectedTasks / reconcileDisconnectedTask - reconcileAbandonedAssignedTasks - reconcileStalePendingTasks / handleStalePendingTask - getActualTaskStatus / getStatusFromWorkerCache / triggerWorkerStatusSync - queryProcessStatus / requestProcessStatusFromWorker / mapProcessStatusToTaskStatus - findTasksByStatus / markTaskDisconnected / findAvailableNodeForTask - updateTaskStatus / saveTask / shouldMarkTaskAbnormal / markTaskAbnormal - Add periodic background workers: - StartPeriodicReconciliation -> runPeriodicReconciliation to reconcile running/disconnected tasks - runPeriodicAssignedTaskCleanup -> cleanupStuckAssignedTasks to detect and recover stuck assigned tasks - Prioritize worker-side cached status and attempt sync from task runner before querying worker processes. - Introduce a placeholder createWorkerClient for future gRPC worker discovery/invocation. - Replace ad-hoc DB updates with saveTask using retry/backoff and centralize status update logic. - Improve logging and error messages, and tighten conditions for marking tasks abnormal. This refactor clarifies responsibilities, improves reliability of status updates, and prepares the codebase for future worker gRPC integration.	2025-10-20 10:54:32 +08:00
Marvin Zhang	44fd0809e6	feat: disable backend unit tests and document reasons for integration test requirements	2025-10-09 12:35:09 +08:00
Marvin Zhang	5bff8823a8	feat: update test workflows to skip API tests and document controller test status	2025-10-09 11:34:26 +08:00
Marvin Zhang	2a211923da	Update core/task/handler/runner_sync.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-09 11:14:32 +08:00
Marvin Zhang	587d9d0960	Merge branch 'test' into develop	2025-10-09 11:13:51 +08:00
Marvin Zhang	29ef8d67da	feat: implement synchronization and error handling improvements in task reconciliation and file synchronization	2025-09-28 17:42:23 +08:00
Marvin Zhang	b6e14a13fe	refactor: remove obsolete task reconciliation service tests	2025-09-17 11:05:27 +08:00
Marvin Zhang	afa5fab4c1	feat: enhance task reconciliation with worker-side status caching and synchronization	2025-09-17 11:03:35 +08:00
Marvin Zhang	8c2c23d9b6	feat: Update gRPC service definitions and implement CheckProcess method - Downgraded protoc-gen-go-grpc and protoc versions for compatibility. - Added CheckProcess method to TaskService with corresponding request and response types. - Updated Subscribe and Connect methods to use new generic client stream types. - Refactored server and client implementations for Subscribe and Connect methods. - Ensured backward compatibility by maintaining existing method signatures where applicable. - Added necessary handler for CheckProcess in the service descriptor.	2025-09-17 10:37:03 +08:00
Marvin Zhang	c6834e9964	feat: enhance task reconciliation logic with improved status handling and error messaging	2025-09-17 10:18:13 +08:00
Marvin Zhang	39f83d71b1	fix: update NotificationRequestDTO to include BSON field names for setting and channel	2025-09-16 15:43:43 +08:00
Marvin Zhang	875ca290b5	fix: update UseORM description to specify supported databases (MySQL, PostgreSQL, SQL Server)	2025-09-16 14:04:01 +08:00
Marvin Zhang	293e630f6f	feat: add UseORM field to Database struct for ORM support	2025-09-16 13:30:50 +08:00
Marvin Zhang	7c33fec784	refactor: remove unused fields from WorkerService struct	2025-09-12 18:17:36 +08:00
Marvin Zhang	e221e3c640	feat: enhance gRPC client handling with improved reconnection logic and monitoring	2025-09-12 18:16:52 +08:00
Marvin Zhang	316878e129	test: add comprehensive tests for task reconciliation service handling offline nodes	2025-09-12 16:10:00 +08:00
Marvin Zhang	60be5072e5	feat: add node disconnection handling and update task statuses accordingly	2025-09-12 15:40:29 +08:00
Marvin Zhang	14a94ff798	refactor: enhance error logging in writeLogLines to respect circuit breaker state	2025-09-12 14:34:27 +08:00
Marvin Zhang	c0e230e5d8	refactor: rename PING code to HEARTBEAT in node service and update related proto files	2025-09-12 14:17:49 +08:00
Marvin Zhang	d39c265483	feat: add PING message handling for connection health checks - Implemented PING message handling in TaskServiceServer to acknowledge health check pings. - Updated isConnectionHealthy method in Runner to use a non-blocking approach for health checks, preventing interference with log streams. - Introduced lastConnCheck timestamp to optimize health check frequency based on recent activity. - Added PING code to TaskServiceConnectCode enum in proto definition and generated files. - Updated gRPC client and server interfaces to support new PING functionality.	2025-09-12 13:58:16 +08:00
Marvin Zhang	333dfd44c0	refactor: implement circuit breaker for log connections to prevent flooding during failures	2025-09-12 13:55:44 +08:00
Marvin Zhang	3edd2a1210	refactor: optimize connection health checks to reduce log stream interference; adjust health check intervals and implement non-blocking pings	2025-08-16 17:42:07 +08:00
Marvin Zhang	65aeb3ed8c	feat: add PING mechanism for connection health checks; update proto and generated files - Introduced PING code in TaskServiceConnectCode enum for health checks. - Updated Runner to use proper PING messages instead of fake log messages for connection health checks. - Modified TaskServiceServer to handle PING requests and acknowledge them. - Adjusted generated gRPC files to reflect changes in proto definitions and ensure compatibility.	2025-08-16 17:19:21 +08:00
Marvin Zhang	45913ad7e4	refactor: implement health service for master and worker nodes; add health check script and integrate health checks into service lifecycle	2025-08-08 00:05:00 +08:00
Marvin Zhang	78f9e0ca8d	refactor: update task worker pool to support dynamic max workers and improve queue management; enhance configuration defaults for node runners and task queue size	2025-08-07 18:16:23 +08:00
Marvin Zhang	6340a9b880	refactor: Move context initialization for graceful shutdown to appropriate locations	2025-08-07 17:27:11 +08:00
Marvin Zhang	6912b92501	refactor: enhance context handling across task runner and service components; ensure proper cancellation chains and prevent goroutine leaks	2025-08-07 15:40:48 +08:00
Marvin Zhang	e1251d808b	refactor: update method receivers to value type for cleanup and connection methods; enhance context usage for task client operations	2025-08-07 11:53:42 +08:00
Marvin Zhang	d042bc8cd7	refactor: improve connection readiness check and enhance goroutine management in gRPC client; ensure proper context handling in stream listeners	2025-08-07 11:12:46 +08:00
Marvin Zhang	060396af3d	refactor: enhance data structure annotations and improve layout responsiveness in Home component	2025-08-07 09:58:11 +08:00
Marvin Zhang	44dd68918f	refactor: improve goroutine management and context handling in task and stream operations; ensure graceful shutdown and prevent leaks	2025-08-07 00:16:46 +08:00
Marvin Zhang	784ffc8b52	feat: implement task management service operations, stream manager, and worker pool - Added service_operations.go for task management including run, cancel, and execution logic. - Introduced stream_manager.go to handle task streams and manage cancellation signals. - Created worker_pool.go to manage a bounded pool of workers for executing tasks concurrently. - Implemented graceful shutdown and cleanup mechanisms for task runners and streams. - Enhanced error handling and logging throughout the task management process.	2025-08-06 18:29:08 +08:00
Marvin Zhang	3678d14082	feat: implement bounded goroutine pools for task execution and notification handling; enhance task scheduler with graceful shutdown and cleanup routines; update metric component for new time range options	2025-08-06 17:57:37 +08:00
Marvin Zhang	9745129e33	feat: configure test database as master node for testing	2025-07-23 15:28:51 +08:00
Marvin Zhang	cf5ec81250	fix: add nil checks for logger in config initialization and logging methods	2025-07-23 15:07:39 +08:00
Marvin Zhang	a2d13fae36	feat: temporarily disable batch file saving route and implement alternative handler in spider controller	2025-07-23 14:55:04 +08:00
Marvin Zhang	b4288b08a5	fix: Update FsFileInfo to use pointer slices for children and remove redundant getter methods	2025-07-23 09:42:07 +08:00
Marvin Zhang	3c3ff09723	feat: enhance gRPC client with health check functionality and improved connection handling	2025-07-09 14:38:53 +08:00
Marvin Zhang	20ba390cf6	refactor: improve mongo client connection error logging format and remove redundant gRPC server start in MasterService	2025-07-09 14:06:10 +08:00
Marvin Zhang	46c0cd6298	refactor: update gRPC client access patterns to use safe getter methods for improved error handling	2025-07-08 18:08:46 +08:00
Marvin Zhang	8bd3ef0b72	feat: add goroutine count metric to the MetricService and update related files	2025-07-08 14:08:31 +08:00
Marvin Zhang	00daa0ed96	fix: enhance gRPC client reconnection logic and add goroutine monitoring for potential leaks	2025-07-08 13:39:39 +08:00

1 2 3 4 5 ...

386 Commits