Commit Graph

380 Commits

Author SHA1 Message Date
Marvin Zhang
4baa5fad59 fix(grpc/client): trigger reconnection on bad conn state and improve connection logging
- Trigger reconnection proactively from Get*WithTimeout when underlying connection is in
  SHUTDOWN or TRANSIENT_FAILURE to avoid returning stale/unusable clients.
- Add debug/info logs around client registration, connection attempts, closing existing
  connections, connection initiation, reconnection start, backoff retry and successful
  reconnection (including current state and registration status).
- Surface more context in reconnection and connection logs to aid diagnostics.
2025-10-20 11:34:41 +08:00
Marvin Zhang
6020fef30b chore(node): add timing logs and improve node status diagnostics
- master: add TIMING logs in setWorkerNodeOnline to mark start and completed DB update
- handler: log node status for reconnection debugging and include active/enabled values in "node not active or enabled" error
2025-10-20 11:14:55 +08:00
Marvin Zhang
49165b2165 refactor(node): reorganize task reconciliation, prioritize worker cache, add periodic cleanup
- Move and document reconciliation constants and add sectioned organization/comments.
- Split large monolithic logic into smaller functions:
  - reconcileDisconnectedTasks / reconcileDisconnectedTask
  - reconcileAbandonedAssignedTasks
  - reconcileStalePendingTasks / handleStalePendingTask
  - getActualTaskStatus / getStatusFromWorkerCache / triggerWorkerStatusSync
  - queryProcessStatus / requestProcessStatusFromWorker / mapProcessStatusToTaskStatus
  - findTasksByStatus / markTaskDisconnected / findAvailableNodeForTask
  - updateTaskStatus / saveTask / shouldMarkTaskAbnormal / markTaskAbnormal
- Add periodic background workers:
  - StartPeriodicReconciliation -> runPeriodicReconciliation to reconcile running/disconnected tasks
  - runPeriodicAssignedTaskCleanup -> cleanupStuckAssignedTasks to detect and recover stuck assigned tasks
- Prioritize worker-side cached status and attempt sync from task runner before querying worker processes.
- Introduce a placeholder createWorkerClient for future gRPC worker discovery/invocation.
- Replace ad-hoc DB updates with saveTask using retry/backoff and centralize status update logic.
- Improve logging and error messages, and tighten conditions for marking tasks abnormal.

This refactor clarifies responsibilities, improves reliability of status updates, and prepares the codebase for future worker gRPC integration.
2025-10-20 10:54:32 +08:00
Marvin Zhang
44fd0809e6 feat: disable backend unit tests and document reasons for integration test requirements 2025-10-09 12:35:09 +08:00
Marvin Zhang
5bff8823a8 feat: update test workflows to skip API tests and document controller test status 2025-10-09 11:34:26 +08:00
Marvin Zhang
2a211923da Update core/task/handler/runner_sync.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-09 11:14:32 +08:00
Marvin Zhang
587d9d0960 Merge branch 'test' into develop 2025-10-09 11:13:51 +08:00
Marvin Zhang
29ef8d67da feat: implement synchronization and error handling improvements in task reconciliation and file synchronization 2025-09-28 17:42:23 +08:00
Marvin Zhang
b6e14a13fe refactor: remove obsolete task reconciliation service tests 2025-09-17 11:05:27 +08:00
Marvin Zhang
afa5fab4c1 feat: enhance task reconciliation with worker-side status caching and synchronization 2025-09-17 11:03:35 +08:00
Marvin Zhang
8c2c23d9b6 feat: Update gRPC service definitions and implement CheckProcess method
- Downgraded protoc-gen-go-grpc and protoc versions for compatibility.
- Added CheckProcess method to TaskService with corresponding request and response types.
- Updated Subscribe and Connect methods to use new generic client stream types.
- Refactored server and client implementations for Subscribe and Connect methods.
- Ensured backward compatibility by maintaining existing method signatures where applicable.
- Added necessary handler for CheckProcess in the service descriptor.
2025-09-17 10:37:03 +08:00
Marvin Zhang
c6834e9964 feat: enhance task reconciliation logic with improved status handling and error messaging 2025-09-17 10:18:13 +08:00
Marvin Zhang
39f83d71b1 fix: update NotificationRequestDTO to include BSON field names for setting and channel 2025-09-16 15:43:43 +08:00
Marvin Zhang
875ca290b5 fix: update UseORM description to specify supported databases (MySQL, PostgreSQL, SQL Server) 2025-09-16 14:04:01 +08:00
Marvin Zhang
293e630f6f feat: add UseORM field to Database struct for ORM support 2025-09-16 13:30:50 +08:00
Marvin Zhang
7c33fec784 refactor: remove unused fields from WorkerService struct 2025-09-12 18:17:36 +08:00
Marvin Zhang
e221e3c640 feat: enhance gRPC client handling with improved reconnection logic and monitoring 2025-09-12 18:16:52 +08:00
Marvin Zhang
316878e129 test: add comprehensive tests for task reconciliation service handling offline nodes 2025-09-12 16:10:00 +08:00
Marvin Zhang
60be5072e5 feat: add node disconnection handling and update task statuses accordingly 2025-09-12 15:40:29 +08:00
Marvin Zhang
14a94ff798 refactor: enhance error logging in writeLogLines to respect circuit breaker state 2025-09-12 14:34:27 +08:00
Marvin Zhang
c0e230e5d8 refactor: rename PING code to HEARTBEAT in node service and update related proto files 2025-09-12 14:17:49 +08:00
Marvin Zhang
d39c265483 feat: add PING message handling for connection health checks
- Implemented PING message handling in TaskServiceServer to acknowledge health check pings.
- Updated isConnectionHealthy method in Runner to use a non-blocking approach for health checks, preventing interference with log streams.
- Introduced lastConnCheck timestamp to optimize health check frequency based on recent activity.
- Added PING code to TaskServiceConnectCode enum in proto definition and generated files.
- Updated gRPC client and server interfaces to support new PING functionality.
2025-09-12 13:58:16 +08:00
Marvin Zhang
333dfd44c0 refactor: implement circuit breaker for log connections to prevent flooding during failures 2025-09-12 13:55:44 +08:00
Marvin Zhang
3edd2a1210 refactor: optimize connection health checks to reduce log stream interference; adjust health check intervals and implement non-blocking pings 2025-08-16 17:42:07 +08:00
Marvin Zhang
65aeb3ed8c feat: add PING mechanism for connection health checks; update proto and generated files
- Introduced PING code in TaskServiceConnectCode enum for health checks.
- Updated Runner to use proper PING messages instead of fake log messages for connection health checks.
- Modified TaskServiceServer to handle PING requests and acknowledge them.
- Adjusted generated gRPC files to reflect changes in proto definitions and ensure compatibility.
2025-08-16 17:19:21 +08:00
Marvin Zhang
45913ad7e4 refactor: implement health service for master and worker nodes; add health check script and integrate health checks into service lifecycle 2025-08-08 00:05:00 +08:00
Marvin Zhang
78f9e0ca8d refactor: update task worker pool to support dynamic max workers and improve queue management; enhance configuration defaults for node runners and task queue size 2025-08-07 18:16:23 +08:00
Marvin Zhang
6340a9b880 refactor: Move context initialization for graceful shutdown to appropriate locations 2025-08-07 17:27:11 +08:00
Marvin Zhang
6912b92501 refactor: enhance context handling across task runner and service components; ensure proper cancellation chains and prevent goroutine leaks 2025-08-07 15:40:48 +08:00
Marvin Zhang
e1251d808b refactor: update method receivers to value type for cleanup and connection methods; enhance context usage for task client operations 2025-08-07 11:53:42 +08:00
Marvin Zhang
d042bc8cd7 refactor: improve connection readiness check and enhance goroutine management in gRPC client; ensure proper context handling in stream listeners 2025-08-07 11:12:46 +08:00
Marvin Zhang
060396af3d refactor: enhance data structure annotations and improve layout responsiveness in Home component 2025-08-07 09:58:11 +08:00
Marvin Zhang
44dd68918f refactor: improve goroutine management and context handling in task and stream operations; ensure graceful shutdown and prevent leaks 2025-08-07 00:16:46 +08:00
Marvin Zhang
784ffc8b52 feat: implement task management service operations, stream manager, and worker pool
- Added service_operations.go for task management including run, cancel, and execution logic.
- Introduced stream_manager.go to handle task streams and manage cancellation signals.
- Created worker_pool.go to manage a bounded pool of workers for executing tasks concurrently.
- Implemented graceful shutdown and cleanup mechanisms for task runners and streams.
- Enhanced error handling and logging throughout the task management process.
2025-08-06 18:29:08 +08:00
Marvin Zhang
3678d14082 feat: implement bounded goroutine pools for task execution and notification handling; enhance task scheduler with graceful shutdown and cleanup routines; update metric component for new time range options 2025-08-06 17:57:37 +08:00
Marvin Zhang
9745129e33 feat: configure test database as master node for testing 2025-07-23 15:28:51 +08:00
Marvin Zhang
cf5ec81250 fix: add nil checks for logger in config initialization and logging methods 2025-07-23 15:07:39 +08:00
Marvin Zhang
a2d13fae36 feat: temporarily disable batch file saving route and implement alternative handler in spider controller 2025-07-23 14:55:04 +08:00
Marvin Zhang
b4288b08a5 fix: Update FsFileInfo to use pointer slices for children and remove redundant getter methods 2025-07-23 09:42:07 +08:00
Marvin Zhang
3c3ff09723 feat: enhance gRPC client with health check functionality and improved connection handling 2025-07-09 14:38:53 +08:00
Marvin Zhang
20ba390cf6 refactor: improve mongo client connection error logging format and remove redundant gRPC server start in MasterService 2025-07-09 14:06:10 +08:00
Marvin Zhang
46c0cd6298 refactor: update gRPC client access patterns to use safe getter methods for improved error handling 2025-07-08 18:08:46 +08:00
Marvin Zhang
8bd3ef0b72 feat: add goroutine count metric to the MetricService and update related files 2025-07-08 14:08:31 +08:00
Marvin Zhang
00daa0ed96 fix: enhance gRPC client reconnection logic and add goroutine monitoring for potential leaks 2025-07-08 13:39:39 +08:00
Marvin Zhang
f8e9c45a85 fix: enhance gRPC client connection management with circuit breaker and keep-alive settings 2025-07-08 13:34:43 +08:00
Marvin Zhang
92046a8c2e fix: improve task cancellation and connection health check logic with timeout handling 2025-06-27 14:02:24 +08:00
Marvin Zhang
9f251f3ebe fix: enhance task cancellation logic with graceful termination and stuck task cleanup 2025-06-27 13:50:21 +08:00
Marvin Zhang
89514b0154 feat: implement zombie process prevention and cleanup mechanisms in task runner 2025-06-23 13:54:43 +08:00
Marvin Zhang
1008886715 fix: enhance task service resilience with connection health monitoring and periodic cleanup 2025-06-23 11:57:05 +08:00
Marvin Zhang
5837472de5 fix: update API client constructor and improve error handling for missing API token and URL
chore: update package version to 0.1.1-dev.6 and adjust dependencies in pnpm-lock.yaml
refactor: change import paths to relative in tools.ts for consistency
2025-06-20 15:55:50 +08:00