Marvin Zhang
4baa5fad59
fix(grpc/client): trigger reconnection on bad conn state and improve connection logging
...
- Trigger reconnection proactively from Get*WithTimeout when underlying connection is in
SHUTDOWN or TRANSIENT_FAILURE to avoid returning stale/unusable clients.
- Add debug/info logs around client registration, connection attempts, closing existing
connections, connection initiation, reconnection start, backoff retry and successful
reconnection (including current state and registration status).
- Surface more context in reconnection and connection logs to aid diagnostics.
2025-10-20 11:34:41 +08:00
Marvin Zhang
6020fef30b
chore(node): add timing logs and improve node status diagnostics
...
- master: add TIMING logs in setWorkerNodeOnline to mark start and completed DB update
- handler: log node status for reconnection debugging and include active/enabled values in "node not active or enabled" error
2025-10-20 11:14:55 +08:00
Marvin Zhang
49165b2165
refactor(node): reorganize task reconciliation, prioritize worker cache, add periodic cleanup
...
- Move and document reconciliation constants and add sectioned organization/comments.
- Split large monolithic logic into smaller functions:
- reconcileDisconnectedTasks / reconcileDisconnectedTask
- reconcileAbandonedAssignedTasks
- reconcileStalePendingTasks / handleStalePendingTask
- getActualTaskStatus / getStatusFromWorkerCache / triggerWorkerStatusSync
- queryProcessStatus / requestProcessStatusFromWorker / mapProcessStatusToTaskStatus
- findTasksByStatus / markTaskDisconnected / findAvailableNodeForTask
- updateTaskStatus / saveTask / shouldMarkTaskAbnormal / markTaskAbnormal
- Add periodic background workers:
- StartPeriodicReconciliation -> runPeriodicReconciliation to reconcile running/disconnected tasks
- runPeriodicAssignedTaskCleanup -> cleanupStuckAssignedTasks to detect and recover stuck assigned tasks
- Prioritize worker-side cached status and attempt sync from task runner before querying worker processes.
- Introduce a placeholder createWorkerClient for future gRPC worker discovery/invocation.
- Replace ad-hoc DB updates with saveTask using retry/backoff and centralize status update logic.
- Improve logging and error messages, and tighten conditions for marking tasks abnormal.
This refactor clarifies responsibilities, improves reliability of status updates, and prepares the codebase for future worker gRPC integration.
2025-10-20 10:54:32 +08:00
Marvin Zhang
44fd0809e6
feat: disable backend unit tests and document reasons for integration test requirements
2025-10-09 12:35:09 +08:00
Marvin Zhang
5bff8823a8
feat: update test workflows to skip API tests and document controller test status
2025-10-09 11:34:26 +08:00
Marvin Zhang
2a211923da
Update core/task/handler/runner_sync.go
...
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-10-09 11:14:32 +08:00
Marvin Zhang
587d9d0960
Merge branch 'test' into develop
2025-10-09 11:13:51 +08:00
Marvin Zhang
29ef8d67da
feat: implement synchronization and error handling improvements in task reconciliation and file synchronization
2025-09-28 17:42:23 +08:00
Marvin Zhang
b6e14a13fe
refactor: remove obsolete task reconciliation service tests
2025-09-17 11:05:27 +08:00
Marvin Zhang
afa5fab4c1
feat: enhance task reconciliation with worker-side status caching and synchronization
2025-09-17 11:03:35 +08:00
Marvin Zhang
8c2c23d9b6
feat: Update gRPC service definitions and implement CheckProcess method
...
- Downgraded protoc-gen-go-grpc and protoc versions for compatibility.
- Added CheckProcess method to TaskService with corresponding request and response types.
- Updated Subscribe and Connect methods to use new generic client stream types.
- Refactored server and client implementations for Subscribe and Connect methods.
- Ensured backward compatibility by maintaining existing method signatures where applicable.
- Added necessary handler for CheckProcess in the service descriptor.
2025-09-17 10:37:03 +08:00
Marvin Zhang
c6834e9964
feat: enhance task reconciliation logic with improved status handling and error messaging
2025-09-17 10:18:13 +08:00
Marvin Zhang
39f83d71b1
fix: update NotificationRequestDTO to include BSON field names for setting and channel
2025-09-16 15:43:43 +08:00
Marvin Zhang
875ca290b5
fix: update UseORM description to specify supported databases (MySQL, PostgreSQL, SQL Server)
2025-09-16 14:04:01 +08:00
Marvin Zhang
293e630f6f
feat: add UseORM field to Database struct for ORM support
2025-09-16 13:30:50 +08:00
Marvin Zhang
7c33fec784
refactor: remove unused fields from WorkerService struct
2025-09-12 18:17:36 +08:00
Marvin Zhang
e221e3c640
feat: enhance gRPC client handling with improved reconnection logic and monitoring
2025-09-12 18:16:52 +08:00
Marvin Zhang
316878e129
test: add comprehensive tests for task reconciliation service handling offline nodes
2025-09-12 16:10:00 +08:00
Marvin Zhang
60be5072e5
feat: add node disconnection handling and update task statuses accordingly
2025-09-12 15:40:29 +08:00
Marvin Zhang
14a94ff798
refactor: enhance error logging in writeLogLines to respect circuit breaker state
2025-09-12 14:34:27 +08:00
Marvin Zhang
c0e230e5d8
refactor: rename PING code to HEARTBEAT in node service and update related proto files
2025-09-12 14:17:49 +08:00
Marvin Zhang
d39c265483
feat: add PING message handling for connection health checks
...
- Implemented PING message handling in TaskServiceServer to acknowledge health check pings.
- Updated isConnectionHealthy method in Runner to use a non-blocking approach for health checks, preventing interference with log streams.
- Introduced lastConnCheck timestamp to optimize health check frequency based on recent activity.
- Added PING code to TaskServiceConnectCode enum in proto definition and generated files.
- Updated gRPC client and server interfaces to support new PING functionality.
2025-09-12 13:58:16 +08:00
Marvin Zhang
333dfd44c0
refactor: implement circuit breaker for log connections to prevent flooding during failures
2025-09-12 13:55:44 +08:00
Marvin Zhang
3edd2a1210
refactor: optimize connection health checks to reduce log stream interference; adjust health check intervals and implement non-blocking pings
2025-08-16 17:42:07 +08:00
Marvin Zhang
65aeb3ed8c
feat: add PING mechanism for connection health checks; update proto and generated files
...
- Introduced PING code in TaskServiceConnectCode enum for health checks.
- Updated Runner to use proper PING messages instead of fake log messages for connection health checks.
- Modified TaskServiceServer to handle PING requests and acknowledge them.
- Adjusted generated gRPC files to reflect changes in proto definitions and ensure compatibility.
2025-08-16 17:19:21 +08:00
Marvin Zhang
45913ad7e4
refactor: implement health service for master and worker nodes; add health check script and integrate health checks into service lifecycle
2025-08-08 00:05:00 +08:00
Marvin Zhang
78f9e0ca8d
refactor: update task worker pool to support dynamic max workers and improve queue management; enhance configuration defaults for node runners and task queue size
2025-08-07 18:16:23 +08:00
Marvin Zhang
6340a9b880
refactor: Move context initialization for graceful shutdown to appropriate locations
2025-08-07 17:27:11 +08:00
Marvin Zhang
6912b92501
refactor: enhance context handling across task runner and service components; ensure proper cancellation chains and prevent goroutine leaks
2025-08-07 15:40:48 +08:00
Marvin Zhang
e1251d808b
refactor: update method receivers to value type for cleanup and connection methods; enhance context usage for task client operations
2025-08-07 11:53:42 +08:00
Marvin Zhang
d042bc8cd7
refactor: improve connection readiness check and enhance goroutine management in gRPC client; ensure proper context handling in stream listeners
2025-08-07 11:12:46 +08:00
Marvin Zhang
060396af3d
refactor: enhance data structure annotations and improve layout responsiveness in Home component
2025-08-07 09:58:11 +08:00
Marvin Zhang
44dd68918f
refactor: improve goroutine management and context handling in task and stream operations; ensure graceful shutdown and prevent leaks
2025-08-07 00:16:46 +08:00
Marvin Zhang
784ffc8b52
feat: implement task management service operations, stream manager, and worker pool
...
- Added service_operations.go for task management including run, cancel, and execution logic.
- Introduced stream_manager.go to handle task streams and manage cancellation signals.
- Created worker_pool.go to manage a bounded pool of workers for executing tasks concurrently.
- Implemented graceful shutdown and cleanup mechanisms for task runners and streams.
- Enhanced error handling and logging throughout the task management process.
2025-08-06 18:29:08 +08:00
Marvin Zhang
3678d14082
feat: implement bounded goroutine pools for task execution and notification handling; enhance task scheduler with graceful shutdown and cleanup routines; update metric component for new time range options
2025-08-06 17:57:37 +08:00
Marvin Zhang
9745129e33
feat: configure test database as master node for testing
2025-07-23 15:28:51 +08:00
Marvin Zhang
cf5ec81250
fix: add nil checks for logger in config initialization and logging methods
2025-07-23 15:07:39 +08:00
Marvin Zhang
a2d13fae36
feat: temporarily disable batch file saving route and implement alternative handler in spider controller
2025-07-23 14:55:04 +08:00
Marvin Zhang
b4288b08a5
fix: Update FsFileInfo to use pointer slices for children and remove redundant getter methods
2025-07-23 09:42:07 +08:00
Marvin Zhang
3c3ff09723
feat: enhance gRPC client with health check functionality and improved connection handling
2025-07-09 14:38:53 +08:00
Marvin Zhang
20ba390cf6
refactor: improve mongo client connection error logging format and remove redundant gRPC server start in MasterService
2025-07-09 14:06:10 +08:00
Marvin Zhang
46c0cd6298
refactor: update gRPC client access patterns to use safe getter methods for improved error handling
2025-07-08 18:08:46 +08:00
Marvin Zhang
8bd3ef0b72
feat: add goroutine count metric to the MetricService and update related files
2025-07-08 14:08:31 +08:00
Marvin Zhang
00daa0ed96
fix: enhance gRPC client reconnection logic and add goroutine monitoring for potential leaks
2025-07-08 13:39:39 +08:00
Marvin Zhang
f8e9c45a85
fix: enhance gRPC client connection management with circuit breaker and keep-alive settings
2025-07-08 13:34:43 +08:00
Marvin Zhang
92046a8c2e
fix: improve task cancellation and connection health check logic with timeout handling
2025-06-27 14:02:24 +08:00
Marvin Zhang
9f251f3ebe
fix: enhance task cancellation logic with graceful termination and stuck task cleanup
2025-06-27 13:50:21 +08:00
Marvin Zhang
89514b0154
feat: implement zombie process prevention and cleanup mechanisms in task runner
2025-06-23 13:54:43 +08:00
Marvin Zhang
1008886715
fix: enhance task service resilience with connection health monitoring and periodic cleanup
2025-06-23 11:57:05 +08:00
Marvin Zhang
5837472de5
fix: update API client constructor and improve error handling for missing API token and URL
...
chore: update package version to 0.1.1-dev.6 and adjust dependencies in pnpm-lock.yaml
refactor: change import paths to relative in tools.ts for consistency
2025-06-20 15:55:50 +08:00