Commit Graph

48 Commits

Author SHA1 Message Date
Marvin Zhang
b6e14a13fe refactor: remove obsolete task reconciliation service tests 2025-09-17 11:05:27 +08:00
Marvin Zhang
afa5fab4c1 feat: enhance task reconciliation with worker-side status caching and synchronization 2025-09-17 11:03:35 +08:00
Marvin Zhang
8c2c23d9b6 feat: Update gRPC service definitions and implement CheckProcess method
- Downgraded protoc-gen-go-grpc and protoc versions for compatibility.
- Added CheckProcess method to TaskService with corresponding request and response types.
- Updated Subscribe and Connect methods to use new generic client stream types.
- Refactored server and client implementations for Subscribe and Connect methods.
- Ensured backward compatibility by maintaining existing method signatures where applicable.
- Added necessary handler for CheckProcess in the service descriptor.
2025-09-17 10:37:03 +08:00
Marvin Zhang
c6834e9964 feat: enhance task reconciliation logic with improved status handling and error messaging 2025-09-17 10:18:13 +08:00
Marvin Zhang
7c33fec784 refactor: remove unused fields from WorkerService struct 2025-09-12 18:17:36 +08:00
Marvin Zhang
e221e3c640 feat: enhance gRPC client handling with improved reconnection logic and monitoring 2025-09-12 18:16:52 +08:00
Marvin Zhang
316878e129 test: add comprehensive tests for task reconciliation service handling offline nodes 2025-09-12 16:10:00 +08:00
Marvin Zhang
60be5072e5 feat: add node disconnection handling and update task statuses accordingly 2025-09-12 15:40:29 +08:00
Marvin Zhang
c0e230e5d8 refactor: rename PING code to HEARTBEAT in node service and update related proto files 2025-09-12 14:17:49 +08:00
Marvin Zhang
45913ad7e4 refactor: implement health service for master and worker nodes; add health check script and integrate health checks into service lifecycle 2025-08-08 00:05:00 +08:00
Marvin Zhang
e1251d808b refactor: update method receivers to value type for cleanup and connection methods; enhance context usage for task client operations 2025-08-07 11:53:42 +08:00
Marvin Zhang
20ba390cf6 refactor: improve mongo client connection error logging format and remove redundant gRPC server start in MasterService 2025-07-09 14:06:10 +08:00
Marvin Zhang
46c0cd6298 refactor: update gRPC client access patterns to use safe getter methods for improved error handling 2025-07-08 18:08:46 +08:00
Marvin Zhang
ef499a03e0 fix: improve logging in master and worker services
- Added logging for error handling in the MasterService when setting a worker node offline, replacing the previous trace.PrintError with a more informative log message.
- Enhanced WorkerService subscription method with debug logs to indicate subscription attempts and status, improving traceability during connection processes.
2024-12-29 19:19:36 +08:00
Marvin Zhang
3276083994 refactor: replace apex/log with structured logger across multiple services
- Replaced all instances of apex/log with a structured logger interface in various services, including Api, Server, Config, and others, to enhance logging consistency and context.
- Updated logging calls to utilize the new logger methods, improving error tracking and service monitoring.
- Added logger initialization in services and controllers to ensure proper logging setup.
- Improved error handling and logging messages for better clarity during service operations.
- Removed unused apex/log imports and cleaned up related code for better maintainability.
2024-12-24 19:11:19 +08:00
Marvin Zhang
e064889795 refactor: replace apex/log with structured logger in master and worker services
- Removed direct usage of apex/log in favor of a structured logger interface for improved logging consistency and context.
- Updated logging calls in MasterService and WorkerService to utilize the new logger, enhancing error tracking and service monitoring.
- Added logger initialization in both services to ensure proper logging setup.
- Improved error handling and logging messages for better clarity during service operations.
2024-12-23 21:45:38 +08:00
Marvin Zhang
3cb74d76f9 feat: enhance gRPC client functionality and improve logging
- Added WaitForReady method to GrpcClient for blocking until the client is ready.
- Updated WorkerService to utilize WaitForReady for ensuring gRPC client readiness before starting.
- Refactored ModelService to consistently use GetGrpcClient for context management.
- Changed logging level for received metrics in MetricServiceServer from Info to Debug.
- Modified error handling in HandleError to conditionally print errors based on the environment.
- Cleaned up unused GrpcClient references in various services, improving code clarity.
2024-12-20 20:34:04 +08:00
Marvin Zhang
be93f9d17d feat: added retry for worker node start 2024-12-20 11:40:21 +08:00
Marvin Zhang
1fe74fa8a5 fix: optimized node runners calculation 2024-12-11 20:43:40 +08:00
Marvin Zhang
858e5c2b89 fix: unable to start api 2024-11-22 21:19:17 +08:00
Marvin Zhang
7a322ae6c8 fix: unable to start api 2024-11-22 20:58:01 +08:00
Marvin Zhang
dc9f62dfd0 feat: added health check for worker service 2024-11-19 18:32:50 +08:00
Marvin Zhang
3dc66e48db fix: test case issue 2024-11-19 15:53:40 +08:00
Marvin Zhang
e33fcfc150 refactor: renamed files and services 2024-11-05 11:15:27 +08:00
Marvin Zhang
73674832b8 feat: optimized dependency api 2024-11-04 00:16:42 +08:00
Marvin Zhang
71f0a210ba refactor: fixed dependency errors 2024-11-01 15:19:48 +08:00
Marvin Zhang
68ba84a4e7 refactor: optimized node communication 2024-11-01 15:19:48 +08:00
Marvin Zhang
d9b327de17 refactor: code cleanup 2024-11-01 15:19:48 +08:00
Marvin Zhang
8a5f51de47 refactor: updated grpc services 2024-11-01 15:19:48 +08:00
Marvin Zhang
79ea8a0f88 refactor: updated index related code 2024-10-29 13:18:57 +08:00
Marvin Zhang
1c03cb3e5c refactor: code cleanup 2024-10-29 12:59:45 +08:00
Marvin Zhang
e1170d5612 test: updated test cases 2024-10-20 17:55:57 +08:00
Marvin Zhang
1b852fb96a refactor: code cleanup 2024-10-18 15:03:32 +08:00
Marvin Zhang
7b1fa48fd9 feat: support notification for node 2024-07-24 17:00:35 +08:00
Marvin Zhang
821383a677 refactor: Update SendNotification function to handle old and new settings triggers 2024-07-18 00:05:48 +08:00
Marvin Zhang
b7cafb4623 refactor: Update SendNotification function to handle old and new settings triggers 2024-07-15 17:34:04 +08:00
Marvin Zhang
3a03ac63dc fix: compiling issue 2024-07-12 20:05:14 +08:00
Marvin Zhang
d0611b4567 refactor: removed unnecessary code 2024-07-12 18:00:19 +08:00
Marvin Zhang
aca0c0ebce refactor: removed unnecessary code 2024-07-11 12:45:29 +08:00
Marvin Zhang
40f37e85ef fix: missing name and max runners when registering nodes 2024-07-03 14:57:33 +08:00
Marvin Zhang
023ba27566 fix: unable to sync directories to work nodes 2024-07-01 15:59:20 +08:00
Marvin Zhang
7bdce1af58 feat: added metrics service v2 2024-06-26 23:23:14 +08:00
Marvin Zhang
326a8d67d0 fix: missing data source issue 2024-06-26 12:37:24 +08:00
Marvin Zhang
5daeccb87d fix: unable to sync files and save data issues 2024-06-25 14:58:54 +08:00
Marvin Zhang
972713959f feat: updated grpc for dependencies service 2024-06-15 23:25:24 +08:00
Marvin Zhang
6a60433d25 feat: added modules 2024-06-14 16:37:48 +08:00
Marvin Zhang
dc21bce11f feat: added modules 2024-06-14 15:59:48 +08:00
Marvin Zhang
0b67fd9ece feat: added modules 2024-06-14 15:42:50 +08:00