84 Commits

Author SHA1 Message Date
Marvin Zhang
ef70312430 fix(grpc/server): add keepalive enforcement and params to match client
Configure server-side keepalive (EnforcementPolicy and ServerParameters) to align with client settings and prevent connection timeouts after network disconnection/reconnection.
2025-10-23 10:58:22 +08:00
Marvin Zhang
893cb3cb8a fix(grpc/server): mark node active on Subscribe and notify on status change
Optimistically mark node as online/active and persist ActiveAt when a node
subscribes, logging success or warning on failure. Send a node notification
in Pro mode if the status changed. Also tidy import ordering.
2025-10-22 22:03:16 +08:00
Marvin Zhang
18fc84afb7 fix(grpc/client): clear reconnecting on failure and requeue reconnection after backoff
Ensure the reconnecting flag is reset on failed attempts so subsequent retries can proceed,
and explicitly trigger a reconnection attempt after the backoff period to keep retrying recovery.
2025-10-21 22:03:17 +08:00
Marvin Zhang
138bed5c05 fix(grpc/client): wait for full reconnection readiness before clearing reconnecting flag
- add maxReconnectionWait and reconnectionCheckInterval constants for reconnection readiness polling
- introduce waitForFullReconnectionReady() to verify: connection READY, clients registered, and ability to obtain critical service clients (model/task) within short timeouts
- ensure reconnecting flag is cleared immediately on reconnection failure and only cleared after full readiness checks on success
- improve logging around reconnection stabilization and readiness checks
2025-10-21 21:26:57 +08:00
Marvin Zhang
ec3dd2d077 fix(grpc/client): protect GetGrpcClient with _clientMux lock to avoid race during singleton init 2025-10-20 13:43:55 +08:00
Marvin Zhang
2dfc66743b fix(grpc/client,node/task/handler): add RetryWithBackoff, stabilize reconnection, and retry gRPC ops
- add RetryWithBackoff helper to grpc client for exponential retry with backoff and reconnection-aware handling
- increase reconnectionClientTimeout to 90s and introduce connectionStabilizationDelay; wait briefly after reconnection to avoid immediate flapping
- refresh reconnection flag while waiting for client registration and improve cancellation message
- replace direct heartbeat RPC with RetryWithBackoff in WorkerService (use extended timeout)
- use RetryWithBackoff for worker node status updates in task handler and propagate errors
2025-10-20 13:01:10 +08:00
Marvin Zhang
f441265cc2 feat(sync): add gRPC file synchronization service and integrate end-to-end
- add proto/services/sync_service.proto and generate Go pb + grpc bindings
- implement SyncServiceServer (streaming file scan + download) with:
  - request deduplication, in-memory cache (TTL), chunked streaming
  - concurrent-safe broadcast to waiters and server-side logging
- register SyncSvr in gRPC server and expose sync client in GrpcClient:
  - add syncClient field, registration and safe getters with reconnection-aware timeouts
- integrate gRPC sync into runner:
  - split syncFiles into syncFilesHTTP (legacy) and syncFilesGRPC
  - Runner now chooses implementation via config flag and performs streaming scan/download
- controller improvements:
  - add semaphore-based rate limiting for sync scan requests with in-flight counters and logs
- misc:
  - add utils.IsSyncGrpcEnabled() config helper
  - improve HTTP sync error diagnostics (Content-Type validation, response previews)
  - update/regenerate many protobuf and gRPC generated files (protoc/protoc-gen-go / protoc-gen-go-grpc version bumps)
2025-10-20 12:48:53 +08:00
Marvin Zhang
4baa5fad59 fix(grpc/client): trigger reconnection on bad conn state and improve connection logging
- Trigger reconnection proactively from Get*WithTimeout when underlying connection is in
  SHUTDOWN or TRANSIENT_FAILURE to avoid returning stale/unusable clients.
- Add debug/info logs around client registration, connection attempts, closing existing
  connections, connection initiation, reconnection start, backoff retry and successful
  reconnection (including current state and registration status).
- Surface more context in reconnection and connection logs to aid diagnostics.
2025-10-20 11:34:41 +08:00
Marvin Zhang
8c2c23d9b6 feat: Update gRPC service definitions and implement CheckProcess method
- Downgraded protoc-gen-go-grpc and protoc versions for compatibility.
- Added CheckProcess method to TaskService with corresponding request and response types.
- Updated Subscribe and Connect methods to use new generic client stream types.
- Refactored server and client implementations for Subscribe and Connect methods.
- Ensured backward compatibility by maintaining existing method signatures where applicable.
- Added necessary handler for CheckProcess in the service descriptor.
2025-09-17 10:37:03 +08:00
Marvin Zhang
e221e3c640 feat: enhance gRPC client handling with improved reconnection logic and monitoring 2025-09-12 18:16:52 +08:00
Marvin Zhang
65aeb3ed8c feat: add PING mechanism for connection health checks; update proto and generated files
- Introduced PING code in TaskServiceConnectCode enum for health checks.
- Updated Runner to use proper PING messages instead of fake log messages for connection health checks.
- Modified TaskServiceServer to handle PING requests and acknowledge them.
- Adjusted generated gRPC files to reflect changes in proto definitions and ensure compatibility.
2025-08-16 17:19:21 +08:00
Marvin Zhang
e1251d808b refactor: update method receivers to value type for cleanup and connection methods; enhance context usage for task client operations 2025-08-07 11:53:42 +08:00
Marvin Zhang
d042bc8cd7 refactor: improve connection readiness check and enhance goroutine management in gRPC client; ensure proper context handling in stream listeners 2025-08-07 11:12:46 +08:00
Marvin Zhang
44dd68918f refactor: improve goroutine management and context handling in task and stream operations; ensure graceful shutdown and prevent leaks 2025-08-07 00:16:46 +08:00
Marvin Zhang
3c3ff09723 feat: enhance gRPC client with health check functionality and improved connection handling 2025-07-09 14:38:53 +08:00
Marvin Zhang
20ba390cf6 refactor: improve mongo client connection error logging format and remove redundant gRPC server start in MasterService 2025-07-09 14:06:10 +08:00
Marvin Zhang
46c0cd6298 refactor: update gRPC client access patterns to use safe getter methods for improved error handling 2025-07-08 18:08:46 +08:00
Marvin Zhang
8bd3ef0b72 feat: add goroutine count metric to the MetricService and update related files 2025-07-08 14:08:31 +08:00
Marvin Zhang
00daa0ed96 fix: enhance gRPC client reconnection logic and add goroutine monitoring for potential leaks 2025-07-08 13:39:39 +08:00
Marvin Zhang
f8e9c45a85 fix: enhance gRPC client connection management with circuit breaker and keep-alive settings 2025-07-08 13:34:43 +08:00
Marvin Zhang
1008886715 fix: enhance task service resilience with connection health monitoring and periodic cleanup 2025-06-23 11:57:05 +08:00
Marvin Zhang
25fe273a62 refactor: improve logging in gRPC services by removing service prefixes
- Updated log messages in NodeServiceServer and TaskServiceServer to remove the "[NodeServiceServer]" and "[TaskServiceServer]" prefixes for cleaner output.
- This change enhances log readability and maintains consistency across logging practices in the application.
2024-12-31 13:30:02 +08:00
Marvin Zhang
dc59599509 refactor: remove db module and update imports to core/mongo
- Deleted the db module, consolidating database-related functionality into the core/mongo package for better organization and maintainability.
- Updated all import paths across the codebase to replace references to the removed db module with core/mongo.
- Cleaned up unused code and dependencies, enhancing overall project clarity and reducing complexity.
- This refactor improves the structure of the codebase by centralizing database operations and simplifying module management.
2024-12-25 10:28:21 +08:00
Marvin Zhang
3276083994 refactor: replace apex/log with structured logger across multiple services
- Replaced all instances of apex/log with a structured logger interface in various services, including Api, Server, Config, and others, to enhance logging consistency and context.
- Updated logging calls to utilize the new logger methods, improving error tracking and service monitoring.
- Added logger initialization in services and controllers to ensure proper logging setup.
- Improved error handling and logging messages for better clarity during service operations.
- Removed unused apex/log imports and cleaned up related code for better maintainability.
2024-12-24 19:11:19 +08:00
Marvin Zhang
99ed4396d1 refactor: improve logging messages and update configuration constants
- Updated logging messages in GrpcClient to provide clearer context, changing "ready" to "client is now ready" and "stopped" to "client has stopped".
- Refactored test setup in runner_test.go to remove unnecessary error checks during gRPC client start for cleaner code.
- Renamed GetDependencySetupScriptRoot to GetInstallRoot and updated related constants for better clarity and consistency in configuration management.
2024-12-23 18:19:08 +08:00
Marvin Zhang
c3f4c4ae05 feat: enhance gRPC client with structured logging and dependency actions
- Added DependencyActionSync and DependencyActionSetup constants to improve dependency management.
- Refactored GrpcClient to utilize a logger interface for consistent logging across connection states and errors.
- Updated Start, Stop, and connection methods to replace direct log calls with logger methods, enhancing log context and readability.
- Simplified test cases by removing error checks on gRPC client start, ensuring cleaner test setup.
2024-12-23 17:17:21 +08:00
Marvin Zhang
e44b416e34 feat: enhance model base service with BSON ID normalization
- Added utility function to normalize BSON ObjectId in query parameters for gRPC methods.
- Updated GetOne, GetMany, DeleteOne, DeleteMany, UpdateOne, UpdateMany, ReplaceOne, and UpsertOne methods to utilize the new normalization function.
- Introduced new DependencyConfigSetup model instance in base model definitions for improved structure.
2024-12-21 22:07:54 +08:00
Marvin Zhang
29af5a366b feat: enhance gRPC client with state management and reconnection logic
- Introduced state management in GrpcClient to monitor and handle connection states effectively.
- Added a reconnect channel and a state monitoring goroutine to facilitate automatic reconnections on state changes.
- Updated the connect method to initiate a reconnection loop upon connection loss.
- Enhanced logging for connection state changes and errors during connection attempts.
- Refactored tests to ensure proper initialization of gRPC client and server, improving test reliability and coverage.
2024-12-21 21:41:00 +08:00
Marvin Zhang
3cb74d76f9 feat: enhance gRPC client functionality and improve logging
- Added WaitForReady method to GrpcClient for blocking until the client is ready.
- Updated WorkerService to utilize WaitForReady for ensuring gRPC client readiness before starting.
- Refactored ModelService to consistently use GetGrpcClient for context management.
- Changed logging level for received metrics in MetricServiceServer from Info to Debug.
- Modified error handling in HandleError to conditionally print errors based on the environment.
- Cleaned up unused GrpcClient references in various services, improving code clarity.
2024-12-20 20:34:04 +08:00
Marvin Zhang
f736b2c58e fix: getting stream error for dependency server 2024-12-18 17:43:41 +08:00
Marvin Zhang
79c1d5d14b feat: updated dependency config setup 2024-12-16 21:44:03 +08:00
Marvin Zhang
c5c08dfba6 feat: added dependency config setup (wip) 2024-12-15 23:09:10 +08:00
Marvin Zhang
24561bcbe0 refactor: optimized code 2024-11-24 23:14:26 +08:00
Marvin Zhang
b3261343b8 refactor: code cleanup 2024-11-20 18:22:27 +08:00
Marvin Zhang
3dc66e48db fix: test case issue 2024-11-19 15:53:40 +08:00
Marvin Zhang
a3b286558b refactor: consolidated configs 2024-11-18 16:48:09 +08:00
Marvin Zhang
7731e321ed feat: updated dependency api 2024-11-06 17:15:45 +08:00
Marvin Zhang
6ee5579fe3 fix: import issue 2024-11-06 11:38:55 +08:00
Marvin Zhang
d21b1a89c5 feat: optimized dependency logic 2024-11-05 18:42:33 +08:00
Marvin Zhang
a0989d36db feat: optimized dependency logic 2024-11-05 18:21:52 +08:00
Marvin Zhang
0117794930 fix: test issues 2024-11-05 13:46:22 +08:00
Marvin Zhang
10bb511c5f feat: updated dependency handler logic 2024-11-05 11:40:12 +08:00
Marvin Zhang
e33fcfc150 refactor: renamed files and services 2024-11-05 11:15:27 +08:00
Marvin Zhang
fbf8e5f9f3 feat: optimizing dependency services including grpc, api 2024-11-04 17:45:34 +08:00
Marvin Zhang
73674832b8 feat: optimized dependency api 2024-11-04 00:16:42 +08:00
Marvin Zhang
71f0a210ba refactor: fixed dependency errors 2024-11-01 15:19:48 +08:00
Marvin Zhang
68ba84a4e7 refactor: optimized node communication 2024-11-01 15:19:48 +08:00
Marvin Zhang
d9b327de17 refactor: code cleanup 2024-11-01 15:19:48 +08:00
Marvin Zhang
5deca5e2a2 refactor: updated grpc services 2024-11-01 15:19:48 +08:00
Marvin Zhang
8a5f51de47 refactor: updated grpc services 2024-11-01 15:19:48 +08:00