Files
crawlab/grpc/proto/services/sync_service.proto
Marvin Zhang f441265cc2 feat(sync): add gRPC file synchronization service and integrate end-to-end
- add proto/services/sync_service.proto and generate Go pb + grpc bindings
- implement SyncServiceServer (streaming file scan + download) with:
  - request deduplication, in-memory cache (TTL), chunked streaming
  - concurrent-safe broadcast to waiters and server-side logging
- register SyncSvr in gRPC server and expose sync client in GrpcClient:
  - add syncClient field, registration and safe getters with reconnection-aware timeouts
- integrate gRPC sync into runner:
  - split syncFiles into syncFilesHTTP (legacy) and syncFilesGRPC
  - Runner now chooses implementation via config flag and performs streaming scan/download
- controller improvements:
  - add semaphore-based rate limiting for sync scan requests with in-flight counters and logs
- misc:
  - add utils.IsSyncGrpcEnabled() config helper
  - improve HTTP sync error diagnostics (Content-Type validation, response previews)
  - update/regenerate many protobuf and gRPC generated files (protoc/protoc-gen-go / protoc-gen-go-grpc version bumps)
2025-10-20 12:48:53 +08:00

56 lines
1.5 KiB
Protocol Buffer

syntax = "proto3";
package grpc;
option go_package = ".;grpc";
// File synchronization request
message FileSyncRequest {
string spider_id = 1; // or git_id
string path = 2; // working directory path
string node_key = 3; // worker node key
}
// File information message (streamable)
message FileInfo {
string name = 1;
string path = 2;
string full_path = 3;
string extension = 4;
bool is_dir = 5;
int64 file_size = 6;
int64 mod_time = 7; // Unix timestamp
uint32 mode = 8; // File permissions
string hash = 9; // File content hash
}
// Stream response for file scan
message FileScanChunk {
repeated FileInfo files = 1; // Batch of files
bool is_complete = 2; // Last chunk indicator
string error = 3; // Error message if any
int32 total_files = 4; // Total file count (in last chunk)
}
// Download request
message FileDownloadRequest {
string spider_id = 1;
string path = 2;
string node_key = 3;
}
// Download response (streamed in chunks)
message FileDownloadChunk {
bytes data = 1; // File data chunk
bool is_complete = 2; // Last chunk indicator
string error = 3; // Error if any
int64 total_bytes = 4; // Total file size (in first chunk)
}
service SyncService {
// Stream file list for synchronization
rpc StreamFileScan(FileSyncRequest) returns (stream FileScanChunk);
// Stream file download
rpc StreamFileDownload(FileDownloadRequest) returns (stream FileDownloadChunk);
}