Real-time voice conversion service based on Seed-VC, providing WebSocket voice conversion with PCM and Opus audio format support
Features are continuously being updated. Stay tuned for our latest developments...
Fast-VC-Service aims to build a high-performance real-time streaming voice conversion cloud service designed for production environments. Based on the Seed-VC model, it supports WebSocket protocol and PCM/OPUS audio encoding formats.
Core Features | Quick Start | Performance | Version Updates | TODO | Acknowledgements
Core Features
- Real-time Conversion: Low-latency streaming voice conversion based on Seed-VC
- WebSocket API: Support for PCM and OPUS audio formats
- Performance Monitoring: Complete real-time performance metrics statistics
- High Concurrency: Multi-Worker concurrent processing, supporting production environments
- Easy Deployment: Simple configuration, one-click startup
Quick Start
? One-click Installation
# Clone project
git clone --recursive https://*gi*th*ub.com/Leroll/fast-vc-service.git
cd fast-vc-service
# Configure environment
cp .env.example .env
# Install dependencies (Poetry recommended)
poetry install
# Start service
fast-vc serve? Quick Testing
# WebSocket real-time voice conversion
python examples/websocket/ws_client.py
--source-wav-path "wavs/sources/low-pitched-male-24k.wav"
--encoding PCMFor detailed installation and usage guide, please refer to Quick Start documentation.
? Performance
| GPU | Concurrency | Worker | Chunk time | First Token Latency | End-to-End Latency | Avg Chunk Latency | Avg RTF | Median RTF | P95 RTF |
|---|---|---|---|---|---|---|---|---|---|
| 4090D | 1 | 6 | 500 | 136.0 | 143.0 | 105.0 | 0.21 | 0.22 | 0.24 |
| 4090D | 12 | 12 | 500 | 140.1 | 256.6 | 216.6 | 0.44 | 0.45 | 0.51 |
| 1080TI | 1 | 6 | 500 | 157.0 | 272.0 | 252.2 | 0.50 | 0.51 | 0.61 |
| 1080TI | 3 | 6 | 500 | 154.3 | 261.3 | 304.9 | 0.61 | 0.62 | 0.73 |
- Time unit: milliseconds (ms)
- View detailed test report:
- Performance-Report_4090D
- Performance-Report_1080ti
Version Updates
2025-07-02 - v0.1.3: Added Process and Instance Level Concurrency Monitoring
- Added PID record to logs for easier instance tracking
- Added instance concurrency monitoring feature for real-time concurrency viewing
- Optimized performance analysis interface to reduce impact on real-time performance
2025-06-26 - v0.1.2: Persistent Storage Optimization
- Optimized session persistent storage module with asynchronous processing
- Separated time-consuming timeline statistical analysis module to improve response speed
- Optimized timeline recording mechanism to reduce storage overhead
2025-06-19 - v0.1.1: First Packet Performance Optimization
- Added performance monitoring API endpoint /tools/performance-report for real-time performance metrics
- Enhanced timing logs for better performance bottleneck analysis
- Mitigated delay issue caused by first audio packet model invocation
View Historical Versions
2025-06-15 - v0.1.0: Basic Service Framework
Completed the core framework construction of real-time voice conversion service based on Seed-VC, implementing WebSocket streaming inference, performance monitoring, multi-format audio support and other complete basic functions.
- Real-time streaming voice conversion service
- WebSocket API support for PCM and Opus formats
- Complete performance monitoring and statistics system
- Flexible configuration management and environment variable support
- Multi-Worker concurrent processing capability
- Concurrent performance testing framework
? TODO
-
tag - v0.2 - Improve inference efficiency, reduce RTF - v2025-xx
- Optimize timeline_lognize, add delay items for same events
- Add SLOW tags in logs for monitoring receive interval, send interval, and VC-E2E latency
- Optimize session tool's file naming
- Add adaptive pitch extraction functionality with corresponding toggle switch
- Change VAD to use ONNX-GPU to improve inference speed
- Complete support for seed-vc V2.0 model
- Explore solutions to reduce model inference latency (e.g., new model architectures, quantization, etc.)
- Use torchaudio to directly read reference audio to GPU, eliminating transfer steps
- Fix file_vc issue with the last block
- Create Docker image, AutoDL image
Acknowledgements
- Seed-VC - Provides powerful underlying voice conversion model
- RVC - Provides basic streaming voice conversion pipeline
通过命令行克隆项目: