OmniVoice Studio – How to use it
01/08
What is OmniVoice Studio?
OmniVoice Studio is a Open-source desktop applications For voice cloning, video dubbing, real-time dictation and speaker diarization. Everything runs locally on your machine. No API key, no cloud account, no subscription required.
- 646 languages Supported for TTS via default OmniVoice engine
- 99 languages For transcription via WhisperX
- available on macOS, Windows and Linux
- GPU is optional – entire pipeline runs on CPU
- Free for personal, educational and research use (FSL-1.1-ALv2)
OmniVoice Studio – How to use it
02/08
System Requirements
A GPU is optional. Without one, TTS runs approx. 3× slow On CPU. With ≤8 GB VRAM, TTS is automatically offloaded to the CPU during transcription – no configuration required.
| Component | minimum | recommended |
|---|---|---|
| os | Win 10 / MacOS 12+ / Ubuntu 20.04+ | any modern 64-bit OS |
| to hit | 8 GB | 16 GB+ |
| VRAM | 4 GB (auto-offload) | 8GB+ (RTX 3060+) |
| Disc | 10 GB free | 20GB+ SSD |
| Python | 3.10+ | 3.11–3.12 |
| gpu | optional | CUDA/MPS/ROCM |
OmniVoice Studio – How to use it
03/08
installation
The project recommends running from source. Install the first three prerequisites: ffmpeg, become (JS runtime), and ultraviolet (Python Package Manager).
git clone https://github.com/debpalash/OmniVoice-Studio.git
cd OmniVoice-Studio
uv sync
bun install
bun dev
front end http://localhost:5173 | is loaded on API Runs on port 8000.
Model weights are automatically downloaded on the first generation.
Pre-built installers are available: macOS DMG, Windows MSI, Linux AppImage and .deb – see the release page on GitHub.
OmniVoice Studio – How to use it
04/08
voice cloning
use of voice cloning zero-shot learning – It clones a voice from a short clip 3 secondsWithout prior training on that voice. The default OmniVoice engine positions a diffusion-based TTS model on the reference audio.
- go to voice clone tabs in ui
- upload or record 3 second audio clip target sound
- Enter your text and select a target language (646 available)
- Click Yield – The output is saved to your project library
Voice Gallery: Search YouTube, browse categories, and download reference clips right inside the app to build your voice library.
OmniVoice Studio – How to use it
05/08
video dubbing
The full dubbing pipeline runs locally: transcription → translation → synthesis → mux. Demux separates the vocals so the original background audio is preserved in the final export.
- go to strike Tab – Paste a YouTube URL or upload a local file
- WhisperX transcribes speech word-level alignment
- Choose a target language; Translation runs automatically
- The TTS engine re-voices the transcript; Demux preserves background audio
- export final MP4 mixed with dubbed audio
Batch Queue: Skip up to 50 videos and away you go. Each task has its own progress bar tracking through the entire pipeline.
OmniVoice Studio – How to use it
06/08
Dictation and speaker diarization
dictation Works system-wide from any application. diarizing Piannote+ identifies individual speakers in a multi-speaker audio file using WhisperX.
- Press ⌘+⇧+space (macOS) To open the floating dictation widget
- Speech streams via WebSocket and auto-pastes into active input field
- Upload a multi-speaker file diarizing tab
- Pinot recognizes who said what; Each speaker gets an auto-extracted voice profile
- Specify one TTS voice per speaker for dubbing per speaker
Hugging Face token required For pinot diurization. See docs/setup/huggingface-token.md in the repo.
OmniVoice Studio – How to use it
07/08
tts engine
Six TTS engines are manufactured. switch through Settings → TTS Engine or env var:OMNIVOICE_TTS_BACKEND=cosyvoice
| engine | Languages | clone | platform |
|---|---|---|---|
| unison (default) | 600+ | ✓ | CUDA/MPS/CPU |
| relaxing sounds 3 | 9+18 bids | ✓ | CUDA/MPS/CPU |
| mlx-audio | multi | Different | apple silicone only |
| voxcpm2 | 30 | ✓ | CUDA/MPS/CPU |
| mos-tts-nano | 20 | ✓ | CUDA/CPU |
| KittenTTS | English | ✗ | cpu only |
Custom Engine: Subclass TTSBackend in backend/services/tts_backend.py and add it to _REGISTRY. ~50 lines of Python.
OmniVoice Studio – How to use it
08 / 08
MCP Servers and Resources
OmniVoice Studio ships a built-in mcp serverExposing voice and dubbing capabilities to any MCP-compatible client – cloud, cursor, or your own tooling – without opening the desktop UI.
- MCP Server starts with FastAPI backend become god
- Point your MCP client to the local server to access all endpoints
- audioseal (Meta) AI embeds an invisible neural watermark in all generated audio for provenance
- GitHub: github.com/debpalash/OmniVoice-Studio
- Install Document: Documents/Install/ (MacOS/Windows/Linux/Docker)
- Troubleshooting: Documents/Install/Troubleshoot.md
- Discord: discord.gg/bzQavDfVV9