Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

OmniVoice Studio – How to use it
01/08

What is OmniVoice Studio?

OmniVoice Studio is a Open-source desktop applications For voice cloning, video dubbing, real-time dictation and speaker diarization. Everything runs locally on your machine. No API key, no cloud account, no subscription required.

646 languages Supported for TTS via default OmniVoice engine
99 languages For transcription via WhisperX
available on macOS, Windows and Linux
GPU is optional – entire pipeline runs on CPU
Free for personal, educational and research use (FSL-1.1-ALv2)

OmniVoice Studio – How to use it
02/08

System Requirements

A GPU is optional. Without one, TTS runs approx. 3× slow On CPU. With ≤8 GB VRAM, TTS is automatically offloaded to the CPU during transcription – no configuration required.

Component	minimum	recommended
os	Win 10 / MacOS 12+ / Ubuntu 20.04+	any modern 64-bit OS
to hit	8 GB	16 GB+
VRAM	4 GB (auto-offload)	8GB+ (RTX 3060+)
Disc	10 GB free	20GB+ SSD
Python	3.10+	3.11–3.12
gpu	optional	CUDA/MPS/ROCM

OmniVoice Studio – How to use it
03/08

installation

The project recommends running from source. Install the first three prerequisites: ffmpeg, become (JS runtime), and ultraviolet (Python Package Manager).

git clone https://github.com/debpalash/OmniVoice-Studio.git cd OmniVoice-Studio uv sync bun install bun dev

front end http://localhost:5173 | is loaded on API Runs on port 8000.
Model weights are automatically downloaded on the first generation.

Pre-built installers are available: macOS DMG, Windows MSI, Linux AppImage and .deb – see the release page on GitHub.

OmniVoice Studio – How to use it
04/08

voice cloning

use of voice cloning zero-shot learning – It clones a voice from a short clip 3 secondsWithout prior training on that voice. The default OmniVoice engine positions a diffusion-based TTS model on the reference audio.

go to voice clone tabs in ui
upload or record 3 second audio clip target sound
Enter your text and select a target language (646 available)
Click Yield – The output is saved to your project library

Voice Gallery: Search YouTube, browse categories, and download reference clips right inside the app to build your voice library.

OmniVoice Studio – How to use it
05/08

video dubbing

The full dubbing pipeline runs locally: transcription → translation → synthesis → mux. Demux separates the vocals so the original background audio is preserved in the final export.

go to strike Tab – Paste a YouTube URL or upload a local file
WhisperX transcribes speech word-level alignment
Choose a target language; Translation runs automatically
The TTS engine re-voices the transcript; Demux preserves background audio
export final MP4 mixed with dubbed audio

Batch Queue: Skip up to 50 videos and away you go. Each task has its own progress bar tracking through the entire pipeline.

OmniVoice Studio – How to use it
06/08

Dictation and speaker diarization

dictation Works system-wide from any application. diarizing Piannote+ identifies individual speakers in a multi-speaker audio file using WhisperX.

Press ⌘+⇧+space (macOS) To open the floating dictation widget
Speech streams via WebSocket and auto-pastes into active input field
Upload a multi-speaker file diarizing tab
Pinot recognizes who said what; Each speaker gets an auto-extracted voice profile
Specify one TTS voice per speaker for dubbing per speaker

Hugging Face token required For pinot diurization. See docs/setup/huggingface-token.md in the repo.

OmniVoice Studio – How to use it
07/08

tts engine

Six TTS engines are manufactured. switch through Settings → TTS Engine or env var:
OMNIVOICE_TTS_BACKEND=cosyvoice

engine	Languages	clone	platform
unison (default)	600+	✓	CUDA/MPS/CPU
relaxing sounds 3	9+18 bids	✓	CUDA/MPS/CPU
mlx-audio	multi	Different	apple silicone only
voxcpm2	30	✓	CUDA/MPS/CPU
mos-tts-nano	20	✓	CUDA/CPU
KittenTTS	English	✗	cpu only

Custom Engine: Subclass TTSBackend in backend/services/tts_backend.py and add it to _REGISTRY. ~50 lines of Python.

OmniVoice Studio – How to use it
08 / 08

MCP Servers and Resources

OmniVoice Studio ships a built-in mcp serverExposing voice and dubbing capabilities to any MCP-compatible client – cloud, cursor, or your own tooling – without opening the desktop UI.

MCP Server starts with FastAPI backend become god
Point your MCP client to the local server to access all endpoints
audioseal (Meta) AI embeds an invisible neural watermark in all generated audio for provenance

GitHub: github.com/debpalash/OmniVoice-Studio
Install Document: Documents/Install/ (MacOS/Windows/Linux/Docker)
Troubleshooting: Documents/Install/Troubleshoot.md
Discord: discord.gg/bzQavDfVV9

Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

Two people arrested for creating AI deepfake porn

Design a complete multimodal RLVR pipeline with OpenMM-RL, vision-language prompting, reward scoring, and GRPO export

Related Articles

Leave a Comment Cancel Reply