Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

by ai-intensify
0 comments
Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

OmniVoice Studio – How to use it
01/08

What is OmniVoice Studio?

OmniVoice Studio is a Open-source desktop applications For voice cloning, video dubbing, real-time dictation and speaker diarization. Everything runs locally on your machine. No API key, no cloud account, no subscription required.

  • 646 languages Supported for TTS via default OmniVoice engine
  • 99 languages For transcription via WhisperX
  • available on macOS, Windows and Linux
  • GPU is optional – entire pipeline runs on CPU
  • Free for personal, educational and research use (FSL-1.1-ALv2)

OmniVoice Studio – How to use it
02/08

System Requirements

A GPU is optional. Without one, TTS runs approx. 3× slow On CPU. With ≤8 GB VRAM, TTS is automatically offloaded to the CPU during transcription – no configuration required.

Component minimum recommended
os Win 10 / MacOS 12+ / Ubuntu 20.04+ any modern 64-bit OS
to hit 8 GB 16 GB+
VRAM 4 GB (auto-offload) 8GB+ (RTX 3060+)
Disc 10 GB free 20GB+ SSD
Python 3.10+ 3.11–3.12
gpu optional CUDA/MPS/ROCM

OmniVoice Studio – How to use it
03/08

installation

The project recommends running from source. Install the first three prerequisites: ffmpeg, become (JS runtime), and ultraviolet (Python Package Manager).

git clone https://github.com/debpalash/OmniVoice-Studio.git
cd OmniVoice-Studio
uv sync
bun install
bun dev

front end http://localhost:5173 | is loaded on API Runs on port 8000.
Model weights are automatically downloaded on the first generation.

Pre-built installers are available: macOS DMG, Windows MSI, Linux AppImage and .deb – see the release page on GitHub.

OmniVoice Studio – How to use it
04/08

voice cloning

use of voice cloning zero-shot learning – It clones a voice from a short clip 3 secondsWithout prior training on that voice. The default OmniVoice engine positions a diffusion-based TTS model on the reference audio.

  • go to voice clone tabs in ui
  • upload or record 3 second audio clip target sound
  • Enter your text and select a target language (646 available)
  • Click Yield – The output is saved to your project library

Voice Gallery: Search YouTube, browse categories, and download reference clips right inside the app to build your voice library.

OmniVoice Studio – How to use it
05/08

video dubbing

The full dubbing pipeline runs locally: transcription → translation → synthesis → mux. Demux separates the vocals so the original background audio is preserved in the final export.

  • go to strike Tab – Paste a YouTube URL or upload a local file
  • WhisperX transcribes speech word-level alignment
  • Choose a target language; Translation runs automatically
  • The TTS engine re-voices the transcript; Demux preserves background audio
  • export final MP4 mixed with dubbed audio

Batch Queue: Skip up to 50 videos and away you go. Each task has its own progress bar tracking through the entire pipeline.

OmniVoice Studio – How to use it
06/08

Dictation and speaker diarization

dictation Works system-wide from any application. diarizing Piannote+ identifies individual speakers in a multi-speaker audio file using WhisperX.

  • Press ⌘+⇧+space (macOS) To open the floating dictation widget
  • Speech streams via WebSocket and auto-pastes into active input field
  • Upload a multi-speaker file diarizing tab
  • Pinot recognizes who said what; Each speaker gets an auto-extracted voice profile
  • Specify one TTS voice per speaker for dubbing per speaker

Hugging Face token required For pinot diurization. See docs/setup/huggingface-token.md in the repo.

OmniVoice Studio – How to use it
07/08

tts engine

Six TTS engines are manufactured. switch through Settings → TTS Engine or env var:
OMNIVOICE_TTS_BACKEND=cosyvoice

engine Languages clone platform
unison (default) 600+ CUDA/MPS/CPU
relaxing sounds 3 9+18 bids CUDA/MPS/CPU
mlx-audio multi Different apple silicone only
voxcpm2 30 CUDA/MPS/CPU
mos-tts-nano 20 CUDA/CPU
KittenTTS English cpu only

Custom Engine: Subclass TTSBackend in backend/services/tts_backend.py and add it to _REGISTRY. ~50 lines of Python.

OmniVoice Studio – How to use it
08 / 08

MCP Servers and Resources

OmniVoice Studio ships a built-in mcp serverExposing voice and dubbing capabilities to any MCP-compatible client – ​​cloud, cursor, or your own tooling – without opening the desktop UI.

  • MCP Server starts with FastAPI backend become god
  • Point your MCP client to the local server to access all endpoints
  • audioseal (Meta) AI embeds an invisible neural watermark in all generated audio for provenance
  • GitHub: github.com/debpalash/OmniVoice-Studio
  • Install Document: Documents/Install/ (MacOS/Windows/Linux/Docker)
  • Troubleshooting: Documents/Install/Troubleshoot.md
  • Discord: discord.gg/bzQavDfVV9

Related Articles

Leave a Comment