Restoring digital sovereignty — one voice sample at a time.
Explore the manifestoThe phenomenon of the "Dead Internet" has become a tangible reality affecting most users of the global digital media space today. A state of doubt and uncertainty has taken hold: is the entity I am communicating with — the one sending me information — a real human being or a fabricated persona?
It was from this standpoint that Monopeak took this dilemma seriously and worked tirelessly to devise sovereign technological solutions that restore a portion of the trust users have lost in the digital world. And from here, our latest project was born: YourVoices.
Our platform contains no convoluted privacy agreements or twisted terms — because we simply do not own nor collect any information about anyone, unless they personally and of their own free will choose to publish it and confirm their possession of it through the unique voice sample verification.
The vast majority of humanity has come to see the entire internet as a grand surveillance apparatus and a tool for violating digital sanctities. The prevailing perception among individuals is that "our privacy is the cheapest commodity in the global technology market."
And here, the voice arrived — the voice of every human being who suffers from the violation of their digital rights and refuses to be merely a number in an algorithm. Monopeak's philosophy is built on breaking the mould of blind compliance: you should not stand submissively just because everyone else decided to stand when the red light flashes, and you should not accept — neither programmatically nor humanly — something you are not convinced of deep within. YourVoices is the space that restores that trust and complete sovereignty to you.
Authenticate using your unique voice sample — no passwords, no recovery emails, no third-party identity providers. Learn about the science →
Every user holds a self-sovereign identifier (V-ID). Your identity belongs to you, stored on your device, not in a central database.
End-to-end encrypted whispers reach only the intended recipient. No metadata trails, no server-side message inspection.
Peer-to-peer encrypted voice calls. Your conversations stay between you and the person you trust.
Creating a YourVoices account takes under two minutes and requires nothing but your voice. Here is exactly what happens, step by step.
You record three short voice samples (7 seconds each). The app processes them entirely on your device — no audio is ever sent to our servers.
You choose an 8-digit PIN. This PIN is never stored in plain text, never sent anywhere. It is combined with your voice embedding via PBKDF2-HMAC-SHA256 (200,000 iterations) to derive your personal MonocodeKey — the encryption key for your local data.
A unique Sovereign ID (V-ID) is generated on your device. Your voice centroid and public encryption key are sent to our server to complete registration. Your private key never leaves your device.
You may optionally verify an email address. This unlocks an additional +2 GB of Bunker storage (total 5 GB). Your email is used solely for this verification — never for marketing or shared with third parties.
In your account settings, you may add a display name, profile photo, and short biography. This information is publicly visible to all platform users — only include what you are comfortable sharing openly.
We believe you have a right to know exactly what data exists about you and where it lives. The table below is the complete picture.
| Data | Where stored | Can we read it? | You can delete it |
|---|---|---|---|
| Voice embedding & centroid Mathematical vector — not audio |
Your device Server | No — only a vector | Yes — delete account |
| PIN & MonocodeKey PBKDF2-derived encryption key |
Your device only | Never — stays on device | Yes — uninstall app |
| Private keys X25519 keypair for E2E encryption |
Your device only | Never — stays on device | Yes — uninstall app |
| Sovereign ID (V-ID) | Your device Server | Yes — it is an identifier | Yes — delete account |
| Public encryption key | Server | Yes — it is public by design | Yes — delete account |
| Email address Optional |
Server | Yes | Yes — delete account |
| Profile (name, photo, bio) Optional — publicly visible |
Server | Yes — and so can everyone else | Yes — edit or delete account |
| Messages & media Whisper — end-to-end encrypted |
Bunker (Helsinki) | No — encrypted before upload | Yes — individually or delete account |
| Call logs Encrypted metadata |
Bunker (Helsinki) | No — encrypted before upload | Yes — delete account |
How voice sample verification works — experimental results & technical references
A voice sample is a 512-dimensional mathematical vector (embedding) extracted from your voice. Unlike a password that you must remember or a fingerprint that requires special hardware, your voice sample is computed from a short audio recording (7 seconds) and serves as a soft biometric identifier — not a hard cryptographic proof, but a statistically reliable signal that helps confirm your identity.
The term "voice sample" is deliberately chosen over "voice print" to avoid the misleading implication that voice is as unique and immutable as a fingerprint. Voice is behavioural: it changes with health, emotion, age, and recording conditions. It is a sample taken at a moment in time, compared against a stored centroid (the average of several enrolment samples).
From raw audio to similarity score — every step runs entirely on-device:
The system was evaluated on VoxCeleb1 (1,211 speakers, ~148k utterances) using the production-identical pipeline: Rust DSP → CMVN → CAM++ ONNX (ORT_DISABLE_ALL).
Key insight: The EER of ~9% reflects a frozen CAM++ model used as a feature extractor with cosine similarity — no fine-tuning, no PLDA backend calibration. Published state-of-the-art results (1-2% EER) are achieved after full training on VoxCeleb with data augmentation and backend calibration. This ~9% figure is the honest, measured upper bound of our current pipeline on noisy, real-world data.
Two-speaker validation: In a controlled test with two distinct speakers (clean recordings, known ground truth), the system achieved a mean gap of 0.342 with zero overlap between intra-speaker and inter-speaker similarity distributions — comfortably above the 0.30 red line.
| Component | Verification | Method |
|---|---|---|
| DSP (Rust fbank_cli) | ✅ | Bit-exact with device pipeline (flat FBank match) |
| CMVN (axis=0, ε=1e-10) | ✅ | Zero-mean verified per bin; matches Kotlin implementation exactly |
| ONNX (ORT_DISABLE_ALL) | ✅ | maxAbsDiff < 1e-4 between device and Python reference |
| L2 normalization | ✅ | Every embedding has ‖x‖ = 1.0 ± 1e-6 |
| VoxCeleb1 (dev, ~9% EER) | ⚠️ | Estimated on 30 speakers × 5 utterances; official vox1_test evaluation pending file availability |
The full test harness is in the repository under cmvn_experiment/voxceleb/:
voxceleb_CAM++.onnx, ORT_DISABLE_ALL to match device outputORT_DISABLE_ALL flag is critical for numerical equivalence