YourVoices

⚖️

Platform Rules: The Sovereign Privacy Charter

Our platform contains no convoluted privacy agreements or twisted terms — because we simply do not own nor collect any information about anyone, unless they personally and of their own free will choose to publish it and confirm their possession of it through the unique voice sample verification.

The vast majority of humanity has come to see the entire internet as a grand surveillance apparatus and a tool for violating digital sanctities. The prevailing perception among individuals is that "our privacy is the cheapest commodity in the global technology market."

And here, the voice arrived — the voice of every human being who suffers from the violation of their digital rights and refuses to be merely a number in an algorithm. Monopeak's philosophy is built on breaking the mould of blind compliance: you should not stand submissively just because everyone else decided to stand when the red light flashes, and you should not accept — neither programmatically nor humanly — something you are not convinced of deep within. YourVoices is the space that restores that trust and complete sovereignty to you.

Core Capabilities

🎙️

Voice Sample

Authenticate using your unique voice sample — no passwords, no recovery emails, no third-party identity providers. Learn about the science →

🛡️

Sovereign Identity

Every user holds a self-sovereign identifier (V-ID). Your identity belongs to you, stored on your device, not in a central database.

✉️

Private Messaging

End-to-end encrypted whispers reach only the intended recipient. No metadata trails, no server-side message inspection.

📞

Voice Calls

Peer-to-peer encrypted voice calls. Your conversations stay between you and the person you trust.

How Registration Works

Creating a YourVoices account takes under two minutes and requires nothing but your voice. Here is exactly what happens, step by step.

🎙️

Record Your Voice

You record three short voice samples (7 seconds each). The app processes them entirely on your device — no audio is ever sent to our servers.

Each recording passes through our DSP pipeline (FBank → CMVN → CAM++ ONNX) and becomes a 512-dimensional mathematical vector. The three vectors are averaged into a single voice centroid, which is your identity anchor.

🔐

Set Your PIN

You choose an 8-digit PIN. This PIN is never stored in plain text, never sent anywhere. It is combined with your voice embedding via PBKDF2-HMAC-SHA256 (200,000 iterations) to derive your personal MonocodeKey — the encryption key for your local data.

Your private keys and sensitive data are protected by this PIN-derived key, stored in Android's encrypted storage. If you forget your PIN, there is no recovery — this is a deliberate security design.

🆔

Your V-ID is Created

A unique Sovereign ID (V-ID) is generated on your device. Your voice centroid and public encryption key are sent to our server to complete registration. Your private key never leaves your device.

You receive 3 GB of Bunker storage — your personal encrypted cloud vault on Hetzner's Helsinki data centre. All content in your Bunker is end-to-end encrypted; we cannot access it.

✉️

Email Verification (Optional)

You may optionally verify an email address. This unlocks an additional +2 GB of Bunker storage (total 5 GB). Your email is used solely for this verification — never for marketing or shared with third parties.

Email verification is entirely optional. The core service — voice identity, messaging, and calls — works without it.

👤

Set Up Your Profile (Optional)

In your account settings, you may add a display name, profile photo, and short biography. This information is publicly visible to all platform users — only include what you are comfortable sharing openly.

Profile information is never required. You can use YourVoices entirely anonymously, identified only by your V-ID.

Your Data — What We Hold & Where

We believe you have a right to know exactly what data exists about you and where it lives. The table below is the complete picture.

Data	Where stored	Can we read it?	You can delete it
Voice embedding & centroid Mathematical vector — not audio	Your device Server	No — only a vector	Yes — delete account
PIN & MonocodeKey PBKDF2-derived encryption key	Your device only	Never — stays on device	Yes — uninstall app
Private keys X25519 keypair for E2E encryption	Your device only	Never — stays on device	Yes — uninstall app
Sovereign ID (V-ID)	Your device Server	Yes — it is an identifier	Yes — delete account
Public encryption key	Server	Yes — it is public by design	Yes — delete account
Email address Optional	Server	Yes	Yes — delete account
Profile (name, photo, bio) Optional — publicly visible	Server	Yes — and so can everyone else	Yes — edit or delete account
Messages & media Whisper — end-to-end encrypted	Bunker (Helsinki)	No — encrypted before upload	Yes — individually or delete account
Call logs Encrypted metadata	Bunker (Helsinki)	No — encrypted before upload	Yes — delete account

Read the full Privacy Policy

← Back to Home

🎙️ Voice Sample

How voice sample verification works — experimental results & technical references

What Is a Voice Sample?

A voice sample is a 512-dimensional mathematical vector (embedding) extracted from your voice. Unlike a password that you must remember or a fingerprint that requires special hardware, your voice sample is computed from a short audio recording (7 seconds) and serves as a soft biometric identifier — not a hard cryptographic proof, but a statistically reliable signal that helps confirm your identity.

The term "voice sample" is deliberately chosen over "voice print" to avoid the misleading implication that voice is as unique and immutable as a fingerprint. Voice is behavioural: it changes with health, emotion, age, and recording conditions. It is a sample taken at a moment in time, compared against a stored centroid (the average of several enrolment samples).

The Pipeline

From raw audio to similarity score — every step runs entirely on-device:

# 1. Audio capture (Android AudioRecord) 16 kHz mono 16-bit PCM ── 7 seconds # 2. DSP — Rust fbank_cli (identical pipeline on device & test harness) Pre-emphasis (α=0.97) → Framing (25ms, 10ms stride) → Hamming window → FFT (512-point) → Mel filterbank (80 filters, 20-8000 Hz) → Log(|FFT|²) ── 80-dimensional FBank × T frames # 3. CMVN (Cepstral Mean Variance Normalization) For each mel bin: x ← (x - μ) / max(σ, 1e-10) → Zero-mean, unit-variance per frequency bin across time # 4. ONNX Inference — CAM++ model (voxceleb_CAM++.onnx, 29 MB) Frozen feature extractor (no fine-tuning), graph optimizations disabled → 512-dimensional embedding vector # 5. L2 Normalization + Cosine Similarity emb ← emb / ||emb||₂ → score = cosine(probe_emb, stored_centroid)

Experimental Results

The system was evaluated on VoxCeleb1 (1,211 speakers, ~148k utterances) using the production-identical pipeline: Rust DSP → CMVN → CAM++ ONNX (ORT_DISABLE_ALL).

0.289

Mean Similarity Gap
(target ≥ 0.25 ✅)

~9.2%

Equal Error Rate
(target ≤ 5% ❌)

0.70

Operating Threshold
(conservative, security-tilted)

Key insight: The EER of ~9% reflects a frozen CAM++ model used as a feature extractor with cosine similarity — no fine-tuning, no PLDA backend calibration. Published state-of-the-art results (1-2% EER) are achieved after full training on VoxCeleb with data augmentation and backend calibration. This ~9% figure is the honest, measured upper bound of our current pipeline on noisy, real-world data.

Two-speaker validation: In a controlled test with two distinct speakers (clean recordings, known ground truth), the system achieved a mean gap of 0.342 with zero overlap between intra-speaker and inter-speaker similarity distributions — comfortably above the 0.30 red line.

Component Verification Status

Component	Verification	Method
DSP (Rust fbank_cli)	✅	Bit-exact with device pipeline (flat FBank match)
CMVN (axis=0, ε=1e-10)	✅	Zero-mean verified per bin; matches Kotlin implementation exactly
ONNX (ORT_DISABLE_ALL)	✅	maxAbsDiff < 1e-4 between device and Python reference
L2 normalization	✅	Every embedding has ‖x‖ = 1.0 ± 1e-6
VoxCeleb1 (dev, ~9% EER)	⚠️	Estimated on 30 speakers × 5 utterances; official vox1_test evaluation pending file availability

Reproduce the Results

The full test harness is in the repository under cmvn_experiment/voxceleb/:

# 1. Install dependencies
pip install numpy soundfile onnxruntime

# 2. Extract FBank with the Rust CLI (production-equivalent DSP)
cargo build --release --bin fbank_cli
./target/release/fbank_cli input.wav features.json

# 3. Run the analysis pipeline
python3 cmvn_experiment/voxceleb/run_voxceleb_test.py
      

fbank_cli — Rust FBank extractor, identical to the Android app's native DSP
CMVN — Per-bin mean subtraction and variance normalization (Python, matched to Kotlin)
ONNX inference — voxceleb_CAM++.onnx, ORT_DISABLE_ALL to match device output
Cosine similarity — All-pairs intra-speaker vs inter-speaker scoring → EER via threshold sweep

References & Further Reading

🔗 voice-vector
Python package for voice embedding extraction using CAM++ and ECAPA-TDNN models — useful for cross-validation and experimenting with different back-ends
📄 CAM++: A Fast and Efficient Neural Network for Speaker Verification (2023)
The paper describing the CAM++ architecture used in our ONNX model. Achieves 0.85% EER on VoxCeleb1 with full training and PLDA backend
📊 VoxCeleb1 Dataset
1,211 celebrities, ~148k utterances, extracted from YouTube videos. The standard benchmark for speaker verification research
🔧 VoxCeleb Trainer
Official training and evaluation code for VoxCeleb — the reference implementation for metric learning and PLDA back-end calibration
⚡ ONNX Runtime
Cross-platform inference engine used on both Android (Kotlin) and the test harness (Python). The ORT_DISABLE_ALL flag is critical for numerical equivalence
🎵 Librosa
Python audio analysis library. Not used in our pipeline — we use Rust fbank_cli for DSP equivalence — but useful for understanding FBank theory and visualisation