idle intelligence
NVIDIA PersonaPlex-7B — pruned to 24 layers, recovered with QLoRA, quantized to Q4_K. Client-side inference: WebGPU via Burn + Rust/WASM.
⚠ Work in progress — runs slower than realtime without a powerful GPU. Expect delays during audio generation.
Desktop only. Requires discrete GPU or Apple Silicon with 16GB+ RAM.
No voice prompt — speaker identity sampled from prior distribution
Audio stays on this device. All inference runs locally in your browser.