idle intelligence

sts-web

NVIDIA PersonaPlex-7B — pruned to 24 layers, recovered with QLoRA, quantized to Q4_K. Client-side inference: WebGPU via Burn + Rust/WASM.

github.com/idle-intelligence/sts-web

⚠ Work in progress — runs slower than realtime without a powerful GPU. Expect delays during audio generation.

status

Checking browser support...

Desktop only. Requires discrete GPU or Apple Silicon with 16GB+ RAM.

conversation

No voice prompt — speaker identity sampled from prior distribution

Audio stays on this device. All inference runs locally in your browser.

output · inner monologue

output · audio

files · test mode

performance

--
ms/frame
--
Frames/sec
--
RTF
--
Temporal (ms)
--
Depth (ms)
--
Mimi (ms)