Local-first AI orchestration with a host-authoritative Rust runtime, extension-driven capabilities, and first-party support for local LLM, audio, and video workflows.
- π Run It Locally First
- π§© Supported Capabilities
- π¬ Video Generation β SVI2-Pro Β· LongCat Β· LTX-2.3, RIFE frame-gen, RTX Γ2/Γ4 upscale
- π§ LLM Inference β llama.cpp + speculative decoding (MTP)
- π€ Voice Generation (EmotionTTS) β IndexTTS-2 emotion vectors + storyboard
- π§ Image-to-3D β Microsoft TRELLIS.2, single image β watertight GLB mesh
- π¨ Image Generation β Stable Diffusion Β· FLUX (coming soon)
- β¨ What nexus-dnn Is For
- π§ System At A Glance
- πΌοΈ UI Screenshots
- π Built-in Extensions
- π Documentation Map
- π£οΈ Future Roadmap
- π License
| What | Minimum |
|---|---|
| Rust | stable |
| Node.js | 20+ |
| pnpm | 8+ |
uv |
latest recommended |
uv matters because several built-in extensions use host-managed Python environments.
GPU / production install? See docs/deployment/container-and-cuda-dependencies.md β "Installation by platform". On linux-aarch64 (DGX Spark) run the container (
just dgx-deploy); on Windows use the native build scriptdockerfiles/win64.build.ps1(no Windows container β CUDA is Linux-container-only).
git clone <your-fork-or-origin-url> nexus-dnn
cd nexus-dnn
# Host only: browser UI served from the embedded frontend bundle
cargo host
# or
cargo run -p nexus-core --bin nexus-dnnOpen http://127.0.0.1:3000.
curl http://127.0.0.1:3000/api/v1/healthExpected shape:
{
"data": {
"status": "ok",
"details": {
"...": "additional live health fields may appear here"
}
},
"meta": {
"timestamp": "2026-06-12T00:00:00Z"
},
"error": null
}
nexus-dnncurrently listens on0.0.0.0:$NEXUS_PORT, but local usage should still prefer127.0.0.1.
| Goal | Command |
|---|---|
| Host only | cargo host |
| Host + TUI | cargo dev |
| TUI drives host | cargo dev-tui |
| Desktop shell | cd apps/web && pnpm install && pnpm tauri dev |
| Rebuild embedded web app | cd apps/web && pnpm install && pnpm build |
The host serves the already-built web bundle from apps/web/dist, so frontend install/build is mainly for frontend or desktop-shell development.
nexus-dnn is a local-first platform for running AI features as structured host-managed systems instead of ad-hoc scripts. The host owns process lifecycle, storage, installs, API routing, workflows, model/runtime leasing, and extension boundaries. Extensions add domain capability without taking control away from the host.
Today that means the repo can host:
- Local chat and RAG workflows
- Host-managed backend runtime leasing
- Emotional TTS pipelines
- Image-to-video and long-video generation flows
- Image-to-3D mesh generation (single image β watertight GLB)
- Extension-owned UI surfaces mounted inside the host app
Everything below runs locally, on a single consumer GPU, behind the same host-managed runtime-lease + model-store foundation.
| Capability | Engines | Highlights | Status |
|---|---|---|---|
| π¬ Video Generation | SVI2-Pro Β· LongCat Β· LTX-2.3 | TextβVideo, ImageβVideo, infinite length, RIFE frame-gen, RTX Γ2/Γ4 upscale | π’ Stable |
| π§ LLM Inference | llama.cpp | Speculative decoding via MTP, GGUF, host-managed runtime leases | π’ Stable |
| π€ Voice Generation | IndexTTS-2 (EmotionTTS) | 8-axis emotion vectors, storyboard, custom-voice upload | π’ Stable |
| π§ Image-to-3D | Microsoft TRELLIS.2 | Single image β watertight GLB, mesh-only or textured, triangle-budget decimation, in-browser 3D preview | π’ Stable |
| π¨ Image Generation | Stable Diffusion Β· FLUX | TextβImage | π Coming soon |
nexus.video.svi2-proΒ·nexus.video.longcatΒ·nexus.video.ltx23
Generate video from a text prompt or a still image β then push it past what a single diffusion pass yields: higher frame-rate and higher resolution, all on a local GPU.
Engines
- SVI2-Pro β Stable Video Infinity 2.0 Pro (two SVI LoRAs over the Wan2.2-I2V-A14B dual-expert MoE). Does both ImageβVideo and TextβVideo, with infinite, cross-clip-consistent length: clips are chained with rolling cross-fade + reference anchoring so the subject stays coherent across arbitrarily many segments. The fp8 e4m3fn base fits in 16 GB of VRAM β or less.
- LTX-2.3 β fast image-to-video with host-managed runtime profiles. RTX 40 FP8, RTX 50 Blackwell FP8 (production) + RTX 50 NVFP4 (experimental). 16 GB-safe by default via external-segment rendering.
- LongCat β 13.6B DiT (UMT5-XXL text encoder, Wan 2.1 VAE) for textβvideo, imageβvideo, and long-video continuations. FP8 e4m3fn path for 12β16 GB; BF16 path for 24 GB+.
Post-processing stack β applies on top of any engine
- π RIFE frame interpolation (frame-gen) β torch-RIFE (vendored IFNet HDv3) synthesizes in-between frames to multiply FPS (e.g. 16 β 48 fps) for fluid motion, with no extra diffusion cost.
- π RTX Γ2 / Γ4 upscaling β NVIDIA Maxine RTX super-resolution on RTX GPUs upscales the output in a hardware-accelerated pass (e.g. 1216Γ768 β 2432Γ1536).
- βοΈ Attention backends (SDPA / FlashAttention-2/3 / SageAttention) are auto-selected per GPU architecture + dtype.
16 GB-friendly by design β staged CPU offload, fp8 compute, and external-segment rendering keep peak VRAM under consumer-card budgets.
nexus.local-llm
Local large-language-model inference and chat, served through host-managed backend runtimes.
- β‘ Latest llama.cpp backend with speculative decoding via MTP (Multi-Token Prediction) β a draft head proposes several tokens per step and the main model verifies them in one pass, for materially higher tokens/sec at identical output quality.
- π¦ GGUF models with quantization-aware install and an on-disk model store.
- π‘οΈ Host-managed runtime leases β the host owns process lifecycle, VRAM budgeting, and idle reaping; the extension simply acquires a lease.
- π¬ Interactive chat threads with per-thread generation settings, model picker, and RAG workflows.
- ποΈ Throughput knobs: KV-cache reuse, MoE offload, min-p / DRY sampling, context cram.
nexus.audio.emotiontts
State-of-the-art emotional text-to-speech via IndexTTS-2, running in a host-managed Python subprocess.
- ποΈ 8-axis emotion vectors β dial the emotional tone (joy, anger, sadness, surprise, β¦) per line. Optional Qwen text-emotion inference reads the intended emotion straight from the text, and audio-reference transfer copies the feeling from a sample clip.
- π¬ Storyboard β author a multi-line script/dialogue, assign a voice and an emotion vector to every line, and batch-synthesize the whole scene in a single run. The lines render as one coherent, ordered sequence with independent per-line control β think a screenplay that compiles to audio, with each character speaking in their own voice and mood.
- ποΈ Custom voice upload β drop in your own reference audio to mint a custom voice ("voice asset"). Automatic reference-audio preprocessing, alignment-score observability, and speaker-prefix caching keep quality high and re-synthesis fast.
- ποΈ Deployment-scoped characterβvoice mappings, a global content-hash synthesis cache (10 GB LRU), and partial-ZIP install with auto-resume.
nexus.3d.trellis2Β· operatortrellis2.generate_3d
Turn a single image into a watertight 3D mesh with Microsoft TRELLIS.2 (a 4B flow-matching model over an O-Voxel sparse structure), then orbit and download the result as a GLB β all on a local GPU.
- π§ Single image β watertight GLB β upload a subject on a clean background; the pipeline runs image β sparse structure β shape β mesh decode β GLB export and returns a downloadable artifact.
- ποΈ Tunable flow β sparse-structure and shape flow steps, deterministic seed, and a triangle-budget decimation target for light, game-ready meshes.
- π¨ Mesh-only or textured β skip the texture pass for a fast
MeshOnlyGLB, or bake a full PBR texture for a shaded result. - π§ In-browser 3D preview β every result renders in an interactive
<model-viewer>(orbit, auto-rotate, neutral/ACES tone-mapping, exposure) with a live FORMAT / TRIANGLES / VERTICES readout and one-click GLB download. - π‘οΈ Host-managed β install, runtime lease, storage, and
/mediaartifact serving all flow through the same host foundation as the video and LLM stacks.
Validated end-to-end on the DGX Spark (GB10, aarch64 Blackwell sm_121) with vendored native
kernels. See extensions/builtin/trellis2/README.md, or
open the standalone showcase: docs/showcase/trellis2-image-to-3d.html.
π Coming soon
Text-to-image generation via Stable Diffusion and FLUX, packaged as host-managed extensions on the same runtime-lease + model-store foundation as the video and LLM stacks β so installs, VRAM budgeting, and UI mounting work exactly the same way.
flowchart LR
UI["π₯οΈ Web UI / Tauri / TUI"] --> HOST["π‘οΈ nexus-dnn host"]
HOST --> API["HTTP + SSE + WS"]
HOST --> REG["Extension registry"]
HOST --> DEPS["Dependency installer"]
HOST --> STORE["SQLite + artifact store"]
HOST --> RUNTIMES["Backend runtime leases"]
REG --> EXT["Built-in + external extensions"]
EXT --> WORKERS["Native / Python workers"]
WORKERS --> RUNTIMES
WORKERS --> STORE
| Area | Authority |
|---|---|
| HTTP listener, health, API envelopes | Host |
| Extension discovery, validation, enable/disable | Host |
| Dependency install plans | Host |
| Runtime processes and leases | Host |
| Workflow storage, runs, artifacts | Host |
| Extension-specific logic and UX | Extension, but only through host-owned surfaces |
|
Image-to-3D β TRELLIS 2 generative surface |
|
|
Extensions gallery |
Models browser |
|
Deployments |
Backend runtimes |
|
Dependency installer |
Modules |
|
SVI2 recipe |
SVI2 recipe graph |
| Extension | Status | What it adds |
|---|---|---|
nexus.local-llm |
π’ active product surface | Local chat, RAG, backend-runtime integration, model/browser layouts |
nexus.audio.emotiontts |
π’ active product surface | Emotional dialogue TTS, voice assets, batch runs, audio editing |
nexus.video.ltx23 |
π’ active product surface | LTX 2.3 image-to-video with host-managed runtime profiles |
nexus.video.longcat |
π‘ active extension, still evolving | LongCat-based long-video generation paths |
nexus.video.svi2-pro |
π‘ advanced / high-requirement path | SVI 2.0 Pro image-to-video for Blackwell-focused setups |
nexus.3d.trellis2 |
π’ active product surface | Imageβ3D mesh generation (Microsoft TRELLIS.2), GLB export, in-app 3D viewer |
Several of these extensions ship more than operators:
- Dependency graphs
- Backend runtime manifests
- Storage migrations
- YAML layouts
- Static web assets and custom elements
- Extension routers mounted under
/api/v1/extensions/{ext_id}/...
sequenceDiagram
participant User as User Surface
participant Host as Host App
participant Ext as Extension Registry/Router
participant Worker as Native or Python Worker
participant Runtime as Backend Runtime Lease
participant Store as Storage + Artifacts
User->>Host: UI action / API request
Host->>Ext: Resolve extension metadata or route
Ext->>Host: Manifest, layouts, storage, router, runtime declarations
Host->>Worker: JSON-RPC / typed host service calls
Worker->>Runtime: Acquire or use host-managed lease
Worker->>Store: Read/write artifacts through host-owned paths
Worker-->>Host: Results, progress, errors
Host-->>User: REST/SSE/WS updates + rendered UI
The important boundary is that extensions do not become mini-hosts. They can contribute routes, UIs, and workers, but the host still owns mounting, serving, validation, and lifecycle.
The strongest recent validation evidence in the repo is centered on a Windows workstation rather than a broad cross-platform certification matrix.
| Area | Evidence in repo |
|---|---|
| OS | Windows |
| GPU | NVIDIA GeForce RTX 5070 Ti |
| VRAM | ~15.9 GiB |
| Driver family | 570.65+ |
| Python evidence | 3.12.11 |
| Torch evidence | 2.12.0 + CUDA 13.2 |
Architectures: the host targets amd64 Windows, amd64 Linux, and aarch64 Linux (e.g. DGX Spark / GB10). The host, embedded Python, ffmpeg, and the LLM install pipeline are arch-aware across all three; on aarch64, managed llama.cpp is CPU-only (GPU via external llama-server) and GPU video paths are experimental. See the Architecture Support matrix.
For the detailed support notes and caveats, read docs/platform-support.md and docs/requirements.md.
Start here:
- docs/getting-started.md β install, run, verify, and choose the right local launch mode
- docs/platform-support.md β what is validated today, what is experimental, and hardware expectations
- docs/architecture.md β host authority, crate map, runtime model, and data flow
- docs/extension-internals.md β how extensions are discovered, mounted, and constrained
- docs/configuration.md β CLI flags, env vars, config file, and data directories
Reference and deeper dives:
- docs/api-reference.md
- docs/extension-guide.md
- docs/data-model.md
- docs/database-schema.md
- docs/README.md
The next platform-level milestones should reinforce host authority instead of weakening it.
- MCP control The host should expose and govern tool/runtime control surfaces in a first-class way, instead of scattering capability across ad-hoc extension UX.
- Remote workers Worker execution should be able to move off-box while still preserving host-owned leases, auditability, and policy checks.
- Extensions SDK Extension authoring should become a clearer supported product surface, with stable contracts, better scaffolding, and thinner accidental complexity.
See docs/roadmap.md for the expanded roadmap.
apps/web/ React web frontend + desktop shell frontend
crates/ Rust workspace crates
extensions/builtin/ First-party extensions
docs/ User and architecture documentation
specs/ Detailed feature specs and verification artifacts
graphify-out/ Generated codebase graph and reports
cargo test
cargo clippy
cd apps/web && pnpm testGPL-3.0. See LICENSE.







