Skip to content

ldilov/nexus-dnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1,421 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

nexus-dnn

Local-first AI orchestration with a host-authoritative Rust runtime, extension-driven capabilities, and first-party support for local LLM, audio, and video workflows.

License Rust Local-first VRAM
Video LLM Voice 3D Image

πŸ“‹ Table of Contents

πŸš€ Run It Locally First

1. Install prerequisites

What Minimum
Rust stable
Node.js 20+
pnpm 8+
uv latest recommended

uv matters because several built-in extensions use host-managed Python environments.

GPU / production install? See docs/deployment/container-and-cuda-dependencies.md β†’ "Installation by platform". On linux-aarch64 (DGX Spark) run the container (just dgx-deploy); on Windows use the native build script dockerfiles/win64.build.ps1 (no Windows container β€” CUDA is Linux-container-only).

2. Clone and start

git clone <your-fork-or-origin-url> nexus-dnn
cd nexus-dnn

# Host only: browser UI served from the embedded frontend bundle
cargo host

# or

cargo run -p nexus-core --bin nexus-dnn

Open http://127.0.0.1:3000.

3. Verify the host is healthy

curl http://127.0.0.1:3000/api/v1/health

Expected shape:

{
  "data": {
    "status": "ok",
    "details": {
      "...": "additional live health fields may appear here"
    }
  },
  "meta": {
    "timestamp": "2026-06-12T00:00:00Z"
  },
  "error": null
}

nexus-dnn currently listens on 0.0.0.0:$NEXUS_PORT, but local usage should still prefer 127.0.0.1.

4. Useful local launch modes

Goal Command
Host only cargo host
Host + TUI cargo dev
TUI drives host cargo dev-tui
Desktop shell cd apps/web && pnpm install && pnpm tauri dev
Rebuild embedded web app cd apps/web && pnpm install && pnpm build

The host serves the already-built web bundle from apps/web/dist, so frontend install/build is mainly for frontend or desktop-shell development.

✨ What nexus-dnn Is For

nexus-dnn is a local-first platform for running AI features as structured host-managed systems instead of ad-hoc scripts. The host owns process lifecycle, storage, installs, API routing, workflows, model/runtime leasing, and extension boundaries. Extensions add domain capability without taking control away from the host.

Today that means the repo can host:

  • Local chat and RAG workflows
  • Host-managed backend runtime leasing
  • Emotional TTS pipelines
  • Image-to-video and long-video generation flows
  • Image-to-3D mesh generation (single image β†’ watertight GLB)
  • Extension-owned UI surfaces mounted inside the host app

🧩 Supported Capabilities

Everything below runs locally, on a single consumer GPU, behind the same host-managed runtime-lease + model-store foundation.

Capability Engines Highlights Status
🎬 Video Generation SVI2-Pro Β· LongCat Β· LTX-2.3 Textβ†’Video, Imageβ†’Video, infinite length, RIFE frame-gen, RTX Γ—2/Γ—4 upscale 🟒 Stable
🧠 LLM Inference llama.cpp Speculative decoding via MTP, GGUF, host-managed runtime leases 🟒 Stable
🎀 Voice Generation IndexTTS-2 (EmotionTTS) 8-axis emotion vectors, storyboard, custom-voice upload 🟒 Stable
🧊 Image-to-3D Microsoft TRELLIS.2 Single image β†’ watertight GLB, mesh-only or textured, triangle-budget decimation, in-browser 3D preview 🟒 Stable
🎨 Image Generation Stable Diffusion Β· FLUX Textβ†’Image 🟠 Coming soon

🎬 Video Generation

nexus.video.svi2-pro Β· nexus.video.longcat Β· nexus.video.ltx23

Generate video from a text prompt or a still image β€” then push it past what a single diffusion pass yields: higher frame-rate and higher resolution, all on a local GPU.

Engines

  • SVI2-Pro β€” Stable Video Infinity 2.0 Pro (two SVI LoRAs over the Wan2.2-I2V-A14B dual-expert MoE). Does both Imageβ†’Video and Textβ†’Video, with infinite, cross-clip-consistent length: clips are chained with rolling cross-fade + reference anchoring so the subject stays coherent across arbitrarily many segments. The fp8 e4m3fn base fits in 16 GB of VRAM β€” or less.
  • LTX-2.3 β€” fast image-to-video with host-managed runtime profiles. RTX 40 FP8, RTX 50 Blackwell FP8 (production) + RTX 50 NVFP4 (experimental). 16 GB-safe by default via external-segment rendering.
  • LongCat β€” 13.6B DiT (UMT5-XXL text encoder, Wan 2.1 VAE) for textβ†’video, imageβ†’video, and long-video continuations. FP8 e4m3fn path for 12–16 GB; BF16 path for 24 GB+.

Post-processing stack β€” applies on top of any engine

  • πŸŒ€ RIFE frame interpolation (frame-gen) β€” torch-RIFE (vendored IFNet HDv3) synthesizes in-between frames to multiply FPS (e.g. 16 β†’ 48 fps) for fluid motion, with no extra diffusion cost.
  • πŸ” RTX Γ—2 / Γ—4 upscaling β€” NVIDIA Maxine RTX super-resolution on RTX GPUs upscales the output in a hardware-accelerated pass (e.g. 1216Γ—768 β†’ 2432Γ—1536).
  • βš™οΈ Attention backends (SDPA / FlashAttention-2/3 / SageAttention) are auto-selected per GPU architecture + dtype.

16 GB-friendly by design β€” staged CPU offload, fp8 compute, and external-segment rendering keep peak VRAM under consumer-card budgets.


🧠 LLM Inference

nexus.local-llm

Local large-language-model inference and chat, served through host-managed backend runtimes.

  • ⚑ Latest llama.cpp backend with speculative decoding via MTP (Multi-Token Prediction) β€” a draft head proposes several tokens per step and the main model verifies them in one pass, for materially higher tokens/sec at identical output quality.
  • πŸ“¦ GGUF models with quantization-aware install and an on-disk model store.
  • πŸ›‘οΈ Host-managed runtime leases β€” the host owns process lifecycle, VRAM budgeting, and idle reaping; the extension simply acquires a lease.
  • πŸ’¬ Interactive chat threads with per-thread generation settings, model picker, and RAG workflows.
  • 🎚️ Throughput knobs: KV-cache reuse, MoE offload, min-p / DRY sampling, context cram.

🎀 Voice Generation (EmotionTTS)

nexus.audio.emotiontts

State-of-the-art emotional text-to-speech via IndexTTS-2, running in a host-managed Python subprocess.

  • 🎚️ 8-axis emotion vectors β€” dial the emotional tone (joy, anger, sadness, surprise, …) per line. Optional Qwen text-emotion inference reads the intended emotion straight from the text, and audio-reference transfer copies the feeling from a sample clip.
  • 🎬 Storyboard β€” author a multi-line script/dialogue, assign a voice and an emotion vector to every line, and batch-synthesize the whole scene in a single run. The lines render as one coherent, ordered sequence with independent per-line control β€” think a screenplay that compiles to audio, with each character speaking in their own voice and mood.
  • πŸŽ™οΈ Custom voice upload β€” drop in your own reference audio to mint a custom voice ("voice asset"). Automatic reference-audio preprocessing, alignment-score observability, and speaker-prefix caching keep quality high and re-synthesis fast.
  • πŸ—‚οΈ Deployment-scoped characterβ†’voice mappings, a global content-hash synthesis cache (10 GB LRU), and partial-ZIP install with auto-resume.

🧊 Image-to-3D

nexus.3d.trellis2 Β· operator trellis2.generate_3d

Turn a single image into a watertight 3D mesh with Microsoft TRELLIS.2 (a 4B flow-matching model over an O-Voxel sparse structure), then orbit and download the result as a GLB β€” all on a local GPU.

TRELLIS 2 β€” image-to-3D generative surface

  • 🧊 Single image β†’ watertight GLB β€” upload a subject on a clean background; the pipeline runs image β†’ sparse structure β†’ shape β†’ mesh decode β†’ GLB export and returns a downloadable artifact.
  • 🎚️ Tunable flow β€” sparse-structure and shape flow steps, deterministic seed, and a triangle-budget decimation target for light, game-ready meshes.
  • 🎨 Mesh-only or textured β€” skip the texture pass for a fast MeshOnly GLB, or bake a full PBR texture for a shaded result.
  • 🧭 In-browser 3D preview β€” every result renders in an interactive <model-viewer> (orbit, auto-rotate, neutral/ACES tone-mapping, exposure) with a live FORMAT / TRIANGLES / VERTICES readout and one-click GLB download.
  • πŸ›‘οΈ Host-managed β€” install, runtime lease, storage, and /media artifact serving all flow through the same host foundation as the video and LLM stacks.

Validated end-to-end on the DGX Spark (GB10, aarch64 Blackwell sm_121) with vendored native kernels. See extensions/builtin/trellis2/README.md, or open the standalone showcase: docs/showcase/trellis2-image-to-3d.html.


🎨 Image Generation

🟠 Coming soon

Text-to-image generation via Stable Diffusion and FLUX, packaged as host-managed extensions on the same runtime-lease + model-store foundation as the video and LLM stacks β€” so installs, VRAM budgeting, and UI mounting work exactly the same way.

status

🧭 System At A Glance

flowchart LR
    UI["πŸ–₯️ Web UI / Tauri / TUI"] --> HOST["πŸ›‘οΈ nexus-dnn host"]
    HOST --> API["HTTP + SSE + WS"]
    HOST --> REG["Extension registry"]
    HOST --> DEPS["Dependency installer"]
    HOST --> STORE["SQLite + artifact store"]
    HOST --> RUNTIMES["Backend runtime leases"]
    REG --> EXT["Built-in + external extensions"]
    EXT --> WORKERS["Native / Python workers"]
    WORKERS --> RUNTIMES
    WORKERS --> STORE
Loading

Host authority is the core design rule

Area Authority
HTTP listener, health, API envelopes Host
Extension discovery, validation, enable/disable Host
Dependency install plans Host
Runtime processes and leases Host
Workflow storage, runs, artifacts Host
Extension-specific logic and UX Extension, but only through host-owned surfaces

πŸ–ΌοΈ UI Screenshots

Image-to-3D β€” TRELLIS 2 generative surface

Image-to-3D

Extensions gallery

Extensions

Models browser

Models

Deployments

Deployments

Backend runtimes

Backend runtimes

Dependency installer

Dependency installer

Modules

Modules

SVI2 recipe

SVI2 recipe

SVI2 recipe graph

SVI2 recipe graph

πŸ”Œ Built-in Extensions

Extension Status What it adds
nexus.local-llm 🟒 active product surface Local chat, RAG, backend-runtime integration, model/browser layouts
nexus.audio.emotiontts 🟒 active product surface Emotional dialogue TTS, voice assets, batch runs, audio editing
nexus.video.ltx23 🟒 active product surface LTX 2.3 image-to-video with host-managed runtime profiles
nexus.video.longcat 🟑 active extension, still evolving LongCat-based long-video generation paths
nexus.video.svi2-pro 🟑 advanced / high-requirement path SVI 2.0 Pro image-to-video for Blackwell-focused setups
nexus.3d.trellis2 🟒 active product surface Imageβ†’3D mesh generation (Microsoft TRELLIS.2), GLB export, in-app 3D viewer

Several of these extensions ship more than operators:

  • Dependency graphs
  • Backend runtime manifests
  • Storage migrations
  • YAML layouts
  • Static web assets and custom elements
  • Extension routers mounted under /api/v1/extensions/{ext_id}/...

πŸ”„ How The Host And Extensions Communicate

sequenceDiagram
    participant User as User Surface
    participant Host as Host App
    participant Ext as Extension Registry/Router
    participant Worker as Native or Python Worker
    participant Runtime as Backend Runtime Lease
    participant Store as Storage + Artifacts

    User->>Host: UI action / API request
    Host->>Ext: Resolve extension metadata or route
    Ext->>Host: Manifest, layouts, storage, router, runtime declarations
    Host->>Worker: JSON-RPC / typed host service calls
    Worker->>Runtime: Acquire or use host-managed lease
    Worker->>Store: Read/write artifacts through host-owned paths
    Worker-->>Host: Results, progress, errors
    Host-->>User: REST/SSE/WS updates + rendered UI
Loading

The important boundary is that extensions do not become mini-hosts. They can contribute routes, UIs, and workers, but the host still owns mounting, serving, validation, and lifecycle.

πŸ§ͺ Tested Machine Snapshot

The strongest recent validation evidence in the repo is centered on a Windows workstation rather than a broad cross-platform certification matrix.

Area Evidence in repo
OS Windows
GPU NVIDIA GeForce RTX 5070 Ti
VRAM ~15.9 GiB
Driver family 570.65+
Python evidence 3.12.11
Torch evidence 2.12.0 + CUDA 13.2

Architectures: the host targets amd64 Windows, amd64 Linux, and aarch64 Linux (e.g. DGX Spark / GB10). The host, embedded Python, ffmpeg, and the LLM install pipeline are arch-aware across all three; on aarch64, managed llama.cpp is CPU-only (GPU via external llama-server) and GPU video paths are experimental. See the Architecture Support matrix.

For the detailed support notes and caveats, read docs/platform-support.md and docs/requirements.md.

πŸ“š Documentation Map

Start here:

Reference and deeper dives:

πŸ›£οΈ Future Roadmap

The next platform-level milestones should reinforce host authority instead of weakening it.

  1. MCP control The host should expose and govern tool/runtime control surfaces in a first-class way, instead of scattering capability across ad-hoc extension UX.
  2. Remote workers Worker execution should be able to move off-box while still preserving host-owned leases, auditability, and policy checks.
  3. Extensions SDK Extension authoring should become a clearer supported product surface, with stable contracts, better scaffolding, and thinner accidental complexity.

See docs/roadmap.md for the expanded roadmap.

🧱 Repo Shape

apps/web/                  React web frontend + desktop shell frontend
crates/                    Rust workspace crates
extensions/builtin/        First-party extensions
docs/                      User and architecture documentation
specs/                     Detailed feature specs and verification artifacts
graphify-out/              Generated codebase graph and reports

πŸ” Verification Commands

cargo test
cargo clippy
cd apps/web && pnpm test

πŸ“„ License

GPL-3.0. See LICENSE.

About

Local-first AI orchestration with a host-authoritative Rust runtime, extension-driven capabilities, and first-party support for local LLM, audio, and video workflows. The hub where everything connects and extends.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors