#

inference-engine

Here are 168 public repositories matching this topic...

FedML-AI / FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

machine-learning deep-learning inference-engine model-deployment model-serving distributed-training federated-learning mlops edge-ai ai-agent on-device-training

Updated Oct 28, 2025
Python

qualcomm / ai-hub-models

Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

machine-learning inference pytorch machinelearning deeplearning demos inference-engine onnx tensorflow-lite qnn inference-api

Updated Jul 1, 2026
Python

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

serverless inference-engine llm llm-serving vllm llm-inference ollama llm-framework sglang kvcache gpu-sharing kvcached gpu-mutiplexing kvcache-optimization elastic-kvcache online-offline-coserve

Updated Jul 1, 2026
Python

youssofal / MTPLX

2.24x decode TPS increase On Qwen 3.6 27B @ temp 0.6 | Native MTP Speculative Decoding On Apple Silicon With No External Drafter.

metal mtp mlx inference-engine apple-silicon local-ai qwen speculative-decoding speculative-sampling openai-compatible qwen3-next anthropic-compatible native-mtp mtplx

Updated Jun 27, 2026
Python

insight-platform / Savant

Python Computer Vision & Video Analytics Framework With Batteries Included

opencv machine-learning video computer-vision deep-learning cuda nvidia yolo object-detection deepstream tensorrt inference-engine instance-segmentation edge-computing peoplenet nvidia-deepstream-sdk yolov5-face yolov8 yolov8-face

Updated May 15, 2026
Python

pylint-dev / astroid

A common base representation of python source code for pylint and other projects

parser static-code-analysis static-analysis ast hacktoberfest inference-engine closember

Updated Jun 30, 2026
Python

HoloClean / holoclean

A Machine Learning System for Data Enrichment.

data-science machine-learning pytorch inference-engine data-enrichment

Updated Jul 20, 2023
Python

buguroo / pyknow

PyKnow: Expert Systems for Python

python3 expert-system inference-engine

Updated Feb 23, 2020
Python

SearchSavior / OpenArc

Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.

transformers inference-engine fastapi openvino-toolkit optimum-intel agentic-ai openvino-genai

Updated Jun 30, 2026
Python

qualcomm / ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

machine-learning inference pytorch machinelearning deeplearning demos inference-engine onnx tensorflow-lite qnn inference-api

Updated Jul 1, 2026
Python

chengzeyi / ParaAttention

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

flux parallel-computing transformers inference attention inference-engine diffusers hunyuan-video

Updated Jul 5, 2025
Python

zengxiao-he / tessera

From teacher to tiles — a from-scratch LLM distillation & serving engine: custom Triton/CUDA kernels, FSDP distillation, paged-KV continuous batching, speculative decoding, a Rust gateway, a JAX oracle, and interpretability tooling.

rust cuda pytorch triton quantization knowledge-distillation inference-engine jax kv-cache ml-systems llm mechanistic-interpretability fsdp flash-attention speculative-decoding paged-attention

Updated Jun 5, 2026
Python

interestingLSY / swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

cuda transformers inference pytorch transformer llama gpt inference-engine model-serving mlops llm llmops llm-serving llm-inference

Updated Jun 10, 2025
Python

EfficientMoE / MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

pytorch inference-engine mixture-of-experts huggingface large-language-models llm-inference

Updated Jun 24, 2026
Python

psmarter / mini-infer

LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving

machine-learning cuda inference pytorch transformer triton moe quantization language-model inference-engine kv-cache tensor-parallelism llm speculative-decoding pagedattention continuous-batching

Updated Apr 24, 2026
Python

mlx-gui

RamboRogers / mlx-gui

MLX-GUI MLX Inference Server for Apple Silicone

ai inference mlx inference-engine

Updated Apr 1, 2026
Python

nano-vllm

ovshake / nano-vllm

a fun and educational take on vLLM

python inference-engine vllm

Updated Jan 25, 2026
Python

nilp0inter / experta

Expert Systems for Python

inference python3 knowledge-base clips inference-engine

Updated Feb 9, 2025
Python

midea-ai / Aidget

Ai edge toolbox，专门面向边端设备尤其是嵌入式RTOS平台，AI模型部署工具链，包括模型推理引擎和模型压缩工具

deep-learning dsp inference simd pruning mcu rtos asr inference-engine wakeup tflm resrep hifi5

Updated Dec 20, 2023
Python

BMW-InnovationLab / BMW-TensorFlow-Inference-API-CPU

This is a repository for an object detection inference API using the Tensorflow framework.

Updated Jun 28, 2022
Python

Improve this page

Add a description, image, and links to the inference-engine topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference-engine topic, visit your repo's landing page and select "manage topics."