Mohamad Salman mohamadmsalman82

Mohamad Salman

Computer Engineering @ UofT | AI/ML Developer & Researcher

👋 About Me

I'm a Computer Engineering student at the University of Toronto (Minor: AI Engineering) with a deep passion for AI/ML development and research — from designing custom attention mechanisms and fine-tuning large language models, to pushing the limits of GPU-accelerated compute.

I love working at the intersection of systems engineering and machine intelligence — building things that are both fast and smart.

🎓 BASc Computer Engineering @ UofT (PEY Co-op, Class of 2029)
🔬 ML Researcher @ UTMIST — designing sparse attention for BART
💼 Software Developer @ Nodalli — building AI-native infrastructure
🌍 Based in Toronto, ON
📬 mohamad.salman@mail.utoronto.ca

🛠️ Tech Stack

Languages

ML & AI

Backend & Infrastructure

🚀 What I'm Currently Working On

🧠 UTMIST — ML Research

Content-Aware Sparse Attention for BART

Standard attention scales quadratically with sequence length — existing fixes like Big Bird use hardcoded patterns that don't adapt to input. We're building a mechanism that reads the content and decides which tokens to attend to dynamically.

PyTorch CUDA Transformers DeepSpeed BART

⚡ Nodalli — AI Startup

Unified Action Adapter Layer

Building the execution backbone of an AI platform — routing NLP-parsed commands across 4 platform APIs with field-level validation, Redis-backed OAuth, and automated credential management.

Node.js Redis PostgreSQL FastAPI OAuth 2.0

🌟 Featured Projects

🤖 AI Agent Debate System

Fine-tuned LLaMA 3.1 8B with LoRA + DeepSpeed on 5K debate transcripts, achieving 87% agreement with GPT-4 judgements. Deployed a self-hosted inference pipeline with 70% latency reduction over baseline.

Python PyTorch LoRA DeepSpeed LLaMA

🔒 Email Fraud Detection

Multi-agent MCP pipeline orchestrating 5 forensic agents. Maps 20 LLM signals to a 0–100 fraud score with async FastAPI + SendGrid webhook backend dispatching replies in under 30s.

Python FastAPI MCP SendGrid SQLAlchemy

⚡ GPU Accelerated Soccer Offside Simulator

100× speedup over CPU across 5M simulations using CUDA on RTX 4060. Engineered 16-byte memory layouts for coalesced memory access with bit-for-bit CPU/GPU parity validation.

C++ CUDA CMake RTX 4060

🎙️ FPGA Neural Network Voice Gate

Trained a Binary Neural Network in PyTorch to classify "open sesame" from raw audio. Deployed fully on-chip to a DE1-SoC FPGA in C, interfaced with a servo motor for a physical gate.

C Python PyTorch FPGA DE1-SoC

🗺️ Graphical Navigation System

C++ city-scale mapping engine with an adjacency list loading datasets in under 2 seconds. Implemented A* pathfinding with cache-friendly data structures on GTK/EZGL.

C++ CMake GTK/EZGL A*

☁️ Cloudflare Edge Automation Platform

No-code workflow automation platform on Cloudflare Workers. Managed state for 100+ distributed cron jobs via Durable Objects, with LLaMA-powered auto-generation of workflow steps.

Cloudflare Workers Durable Objects LLaMA

"The best way to predict the future is to build it."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly