
Featured Work
End-to-end systems built for performance and scale.
Edge Vision Systems
Optimization of object detection and segmentation nets via FP16/INT8 quantization, TensorRT compiling, and custom C++ inference pipelines.
Data Engine Architecture
Designing active learning loops to systematically target edge cases, employing synthetic generation, and managing heavily imbalanced distributions.
High-Throughput ML Infrastructure
Deploying Triton Inference Servers, building distributed data ingestion pipelines, and ensuring strict SLA observability in cloud environments.
Engineering Philosophy
I’m an optimizations-obsessed engineer who bridges the gap between research and strict physical hardware limits. Whether it’s writing custom CUDA kernels to shave off milliseconds or architecting cloud pipelines to serve millions of predictions daily, I build systems that perform at scale.
Obsesses over inference constraints: Memory footprints, FLOPs, and millisecond latency targets.
Bridges Python prototyping with highly optimized C++ and TensorRT deployment graphs.
Treats data engines and active learning loops as first-class citizens in the engineering lifecycle.
Architects robust, distributed cloud infrastructure capable of adhering to high-criticality SLAs.


