Research Interests

My research sits at the intersection of 3D computer vision, autonomous systems, and efficient deep learning — building perception engines that are accurate, hardware-deployable, and safe in the real world.


3D Scene Understanding & Autonomous Perception

I design transformer-based and Bird's-Eye-View (BEV) architectures for multi-camera 3D object detection, open-vocabulary perception, and monocular depth estimation. My work integrates multi-sensor fusion across camera, LiDAR, radar, and IMU modalities for robust autonomous driving. At Neubility, I deployed a MonoDETR-based 3D detection pipeline with Depth Anything V2 metric depth, achieving sub-30 ms end-to-end inference on a single NVIDIA Jetson Orin SoC.

Vision-Language-Action Models & Embodied AI

I develop and evaluate VLM/VLA models for grounded autonomy, cross-modal spatial reasoning, and long-horizon task generalization in embodied AI systems. My work includes distilling large VLM teachers into compact student models (4× compression, 97% accuracy retention) deployable under a 10 W power budget on Jetson Orin. I validate VLA policies in closed-loop simulations using NVIDIA Isaac Sim and CARLA, bridging the sim-to-real gap via systematic domain randomization.

Efficient Edge AI & Hardware Deployment

I engineer PTQ/QAT INT8–FP16 quantization pipelines with TensorRT and ONNX graph optimization that achieve 8× inference speedup with <1% mAP degradation versus FP32 baselines. My NanoMST architecture reduced compute by 4.7× over LSTM baselines via hardware-aware multiscale transformer design, targeting TinyML deployment with only 298K parameters and 8-bit quantization support. I apply LoRA/PEFT, knowledge distillation, and neural architecture search to enable real-time AI on resource-constrained hardware.

Simulation-to-Real Transfer

I develop closed-loop evaluation frameworks, digital twins, and domain-adaptive learning pipelines for scalable perception across diverse Operational Design Domains (ODDs). Using systematic domain randomization and sensor noise modeling in NVIDIA Isaac Sim and CARLA, I quantify and close the sim-to-real gap for autonomous robot deployment. I define and maintain perception KPI frameworks (mAP, recall at IoU thresholds, latency SLAs) across real and synthetic datasets to drive data-driven robustness improvements.

Neural Inertial Navigation & Sensor Fusion

I design deep learning architectures for robust inertial navigation and localization across diverse environments. My DeepILS system achieves sub-meter accuracy in domain-invariant settings, while NanoMST achieves real-time performance with minimal compute. I also developed a particle-filter Visual-Inertial SLAM hardware accelerator on PYNQ-Z1 FPGA at 30 FPS, reducing SLAM latency by 60%. This work combines Extended Kalman Filters, particle filters, graph-based SLAM, and neural odometry.

Privacy-Preserving & Federated Learning

I design differentially private and federated learning systems for distributed edge intelligence. ConvXformer provides formal ε-DP privacy guarantees for multimodal sensor fusion. ADP-QFed combines adaptive differential privacy with quantized federated learning for IoT edge sensing. My federated navigation framework (FedNav) enables privacy-preserving collaborative model training without centralizing sensitive sensor data.

Key Projects


Multi-Camera 3D Perception Pipeline @ Neubility

Architected a production multi-camera 3D perception system integrating MonoDETR-based 3D object detection, open-vocabulary detection, semantic segmentation, and Depth Anything V2 metric depth estimation. Achieved sub-30 ms end-to-end inference on a single NVIDIA Jetson Orin SoC. Deployed via a full PTQ/QAT INT8–FP16 TensorRT/ONNX pipeline: 3.2× model size reduction, 8× inference speedup, <1% mAP degradation.

MonoDETR Depth Anything V2 TensorRT Jetson Orin

VPE-Neubie: DINOv3-Backbone Visual Perception Engine

Led the development of a unified visual perception engine using a DINOv3 backbone for on-device deployment. Established cross-modal alignment between visual and language representations for zero-shot generalization across unseen object categories. Deployed on Neubility's autonomous delivery robots in urban environments.

DINOv3 Zero-Shot Open-Vocabulary On-Device

NanoMST: Hardware-Aware Multiscale Transformer for TinyML

Designed a hardware-aware multiscale transformer for inertial motion tracking. NanoMST achieves 4.7× compute reduction over LSTM baselines using only 298K parameters with 8-bit quantization support, enabling real-time TinyML deployment on embedded devices. Published in IEEE Internet of Things Journal (IF: 8.9).

Published — IEEE IoT Journal TinyML Quantization Paper

ConvXformer: Differentially Private Hybrid Architecture

A hybrid ConvNeXt-Transformer architecture with formal ε-DP privacy guarantees for distributed multimodal sensor fusion, maintaining localization accuracy while protecting user data. Under review at IEEE Transactions on Systems, Man, and Cybernetics: Systems.

Under Review — IEEE TSMC Differential Privacy Transformer

DeepILS: Domain-Invariant Inertial Localization System

An AIoT-enabled inertial localization system achieving sub-meter accuracy across diverse environments without environment-specific retraining, combining deep domain adaptation with multi-sensor fusion. Published in IEEE Internet of Things Journal (IF: 8.9).

Published — IEEE IoT Journal Domain Adaptation Sensor Fusion Paper Code

FPGA-Based Particle Filter SLAM Accelerator

A Visual-Inertial SLAM hardware accelerator on PYNQ-Z1 FPGA achieving real-time navigation at 30 FPS with optimized feature-matching kernels that reduce SLAM latency by 60%. This work bridged software SLAM research and hardware deployment for mobile robotics. Published in IEEE Access (IF: 3.6).

Published — IEEE Access FPGA SLAM Paper