Home

Introduction

I am Mandar Deshpande, a Systems ML Engineer @ Meta currently optimizing GPU kernels and integrating/developing Triton with PyTorch at Meta for NVIDIA/AMD hardware. Also specialized in PyTorch-based preprocessing transforms and model serialization for large scale recsys models. Prior to this, contributed to the distributed training infrastructure for AWS Inferentia and Trainium, and designed anomaly detection models for AWS Rekognition.

My work spans deep learning, GPU optimization (triton, cuda, etc), and large-scale data processing.

Current Work

Systems ML Engineer @ Meta (2022 - Present):

Triton + Pytorch: Contributing to Triton GPU compiler + Pytorch integrations. Optimizations for GPU kernels (flash attention v2, flexattention) for H100, B200 GPUs.
PyTorch Preproc: Developing pure pytorch based preproc transforms for internal customers (Ads, IG, MRS, Feed) and handling model serialization, model splitting and export components of the stack in torchscript and fx-trace.
Authoring and optimizing Pytorch C++ operators and CUDA kernels for accelerated GPU computing.

Past work/Research

Software Engineer @ AWS Inferentia/Trainium (2021): Creating distributed training infrastructure for AWS Inferentia and Trainium and developing Neuron SDK. raph capture and optimizations for fx-graph/Torchscript in Pytorch and TVM on Tensorflow.
Research Engineer @ AWS Rekognition (2020): Designed and deployed anomaly detection models in Pytorch for manufacturing defects for Amazon Lookout for Vision. Developed an fault visualization tool to validate model performance and feed corrections to the models.
Google Summer of Code 2019 -
- TensorFlow Mentor : Mentored Ryan Lee in the development of curiosity module for TF-Agents along with Oscar Ramirez. Ryan implemented Random Network Distillation (RND), a state-of-the-art bonus-based exploration reinforcement learning algorithm
Machine Learning Engineer at Citi (2017-2019) : Solved NLP and CV use cases usig deep learning and traditional vision techniques for images. Utilized NLP for parsing financial documents and extarcting relevant information.
Google Summer of Code 2018 -
- Scilab Mentor: Mentored Soumitra Agrawal (student) in the development of machine learning toolbox for Scilab, to extend my GSoC 2017’s work on jupyter integration for remote script execution
- Gensim Mentor: Co-mentor to Aneesh Joshi on the neural networks for similarity learning research for Gensim, a topic modelling and nlp toolkit (under Numfocus)
Google Summer of Code 2017: Worked with Scilab to develop a modular ml toolbox, which allowed remote execution of python model traning over network and inference on local machineProject page

Learning moto

Only persistence fuels my learning engines and the dream is to always be in the journey towards execellence!

Contact me

All your suggestion and help will be useful, so don’t hold back! :D