Competition Projects, Course Projects and Club Projects
Distributed Data Parallel ML Training | Big Data Systems Course Assignment
Deployed, compared and contrasted different Distributed ML Training frameworks such as gather-and-scatter,
allReduce and Pytorch DDP (Distributed Data Parallel) on a multi-node cluster setting.
Achieved 1.4 - 2.8 times speedup over a single machine using above techniques for gradient synchronization.
[Github]
BadgerDB | Database Management Systems Course Assignment
Implemented Buffer Pool Management in BadgerDB using Clock Algorithm and Buffer Hash Table.
Developed a B+ Tree Index Manager in BadgerDB to improve query searches and range scans.
[Github]
Cloud Based Real-Time Big Data Analysis | Big Data Lab Project
Implemented DataProc Cluster based Spark Job Computation for Data Processing & Model Training.
Computed Real-Time Predictions on Test Data by Spark-Streaming the Data to Apache Kafka Cluster.
[Report][Code]
American Express - Cricket Analytics based on Deep Learning | AmEx Ignite Challenge 2019
Developed an LSTM-based Model for Real-Time Win Probability Prediction in the Game of Cricket.
Predicted 82.7% of the Match Outcomes Accurately at the end of the 10th over (out of 50 overs).
[Paper]
Accident Casualties & Severities Analysis | Data Analytics Lab Project
Investigated Feature Importance using supervised ML techniques: Lasso, LGBM and Random Forest.
Categorized UK Districts into different Clusters based on Safety Levels using unsupervised ML algorithms.
[Presentation][Code]