I'm Swaminathan Sankaran

I like building ML systems that don't break the moment they hit real data. Currently studying Data Science at UB and looking for internship opportunities.

Swaminathan Sankaran

About

A bit about me

I’m a Master’s student in Data Science at the University at Buffalo, and curiosity has shaped a lot of how I learn and work. I enjoy understanding how things work, whether that is a technical system, a human decision, or a bigger idea in philosophy, psychology, and rationality.

A lot of that mindset comes from the things I’ve always enjoyed outside of work. I used to be a competitive gamer, which taught me how to think strategically, stay composed, and keep improving through practice. I’m also a speedcuber, so I’ve always liked patterns, problem solving, and the challenge of getting better with focus and repetition. I enjoy reading, exploring different perspectives, and learning through food, culture, and the ways people think and live.

That same curiosity is what drew me to data science and machine learning. It also shapes how I approach my work. I enjoy building systems that are thoughtful, useful, and reliable, and I’m especially interested in turning ideas into practical solutions that create real impact.

I’m currently looking for internship opportunities where I can keep learning, contribute meaningfully, and grow as both a problem solver and engineer.

Thanks for stopping by. Don’t hesitate to say hi. I’d love to hear from you.

Projects

Selected Work

MLOps / Production Systems

Drift-Aware MLOps Pipeline

Built an end-to-end MLOps pipeline that watches for silent model degradation and automatically retrains when things start drifting. Simulated real-world distribution shifts and the system recovered PR-AUC from 0.37 to 0.89 without any manual intervention. The whole stack (FastAPI, Airflow, MLflow, Evidently, Prometheus, Grafana) runs in Docker Compose with Kubernetes manifests ready to go.

Multimodal Deep Learning

Multi-Modal Molecular Similarity Regression

A multimodal system for predicting how similar two molecules are, built for drug discovery. It fuses three views of each molecule (2D images, 3D atomic geometry, and chemical fingerprints) using contrastive learning across ~292K molecules. Hit ~0.92 Pearson correlation on expert-annotated pairs, beating the standard Tanimoto baseline.

Medical Imaging

Patch-Level CT Tamper Classification

Detects localized tampering in CT scans, which is a harder problem than catching whole-scan forgeries. Uses a 3D convolutional compressor paired with a pretrained ResNet-18 to squeeze 16-slice 3D patches into compact 2D feature maps. Trained on 169 volumetric lung CTs and hit 0.95 validation AUC, beating 2.5D, full 3D, and projection baselines while being cheaper to run.

Experience

Where I've Worked

Machine Learning Engineer, Intern

Zolvit

Feb 2024 — Aug 2024
  • Eliminated ~23 hours/week of manual data entry by building an OCR + T5-large extraction pipeline for legal documents
  • Improved legal retrieval precision by ~28% across 10,000+ documents using hybrid Elasticsearch keyword + Pinecone vector search
  • Built a document routing classifier achieving 92% accuracy on TF-IDF/Doc2Vec features, automating intake workflows
  • Containerized ML services with Docker and deployed on AWS EC2, maintaining 99.9% uptime with Grafana monitoring

Education

Academic Background

University at Buffalo

MS in Engineering Science (Data Science)

Aug 2025 — Dec 2026

Relevant Coursework

Statistical Learning I & II Machine Learning Probability Theory Database Data Science Data Models & Query Languages Numerical Methods

Vellore Institute of Technology

B.Tech in Computer Science and Engineering (AI & ML)

Aug 2019 — Jul 2023

Relevant Coursework

Machine Learning Deep Learning Reinforcement Learning Computer Vision Applied Linear Algebra Statistics Data Structures & Algorithms Database Management Systems

Skills

Technical Toolkit

Languages & Databases

Python
C+C/C++
SQLSQL
RR
PostgreSQL
MySQL
Elasticsearch
PNPinecone
Snowflake
Linux

AI & Machine Learning

PyTorch
TensorFlow
Scikit-learn
XGXGBoost
Hugging Face
LCLangChain
LGLangGraph
RGRAG
TBTensorBoard
NumPy
Pandas

MLOps & Cloud

Docker
Kubernetes
AWSAWS
Airflow
MLflow
EVEvidently AI
Prometheus
Grafana
Git
Linux

Data & Visualization

Spark
MPMatplotlib
SBSeaborn
Plotly
Streamlit
TBTableau
BIPower BI
Pandas
NumPy
Git
RDRDKit

Awards & Certifications

Recognition