I'm Swaminathan Sankaran

I like building ML systems that don't break the moment they hit real data. Currently studying Data Science at UB and looking for internship opportunities.

About

A bit about me

I'm a Master's student in Data Science at the University at Buffalo, and curiosity has shaped a lot of how I learn and work. I enjoy understanding how things work, whether that is a technical system, a human decision, or a bigger idea in philosophy, psychology, and rationality.

A lot of that mindset comes from the things I've always enjoyed outside of work. I used to be a competitive gamer, which taught me how to think strategically, stay composed, and keep improving through practice. I'm also a speedcuber, so I've always liked patterns, problem solving, and the challenge of getting better with focus and repetition. I enjoy reading, exploring different perspectives, and learning through food, culture, and the ways people think and live.

That same curiosity is what drew me to data science and machine learning. It also shapes how I approach my work. I enjoy building systems that are thoughtful, useful, and reliable, and I'm especially interested in turning ideas into practical solutions that create real impact.

I'm currently looking for internship opportunities where I can keep learning, contribute meaningfully, and grow as both a problem solver and engineer.

Projects

Selected Work

MLOps / Production Systems

Drift-Aware MLOps Pipeline

  • Drift-aware MLOps pipeline built to detect silent model degradation, trigger automated retraining, and improve model reliability under changing data distributions
  • Built and containerized the system using FastAPI, Airflow, MLflow, PostgreSQL, and Evidently AI with Docker Compose, with Kubernetes-ready orchestration in mind
  • Monitored system health and pipeline behavior in real time using Prometheus and Grafana dashboards
  • Validated end-to-end performance by simulating covariate and concept drift, triggering automated retraining that recovered shifted-distribution PR-AUC from 0.37 to 0.89 while preserving historical ROC-AUC at 0.75

Multimodal Deep Learning

Multi-Modal Molecular Similarity Regression

  • Modeled molecular similarity for drug discovery to improve candidate selection and ranking of chemical compounds
  • Designed a multi-modal architecture fusing 2D image features using ResNet-18, 3D geometry using SchNet, and fingerprint embeddings, trained with contrastive learning across 291,742 molecules
  • Achieved ~0.92 Pearson correlation on 200 expert-annotated pairs, outperforming the Tanimoto similarity baseline on unseen molecular pairs
  • Built the system to combine complementary molecular views for stronger ranking quality in downstream discovery workflows

Medical Imaging

Patch-Level CT Tamper Classification

  • Built to detect tampering at the patch level within CT scans, where localized manipulation is harder to catch than whole-scan forgeries
  • Jointly trained a 3D convolutional compressor with an ImageNet-pretrained ResNet-18 across 169 volumetric lung CT scans to convert 16-slice 3D patches into compact 2D feature maps
  • Focused on patch-level manipulation, where tampering is more localized and more difficult to detect than whole-scan forgery
  • Achieved 0.95 validation AUC, outperforming 2.5D, 3D, and projection-based baselines at lower computational cost

Experience

Where I've Worked

Machine Learning Engineer, Intern

Zolvit

Feb 2024 — Aug 2024
  • Eliminated ~23 hours/week of manual data entry by building an OCR + T5-large extraction pipeline for legal documents
  • Improved legal retrieval precision by ~28% across 10,000+ documents using hybrid Elasticsearch keyword + Pinecone vector search
  • Built a document routing classifier achieving 92% accuracy on TF-IDF/Doc2Vec features, automating intake workflows
  • Containerized ML services with Docker and deployed on AWS EC2, maintaining 99.9% uptime with Grafana monitoring

Education

Academic Background

University at Buffalo

MS in Engineering Science (Data Science)

Aug 2025 — Dec 2026

Relevant Coursework

Statistical Learning I & II Machine Learning Probability Theory Database Data Science Data Models & Query Languages Numerical Methods

Vellore Institute of Technology

B.Tech in Computer Science and Engineering (AI & ML)

Aug 2019 — Jul 2023

Relevant Coursework

Machine Learning Deep Learning Reinforcement Learning Computer Vision Applied Linear Algebra Statistics Data Structures & Algorithms Database Management Systems

Skills

Technical Toolkit

Languages & Databases

Python
C+C/C++
SQLSQL
RR
PostgreSQL
MySQL
Elasticsearch
PNPinecone
Snowflake
Linux

AI & Machine Learning

PyTorch
TensorFlow
Scikit-learn
XGXGBoost
Hugging Face
LCLangChain
LGLangGraph
RGRAG
TBTensorBoard
NumPy
Pandas

MLOps & Cloud

Docker
Kubernetes
AWSAWS
Airflow
MLflow
EVEvidently AI
Prometheus
Grafana
Git
Linux

Data & Visualization

Spark
MPMatplotlib
SBSeaborn
Plotly
Streamlit
TBTableau
BIPower BI
Pandas
NumPy
Git
RDRDKit

Awards & Certifications

Recognition

Get in Touch

I’d love to hear from you

Thanks for stopping by. If you'd like to connect, talk about opportunities, or just say hello, I'd love to hear from you.