Machine Learning

StackOverflow Tag Prediction

Multi-label ML system for recommending relevant tags from question content.

PythonScikit-learnNLPMulti-label Classification

Overview

Built at the University of Wisconsin-Madison, this project focused on developing a machine learning-based tag recommendation system for Stack Overflow questions.

Problem

  • Stack Overflow questions often need multiple relevant tags for discoverability and routing.
  • Manual tagging is inconsistent, time-consuming, and may miss key topics in the question body.

Solution

  • Developed a robust multi-label classification pipeline to assign multiple tags simultaneously from question content.
  • Engineered text preprocessing and feature representations tailored for technical Q&A language.
  • Evaluated model quality iteratively and improved performance across frequent and long-tail tags.

Architecture

  1. Question title/body preprocessing and normalization
  2. Feature extraction for text representation
  3. Multi-label classification training and validation
  4. Inference pipeline for simultaneous tag recommendation

Metrics

  • Validated model behavior using precision, recall, and F1 scores.
  • Improved consistency of tag recommendations compared to manual-only workflows.
  • Enabled fast, scalable inference for multi-tag prediction.

Highlights

  • Associated with University of Wisconsin-Madison.
  • Spearheaded end-to-end model development and evaluation.
  • Focused on practical recommendation quality for real Q&A content.