Digit Recognition

A learning exercise in machine learning using the classic MNIST handwritten digit dataset

Description

This project is an exploratory exercise in machine learning using the classic MNIST handwritten digit dataset. The MNIST dataset is commonly used in early machine learning exercises because it is small, accessible, and there is relatively low friction for building a well-performing classifier. The task here is to build a machine learning model to detect handwritten digits and properly classify them. Accurately classifying handwritten digits is a key component of optical character recognition (OCR) software, so this project provides key insight into some low hanging fruit in applied machine learning.

Techniques

  • K-nearest neighbors
  • hyperparameter tuning
  • Gaussian blurring
  • generative modeling
  • model calibration

Tools

  • Sci-kit Learn
  • numpy
  • jupyter notebooks
  • matplotlib

Outcome

Created a baseline model that classified handwritten digits with 87% accuracy across the board. Using the techniques listed above, improved that baseline accuracy to 97%. Used Naive Bayes to generate new examples of "handwritten" digits.

More Information

More information can be found at the following links:

GitHub Repository: https://github.com/nsylva/digit_recognition