Description
The objective of this project was to experiment with handling unstructured text data for a classification task. The data for this project comes from the open newsgroups dataset built into Scikit-Learn, which contains over 18,000 newsgroup posts for 20 topics. In particular, I focus on classifying posts in newsgroups falling into the following four categories: atheism, religion, computer graphics, and space. For this project, unlike more sophisticated natural language processing projects, I use a bag-of-words method to extract features from each newsgroup post.