Click here to view the project in GitHub

Overview

This project is a Java implementation of the K Nearest Neighbors algorithm to deal with classifying the famous iris.txt dataset. The program takes in two user inputs, the k value and the training/testing split for the iris dataset. It uses the 4 known data points to predict the class name out of the three options {Iris-setosa, Iris-versicolor, Iris-virginica}. The results of the classification are saved to the results text file, along with the accuracy of the classification.

What I learned

This project was fascinating because I had the opportunity to implement the KNN algorithm completely by hand. Doing this allowed me to gain a full understanding and appreciate the math that goes into these complex models. This project also is one of the first projects that I did that exposed me to the world of predictive modeling an data science. I think this project, along with my project on Hopfield neural nets (which can be found here), can be seen as the foundational building blocks to my curiosities in this space and are great indicators of my understandings.