For this project I entered the IEEE-CIS Fraud Detection Challenge on Kaggle. The objective of the competition was to detect fraudulent credit card transactions and money transfers from an imbalanced (target class comprised 3.5% of data), 400-dimensional, anonymized, and feature-meaning obscured dataset provided by the Vesta Corporation.
Improved baseline decision tree model performance from ~0.63 AUROC to 0.87 AUROC with random forest and XGBoost. Applied various feature engineering techniques including: feature scaling, null value correlation analysis, categorical encoding, feature grouping and reduction, principal component analysis, t-SNE. Experimented with imputation methodologies, including KNN, constant value, and mean value imputation. Utilized SMOTE to deal with class imbalance.
More information can be found at the following links:
GitHub Repository: https://github.com/nsylva/fraud-detection