Fraud Detection

Using machine learning to detect fraudulent credit card transactions and money transfers

Description

For this project I entered the IEEE-CIS Fraud Detection Challenge on Kaggle. The objective of the competition was to detect fraudulent credit card transactions and money transfers from an imbalanced (target class comprised 3.5% of data), 400-dimensional, anonymized, and feature-meaning obscured dataset provided by the Vesta Corporation.

Techniques

Feature Engineering:

feature scaling
null value correlation analysis
categorical encoding
feature grouping and reduction
principal component analysis
t-SNE

Imputation methods:

K-Nearest Neighbors
constant value
mean and median value

Class balancing:

SMOTE
under/over sampling

Models:

Decision Trees
Random Forest
XGBoost
Logistic Regression

Tools

scikit-learn
numpy
jupyter notebooks

Outcome

Improved baseline decision tree model performance from ~0.63 AUROC to 0.87 AUROC with random forest and XGBoost. Applied various feature engineering techniques including: feature scaling, null value correlation analysis, categorical encoding, feature grouping and reduction, principal component analysis, t-SNE. Experimented with imputation methodologies, including KNN, constant value, and mean value imputation. Utilized SMOTE to deal with class imbalance.

More Information

More information can be found at the following links:

GitHub Repository: https://github.com/nsylva/fraud-detection