Fraud Detection

Using machine learning to detect fraudulent credit card transactions and money transfers


For this project I entered the IEEE-CIS Fraud Detection Challenge on Kaggle. The objective of the competition was to detect fraudulent credit card transactions and money transfers from an imbalanced (target class comprised 3.5% of data), 400-dimensional, anonymized, and feature-meaning obscured dataset provided by the Vesta Corporation.


Feature Engineering:

  • feature scaling
  • null value correlation analysis
  • categorical encoding
  • feature grouping and reduction
  • principal component analysis
  • t-SNE

Imputation methods:

  • K-Nearest Neighbors
  • constant value
  • mean and median value

Class balancing:

  • under/over sampling


  • Decision Trees
  • Random Forest
  • XGBoost
  • Logistic Regression


  • scikit-learn
  • numpy
  • jupyter notebooks


Improved baseline decision tree model performance from ~0.63 AUROC to 0.87 AUROC with random forest and XGBoost. Applied various feature engineering techniques including: feature scaling, null value correlation analysis, categorical encoding, feature grouping and reduction, principal component analysis, t-SNE. Experimented with imputation methodologies, including KNN, constant value, and mean value imputation. Utilized SMOTE to deal with class imbalance.

More Information

More information can be found at the following links:

GitHub Repository: