Fraud Detection

Using machine learning to detect fraudulent credit card transactions and money transfers

Description

For this project I entered the IEEE-CIS Fraud Detection Challenge on Kaggle. The objective of the competition was to detect fraudulent credit card transactions and money transfers from an imbalanced (target class comprised 3.5% of data), 400-dimensional, anonymized, and feature-meaning obscured dataset provided by the Vesta Corporation.

Techniques

Feature Engineering:

  • feature scaling
  • null value correlation analysis
  • categorical encoding
  • feature grouping and reduction
  • principal component analysis
  • t-SNE

Imputation methods:

  • K-Nearest Neighbors
  • constant value
  • mean and median value

Class balancing:

  • SMOTE
  • under/over sampling

Models:

  • Decision Trees
  • Random Forest
  • XGBoost
  • Logistic Regression

Tools

  • scikit-learn
  • numpy
  • jupyter notebooks

Outcome

Improved baseline decision tree model performance from ~0.63 AUROC to 0.87 AUROC with random forest and XGBoost. Applied various feature engineering techniques including: feature scaling, null value correlation analysis, categorical encoding, feature grouping and reduction, principal component analysis, t-SNE. Experimented with imputation methodologies, including KNN, constant value, and mean value imputation. Utilized SMOTE to deal with class imbalance.

More Information

More information can be found at the following links:

GitHub Repository: https://github.com/nsylva/fraud-detection