Machine Learning — Fraud Detection

Vatsal Shah
3 min readSep 14, 2019

--

Abstract:

In modern day’s credit card plays aa a vital role in every person’s daily activity. Customer purchases their needs with their credit cards and online transitions. Banks and financial institutes consider denying the credit applications of customers to avoid the risk of defaulters. Credit risk is the rise of debt on the customer who fails to make the billing payment for some period. The purpose of the project is how to reduce the defaulters among the list of customers, and make a background check on whether to provide the loan or not and to find promising customers. These predictive models would benefit the lending institutions and to the customers, as it would make them more aware of their potential defaulting rate. The problem is a binary classification problem whether a customer will be defaulting to pay next month payment. The dataset is unbalanced, so the focus was on the precision and recalled more than the accuracy metrics. After comparison with the precision-recall curve, logistic regression is the best model based on the False Negative value of confusion metrics. Moreover, after changing the threshold value of the logistic regression, GUI (Graphical user interface) implemented and predicted whether a customer is a defaulter or not-defaulter.

Aims & Objectives:

The problem is to classify the defaulters and non-defaulters on the credit payment of the customers. This project helps solve the real issue by using various classification techniques. Moreover, any user can access GUI and add their gender, education, marital status, and payment details to check next month in which category they fall (defaulter or non-defaulter).

The core objectives: Find whether the customer could pay back his next credit amount or not and Identify some potential customers for the bank who can settle their credit balance.

The steps followed to manage these goals:

• Selection of dataset

• Display some graphical information and visualize the features.

• Check Null values in the dataset

• Data pre-processing using one-hot encoding and remove extra parameters

• Train with classifiers

• Evaluate the model with test data

• Compare the accuracy, precision, and recall finding the optimal model.

• Created a Graphical User Interface to check with real-time customer data and predict defaulter for their next month payment

Results:

Graphical User Interface:

We created a Graphical User Interface using python and Tkinter, we trained a model and set threshold value at 0.2 in logistic regression. When the user submits below-mentioned parameters cost, the model will predict whether a user will be defaulter or non-defaulter next month in payment.

The general steps are mentioned below:

1. Choose the best model and parameters

2. Save to .json file

3. Load a file from disk to predict data

4. Call a function on button submit and load data to a model

5. Check probability and result on GUI

Graphical User Interface to show Defaulter

Language: Python

Library: Tensorflow, Keras, Numpy, Pandas, Sklean

A report, Source code, and more about project information will be available on my website: http://vatsalshah.in/projects.html

--

--

Vatsal Shah

Intrapreneur, Machine Learning | AI | Software Engineer | IoT | Voice Applications