Credit Access & Personal Finance Analysis
Analyzed 12 years of Federal Reserve survey data to uncover how credit score and age shape who gets approved, rejected, or too discouraged to even apply, then built ML models to predict credit outcomes with up to 94% accuracy.
Tech Stack
Python · Random Forest · XGBoost · Optuna ·
MICE Imputation · Tableau · Matplotlib/Seaborn · Scikit-learn
4
12Y
SURVEY DATA SPAN
2013–2025
0.94
BEST F1 SCORE
Auto loan model
63%
MISSING DATA RATE
HANDLED via MICE
ML CLASSIFIERS
BENCHMARKED
Overview
This project analyzes consumer credit access in the United States using over a decade of survey microdata from the Federal Reserve Bank of New York (2013–2025). The team combined exploratory data storytelling with machine learning to answer a deceptively simple question:
who gets credit in America, and who doesn't even try?
Business Problem
Access to credit is foundational to financial mobility, it affects whether someone can buy a car, own a home, or weather an emergency. Yet traditional metrics like rejection rates only capture part of the story. This project surfaces a hidden population: discouraged borrowers like people who wanted credit but never applied because they expected rejection.
Understanding both groups is essential for lenders, policymakers, and financial inclusion researchers.
The Data
The Federal Reserve Bank of New York's Survey of Consumer Expectations (SCE) — Credit Access Module. A nationally representative rotating panel of ~1,300 U.S. household heads surveyed monthly since 2013, producing two complementary datasets: raw microdata with individual-level survey responses (~35,000+ records), and aggregated data grouped by credit score and age. The raw data had 63% missing values, a significant preprocessing challenge.
Methodology
-
Dropped 73 high-missingness columns (>65%); identified datatypes; ordinal-encoded outcomes
-
Created 5-tier credit score variable (Poor → Excellent); encoded application outcomes
-
MICE (Multiple Imputation by Chained Equations) to handle remaining missingness
-
Applied Variance Inflation Factor (VIF) threshold=10; reduced to 42 final features
-
Trained 4 classifiers (Logistic Regression, KNN, Random Forest, Gradient Boosting) with Optuna Bayesian hyperparameter tuning; 80/20 train-test split