Credit Access & Personal Finance Analysis

Analyzed 12 years of Federal Reserve survey data to uncover how credit score and age shape who gets approved, rejected, or too discouraged to even apply, then built ML models to predict credit outcomes with up to 94% accuracy.

Tech Stack

Python · Random Forest · XGBoost · Optuna ·

MICE Imputation · Tableau · Matplotlib/Seaborn · Scikit-learn

4

12Y

SURVEY DATA SPAN
2013–2025

0.94

BEST F1 SCORE
Auto loan model

63%

MISSING DATA RATE
HANDLED via MICE

ML CLASSIFIERS
BENCHMARKED


Overview

This project analyzes consumer credit access in the United States using over a decade of survey microdata from the Federal Reserve Bank of New York (2013–2025). The team combined exploratory data storytelling with machine learning to answer a deceptively simple question:

who gets credit in America, and who doesn't even try?

Business Problem

Access to credit is foundational to financial mobility, it affects whether someone can buy a car, own a home, or weather an emergency. Yet traditional metrics like rejection rates only capture part of the story. This project surfaces a hidden population: discouraged borrowers like people who wanted credit but never applied because they expected rejection.

Understanding both groups is essential for lenders, policymakers, and financial inclusion researchers.

The Data

The Federal Reserve Bank of New York's Survey of Consumer Expectations (SCE) — Credit Access Module. A nationally representative rotating panel of ~1,300 U.S. household heads surveyed monthly since 2013, producing two complementary datasets: raw microdata with individual-level survey responses (~35,000+ records), and aggregated data grouped by credit score and age. The raw data had 63% missing values, a significant preprocessing challenge.

Methodology

  • Dropped 73 high-missingness columns (>65%); identified datatypes; ordinal-encoded outcomes

  • Created 5-tier credit score variable (Poor → Excellent); encoded application outcomes

  • MICE (Multiple Imputation by Chained Equations) to handle remaining missingness

  • Applied Variance Inflation Factor (VIF) threshold=10; reduced to 42 final features

  • Trained 4 classifiers (Logistic Regression, KNN, Random Forest, Gradient Boosting) with Optuna Bayesian hyperparameter tuning; 80/20 train-test split

Credit Access Project Visuals

Application & Rejection Trends

Applications remained relatively stable between 2013 and 2019, generally fluctuating in the mid-40s to low-50s range, before dropping sharply in 2020 and then recovering slightly afterward. Rejections followed a different pattern, declining around 2015, rising again toward 2018, dipping in 2020, and gradually increasing through 2024. Overall, the chart suggests a disruption around 2020 that impacted both applications and rejections, followed by a moderate recovery period.

Applications
Rejections

Discouragement by Credit Score

Discouragement is consistently highest among borrowers with credit scores below 680, and it generally increases over time, especially after 2022. Borrowers with mid-range credit scores (680–760) show low but slightly fluctuating discouragement levels, while those with scores above 760 show almost no discouragement throughout the period. Overall, the chart indicates that lower credit score borrowers are significantly more likely to feel discouraged from applying for credit, and this gap has widened in recent years.

between_680_760
less_680
over_760

Acceptance Rates by Credit Type

Auto loans and mortgage refinancing have the highest full approval rates, both exceeding roughly 85–90%, indicating these credit types are more likely to be fully granted compared to others. In contrast, credit line and loan limit increases have much lower full approval rates and the highest rejection rates, suggesting lenders are more cautious when increasing existing credit limits. Overall, new credit products tend to have higher approval rates, while requests to increase existing credit limits face higher rejection and partial approval rates.

Fully Granted
Partially Granted
Rejected

ML Model F1 Heatmap

Random Forest and Gradient Boosting consistently achieve the highest F1 scores across most credit categories, indicating they are the most accurate models overall. All models perform best on Auto Loan and Mortgage predictions, while performance is weakest for Credit Score and Credit Limit predictions. Overall, ensemble models outperform Logistic Regression and K-Nearest Neighbors, especially on more complex credit prediction tasks.

Higher F1
Lower F1

Debt Applications by Credit Score & Age

Debt application outcomes vary significantly by both credit score and age group. Applicants with higher credit scores have the highest acceptance counts and very few rejections, while those with scores below 680 experience more rejections than acceptances, indicating credit score is a major factor in approval decisions. By age, individuals between 40–59 have the highest number of accepted applications, while those over 60 have fewer applications overall but relatively low rejection counts compared to younger groups.

Feature Importance Plots (Random Forest)

The feature importance plots show that approval probability variables are consistently among the most important predictors across the Random Forest models. Expected denial, existing balances, and whether the borrower actually needed credit also play significant roles, indicating that both borrower expectations and financial status strongly influence model predictions. Overall, the models rely heavily on approval probability indicators and borrower financial behavior variables rather than demographic factors alone.

Model performance — auto loan F1
Model performance — auto loan F1 (weighted)
Random Forest
0.94
Gradient Boost
0.94
k-NN
0.82
Logistic Reg.
0.85