Food Insecurity in America during COVID
There are many at-risk groups during COVID, and food insecurity is a major concern when assistance programs started to get defunded during a crisis. Many news outlets are starting to report on this, like NPR and the NY Times. For this reason, I wanted to do a risk assessment on demographics vulnerable to food insecurity during COVID. I used SNAP (formally recognized as food stamps) QC datasets from the USDA to create a prediction model.
The technical stuff…
- I wanted to do a 10-year gap analysis, emphasizing the effect of the 2008 housing crisis, so I used datasets from 2007 and 2017 to build the model.
- I first focused on New Mexico as an emerging hot spot and Nebraska as an emerging cold spot, determined in a previous GIS spatial analysis to highlight extreme, contrasting states.
- I started off with over 40k rows and 800 columns in each dataset, and using a combination of correlation, high nullity, data insight, and domain knowledge I whittled the data down to 33 columns and a final dataset of 4k records. Optimizing with python reference scripts for frequently used code.
- I used a Vote Ensemble with Random Forest, Gradient Boost, and Bagging Classifier ending in a final CV score of 95%.
What was interesting?
I focused on an interpretable model on purpose. I wanted to see what are the biggest impacts leading to food insecure communities. I was surprised throughout my analysis, from the EDA to the final interpreted results. The initial exploration showed me that the households requesting SNAP were not big families, maybe one or two children at most, and not alot of elderly members. There was one initial correlation, with almost half being single moms as head of household. As I got into the EDA, more correlations emerged like less working poor in 2017 vs 2007 and more assistance programs available for applicants. At this point, I noticed New Mexico seemed more vulnerable to economic changes than Nebraska.
Ultimately, the biggest surprise was in the interpretable coefficients of the model itself. Which told me the top 4 biggest impacts in predicting a household requesting SNAP is how expensive and how stable their housing situation was.
If you want to learn more, including GitHub repository code and a powerpoint presentation, please visit my website.