Food Insecurity in America during COVID

Image: USDA Snap

There are many at-risk groups during COVID, and food insecurity is a major concern when assistance programs started to get defunded during a crisis. Many news outlets are starting to report on this, like NPR and the NY Times. For this reason, I wanted to do a risk assessment on demographics vulnerable to food insecurity during COVID. I used SNAP (formally recognized as food stamps) QC datasets from the USDA to create a prediction model.

Image: NY Times

The technical stuff…

  • I wanted to do a 10-year gap analysis, emphasizing the effect of the 2008 housing crisis, so I used datasets from 2007 and 2017 to build the model.
  • I first focused on New Mexico as an emerging hot spot and Nebraska as an emerging cold spot, determined in a previous GIS spatial analysis to highlight extreme, contrasting states.
  • I started off with over 40k rows and 800 columns in each dataset, and using a combination of correlation, high nullity, data insight, and domain knowledge I whittled the data down to 33 columns and a final dataset of 4k records. Optimizing with python reference scripts for frequently used code.
  • I used a Vote Ensemble with Random Forest, Gradient Boost, and Bagging Classifier ending in a final CV score of 95%.
Report visualization generated using sklearn.treeinterpretor

What was interesting?​​

I focused on an interpretable model on purpose. I wanted to see what are the biggest impacts leading to food insecure communities. I was surprised throughout my analysis, from the EDA to the final interpreted results. The initial exploration showed me that the households requesting SNAP were not big families, maybe one or two children at most, and not alot of elderly members. There was one initial correlation, with almost half being single moms as head of household. As I got into the EDA, more correlations emerged like less working poor in 2017 vs 2007 and more assistance programs available for applicants. At this point, I noticed New Mexico seemed more vulnerable to economic changes than Nebraska.

Ultimately, the biggest surprise was in the interpretable coefficients of the model itself. Which told me the top 4 biggest impacts in predicting a household requesting SNAP is how expensive and how stable their housing situation was.

Final predicted results on a map generated using QGIS with a plotly extension.

If you want to learn more, including GitHub repository code and a powerpoint presentation, please visit my website.




GIS/Data Analyst/Data Scientist

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Classifying “The Faces of Scranton”

The Cast of The Office (US)

Starbucks Rewards -RandomForest VS FunkSVD or just stack ´em?

Landcover Analysis using Python and ArcGIS

T-Cell Receptor Sequencing using Neural Networks

Simple Ways to Improve Your Matplotlib

I dare say you will never use tf-idf again

Hypothesis Testing- Test of Mean, Variance, Proportion

Designing a Dynamic Automated Reporting Pipeline — 3

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Melissa Anthony

Melissa Anthony

GIS/Data Analyst/Data Scientist

More from Medium

Everything I published from my dissertation research

a picture two gears interlocking yet separate

The Human Computer 💁‍♀️ 🖥

Practicing a Prismatic Approach on the Matter/Mind Dilemma: Wheels Within Wheels

The Biggest Misunderstanding about Behavioural Insights