Food Insecurity in America during COVID

Image: USDA Snap

There are many at-risk groups during COVID, and food insecurity is a major concern when assistance programs started to get defunded during a crisis. Many news outlets are starting to report on this, like NPR and the NY Times. For this reason, I wanted to do a risk assessment on demographics vulnerable to food insecurity during COVID. I used SNAP (formally recognized as food stamps) QC datasets from the USDA to create a prediction model.

Image: NY Times

The technical stuff…

  • I wanted to do a 10-year gap analysis, emphasizing the effect of the 2008 housing crisis, so I used datasets from 2007 and 2017 to build the model.
  • I first focused on New Mexico as an emerging hot spot and Nebraska as an emerging cold spot, determined in a previous GIS spatial analysis to highlight extreme, contrasting states.
  • I started off with over 40k rows and 800 columns in each dataset, and using a combination of correlation, high nullity, data insight, and domain knowledge I whittled the data down to 33 columns and a final dataset of 4k records. Optimizing with python reference scripts for frequently used code.
  • I used a Vote Ensemble with Random Forest, Gradient Boost, and Bagging Classifier ending in a final CV score of 95%.
Report visualization generated using sklearn.treeinterpretor

What was interesting?​​

I focused on an interpretable model on purpose. I wanted to see what are the biggest impacts leading to food insecure communities. I was surprised throughout my analysis, from the EDA to the final interpreted results. The initial exploration showed me that the households requesting SNAP were not big families, maybe one or two children at most, and not alot of elderly members. There was one initial correlation, with almost half being single moms as head of household. As I got into the EDA, more correlations emerged like less working poor in 2017 vs 2007 and more assistance programs available for applicants. At this point, I noticed New Mexico seemed more vulnerable to economic changes than Nebraska.

Ultimately, the biggest surprise was in the interpretable coefficients of the model itself. Which told me the top 4 biggest impacts in predicting a household requesting SNAP is how expensive and how stable their housing situation was.

Final predicted results on a map generated using QGIS with a plotly extension.

If you want to learn more, including GitHub repository code and a powerpoint presentation, please visit my website.

GIS/Data Analyst/Data Scientist

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Classifying “The Faces of Scranton”

The Cast of The Office (US)

Analytics Systems Have Been Democratized. We Still Need Curious Business Minds to Drive Forward.

10 Useful Tips From Experienced Queue Management App Practitioners

Facing an Election Loss, President Donald Trump Turns to Raw Political Power to Try to Bend the…

Data Science Super-Skills for 2020 to Master

Why Data Science needs Operations Research algorithms?

Demonstrating the power of feature engineering — Part II: How I beat XGBoost with Linear Regression!

5 Tips to Effectively Tell a Story as a Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Melissa Anthony

Melissa Anthony

GIS/Data Analyst/Data Scientist

More from Medium

Evaluating peer support services in family reunification

Ancient Neighbors Forecast the Future: Learning from Lampreys about Climate Change

A long gray-brown lamprey swims along the bottom of a river bed

Exposing the Agenda of Moms for Liberty, LaVerna in the Library, Mary in the Library, and yes, our…

The Rapid Rise of the French Far-Right