Predicting the Future: COVID-19 and Contraceptive Supplies
Our work in machine learning showed that the process can help us make better sense of thousands of journal articles in a short time.
Machine learning also can help us make predictions for outcomes over time for individuals, such as whether they are likely to graduate from college. But what if we need to make predictions for groups: organizations where individuals work, live, or produce goods? Instead of focusing on individual people’s cases and determining who may be in highest need for an intervention, can we shift the focus to clusters of people and make predictions on their outcomes? We have been exploring such applications of machine learning through two very different use cases: COVID outbreaks in U.S. nursing homes and contraceptive stock levels in Cote d’Ivoire.
Nursing homes in the U.S. have become hotbeds of COVID-19 cases and deaths. By June 2020, nursing home residents accounted for 5% of the total cases and 27% of the total deaths in the U.S. As part of an internal data science sprint, several Abt researchers applied machine learning algorithms that use multiple decision trees, such as random forest and gradient boosting models that are optimized using grid search with cross-validation, to predict COVID-19 outbreaks in nursing homes.
The team took advantage of five publicly available datasets at both the facility and county levels to identify (1) which factors contributed to COVID-19 outbreaks in nursing homes, and (2) which facilities were most at risk of having a COVID-19 outbreak three weeks into the future. The COVID-19 pandemic is still evolving in the country, and so is the prediction model. The team’s model is designed to be refreshed every week with new COVID-19 data to improve prediction accuracy.
The preliminary results found that community spread – measured by the number of new COVID-19 cases in the county where the nursing home is located and in its contiguous counties three to seven weeks prior to the outbreak in a nursing home – was one of the strongest predictors of an outbreak in a nursing home. In addition, the team found that facility size, staffing levels, health inspection five-star ratings, and the percentage of Medicaid residents were among the most important predictors at the nursing home level. The Abt team’s model correctly identified the status of a nursing home three weeks into the future (i.e., COVID-19 outbreak versus no outbreak) in 91% of cases. The score is high because most nursing homes did not have an outbreak. There is room for improvement to minimize false positives (to increase precision) and false negatives (to increase recall).
How did these lessons learned for predicting COVID-19 outbreaks translate to predicting contraceptive stock levels in Cote d’Ivoire? As it turns out, many of the data structures that involved time lags in COVID-19 case reporting could be carried over to monthly contraceptive stock levels. As part of a USAID competition to build predictive models to help forecast contraceptive use in Cote d’Ivoire, Abt created another data science team to build machine learning models to address this issue. Using machine learning algorithms similar to those the nursing home team used, this team used almost four years of monthly data to develop a random forest model to predict contraceptive stock levels in 156 facilities for 11 different types of contraceptives. The team also processed publicly available geospatial data that incorporated demographic characteristics and used them in the model. The preliminary finding suggest that the model can predict contraceptive stock levels 12% better on average than a simple forecast based on the previous time period. That may understate the model’s performance because we lacked some data traditional forecasts could include. Moreover, our model is capable of making these predictions two months earlier than a simple forecaster could, increasing its potential utility in the field.
These short-turnaround data science sprints help us learn what the best practices are for predicting outcomes using machine learning. We are incorporating time series data into machine learning algorithms and determining whether the algorithms perform better or worse than traditional approaches to time series forecasting. This is the type of question that is best answered by testing the newest techniques available and implementing them on data and subject areas that Abt knows from previous work.