Student Work


Modeling and Analysis of COVID-19 using Mathematical and Data Analytic Methods Public

Downloadable Content

open in viewer

COVID-19 is a highly contagious infectious disease that has spread throughout the world. On January 19th, 2020, the first case in the United States was reported by a patient in Washington State. Since then, COVID-19 has killed over 850,000 people in the United States alone, and despite having readily available vaccines and being over two years into the pandemic, COVID-19 cases remain high. In continuing to fight the global pandemic, it is important to study the development of COVID-19 and methods to mitigate its spread. In order to contribute to research, we developed an epidemiological compartmental model of the pandemic using a system of differential equations from which we determined a formula for the Basic Reproduction Number. Using the model, we conducted a case study within Massachusetts to determine the effects vaccinations and other preventive measures have on mitigating the spread of COVID-19 by using the Basic Reproduction Number as an indicator. The epidemiological model requires a set of parameters that describe the behavior of COVID-19: the rate it takes someone to recover from COVID-19, the rate at which natural immunity wanes, the birth/death rate, the rate at which immunity from vaccines wane, the vaccination rate, daily reported cases of COVID-19, and the transmission rate of COVID-19. Using online research and the Massachusetts Government Response Reporting Website, values for all parameters excluding the transmission rate were found. By conducting a statistical analysis in which we compared the model’s expected cases to real reported cases, we were able to solve for the transmission rate. Once all parameters were determined, we solved for the Basic Reproduction Number and described the reasoning for changes in its behavior. This includes a comparison of the Basic Reproduction Number's time series alongside changes in travel restrictions, government mandated lockdowns, mask mandates, and the vaccination rate. The comparison sheds light on which mitigation techniques were the most effective at preventing the spread of COVID-19. Also, this information indicates whether or not COVID-19 will evolve into an endemic state, or diminish until we have a disease-free state. Furthermore, we developed several linear regression models to assess the effectiveness of COVID-19 mitigation techniques. The multiple linear regression describes the extent that techniques such as mask mandates, social event restrictions, business closings, seasonal changes, and vaccinations have on influencing the Basic Reproduction Number that we previously solved for. Also, a logistic regression was developed to determine which restrictions are most likely to result in an endemic or disease-free state. These regressions describe the significance of various restrictions in preventing the spread of COVID-19. We hope that this information will be helpful for future research into COVID-19 and for determining a more accurate Basic Reproduction Number. Further research will allow us to understand the behavior and spread of COVID-19, allowing for a more comprehensive solution to the pandemic. In addition to differential equation based methods of studying COVID-19's reproduction number and modeling its spread, we studied COVID-19 through a data-driven lens as well. During the course of our study, we collected and cleaned Massachusetts COVID-19 data to forecast future cases. Data was acquired from the Massachusetts government website. The model selected for this task was the Autoregressive Integrated Moving Average model (ARIMA). By generating a model through both manually selected and automatically selected parameters, we were able to produce forecasts of COVID cases a week in advance; however, the forecasts were not as accurate as we had hoped. Finally, the last portion of the project was a clustering analysis as a method to determine which states or groups of states managed to keep the COVID cases low, relative to their population. To do this, all the daily vaccination and COVID case data was collected from the Center of Disease Control, and then was cleaned for analysis purposes. We then separated the data into three key time periods: Pre-Delta, Delta, and Omicron in order to separate distinct periods where we'd expect different results. By aggregating the COVID cases per month for each state, as well as the total vaccinations of the state in that month, we were able to create three scatter plots of vaccinations per population and cases per population: one for each time period. These scatter plots were then clustered after performing the min-max scaling technique, and clustered using K-Means. From the clusters generated, we are able to identify states that had high COVID positivity rate, despite having a high number of vaccinations. Another finding from this clustering analysis is the clear drop off of vaccination efficacy as a preventative measure for COVID spread as new variants emerged.

  • This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.
  • E-project-032122-094122
  • 52296
  • 2022
Date created
  • 2022-03-21
Resource type
Rights statement


In Collection:


Permanent link to this page: