Thursday, April 2, 2020

COVID-19 Good News Day 1

What do Kentucky, Missouri, Montana, Nebraska, North Carolina, Oregon, Washington, and Vermont have in common?

Each of these states have experienced a decline in in the new number of deaths from COVID-19 reported one week earlier.

Here is a chart that shows the change in the number of deaths reported each day from the deaths reported seven days ago, relative to each day on the y-axis, and the total number of deaths reported for each state on the x-axis.  The significance of this, if my understanding is correct, is that epidemiologists can use this to help determine the strength/momentum of an epidemic when the number of deaths from the epidemic are increasing exponentially.

My goal with sharing this information is to highlight states, communities, and countries that seem to be experiencing a change in the statistics for the better.  I understand that epidemiologists normally use confirmed cases, not deaths; however, it is my opinion that COVID-19 has spread far beyond the numbers of the people that have been tested.  However, I will switch, If someone can convince me as to why the number of confirmed COVID-19 cases is a more correct data point to model from than deaths.

I've attached two charts that show these changes, one chart with all the states I mentioned, and one chart without the state of Washington for reasons that you can see in the charts.
Chart 1: All 8 States

As you can see from Chart 1, there is so much more data from the state of Washington that it dwarfs all the other states.  I will try not to speculate to much but it seems that there have been so many more cases of COVID-19 in Washington because it was one of the first and hardest hit when the SARS-CoV2 virus landed on shores of the US.  

To help see the data for the other states, here is the same chart minus the data from the state of Washington.

Chart 2: Just 7 States

OK, this is still confusing so here is the chart for each state starting with Vermont, the state, from these 7 that have reported the most deaths per person. As shown in this Chart 3

Chart 3

Vermont:

Chart 4

Washington:

Chart 5

Montana:

Chart 6

Oregon
Chart 7

Nebraska

Chart 8

NorthCarolina

Chart 9

I sincerely hope that these states really do see a sustained reductions in the number of new deaths reported every day, week over week, from COVID-19, that this information helps people commit to being part of the solution, and that there are real reasons the disease has not progressed in these states as it has in others.

If anyone is wondering if I cherry-picked these states, I have not.  I have the same data Johns Hopkins is using to report on.

Less than a month ago, COVID-19 was something really bad happening somewhere else; however, it was clear that it would touch us soon.  That day has arrived and COVID-19 is in every state, and likely in every country, I think.



Wednesday, March 25, 2020

How to import COVID-19 From Johns Hopkins University


How to Plot COVID-19 Data from Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University





Expand the pane on the bottom in the middle


Click on the downloadable database from GitHub here


Login to GitHub if you have not already done so
Navigate into the csse_covid_19_data folder


Navigate into the cse_covid_19_time_series


In the csse_covid_19_time_series are several files.  Up until a couple days ago I was using the following files
time_series_19-covid-Confirmed.csv
time_series_19-covid-Deaths.csv
time_series_19-covid-Recovered.csv

As of 3/23/2020 these files are not being updated any more.  The README.md file instructs us to use the new files named:
time_series_covid19_confirmed_global.csv
time_series_covid19_deaths_global.csv

For this example I’ll start with confirmed covid19 cases
Open the file time_series_covid19_confirmed_global.csv in your web browser.   We are going to go to the raw file, copy the URL, then use the URL for the raw file to get the data into Power BI desktop.  The steps are similar for Excel if you have the Power BI plugin for Excel.


For each file, click the “Raw” button to bring up the URL for the raw csv file.  Copy the URLs and save them for when we retrieve the data into Power BI


Copy the URL for the raw data for the confirmed cases and the deaths and recoveries.  I’ve pasted them here so you don’t have to do this through GitHub, but be warned, this data changes and they may introduce new files or change the  names so if you want to make sure you have the latest data you need to get the latest URL for the raw files from GitHub
Confirmed Cases:

Deaths:

As of this moment, there isn’t a new file for Recoveries yet so I’ll leave that to you to find when it’s published.

Next, make sure you have PowerBI desktop installed.  At present, you cannot edit queries in the online version of PowerBI.  Queries can  only be edited with the PowerBI desktop.

You can install the Power BI desktop from here

Follow these steps to import the data into PowerBI
Open PowerBI Desktop


From the “Get data” pull down list select “Web” to import the data from GitHub


Enter the URL for the confirmed cases then click OK, then click Load.  Don’t worry, I’ll show you how to transform the data next


Repeat the same steps for COVID-19 deaths that you did to import the Confirmed data.

Create new views of the imported data

On the left side of your scree you should see two queries/tables.  Right-click on the ellipses “. . .” to bring up “More Options” then select “Edit Query”

Start with time_series_covid19_confirmed_global
In properties, change the name to Confirmed
From Home tab, click “Use First Row as Headers”

Unpivot the table by doing the following
Click on the column named “1/22/20” then select all of the columns to the right
From the Transform tab, click “Unpivot Columns”
Change the column named “Attribute” to “Date” and the column name “Value” to “Confirmed”
On the column you renamed to “Date”, right click on the column and mouse over “Change Data” then click on “Date”

Repeat these same steps for the time_series_covid19_deaths_global query/table except name the “Values” column “Deaths”

Save your work

I’ll now show you how to create new queries from the queries we imported from the Web
From the list of Queries on the left side, right click on “Conformed” and select “Reference”

There is now a new query named “Confirmed (2)”, rename the query to “DayCountryConfirmed”

Again, right-click on the “Confirmed” query on the left side, then right click on “Confirmed” and select “Reference”

Again, there is a new query named “Confirmed (2)”, rename the query to “DayUSStateConfirmed”

Let’s modify the “DayCountryConfirmed” query to remove unneeded data, and group the data by date and Country with a sum of the deaths in each country on each day

Click on the query name “DayCountryConfirmed”

Drag the date column all the way to the left so it’s the first column

Remove the Lat and Long columns because we won’t use them but we can get them back from the original query.

Group and sort the data by Date First, Country Second, and sum the Confirmed by

In the Transform tab, Ctrl-click Group By
The Group By dialog box is displayed

Change from “Basic” to “Advanced” so we can add multiple group-by conditions

The first condition needs to be Date

Click “Add Grouping”

From the pull-down list select “Country/Region”

Under “New column name” type “Confirmed”, then from the pull-down list under “Operation” select “Sum”, then from the pull-down list under “Column” select Confirmed

Click OK

Now let’s sort things.  First by Date, then by Country by

Next to the column name for “Date” click the box just to the right of the column and select “Sort Ascending”

Next to the column name for “Country/Region” click the box just to the right of the column and select “Sort Ascending”

Check for any data quality problems by selecting a specific country (like US) and make sure the data is correct and matches what is being reported.

Compare data to