Wednesday, March 25, 2020

How to import COVID-19 From Johns Hopkins University


How to Plot COVID-19 Data from Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University





Expand the pane on the bottom in the middle


Click on the downloadable database from GitHub here


Login to GitHub if you have not already done so
Navigate into the csse_covid_19_data folder


Navigate into the cse_covid_19_time_series


In the csse_covid_19_time_series are several files.  Up until a couple days ago I was using the following files
time_series_19-covid-Confirmed.csv
time_series_19-covid-Deaths.csv
time_series_19-covid-Recovered.csv

As of 3/23/2020 these files are not being updated any more.  The README.md file instructs us to use the new files named:
time_series_covid19_confirmed_global.csv
time_series_covid19_deaths_global.csv

For this example I’ll start with confirmed covid19 cases
Open the file time_series_covid19_confirmed_global.csv in your web browser.   We are going to go to the raw file, copy the URL, then use the URL for the raw file to get the data into Power BI desktop.  The steps are similar for Excel if you have the Power BI plugin for Excel.


For each file, click the “Raw” button to bring up the URL for the raw csv file.  Copy the URLs and save them for when we retrieve the data into Power BI


Copy the URL for the raw data for the confirmed cases and the deaths and recoveries.  I’ve pasted them here so you don’t have to do this through GitHub, but be warned, this data changes and they may introduce new files or change the  names so if you want to make sure you have the latest data you need to get the latest URL for the raw files from GitHub
Confirmed Cases:

Deaths:

As of this moment, there isn’t a new file for Recoveries yet so I’ll leave that to you to find when it’s published.

Next, make sure you have PowerBI desktop installed.  At present, you cannot edit queries in the online version of PowerBI.  Queries can  only be edited with the PowerBI desktop.

You can install the Power BI desktop from here

Follow these steps to import the data into PowerBI
Open PowerBI Desktop


From the “Get data” pull down list select “Web” to import the data from GitHub


Enter the URL for the confirmed cases then click OK, then click Load.  Don’t worry, I’ll show you how to transform the data next


Repeat the same steps for COVID-19 deaths that you did to import the Confirmed data.

Create new views of the imported data

On the left side of your scree you should see two queries/tables.  Right-click on the ellipses “. . .” to bring up “More Options” then select “Edit Query”

Start with time_series_covid19_confirmed_global
In properties, change the name to Confirmed
From Home tab, click “Use First Row as Headers”

Unpivot the table by doing the following
Click on the column named “1/22/20” then select all of the columns to the right
From the Transform tab, click “Unpivot Columns”
Change the column named “Attribute” to “Date” and the column name “Value” to “Confirmed”
On the column you renamed to “Date”, right click on the column and mouse over “Change Data” then click on “Date”

Repeat these same steps for the time_series_covid19_deaths_global query/table except name the “Values” column “Deaths”

Save your work

I’ll now show you how to create new queries from the queries we imported from the Web
From the list of Queries on the left side, right click on “Conformed” and select “Reference”

There is now a new query named “Confirmed (2)”, rename the query to “DayCountryConfirmed”

Again, right-click on the “Confirmed” query on the left side, then right click on “Confirmed” and select “Reference”

Again, there is a new query named “Confirmed (2)”, rename the query to “DayUSStateConfirmed”

Let’s modify the “DayCountryConfirmed” query to remove unneeded data, and group the data by date and Country with a sum of the deaths in each country on each day

Click on the query name “DayCountryConfirmed”

Drag the date column all the way to the left so it’s the first column

Remove the Lat and Long columns because we won’t use them but we can get them back from the original query.

Group and sort the data by Date First, Country Second, and sum the Confirmed by

In the Transform tab, Ctrl-click Group By
The Group By dialog box is displayed

Change from “Basic” to “Advanced” so we can add multiple group-by conditions

The first condition needs to be Date

Click “Add Grouping”

From the pull-down list select “Country/Region”

Under “New column name” type “Confirmed”, then from the pull-down list under “Operation” select “Sum”, then from the pull-down list under “Column” select Confirmed

Click OK

Now let’s sort things.  First by Date, then by Country by

Next to the column name for “Date” click the box just to the right of the column and select “Sort Ascending”

Next to the column name for “Country/Region” click the box just to the right of the column and select “Sort Ascending”

Check for any data quality problems by selecting a specific country (like US) and make sure the data is correct and matches what is being reported.

Compare data to


No comments:

Post a Comment