How to Plot COVID-19 Data from Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University
Navigate to https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
Expand the pane on the bottom in the middle
Click on the downloadable database from GitHub here
Login to GitHub if you have not already done so
Navigate into the csse_covid_19_data folder
Navigate into the cse_covid_19_time_series
In the csse_covid_19_time_series are several files. Up until a couple days ago I was using the
following files
time_series_19-covid-Confirmed.csv
time_series_19-covid-Deaths.csv
time_series_19-covid-Recovered.csv
time_series_19-covid-Deaths.csv
time_series_19-covid-Recovered.csv
As of 3/23/2020 these files are not being updated any
more. The README.md file instructs us to
use the new files named:
time_series_covid19_confirmed_global.csv
time_series_covid19_deaths_global.csv
For this example I’ll start with confirmed covid19 cases
Open the file time_series_covid19_confirmed_global.csv in your web
browser. We are going to go to the raw
file, copy the URL, then use the URL for the raw file to get the data into Power
BI desktop. The steps are similar for
Excel if you have the Power BI plugin for Excel.
For each file, click the “Raw” button to bring up the URL
for the raw csv file. Copy the URLs and
save them for when we retrieve the data into Power BI
Copy the URL for the raw data for the confirmed cases and
the deaths and recoveries. I’ve pasted
them here so you don’t have to do this through GitHub, but be warned, this data
changes and they may introduce new files or change the names so if you want to make sure you have
the latest data you need to get the latest URL for the raw files from GitHub
Confirmed Cases:
Deaths:
As of this moment, there isn’t a new file for Recoveries yet
so I’ll leave that to you to find when it’s published.
Next, make sure you have PowerBI desktop installed. At present, you cannot edit queries in the online version of PowerBI. Queries can only be edited with the PowerBI desktop.
You can install the Power BI desktop from here
Next, make sure you have PowerBI desktop installed. At present, you cannot edit queries in the online version of PowerBI. Queries can only be edited with the PowerBI desktop.
You can install the Power BI desktop from here
Follow these steps to import the data into PowerBI
Open PowerBI Desktop
From the “Get data” pull down list select “Web” to import
the data from GitHub
Enter the URL for the confirmed cases then click OK, then
click Load. Don’t worry, I’ll show you
how to transform the data next
Repeat the same steps for COVID-19 deaths that you did to
import the Confirmed data.
Create new views of the imported data
On the left side of your scree you should see two queries/tables. Right-click on the ellipses “. . .” to bring
up “More Options” then select “Edit Query”
Start with time_series_covid19_confirmed_global
In properties, change the name to Confirmed
From Home tab, click “Use First Row as Headers”
Unpivot the table by doing the following
Click on the column named “1/22/20” then select all of the
columns to the right
From the Transform tab, click “Unpivot Columns”
Change the column named “Attribute” to “Date” and the column
name “Value” to “Confirmed”
On the column you renamed to “Date”, right click on the
column and mouse over “Change Data” then click on “Date”
Repeat these same steps for the time_series_covid19_deaths_global
query/table except name the “Values” column “Deaths”
Save your work
I’ll now show you how to create new queries from the queries
we imported from the Web
From the list of Queries on the left side, right click on “Conformed”
and select “Reference”
There is now a new query named “Confirmed (2)”, rename the
query to “DayCountryConfirmed”
Again, right-click on the “Confirmed” query on the left
side, then right click on “Confirmed” and select “Reference”
Again, there is a new query named “Confirmed (2)”, rename
the query to “DayUSStateConfirmed”
Let’s modify the “DayCountryConfirmed” query to remove unneeded
data, and group the data by date and Country with a sum of the deaths in each
country on each day
Click on the query name “DayCountryConfirmed”
Drag the date column all the way to the left so it’s the
first column
Remove the Lat and Long columns because we won’t use them
but we can get them back from the original query.
Group and sort the data by Date First, Country Second, and sum the Confirmed by
In the Transform tab, Ctrl-click Group By
The Group By dialog box is displayed
Change from “Basic” to “Advanced” so we can add multiple group-by
conditions
The first condition needs to be Date
Click “Add Grouping”
From the pull-down list select “Country/Region”
Under “New column name” type “Confirmed”, then from the
pull-down list under “Operation” select “Sum”, then from the pull-down list
under “Column” select Confirmed
Click OK
Now let’s sort things.
First by Date, then by Country by
Next to the column name for “Date” click the box just to the
right of the column and select “Sort Ascending”
Next to the column name for “Country/Region” click the box
just to the right of the column and select “Sort Ascending”
Check for any data quality problems by selecting a specific
country (like US) and make sure the data is correct and matches what is being
reported.
Compare data to










No comments:
Post a Comment