This module will introduce you to Shiny, a framework that integrates with RStudio to construct web-based dashboards. We will work through a number of simple examples of loading data, visualizing it with R's built-in graphics operations, then integrating those visualizations into an interactive Shiny web dashboard, which can be viewed online by anyone with a web browser.
In order to use the Shiny examples in this tutorial, you will need to add the following R packages:
at the R command line.
R provides numerous ways to generate plots or
visualizations of data stored in vectors, matrices, tables,
and data frames. The two most common methods for visualizing data are
R's
base graphics, which are included as part the standard R package,
and ggplot2
, a package by
Hadley Wickham specifically designed to support the flexible design of
plots ranging from simple to complex.
As an example, the following code produces a bar graph of the
heights of the trees in the built-in trees
data
frame.
The same bar graph can be produce using ggplot2
as
follows.
ggplot2
is based on the
grammar of graphics, a foundation proposed by Leland Wilkinson to
generate visualizations. In this context, a chart is divided into
individual components — data, annotations, scales, and so on
— that are connected together using addition. In the example
above, each command has the following meaning.
library( "ggplot2" )
ggplot2
package.
x_lbl = row.names( trees )
row.names( trees )
provides a list of row indices for
the data frame as a character vector.
x_lbl
from a character vector into a
factor (i.e., a list of categories). Because of the way
characters are sorted, factor( x_lbl )
would produce an
order of 1, 10, 11, …, 2, 20, 21, …, 3, 30, 31, 4, 5,
… . To get the proper order of 1, 2, 3, … 31,
we specify the levels in the factor with levels=unique( x_lbl
)
. This produces a properly ordered list of unique category
values (in this case, the numbers from 1 to 31 in order).unique()
in this scenario, since the
numeric vector would be properly ordered, and there are no duplicate
indices.
ggplot( data=trees, aes(x=x_lbl, y=Height) ) + geom_bar(
fill="lightblue", stat="identity" ) + xlab( "ID" ) + ggtitle( "Tree
Height" )
Throughout these notes, we will use ggplot
for our
examples.
We provide examples of the standard charts you're likely to use when you're building an R+Shiny web application. Basic bar charts have been covered above. Below we examine variations on bar charts, line charts, pie charts, scatterplots, and histograms.
In addition to basic bar charts, we often want to construct
stacked or side-by-side bar charts to compare and contrast
subcategories of our data. Consider the beaver1
dataset
included in the standard R install.
The df[ order( df$activ ), ]
command is critical, because
it guarantees activity values for individual days are grouped
together. Without this, a day's active and inactive values will be
spread throughout the day, and the stacked chart will look like a
single bar with lines through it where the activ
factor
changes its value.
Side-by-side bar charts are generated similarly, however here we need to aggregate a total (e.g., a number of occurrences of each activity) for each subcategory (e.g., for each day). Once we have done that, we can plot the subcategories as side-by-side bars.
Here, we select the day
and activ
columns
from the beaver1
dataset, use table
to
compute the frequency of activity for each day, convert the result to
a data frame, then plot it as a side-by-side bar. We
use position="dodge"
in the geom_bar
command
to get a side-by-side bar graph. The
default, position="stacked"
, would give us a stacked bar
chart.
ggplot
stacked bar chart on the left, and a
corresponding side-by-side bar chart on the right
Building a line chart in ggplot
is very similar to
building a bar chart, except that we substitute geom_bar
with geom_line
. This shows one of the strengths of
ggplot
. Since the data, representation of the data, and
decorations on the representation are all built separately and
combined, switching the representation from bar to line involves
changing only one part of the overall ggplot
command.
Notice here that we have treated the data frame row indices as a
sequence of numeric values: x_lbl <- as.numeric( row.names(
trees ) )
, and not as a factor variable. In a line chart, by
default ggplot
uses the combination of all factor
variables to group the points. This is not what we want, so we cannot
use a factor variable for the x-axis. An alternative to this is to
manually specify the grouping. Using the aesthetic specification
aes( group=1 )
specifies we want a single line connecting
all the points.
ggplot
line chart
Notice that we also used geom_point
to add an open circle
at each height value. The shape
argument defines how
points are displayed. Shapes
are defined numerically to provide open and filled circles,
squares, triangles, other glyphs like plus and X-symbols.
We can also built multi-line charts, where each line represents a
separate factor. Consider the chickwts
dataset, which
lists chicken weight by the type of feed it was given. The following
code generates a multi-line chart, one line per feed type, showing the
weight of each chicken that received the given feed.
ggplot
multi-line chart
The group=feed
argument in the initial
ggplot
command defines which variable to use to split the
dataset into individual lines.
Pie charts are closely related to stacked bar
graphs. In ggplot
terms, you can think of a pie chart as
a stacked bar chart that's been "wrapped" to form a circle. The code
below uses the built in chickwts
dataset to build a
stacked bar chart of average chicken weight by feed type.
ggplot
stacked bar chart on the left, and a
corresponding pie chart on the right
Notice the R commands aggregate( chickwts$weight, by=list(
chickwts$feed), FUN=mean )
and df[ order( -df$weight ),
]
. The first command aggregates chicken weight by feed type,
producing a data frame with a single average weight entry for each
feed type. The second command sorts the data frame descending by
average weight. We want to do this, because in a pie chart we want to
display slices in descending order from largest to smallest.
To convert the stacked bar chart into a pie chart, we simply add an
additional ggplot
command coord_polar
to
plot the data in polar coordinates. This produces the pie chart shown
above and to the right.
Here's a slightly more informative and aesthetic version of the pie
chart. You can check your R knowledge and consult the ggplot
documentation to explore the commands used to create this chart.
A scatterplot is normally used to look for relationships between two
variables. For example, suppose we wanted to visually explore whether
a relationship exists between a tree's height and its volume. This
can be done using the geom_point
command.
The figure suggests there appears to be a relationship
between tree height and volume, but it would be useful to plot a
regression line through the points to see how well it fits the data,
and what its slope is. This can be done in ggplot
using
the geom_smooth
command.
Adding the regression line and confidence interval seems to further
confirm a relationship between tree height and volume. Accessing the
lm
function directly confirms a p-value of less
than 0.05, the normal cutoff for rejecting the null hypothesis.
Histograms allow you to: (1) count the number of occurrences in a
categorical variable, or (2) discretize a continuous variable, then
count the number of occurrences of values within a predefined set of
ranges or bins. Both approaches are demonstrated below. The
first uses the built in airquality
dataset and treats
temperature as a factor (i.e., as a categorical variable) to
count the number of temperature occurrences within the dataset. The
second uses the chickwts
to count the number different
chicken weights using equal-width bins of eight ounces.
If you create a histogram from a discrete variable
(e.g.,factor variable), you use geom_bar
. This
makes sense intuitively, since counting occurrences in a categorical
variable is, in essence, equivalent to generating a bar chart of
counts of the variable's values. If you create a histogram from a
continuous variable, you use geom_histogram
.
In the discrete histogram example above, we use
scale_y_continuous
to explicitly define the tick
positions on the y-axis. In the geom_histogram
example
above, we use the alpha
argument to make each bar
semi-transparent.
A final chart that is often useful in statistics is the boxplot, a visualization that identifies the median, the second and third quartiles boundaries Q1 and Q3, and the inner and outer "fences", normally 1.5 × IQR (inter-quartile range Q3 - Q1) below and above Q1 and Q3. Any points outside the fences are plotted as outliers.
An example of boxplots for chickwts feed type versus average weight can be constructed as follows.
This boxplot shows only a few outliers in the "sunflower" feed type
category. Another example uses the iris dataset to plot Sepel
Width by Species. This shows a few additional outliers, both above and
below the IQR fences. We have also used geom_dotplot
to
display all of the data points at their corresponding Sepal Width
positions, overlaid on top of the boxplot.
ggplot
also has the ability to visualize data on maps
using commands like geom_map
and coord_map
.
You can easily add a map projection to the map using
coord_map
.
The R code above produces a basic map of the U.S., then warps it using
an Albers map
projection. An Albers projection requires two parallels to project
about, defined as lat0=29.5
and
lat1=49.5
. The current USGS standard is to display maps
using Albers projection, and for maps of the continental United
States, parallels of 29.5°N and 49.5°N are recommended.
To produce a map with data overlaid, you normally start by drawing a
base map, then adding a second map layer using geom_map
containing the data you want to visualize.
For example, suppose we wanted to visualize a choropleth map of state
population. The R built in state.x77
data frame contains
various information about US states, including estimated population in
millions as its first column. We can use this to colour individual
states darker for lower populations and lighter for higher
populations.
The key concept to understand here is how ggplot
maps
regions on the map to data values that drive the region's colour.
This is done with the map_id
aesthetic field. If you
look at the state
data frame, you'll see that the
individual polygons that make up each state are identified by the
state's name, in lowercase.
When we built the choropleth
variable, we included an
ID
column that also used lowercase state name. We then
matched the columns between the states
and
choropleth
data frames using the map_id
aesthetic field, using map_id=region
in the base map and
map_id=ID
in the choropleth layer that fills in the
individual state polygons.
Other types of maps, like dot maps, can also be generated using
ggplot
. As with the choropleth map, we begin with a base
map, then add points to it, in this case using
the geom_point
command. The example uses two CSV files,
cities-coords.csv
and
cities-data.csv. You'll need to download these files, and use
setwd()
to change RStudio's working directory to the
directory containing the files.
In this example, we've actually created a proportional dot map, where
the size of each dot represents the population of its corresponding
city. The pmax
command is used to ensure a minimum dot
size of 5.0.
Shiny is a package developed by RStudio that allows the creation of web-based, interactive dashboards based on R graphics and jQuery user interface (UI) widgets. This provides a way to create web-based dashboards that allow users to interactively explore an underlying dataset.
A Shiny application is made up of at least two separate R files:
ui.R
that defines the layout of the dashboard and the UI
widgets it contains, and server.R
that responds when a
user interacts with the UI, reading new interface values and
generating new visualizations based on those values.
As an example, here is a simple application that allows a user to
choose a bin width, then plots the number of chickens from the
chickwts
dataset that have a weight within the range of
each bin.
ui.R
server.R
To run a Shiny application, place ui.R
and
server.R
in a common directory. We'll assume the
directory is called shiny
. Next, ensures RStudio's
current working directory is the parent directory that holds the
shiny
subdirectory. You can check the working directory
with the command getwd()
and set the working directory
with the command setwd()
. One RStudio is in the proper
directory, run the Shiny app with the runApp
command.
This will create a new web browser window within RStudio and run the Shiny app in that window.
Let's look at the ui.R
and server.R
in more
detail. As discussed previously, the UI code builds the user
interface, which includes interactive widgets and output
(visualizations) displayed based on the current value of the widgets.
The server code is responsible for reacting to changes in the
UI widget values, generating updated visualizations based on those
changed values, and pushing the results back to the UI side to be
displayed.
Looking at ui.R
in more detail, we see the following
operations.
shinyUI( fluidPage(
fluidPage
layout will be used to
allow for flexible placement of widgets and output.
sidebarLayout(
sidebarPanel(
sliderInput(
"bins",
bins
.
"Bin Width:",
min = 5, max = 50,
value = 20,
mainPanel(
plotOutput( "distPlot" )
distPlot
.
The server code is shorter and simpler, since its only job is to
receive the current value of the sliderInput
widget
variable bins
, create a histogram of chicken weights
based on the values of bins
, then assign that histogram
to the output variable distPlot
.
shinyServer( function( input, output ) {
input
and output
contain
information about the input (i.e., UI) and output
(i.e., visualizations in the MainPanel
region of
the application).
output$dispPlot <- renderPlot( {
dispPlot
. Notice that
since dispPlot
is part of the mainPanel
in ui.R
, whatever we assign here will appear in the main
output region of our application.
ggplot( data=chickwts, ... + ggtitle( "Chicken
Weight Distribution" )
dispPlot
). Notice that we
use binwidth=input$bins
in
the geom_histogram
call. This means that, whatever value
the user has chosen with the sliderInput
, that value will
be used to define the width of the bins in the resulting histogram.
The way that the UI and server code communicate with one another occurs through reactivity, Shiny's terminology for an approach similar to callbacks in other languages. To start, we'll discuss reactive sources and reactive endpoints.
In simple terms, a reactive source is normally a variable attached to
a user interface widget in the UI source code. For example, in our
ui.R
example, the variable bins
is a
reactive source, since it is attached to a slider and needs the
dashboard to "react" whenever its value changes.
On the other hand, the variable distPlot
is a reactive
endpoint, since code in server.R
assigns a histogram
to distPlot
based on the reactive
source bins
. The histogram is then displayed in the
dashboard's mainPanel
.
From this example, we can see that reactive source and endpoint variables are normally defined in the UI code, and responses to changes in reactive sources are normally managed in the server code, with a typical response being to update the value of a reactive endpoint. You can see this exact type of processing happening in our example program.
bins
(a
reactive source) to change.
shinyServer
reacts to this change by
executing any code that uses the bins
variable, in
particular, by updating the value of distPlot
(a reactive
endpoint).
shinyUI
reacts to the change in the value
of distPlot
by updating itself to
display distPlot
's new value in
its mainPanel
.
It is now clear what input
and output
represent in the function defined within shinyServer
.
input
contains (among other things) values for all of the
reactive sources in ui.R
. In particular, it contains the
value of bins
, which is accessed in typical R fashion
as input$bins
.
Similarly, output
contains values for the reactive
endpoints in ui.R
. In our example, this
includes distPlot
, which needs to be updated
whenever bins
changes. Shiny recognizes that the
assignment to output$distPlot
in
the shinyServer
function access input$bins
(to define the histogram's binwidth
),
so shinyServer
is automatically called by Shiny whenever
the value of the reactive source bins
changes. Similarly,
Shiny calls the shinyUI
function whenever the value of
the reactive endpoint distPlot
changes.
bins
reactive source acting as a dependent to a
distPlot
reactive endpoint
Shiny represents the relationships between reactive sources and
endpoints as shown in the diagram above. We would describe this as
reactive sources having one or more dependents (in our
example, bins
has one dependent distPlot
),
and reactive endpoints being dependent on one or more
reactive sources (in our example, distPlot
depends
on bins
).
A final type of component used in Shiny is a reactive conductor. The purpose of a conductor is to encapsulate a computation that depends on reactive source(s). The result returned by a reactive conductor is normally used by a reactive endpoint. Since the reactive conductor caches its return value, if the value is slow to compute, or if it is going to be used by numerous reactive endpoints, the conductor improves the overall efficiency of the Shiny program. Reactive conductors are also useful for performing longer computations that you might not want to embed directly in code used to assign a value to a reactive endpoint.
Consider the following modifications to ui.R
and
server.R
, to allow us to print some text information
about the histogram in our dashboard.
ui.R
server.R
This code makes the following changes to the original chicken weights histogram dashboard.
uiOutput( "distInfo" )
object is added
to ui.R
's mainPanel
to hold text information
about the histogram. Notice that the uiObject
defines a
new reactive source distInfo
.
g()
. This is done because g()
is
used to set both distPlot
and values
in distInfo
. Rather than build the histogram twice, we
build it once, then use the cached value for both assignments.
g()
reactive conductor creating a ggplot histogram
based on the value of reactive source bins
right
ggplot_build
on g()
, then extract
field data[[ 1 ]]
, which holds information about the
histogram.
renderUI
is used to render HTML-styled text to a
uiOutput object. Notice that we first build the string
bin_s
with HTML code like <b>
and
<br>
. When we pass the result to the
uiOutput
object in ui.R
, however, we must
wrap it in an HTML()
call to convert the result into an
HTML object that uiOutput
can display properly.
uiOutput
object to display text describing specific properties of the
dashboard
Finally, here is a Shiny dashboard that displays information from the
built in iris
dataset. It allows a user to choose which
samples from the three iris species to visualize as boxplots using
checkboxes. It allows the user to display outliers only, or all points
using radio buttons. Finally, it allows the user to set the IQR range
to define outliers using a slider.
As in the example above, a reactive function is used to build a
reactive conductor df()
that contains the subset of
samples to plot (based on which species the user chooses to
visualize). This allows the Points and Outlier IQR inputs to change,
without needing to re-subset the original dataset.
ui.R
server.R
There are numerous ways to publish your applications so others can use them. One simple, built in method is to deploy your application on Shiny's application cloud, located at https://www.shinyapps.io. Another is to embed your Shiny UI and server together in a single R file, then send that file to other users.
Shiny applications can be published on RStudio's cloud application server. You may have noticed a "Publish" button in the upper-right corner of your Shiny application window. If you've authorized a shinyapps.io account in RStudio, clicking this button generates a dialog that allows you to choose a directory with Shiny code to deploy, and an account to use to host the application.
To setup your shinyio.apps account, visit shinyapps.io and choose "Sign Up" to create an account to host your Shiny applications. You will be asked to enter an email address and a password for you new account, then asked to choose a name for your account. Once this is done, instructions will be provided to setup RStudio to publish applications. This involves three steps.
rsconnect
package, if it isn't already
installed, by issuing install.packages( 'rsconnect' )
at
the console prompt.
rsconnect
is available
by loading it with library( rsconnect )
, then using
deployApp( "app-directory" )
, identical to how you
use runApp
to run the Shiny application.
Once your Shiny app is deployed, it will be available at a specific
URL, so anyone with a web browser will be able to run it. The URL uses
your shinyapps.io account name account-name
and the name
of the application directory
app-directory
you used to upload your application,
specifically:
So, for example, if my account name was msa
and I
uploaded an application in a directory called shiny
,
users could access that application at the URL https://msa.shinyapps.io/shiny.
Another option is to embed Shiny UI and server code directly in a single R file, then share the file with other R users. The code below shows an example of embedding our original Shiny histogram application as a single R file.
Here, we create two variables hist_ui
and hist_server
, each containing the contents of the
original ui.R
and server.R
. We then run the
application using the
shinyApp()
command, passing ui = hist_ui
and
server = hist_server
as arguments to define the UI and
server components of the application.
In a self-contained program like this, we also need to ensure the
proper libraries ggplot2
and shiny
are
available when a given user tries to run the
application. The load()
function is built to do this. It
uses require()
to attempt to load both libraries,
returning FALSE
if either library is not available.
The load()
function monitors return codes
from require()
, returning TRUE
only if both
libraries are loaded. The mainline of the program begins by calling
load()
, and only runs the body of the program to create
and execute the Shiny app if load()
returns TRUE
(that is, if both the ggplot2
and shiny
libraries are available).