SAS Visual Analytics: Institute for Advanced Analytics

Viya VA

Introduction

This module will introduce you to SAS Visual Analytics (VA), a SAS module designed to analyze and visualize large datasets. We will work through a simple example of loading data, and visualizing it in different ways as a dashboard.

Setup

In order to work with Viya VA, you will need access to an online Viya account, since VA runs as a web application. To do this, you must first login to the VPN using Cisco AnyConnect Secure Mobile Client. Once you have connected to the VPN, open a web browser and navigate to https://sasviya.iaa.ncsu.edu. You will be started in the last module you were running when you most recently logged out of Viya. To start SAS Visual Analytics (VA), click the three bars (the hamburger) left of the SAS Drive text at the top of the screen, and choose Explore and Visualize Data from the pop-down menu.

Basic Viya VA

In order to visualize, we need to start with a dataset. We'll be using a modified version of a CSV activity dataset originally provided by Peter Aldhous as part of his Tableau demo. Much of the material we use here is taken from this demo, then updated for Viya VA and extended to apply some of the visualization principles we learned during the summer visualization lectures.

Once you've downloaded the activity dataset to your computer, open it in Excel and spend a few minutes familiarizing yourself with its different fields.

Loading Data

To start, we need to load the activity dataset into Viya VA.

Launch Viya VA.
Click the "Start with Data" button near the top of the screen.
Click the "Import" option at the top of the list of available datasets. Click ">> Local Files" from the "Add data sources..." options, then choose "Microsoft Excel" from the drop-down menu. Locate the USDA_acctivity_dataset.xlsx that you downloaded, select it, and click "Open."
The "Add to Import" dialog will appear. Keep the default settings and click the "Add" button.
Information about the dataset will be displayed, with a variety of options to control the import process. Keep the default selections and click the "Import item" button in the upper-right corner of the screen.
A small green line near the top of the screen will inform you "The table was successfully imported." Click the "OK" button to complete the import operation.

Now, you will be taken back to the VA screen, with the data from the imported dataset displayed in the Data panel on the left-hand side of the screen

Categories and Measures

Once you load the USDA Activity Dataset into Viya VA, it should present a window that looks similar to this.

Viya VA's initial page after importing the USDA dataset.

Based on the above example, the following regions of the VA workspace correspond to different presentation management.

Application bar. Used to access other SAS Viya applications.
Tab bar. Used to create new tabs and manipulate existing tabs via the trip-dot menu on the right of each tab.
Left pane. Used to work with data, report objects, and the report outline.
Canvas. Used to visualize data as a dashboard.
Right pane. Used to control details about the report and individual report objects.

Notice that Viya VA has divided the data in the Excel file into Category, Measure, and Aggregated Measure. Categories and Measures correspond to Dimensions and Measures in Tableau: categorical or qualitative data fields are considered categories, and numeric or quantitative information are considered measures. Aggregated Measures allow you to apply an aggregation (e.g., average, median, etc.) to a measure and save it as a new data field. This can also be done to the measure directly, but a direct aggregation then applies to all visualizations that use the measure. There is no way to independently vary the aggregation for a measure used in a visualization, similar to the way aggregations can be chosen for data on the rows in Tableau.

As with Tableau, you can convert from one data role to another: hover of the field name, click the double down-arrows that appear to the right of the name, then choose an available option from the Classification drop-down menu. As an example, we will convert State and County from generic Category to specific Geographic roles. Hover over the State name, click the down-arrows, and in the "Classification" drop-down menu, choose "Geography". In the "Edit Geography Item" dialog, select "US State Names" in the "Name or code context" drop-down menu. Click OK, and State is now defined as a geographic field representing US state names. State will now appear in a new "Geography" section in the list of data items. Follow a similar set of steps for County, but choose "County or Region Names" rather than "US State Names" for "Name or code context" menu.

A map showing average % smokers by state

The most common reason to distinguish between dimensions and measures is because of how they act in a visualization. In general, selecting a category creates an axis of individual categories headings. Selecting a measure creates an axis showing a continuous scale over the measure's range of values. You can try this:

Click on the "Objects" icon in the menu on the left pane, and drag the "Bar Chart" item into the data canvas to create a placeholder bar chart.
Click on the "Roles" icon in the menu on the right pane, and choose "State" for Category and "% smokers" for measure.
Right-click on the "State" label for the vertical axis and choose "Sort → State: Ascending" to sort by state name rather than median % smokers.
Click on the "Options" icon in the menu on the right pane, and in the "Bar" drop-down section, choose the "Direction: Up" button (the button on the right).
Further down in the "Bar" section, click the "Data labels" checkbox.
In the "X Axis Options" section, click the "Rotate value label" checkbox.

A vertical bar graph showing average % smokers by state

Now, you should see a horizontal bar chart showing the median of "% Smokers" for each of the individual states. To switch to average percentage of smokers, click the "Data" icon, hover over the "% smokers" measure, click the down arrows, and under the "Aggregation:" label click the drop down menu and choose "Average." The graph will update to show average % smokers per state.

Field Type

Each field also has a small icon to its left. This identifies its current data type: an "M"-like icon for category, a globe for geography, and a ruler rotated 45° for measure and aggregated measure. Calculated fields add a small calculator icon superimposed on top of the data type.

Dot Map

To start our example visualization, we will generate a dot map of the adult obesity rate for each state.

If the canvas has visualization(s) in it, clear them by clicking on the visualization, then pressing "Del" to clear the canvas.
Ensure "State" is a geography data type by clicking the Data icon in the left pane, hovering over the "State" Category and clicking the down arrows, changing the Classification to "Geography," then choosing "US State Names" in the "Name or code context:" field.
Click the "Objects" icon in the left pane, and drag "Geo Coordinate" onto the canvas.
Click on the "Roles" icon in the right pane and map the following data fields: Category → State – 51; Size → Adult obesity rate (do this by clicking on the Size label and choosing Adult obesity rate to replace anything it is currently assigned to.)
You should see a map with one blue dot in the center of each state throughout the United States.
By default, VA generate a Bubble map. If this did not happen, click on the "Options" icon in the right pane, open the "Coordinate" section, and choose the "Bubbles" button in the "Data layer render type:" option.
By default, VA aggregates multiple data points for a given region (in our case, for a given state) by summing. We want to average the adult obesity rate, so click on the "Data" icon in the left pane, hover over Adult obesity rate, click the down arrows, and under the "Aggregation:" drop down menu, choose "Average."
We also want to colour the bubbles in a double-ended fashion based on Adult obesity rate. Click on the "Rules" icon in the right pane, then click "New rule." Choose "Adult obesity rate" from the list that appears, choose the operator "<" from the drop-down menu, enter a value of 28.5 (the median Adult obesity rate). Then, click on the "Style:" button and choose a shade of blue.
Create a second rule and perform the identical steps, except using an operator of ">=" and a style value that's a shade of red.
The final map shows blue and red circles sized based on the average adult obesity rate, and colored based on whether the state's average obesity is below (blue) or above (red) the overall median.
To rename the report page, double-click on the "Page 1" tab title, and change it to "Map."

A map showing average obesity rate by county using a red–blue colour scale: red for values above the median, blue for values below the median, and saturation for values farther from the median.

Scatterplots

Next, we'll create some scatterplots to compare adult obesity rate to other, possibly related measures.

Click the "+" in the Tab bar to create a new report page.
Double-click the tab's default title and change it to "Scatterplot."
Click on the "Objects" icon in the left pane, select "Scatter Plot" and drag it to the canvas, re-open the "Objects" list, select "Scatter Plot" again and drag it to the canvas, positioning it to the right of the first scatterplot. Do this one more time to add a third scatterplot.
Click on the leftmost scatterplot, click on the "Roles" icon in the right pane, and add "% smokers" and "Adult obesity rate" to the "Measures" list.
Click on the middle scatterplot, click on the "Roles" icon in the right pane, and add "% eating few fruits and vegetables" and "Adult obesity rate" to the "Measures" list.
Click on the middle scatterplot, click on the "Roles" icon in the right pane, choose % eating few fruits and vegetables and Adult obesity rate from the Data Items list, and click OK.
Click on the rightmost scatterplot, click on the "Roles" icon in the right pane, choose % who do not exercise and Adult obesity rate from the Data Items list, and click OK.
Using the Data panel, change the aggregation method from Sum to Average for % smokers, % eating few fruits and vegetables, and % who do not exercise.
For each scatterplot, select it, click the "Options" icon in the right pane, in the "Style" section click the upper-left colour square in the "Line/Marker:" section and choose a colour for the marks (green for the left scatterplot, purple for the middle scatterplot, and red for the right scatterplot). In the "Scatter Markers" section choose a "Transparency:" of 40%, in the "Fit Line" section choose a "Type:" of "Cubic", in the "X Axis Options" select "Fixed maximum:" and set the value to 100, and in the "Y Axis Options" select "Fixed maximum:" and set the value to 45.

Three scatterplots showing comparing average obesity rate by county to percentage of smokers, percentage eating few fruits and vegetables, and percentage who do not exercise.

There appears to be a strong correlation between adult obesity and all three potential predictor variables, although "% who do not exercise" seems to have the strongest impact on obesity rate (i.e., the steepest slope.)

Dual-Axis Line Graph

Often we want to compare variables across a sequence over time or a categorical dimension like county or state. Here, we'll build a line graph to represent Adult Obesity and % Smokers, to see if there's any additional visual correlation between the two.

Click the "+" in the Tab bar to create a new report page.
Double-click the tab's default title and change it to "Dual-Axis Graph."
Click on the "Objects" icon in the left pane, select "Dual Axis Line Chart" and drag it to the canvas.
Click on the "Roles" icon in the right pane and map the following data fields: Category → State - 51; Measure (line) → Adult obesity rate; Measure (line 2) → % smokers.
Right-click the "State" label for the horizontal axis and choose "Sort > State: Ascending."
Click on the "Options" icon in the right pane and change the following options:
1. For the first "Line/Marker:" button, change the colour to orange.
2. For the second "Line/Marker:" button, change the colour to blue.
3. In the "X Axis Options", select "Rotate value label."
4. In both the "Y Axis Options" and "Y2 Axis Options", select "Fixed minimum:" and "Fixed maximum:" and set values for both axes to 0 and 40, respectively; this ensures the lines span a common range of percentages.

Dual-axis line graphs comparing sum of obesity rate by state to sum of percentage of smoker.

At this point, we have a basic dual-axis graph, but there are a few of formatting modifications we can make to improve on VA's defaults.

Since we're aggregating across all counties in a state, we want the average obesity and smoker rates, not the sum of all counties, so ensure both Adult obesity rate and % smokers are aggregating as Average.
We'd like the left and right axes to show values with a "%" sign at the end. To do this, click the "Data" icon, hover over the Adult obesity rate measure, click the down arrows, click the pencil icon under "Format:" and change choose "Percent."
Do the same operation for the "% smokers" measure.

Dual-axis line graphs comparing sum of obesity rate by state to sum of percentage of smoker, but with incorrect percentages on the vertical axes.

Although this adds a "%" to the values on the vertical axis, it also exposes a problem: percentages run from 0% to 4,000%, because the "Adult obesity rate" and "% Smokers" columns are coded on a range 0–100, and not 0.0–1.0. You can confirm this by selecting "Manage Data" from the menu to the left of "SAS Visual Analytics" in the titlebar, choosing "USDA_ACTIVITY_DATASET" from the list of available datasets, then selecting the "Sample Data" tab. % smokers is the seventh column, and shows values like 20.87, 21.74, and 32.72. Switch back to "Explore and Visualize Data" once you're done.

To address this, we will create a new calculated field that spans the 0.0–1.0 range we want.

Choose the "Data" icon.
Select "Calculated item" from the "New data item" menu.
Choose a "Name:" of "% Adult Obesity 0.0-1.0"
Click the "Operators" tab, reveal the "Numeric (simple)" options, and drag "# x / y" to the "number" box in the canvas.
Right-click the first "Number" button, choose "Replace with" and select Adult obesity rate from the pop-up menu.
Click inside the second "number" field and enter 100.0.
Click "OK" to save the calculated item.

Notice now that there is a "% Adult Obesity 0.0-1.0" measure in the Data list, and the measure's data type ruler is covered by a calculator, indicating it is a calculated field. Execute the same set of operations to create a calculated field on the range 0–1 for "% smokers."

To fix the dual line graph, click the "Roles" icon, and replace the existing measures with "% Adult Obesity 0.0-1.0" and "% smokers 0.0-1.0" for "Measure (line)" and "Measure (line 2)", respectively. Unfortunately, VA resets back to its default values, so you must also re-do the following changes.

Right-click on "State" and choose "Sort > State: Ascending"
Open the "Data" panel, right-click on "% Adult Obesity 0.0-1.0" and choose "Format: → Percent" and "Aggregation: → Average"; apply the same formatting to "% smokers 0.0-1.0"
Open the "Options" panel, and change the "Fixed maximum:" for both the "Y Axis Options" and "Y2 Axis Options" to 40%, since VA has changed them to 4000%.

Once this is done, we have the same dual line graph, but with percentages correctly shown on the right vertical axis.

Dual-axis line graphs comparing sum of obesity rate by state to sum of percentage of smoker with correct percentages on the vertical axis.

VA provides a number of interactive widgets to allow you to control what is displayed within your visualizations. These act in a manner similar to Action Filters in Tableau: selections within the widgets automatically apply filters to one or more visualizations. Unlike Tableau, however, the widgets can simply be dropped and configured on a report, as opposed to building a separate "visualization" and linking it as an action filter in a Tableau dashboard. The interactive widgets included in VA include:

Button Bar. A list of buttons, whose number and labels are defined by a category field.
Drop-Down List. A drop-down menu, whose length and entries are defined by a category field.
List. A checkbox list whose length and entries are defined by a category field.
Slider. A double-ended slider whose minimum and maximum values are defined by a measure field.
Text Input. A text field where you can enter the name of a category field to filter the visualization to show only entries tied to the named field.

Viya VA also provides a number of containers, to control how your content is positioned and organized. These include:

Precision Container. Allows you to place, align, and size objects explicitly within a container. Objects can overlap.
Prompt Container.Used to group prompt controls within a report. Only prompt controls can be placed inside a prompt container.
Scrolling Container.Displays content in a scrolling layout. Each object fills the entire container, you must scroll to see other objects.
Stacking Container.Display objects as thought they are a slide deck. Only one object is displayed at a time. A control bar allows you to move between objects (slides).
Standard Container.Positions content horizontally or vertically. When possible, content is resized so multiple objects can be displayed without scrolling.

The concept of a "dashboard" in Viya VA is different than Tableau. Rather than having separate worksheets and dashboards, Viya VA assumes you will place a container on the canvas, build multiple visualizations directly within the container, then select and position them as you like based on the container's capabilities. As an example, let's use a standard container to create a dashboard that has the map we previously built on the top, and the three scatterplots and dual-axis line graph below that.

Unfortunately, SAS claims that as of the current version of VA, there is now way to copy content from one report page to another. You can, however, move content. You can do this with the following steps."

Click the "+" in the Tab bar to create a new report page.
Double-click the tab's default title and change it to "Dashboard."
Click the Map tab to select the map visualization.
Right-click inside the map visualization, and choose "Move to > Dashboard" to move the map visualization to the Dashboard page. It will disappear from the map page and re-appear on the Dashboard page.
Perform the same operation four times to move the three scatterplots and the dual-axis graph to the Dashboard page.
Select the Dashboard tab, and you should see all five visualizations. Rearrange them by dragging them and changing their display properties in the Options panel until you have a dashboard layout you are happy with.

Map, three scatterplots, and a dual-axis graph.

Next, we'll create a list of State names, then filter both the visualizations based on the state(s) selected by the user. To do this

Click on the "Objects" icon.
Choose "List" in the "Controls" section, and drag it to lie right of the map, scatterplots, and dual-axis graph.
Click on the "Roles" icon.
Choose "State" for the "Category" role.
Click on the "Actions" icon.
Here, we define actions based on selections in the list. We can use the selections as a filter for individual visualizations, or all the visualizations on the canvas. Since we want to filter all the visualizations, activate the checkbox for "Automatic actions on all objects," and confirm "One-way filters" in the drop-down menu. Also select "Display filter breadcrumb" so we can see the active filters on the visualization.
You can dynamically resize the width of the list by hovering over its left border and, clicking and holding the left button when the pointer changes to a left-right arrow, and dragging left or right.

Map and three scatterplots with a list widget controlling which state data to visualize.

Now, when you select a state in the list, that map will zoom to that state and the scatterplots will reduce to contain only samples from the selected state. You can select multiple states, or right-click and choose "Clear selection" to reset the filters.

Suppose we also wanted to add the ability to filter based on how individuals voted (D, R, or missing value). We will first convert the map to use choropleth filled regions. Then, we will add a button bar to filter by "Voting in 2008".

Click on the geomap to select it.
Click on the "Rules" icon, select each of the two colour rules, and click the trash can to delete them.
Right-click on the map, and choose "Change Geo coordinate to > Geo region."
Click the "Objects" icon on the left, Choose "Button bar" in the "Controls" section, and drag it to lie below the state list widget.
Click on the "Roles" icon.
Choose "Voting in 2008" for the Category role.
Click on the "Actions" icon.
Choose "Automatic action on all objects", a drop-down value of "Two-way filters", and ensure "Display filter breadcrumbs" is selected; two-way filters allows both the voter and state filters to be active at the same time.
Click on the "Options" icon.
In the Title: drop-down, choose Custom title and enter a title of "Select Voter Type."
In the Button Bar section, under "Direction" click the vertical direction button (the second button).

Map and three scatterplots with a list widget and a button bar controlling which state data and party selected in the 2008 election to visualize (Republicans from Arizona and California).

Filters are applied in a Boolean AND fashion, so the map above shows individuals who live in California and Arizona, and who voted Democrat in the 2008 election.

Analytics

One of Viya VA's major advantages over other visualization tools like Tableau is its built-in analytics objects. Because VA is integrated directly into Viya, it has access to all of the statistical power of Viya. VA provides a number of analytic, visual statistics, and visual data mining and machine learning objects that can be applied directly to the data in your visualization report.

As an example, suppose we wanted to use a decision tree to see how well % eating few fruits and vegetables, % smokers, and % Low-income receiving SNAP predict Adult obesity rate. To do this

Click the "+" icon at the top of the report to create a new report page.
Click the "Objects" icon.
In the "SAS Visual Statistics" section, locate "Decision Tree" and drag it onto the new canvas.
Click the "Roles" icon.
Under "Response", click "Add" and choose % Adult Obesity 0.0-1.0.
Under "Predictors", click "Add" and choose % eating few fruits and vegetables, % Low-Income receiving SNAP, and % smokers.

Decision tree, variable importance, assessment line graph, and split frequency for a decision tree predicting Adult obesity rate from % eating few fruits and vegetables, % Low-income receiving SNAP, and % smokers

VA computes results and displays them as four visualizations, clockwise from the upper-left: a tree identifying splitting parameters at each internal level, and rules describing a leaf as well as the number of samples that fall into the leaf at the bottom of the tree; a variable importance graph, an assessment line graph showing how predicted and observed average correlate, and finally a treemap that shows how much of the sample data flows left and right at each split in the tree.