This SAS Quick Start tutorial introduces you to how to explore metrics and metadata for the information assets in your environment. You learn to search for and review assets.
Video Outline
0:00 – Intro
1:19 – Running a Discovery Agent
2:11 – Performing a free text search
2:41 – Performing a faceted search
3:37 – Reviewing table metrics
6:44 – Exploring column metrics
9:08 – Reviewing information privacy status
10:06 – Exploring natural language data
10:52 – Summary
Additional Resources
◉ SAS Viya Platform Learn & Support – https://support.sas.com/en/software/sas-viya-platform.html
◉ SAS Information Catalog Learn & Support – https://support.sas.com/en/software/information-catalog-support.html
◉ SAS Data Engineering Learning Subscription – https://learn.sas.com/totara/program/view.php?id=22
SUBSCRIBE TO THE SAS USERS YOUTUBE CHANNEL #SASUsers #LearnSAS
https://www.youtube.com/SASUsers?sub_confirmation=1
ABOUT SAS
SAS is a trusted analytics powerhouse for organizations seeking immediate value from their data. A deep bench of analytics solutions and broad industry knowledge keep our customers coming back and feeling confident. With SAS®, you can discover insights from your data and make sense of it all. Identify what’s working and fix what isn’t. Make more intelligent decisions. And drive relevant change.
CONNECT WITH SAS
SAS ► https://www.sas.com
SAS Customer Support ► https://support.sas.com
SAS Communities ► https://communities.sas.com
SAS Analytics Explorers ► https://explorers.sas.com
Facebook ► https://www.facebook.com/SASsoftware
Twitter ► https://www.twitter.com/SASsoftware
LinkedIn ► https://www.linkedin.com/company/sas
Blogs ► https://blogs.sas.com
RSS ► https://www.sas.com/rss
Let's get started with
SAS Information Catalog. If you have access to
Information Catalog, I encourage you to
follow along with me. Start by visiting our
SAS Software Github page. The SAS Viya Quick
Start Repository provides the files used
in our demonstrations and simple instructions to load
them into your environment. To open SAS Information Catalog,
click the Applications menu and select Discover
Information Assets. SAS Information Catalog
lets you create and maintain an inventory for your
in
formation assets. This catalog gives
you the ability to ingest, integrate, and
enrich metadata for the assets that are distributed
across your enterprise. You can then use this
metadata to find and explore the relevant assets
that you need. It also enables
data administrators to review data usage from
a single point of access. Below the search field,
the number and type of each cataloged
asset is shown. You can see that
in my environment I have many different types
of resources, including data s
ets, files, and lookup tables. Information Catalog also curates
SAS assets, such as reports, SAS Studio flows, and models. As you create objects on
the SAS Viya platform, SAS Information Catalog
collects basic metadata. Your administrator
can run a process called a discovery
agent on a SAS library or caslib to produce
additional metrics and metadata for the
tables in those libraries. If you have permission, you
can run a discovery agent. To catalog additional
assets, click New. Select the Sample
s caslib and
click New discovery agent. There are several options
available to customize in the Properties and
Configuration tabs. I'll go with the default
settings then click Run now > Continue. The agent will take
several minutes to catalog all the
assets in Samples. Once the agent is
complete and successful, return to the Catalog window. Notice the asset numbers
have been updated. Now I can search for various
assets in the catalog. I'll enter the string,
Water, in the Search field and click S
tart search. This is called a
free text search, which examines common metadata,
like column names, table names, and descriptions. The search returns a
report and two CAS tables, each with water in the name. What if I search for property? WATER_CLUSTER is returned
because it contains a column named Property. You can write a
more specific query with what is called
a faceted search. You can enter a specific
metadata label and the value that you want to search for. Numeric metrics such as
row count
or file size must use a faceted search. I want to see assets
from the Samples caslib that I cataloged earlier. To do this, I'll click
in the search box and select Library.name:. I am then prompted with
available libraries, so I'll choose Samples. Notice the prompt includes
several other attributes on which we can search. For a full list of facets
and other information about the syntax of
searches, I invite you to review the
SAS documentation. The search returns
seven CAS tables, but keep in mind
that the search may include other files, such as
Microsoft Excel, CSV, or SAS tables. In SAS Information Catalog,
you can analyze any file type, database, or SAS table that
is accessible via a SAS library or a caslib. Let's search for the
simple keyword, Home. I want to examine the
HOME_EQUITY table more closely. This table will be the focus of
our journey though the SAS Viya applications. I'll click the HOME_EQUITY
table name to open its details. Because the HOME_EQUITY table
was created in SA
S Viya, SAS Information Catalog
knows it exists, but there are no metrics
because the table hasn't been analyzed yet. Your administrator may analyze
entire libraries or caslibs on a scheduled basis,
but you can analyze individual tables as necessary. I'll click Analyze data. A warning appears,
saying that the analysis will happen in the background
but could take a while. Click Submit request. When the analysis is
complete, the text at the top of the
screen will change to A new analysis is now
av
ailable for this asset. Click View the analysis. Across the top of the display
we see key table metrics, including Completeness,
which represents the percentage of
non-missing values, the number of rows and
columns, and file size. Depending on my
permissions, I may also have the ability to assign
Status, which can be Review, Approved, Flagged or Warning. In the panel on the right, I
can assign contacts or tags to assist with searching. Let's examine the Overview tab. The left pane provides
inter
esting high-level information about the table,
including the information privacy, time period
covered, top areas covered, and top languages. These analyses are done with
algorithms from the SAS Quality Knowledge Base. As an example, I'll click I
next to Top Areas Covered. Top Areas Covered shows the
most frequent geographic regions covered by the data. In this example, there
are states and cities. A similar analysis is done
for the Information Privacy, only the columns are classified
by how sens
itive the data is. Beneath these
attributes, there's an auto-generated
summary of the data, listing interesting information
such as important columns, the presence of outliers, and
the frequency of observations. Let's look more closely
at Semantic Type. Semantic Type identifies
the classification of data that is likely to
be contained in the column. Information Catalog utilizes
the SAS Quality Knowledge base to assign an
appropriate semantic type, but you can modify the
type for each column. For
example, Region was
assigned a semantic type of State/province. I'll select Choose other, then
type Region and select it, then click OK. I can select Search
semantic type to find other tables
with the same value, or filter based on a
particular semantic type in a faceted search. I'll click the
Column Analysis tab. All the columns in the
table have three subtabs that display different
sets of metrics. In the Descriptive
Measures table, which is shown by
default, you can see standard numeric metr
ics
like mean, median, and standard deviation. The first row in
the table contains metrics for the BAD column. This column contains 2
distinct values, 0 and 1. 0 represents a loan
in good standing, and 1 represents
a defaulted load. In the Frequency
distribution chart, I can see approximately
how many 0 and 1 values occur in the data. Our ultimate goal is to
build a predictive model to anticipate loan status
based on the other columns in the table. The blue icon next to
several column names indi
cates that the
column contains outliers. If I select the MORTDUE
row, helpful distribution visualizations and
metrics are provided. I'll click Metadata Measures. Here I can examine column
length, type, and label and additional helpful
information such as Primary Key Candidate and
Information Privacy. Finally, I'll click the
Data Quality Measures table. I can view other data quality
metrics, like pattern count, uniqueness, and completeness. I'll scroll down to review
the most common and least com
mon values for CITY. It looks like the
city names might be in all lowercase, which
won't look good in a report. I'll click the CITY column's
name to investigate further. Now, I can see metrics
for only the CITY column. To the right is a
pattern frequency chart. This shows the frequency of
patterns in the column, where a capital A is any capital
letter, a lowercase a is any lowercase letter,
and a 9 is any number. Looking at the pattern
frequency values, it looks like every pattern
is in all lowe
rcase. After identifying the
data quality issue, I can use SAS Studio later to
make any necessary corrections. I can also use the
Actions menu to analyze this table in another phase
of the analytics life cycle, such as Explore and Visualize
to open SAS Visual Analytics or Build Model to
open SAS Model Studio. I'll change the status
of the table to Flagged and add a note that the casing
of City should be corrected. Let's look at another table that
has more private information. I'll click New Sear
ch
and type Customer. I'll analyze the table and
wait for Information Catalog to generate the metadata. I'll start with a peek
at the Sample Data. Notice this table includes
columns such as Name, Gender, and BirthDate. Back on the Overview tab,
the Information Privacy flag has been set to Sensitive. The Information
Privacy field indicates whether a column contains
potentially private information that could be linked
to an individual. Possible values are Candidate,
Private, and Sensitive. Blank v
alues indicate that the
column has not been identified as potentially private data. SAS Information Catalog also
provides helpful insights for more complex
natural language data. In the Overview tab, it
indicates English and Spanish are the most common languages. Let's examine the Sample Data
tab and the Verbatims column. This column includes
free-form customer comments. If I scroll down, I can
see that not all comments are in English. To explore these comments, I can
visit the Column Analysis t
ab and choose Verbatims. SAS Information Catalog
identifies Verbatims as a long-text column,
and therefore provides some unique metrics. There is a word cloud with the
most frequent values displayed in larger text. The languages for the column
are presented in a chart. There is also a
Sentiment Analysis chart that estimates the
mood of the text and is useful for things like
customer comments or product reviews. I hope that exploring
SAS Information Catalog has inspired you
to think about how it
can help you monitor
and evaluate assets in your own environment. And if you'd like
to learn more, we encourage you to visit your
SAS Learning Center and the SAS documentation.
Comments