Main

Discover Information Assets with SAS Information Catalog | SAS Viya Quick Start Tutorial

This SAS Quick Start tutorial introduces you to how to explore metrics and metadata for the information assets in your environment. You learn to search for and review assets. Video Outline 0:00 – Intro 1:19 – Running a Discovery Agent 2:11 – Performing a free text search 2:41 – Performing a faceted search 3:37 – Reviewing table metrics 6:44 – Exploring column metrics 9:08 – Reviewing information privacy status 10:06 – Exploring natural language data 10:52 – Summary Additional Resources ◉ SAS Viya Platform Learn & Support – https://support.sas.com/en/software/sas-viya-platform.html ◉ SAS Information Catalog Learn & Support – https://support.sas.com/en/software/information-catalog-support.html ◉ SAS Data Engineering Learning Subscription – https://learn.sas.com/totara/program/view.php?id=22 SUBSCRIBE TO THE SAS USERS YOUTUBE CHANNEL #SASUsers #LearnSAS https://www.youtube.com/SASUsers?sub_confirmation=1 ABOUT SAS SAS is a trusted analytics powerhouse for organizations seeking immediate value from their data. A deep bench of analytics solutions and broad industry knowledge keep our customers coming back and feeling confident. With SAS®, you can discover insights from your data and make sense of it all. Identify what’s working and fix what isn’t. Make more intelligent decisions. And drive relevant change. CONNECT WITH SAS SAS ► https://www.sas.com SAS Customer Support ► https://support.sas.com SAS Communities ► https://communities.sas.com SAS Analytics Explorers ► https://explorers.sas.com Facebook ► https://www.facebook.com/SASsoftware Twitter ► https://www.twitter.com/SASsoftware LinkedIn ► https://www.linkedin.com/company/sas Blogs ► https://blogs.sas.com RSS ► https://www.sas.com/rss

SAS Users

2 days ago

Let's get started with SAS Information Catalog. If you have access to Information Catalog, I encourage you to follow along with me. Start by visiting our SAS Software Github page. The SAS Viya Quick Start Repository provides the files used in our demonstrations and simple instructions to load them into your environment. To open SAS Information Catalog, click the Applications menu and select Discover Information Assets. SAS Information Catalog lets you create and maintain an inventory for your in
formation assets. This catalog gives you the ability to ingest, integrate, and enrich metadata for the assets that are distributed across your enterprise. You can then use this metadata to find and explore the relevant assets that you need. It also enables data administrators to review data usage from a single point of access. Below the search field, the number and type of each cataloged asset is shown. You can see that in my environment I have many different types of resources, including data s
ets, files, and lookup tables. Information Catalog also curates SAS assets, such as reports, SAS Studio flows, and models. As you create objects on the SAS Viya platform, SAS Information Catalog collects basic metadata. Your administrator can run a process called a discovery agent on a SAS library or caslib to produce additional metrics and metadata for the tables in those libraries. If you have permission, you can run a discovery agent. To catalog additional assets, click New. Select the Sample
s caslib and click New discovery agent. There are several options available to customize in the Properties and Configuration tabs. I'll go with the default settings then click Run now > Continue. The agent will take several minutes to catalog all the assets in Samples. Once the agent is complete and successful, return to the Catalog window. Notice the asset numbers have been updated. Now I can search for various assets in the catalog. I'll enter the string, Water, in the Search field and click S
tart search. This is called a free text search, which examines common metadata, like column names, table names, and descriptions. The search returns a report and two CAS tables, each with water in the name. What if I search for property? WATER_CLUSTER is returned because it contains a column named Property. You can write a more specific query with what is called a faceted search. You can enter a specific metadata label and the value that you want to search for. Numeric metrics such as row count
or file size must use a faceted search. I want to see assets from the Samples caslib that I cataloged earlier. To do this, I'll click in the search box and select Library.name:. I am then prompted with available libraries, so I'll choose Samples. Notice the prompt includes several other attributes on which we can search. For a full list of facets and other information about the syntax of searches, I invite you to review the SAS documentation. The search returns seven CAS tables, but keep in mind
that the search may include other files, such as Microsoft Excel, CSV, or SAS tables. In SAS Information Catalog, you can analyze any file type, database, or SAS table that is accessible via a SAS library or a caslib. Let's search for the simple keyword, Home. I want to examine the HOME_EQUITY table more closely. This table will be the focus of our journey though the SAS Viya applications. I'll click the HOME_EQUITY table name to open its details. Because the HOME_EQUITY table was created in SA
S Viya, SAS Information Catalog knows it exists, but there are no metrics because the table hasn't been analyzed yet. Your administrator may analyze entire libraries or caslibs on a scheduled basis, but you can analyze individual tables as necessary. I'll click Analyze data. A warning appears, saying that the analysis will happen in the background but could take a while. Click Submit request. When the analysis is complete, the text at the top of the screen will change to A new analysis is now av
ailable for this asset. Click View the analysis. Across the top of the display we see key table metrics, including Completeness, which represents the percentage of non-missing values, the number of rows and columns, and file size. Depending on my permissions, I may also have the ability to assign Status, which can be Review, Approved, Flagged or Warning. In the panel on the right, I can assign contacts or tags to assist with searching. Let's examine the Overview tab. The left pane provides inter
esting high-level information about the table, including the information privacy, time period covered, top areas covered, and top languages. These analyses are done with algorithms from the SAS Quality Knowledge Base. As an example, I'll click I next to Top Areas Covered. Top Areas Covered shows the most frequent geographic regions covered by the data. In this example, there are states and cities. A similar analysis is done for the Information Privacy, only the columns are classified by how sens
itive the data is. Beneath these attributes, there's an auto-generated summary of the data, listing interesting information such as important columns, the presence of outliers, and the frequency of observations. Let's look more closely at Semantic Type. Semantic Type identifies the classification of data that is likely to be contained in the column. Information Catalog utilizes the SAS Quality Knowledge base to assign an appropriate semantic type, but you can modify the type for each column. For
example, Region was assigned a semantic type of State/province. I'll select Choose other, then type Region and select it, then click OK. I can select Search semantic type to find other tables with the same value, or filter based on a particular semantic type in a faceted search. I'll click the Column Analysis tab. All the columns in the table have three subtabs that display different sets of metrics. In the Descriptive Measures table, which is shown by default, you can see standard numeric metr
ics like mean, median, and standard deviation. The first row in the table contains metrics for the BAD column. This column contains 2 distinct values, 0 and 1. 0 represents a loan in good standing, and 1 represents a defaulted load. In the Frequency distribution chart, I can see approximately how many 0 and 1 values occur in the data. Our ultimate goal is to build a predictive model to anticipate loan status based on the other columns in the table. The blue icon next to several column names indi
cates that the column contains outliers. If I select the MORTDUE row, helpful distribution visualizations and metrics are provided. I'll click Metadata Measures. Here I can examine column length, type, and label and additional helpful information such as Primary Key Candidate and Information Privacy. Finally, I'll click the Data Quality Measures table. I can view other data quality metrics, like pattern count, uniqueness, and completeness. I'll scroll down to review the most common and least com
mon values for CITY. It looks like the city names might be in all lowercase, which won't look good in a report. I'll click the CITY column's name to investigate further. Now, I can see metrics for only the CITY column. To the right is a pattern frequency chart. This shows the frequency of patterns in the column, where a capital A is any capital letter, a lowercase a is any lowercase letter, and a 9 is any number. Looking at the pattern frequency values, it looks like every pattern is in all lowe
rcase. After identifying the data quality issue, I can use SAS Studio later to make any necessary corrections. I can also use the Actions menu to analyze this table in another phase of the analytics life cycle, such as Explore and Visualize to open SAS Visual Analytics or Build Model to open SAS Model Studio. I'll change the status of the table to Flagged and add a note that the casing of City should be corrected. Let's look at another table that has more private information. I'll click New Sear
ch and type Customer. I'll analyze the table and wait for Information Catalog to generate the metadata. I'll start with a peek at the Sample Data. Notice this table includes columns such as Name, Gender, and BirthDate. Back on the Overview tab, the Information Privacy flag has been set to Sensitive. The Information Privacy field indicates whether a column contains potentially private information that could be linked to an individual. Possible values are Candidate, Private, and Sensitive. Blank v
alues indicate that the column has not been identified as potentially private data. SAS Information Catalog also provides helpful insights for more complex natural language data. In the Overview tab, it indicates English and Spanish are the most common languages. Let's examine the Sample Data tab and the Verbatims column. This column includes free-form customer comments. If I scroll down, I can see that not all comments are in English. To explore these comments, I can visit the Column Analysis t
ab and choose Verbatims. SAS Information Catalog identifies Verbatims as a long-text column, and therefore provides some unique metrics. There is a word cloud with the most frequent values displayed in larger text. The languages for the column are presented in a chart. There is also a Sentiment Analysis chart that estimates the mood of the text and is useful for things like customer comments or product reviews. I hope that exploring SAS Information Catalog has inspired you to think about how it
can help you monitor and evaluate assets in your own environment. And if you'd like to learn more, we encourage you to visit your SAS Learning Center and the SAS documentation.

Comments