(soft music) - Yooo!
- Yooo! - What's up? I'm Adam.
- And I'm Patrick. - We had a great day one of the Microsoft Fabric
Launch digital event. There were a lot of great sessions that we looked at yesterday. Arun and Amir helped us
understand what Microsoft Fabric was all about. We saw items on co-pilot
and OneLake bringing all that data together in one spot. We saw about Data Factory
and data warehousing and also how you can get
your Microsoft 365 data in. Amazing. So much.
- It was an exciting d
ay. But we're going to kick today off. we're going to kick
today off with something that we haven't done so
far at Microsoft, right? And not intuitively. We're kicking the day off with Justina. Justyna's going to show
us how we can use Spark to create this beautiful Lakehouse animal. - Oh, I like Lakehouse. - Hello, my name is Justyna Lucznik and I'm a group product manager for Synapse to Engineering and
Synapse to Science and Fabric. Today I'm really excited to
share with you our strategy and r
oadmap for Spark in data engineering. Let's get started with a quick
recap of Microsoft Fabric that was announced at Built Today. Fabric is Microsoft's next
generation data platform that brings together all
the different experiences required to build your
end-to-end data project, starting from ingestion all
the way through to reporting. Today we'll do a deep dive
into the data engineering workload and see what Fabric has to offer. Our aim is to empower every
data engineer to be able to transform
their data
at scale using Spark, and build out their
Lakehouse architecture. To achieve this goal, we're going to be talking about
four key product experiences. Firstly, we'll talk about the Lakehouse
and show you how easily you can get started bringing together all your organizational
data and sharing it out with the business for consumption. Secondly, we'll introduce
you to the Spark Engine, which is the backbone of the
data engineering experience. We'll talk about the Spark Runtime, various
performance improvements, and the robust admin controls for managing your Spark workload. Next, we'll look at the data
developer authoring experience that lets you explore your data, write and operationalize your Spark code. We'll highlight the features
and benefits of Notebooks or VS code extension as well as the Sparks Shop definition item to run your Spark applications. Finally, we'll touch on how all
these different capabilities are integrated into the platform, meaning you always have a
sea
mless experience in areas like monitoring, CICD,
governance, automation, and more. Let's get started with the Lakehouse and its role into data
engineering workflow. The Lakehouse is a new item
and Fabric which combines the best of the lake and warehouse
in a single experience. With the Lakehouse, we strive to remove all the
friction from ingesting, preparing, ensuring organizational data in the lake in an open format. The power of the Lakehouse is
that once data lands inside, tables are automati
cally generated, which can be read by
Spark, SQL and Power BI. In fact, every Lakehouse comes equipped with a sequel endpoint that provides data
warehousing capabilities, including the ability
to run TS SQL queries, create view and defined functions. Every Lakehouse also comes
with a semantic data set, enabling VI users to
build reports directly on top of the Lakehouse data. All of these different users
can therefore collaborate on top of the same data stored in the lake with no data movement ne
cessary. From a data integration perspective, the Lakehouse gives users a
variety of ingestion options. Users can start out with
very simple workflows like uploading files directly
from their local machine. Data engineers who work in
low-code tools can leverage data flows with as hundreds of
connectors to output data into the Lakehouse. For copying petabyte sized lakes, users can also leverage pipelines with the copy activity. Behind the scenes, all the data lands in
OneLake, the unify data lake
that comes prewired to
every Fabric workspace. However, users don't have to
copy the data into OneLake to be able to leverage
it inside the Lakehouse. They can also use shortcuts
to point to existing data elsewhere in Fabric or even
in external storage accounts. By creating a shortcut, data shows up in the Lakehouse
either as a file or a table, despite a fact that's
physically elsewhere, such as an ADS Gen2 storage account or even an S3 bucket. Coming soon is the ability
to apply file folder as
well as table security in the Lakehouse. With One Security, once
permissions are applied, they're automatically
synchronized across all engines, meaning that permissions
will be uniform across Spark, SQL and Power BI. Finally, shortly after public preview, the Lakehouse can be shared
as a data product for consumption by the entire
business and users who want to use the Lakehouse for reporting
or data science can easily discover all the Lakehouses
they have access to inside the OneLake data hub.
This is the Fabric data
discovery port tool for all data items. Now that we have a better
end-to-end view of the Lakehouse, let's take a look at how someone
can leverage the Lakehouse for their own data project. Today we're going to take a
look at how data engineers can leverage Fabric to build out
a Lakehouse architecture. In this scenario, I'd like to build a Lakehouse
for my organizational marketing data to share with the business. I'm going to start out by
creating a new Lakehouse artifact,
going to give it a name
and then immediately land in the MT Lakehouse Explorer. The Lakehouse is a new experience
that combines the power of the lake and warehouse and
is a central repository for all Fabric data. I have a variety of options to bring data into the Lakehouse. I can simply upload files and
folders from my local machine. I can use data flows, which is a low-code tool
with hundreds of connectors, or I can leverage the
pipeline copy activity to bring in petabytes of data at scale. Sw
ay marketing data is in the
lakehouse Delta tables are automatically created for me. With no additional effort, I can easily explore the
tables, cedar schema, and even underlying files. I would also like to add some
unstructured customer reviews to accompany my campaign data. Since data already exists in storage, I can simply point to it with
no data movement necessary. To do this, I'm going
to add a new shortcut, which allows me to create a
virtual table and virtual files inside my lakehouse. S
hortcuts enables me to select from lots of different sources, including Lakehouses and
the warehouses in Fabric, but also external storage like ADS Gen2, and even Amazon S3. Since my customer reviews
are actually an S3, all I have to do is select it as a source, specify the data's location, and populate all of my
account information. On next screen, I can
give my shortcut a name and that's it in terms of setup. Within seconds, I can see a shortcut
created in the file section, which is the messy,
unstructured daily lake
portion of the Lakehouse. I can now explore the
data in the Lakehouse and even open up the PDS despite the data still physically being in S3. Now that all my data is
ready in the Lakehouse, there are many ways for me to use it. As a data engineer or data scientist, I can open up the Lakehouse in a Notebook and leverage Spark to
continue transforming the data or build a machine learning model. As a sequel professional, I can seamlessly navigate
to the SQL endpoint of the
lakehouse where
I can write SQL queries, creative views and functions all on top of the same Delta tables. As I write a quick SQL query, I get results back
instantly without needing to move any data. Finally, as a business analyst, I can simply navigate to
the built-in modeling view and start developing my
BI data model directly in the same warehouse experience. After adding relationships
and measures to my data, I can generate a Power BI
report in a single click. As I build out my report, I get
amazing performance thanks to Power BI Direct Lake Mode. With Direct Lake mode, Power BI can natively read
the par key Delta format stored in OneLake, meaning again, no data was duplicated in the process. To conclude, in Fabric data
engineers have a frictionless experience building out their
enterprise Data Lakehouse and can easily democratize this data for all users in the organization. Next, we'll focus on some of the key features and
performance enhancements that Fabric provides for
running
your Spark workloads with little friction in
an end performance manner. One of the most important
aspects is the Spark Runtime. The runtime is pre-wired Fabric workspaces and contains an optimized
distribution of Spark, its dependencies and other key libraries. In the 1.1 runtime, we're including major updates such as upgrading Spark
to 3.3.1, delta to 2.2 and Python to 3.10. A key feature of the runtime is the integration with Delta Lake, which is the OpenTable storage format that Fabric has st
andardized on. Delta is what enables customers to work with a single copy of
data across all of Fabric. Since all the engines have
standardized on a single format, sharing data is completely seamless. Furthermore, all Fabric
engines write Delta with V order meaning data
is automatically optimized for Power BI reporting. Finally, for those users
who don't use Delta today, you can also use the low to
Delta feature in the Lakehouse to convert common file formats
and folders to Delta with just a few
clicks. This will allow you to
easily leverage the benefits of Delta Lake for your
existing data as well. To ensure customers have
a performed experience, we have also built various
Spark optimizations into our runtime. These optimizations are
designed to enhance your query performance by default without needing to do any configurations. One example is partition caching, which stores filtered
partition information in a session level cash. This means less calls have
to be made to the meta store,
saving you time and cost. Another example is
merging scaler subqueries into a single plan, which again reduces the computation time. An exciting upcoming
capability that also focuses on performance by default is autotune. Autotune uses machine learning
to automatically analyze previous runs of your
smart jobs and chooses the configurations to
optimize the performance. It'll configure how
your data's partitioned, joined and read by Spark, which can have a significant
impact on performance. In fa
ct, we have seen customer
jobs running two times faster with this capability enabled. We're committed to improving
Spark Sessions startup times, and so we're also introducing
starter pools in Fabric. Starter pools have default configurations and come pre-wired to
your Fabric Workspace, meaning you don't have to
do anything to set them up. These pools are kept live, meaning need to provide
users with a Spark session within 10 to 15 seconds
of running a Notebook resulting in instant user productiv
ity. Another common customer ask is the ability to reuse Spark sessions
across multiple Notebooks. With the new high concurrency mode coming soon after public preview, this will now be a possibility. From inside a Notebook, users
can choose an existing session to attach to resulting in
lightning fast startup times and lower costs. Coming later this year is a supportive high concurrency in pipelines, which will allow you to run
multiple Notebooks in a pipeline within a single session. Now we have
talked about some of the performance enhancements
customers can expect. Let's talk about what sorts of controls and configurations are available. Firstly, assuming they haven't granted irrelevant permissions, workspace admins will have
the flexibility of creating custom pools and setting them as the workspace they've pulled. They'll be able to configure
things like the node size, number of nodes and executors
as well as autoscale. We're also excited to announce
that custom pools who start all t
he way from a single
node enabling customers to efficiently run small
or test Spark jobs, this is a great cost effective option for lightweight workloads. Coming later in the year, workspace admins will also
have the opportunity to keep these custom clusters live, which will result in
accelerated session start times. Another important aspect of
configuring your workload is library management. Workspace admins can install
public and custom libraries to the default pool in the workspace. Admins wi
ll also be able
to set the default runtime and configure their Spark properties. All Notebooks and Spark jobs will inherit these libraries runtime and settings without need to manage things on an artifact by artifact basis. An upcoming capability that
will help users better govern their workloads is policy management. Admins will be able to author policies based on Spark properties
to enforce certain rules or restrictions on their workloads, which cannot be overridden, ensuring consistency and c
ompliance. Whilst we want to streamline the process by having default experiences defined at the workspace level, we recognize customers need
the flexibility of customizing things at a more granular level too. Later this year, we are there for introducing
a new environment item to give users the
customizability they need. In an environment, users will
be able to install libraries, choosing configure a new pool, set their Spark properties
and upload scripts to a file system. Environments can be a
ttached
to individual Notebooks and Spark jobs, giving users the
customizability that they need. Let's take a closer look at
some of the Spark Management experiences in the following demo. Let's take a look at how an
administrator can configure a Spark environment for
their data engineers. I'm starting out in the
capacity admin portal where I can now access
the Spark compute settings for data engineers and data scientists. Opening up the Spark compute option, I can set a default runtime
and defa
ult Spark properties. I can also turn on the
ability for workspace admins to configure their own custom Spark pools. I'm now going to navigate
to the marketing workspace, which I'm getting ready
for my data engineers. The workspace comes pre-wired
with a default Spark pool that all Notebooks and
Spark jobs inherit from. We can view it, modify the pool by navigating to the workspace settings and drilling into the data
engineering and data science tab. Here I can modify things
like the default lib
raries that come with the workspace. For example, I can search
for the word cloud library and choose the version that I want. I can also add libraries from
Condo and from Yamo files or upload custom ones directly. Navigating to the Spark compute settings, I can see that my workspace automatically comes with a default starter pool and I have full transparency
of all the pool details. Without needing to set anything up, all Notebooks and Spark jobs can leverage a startup pool to run their jobs. In
this case, I would actually like to run
some small test workloads, and so I'm going to
create a new default pool. I'm going to give the pool a name and select a small node
size and turn autoscale off. I can now set my Spark pool to always run with a single node for my test workloads. Finally, I'm going to reduce
my executor upper limit and create the pool. Our workspace admin also has
the ability to change the default runtime and
modify Spark properties. Now that everything has been set up, I c
an save my workspace settings. Any Notebooks I create
will now automatically use a single note Spark along of
the selected runtime libraries and Spark properties. We want to ensure developers
have a great authoring experience when they work in Fabric. Whether you're a data
engineer or a data scientist, you can use Notebooks, Spark jobs, or work in your IT of choice. Let's take a look at some
of of the capabilities you'll be able to leverage. Our primary authoring
experience is the Notebook. Our
Notebooks natively
integrate with the Lakehouse, making it easy to browse your
data and drag and drop it into the Notebook cells. Users can easily collaborate
with others in real time. Whilst Notebook auto saves their work just like they're used to in office, Notebooks can be scheduled
or they can be added to a pipeline for more complex workflows, users who want to make
use of ad hoc libraries during recession will be able
to install popular Python in our libraries in
line leveraging commands li
ke PIP install. This is a quick and convenient
way for validating libraries during the development process. Developers can build modularized Notebooks that reference each other and
they can also track history through snapshots for
troubleshooting errors. Later this year, we're introducing a Notebook
resource folder where users can store dependencies and helper files like scripts and text. Users can easily upload local files for quick and easy use. Notebooks also provide
fully integrated Spark mo
nitoring experiences
inside the Notebook cells. We also have a unique feature
called the Spark Advisor, which analyzes your Spark executions and provides you with
real-time advice and guidance. For example, the Spark advisor can warn you
about things like data skew and provide you with
guidance and recommendations. Already available is Data Wrangler, a UI data prep experience built on top of Panda's data frames. Any low-code operations
carried out are automatically translated to code for
transpa
rency and reproducibility. Coming later this year, Data Wrangler will also
supports Spark scaling to bigger data volumes and will
integrate with OpenAI for transformations powered
by natural language. We also have various improvements coming from a usability perspective. Later this year, we'll introduce a revamped
data frame display with built-in summary statistics,
pre easier exploration. We're also working on Power
BI integration on top of your data frames as well as the
ability to browse and
add code snippets for common
data engineering activities. We're excited to announce
we're also working on adding native co-pilot support to
Notebooks through constructs like magic commands. Users will be able to
leverage these inside their Notebook cells to
chat about their data or get code generated in the Notebook. What's more, co-pilot
and Fabric is data aware, meaning it has full context about your lakehouse tables and schemas. This makes it really easy
to have co-pilot assist you with your
data engineering tasks as well as helping you
better understand, document and debug your code. Finally, another exciting
area of investment is utilizing Notebooks
not just for development but also storytelling. Later this year, users will be able to
embed their Notebooks alongside Power BI reports and dashboards inside Fabric apps, which can easily be
distributed to business users. Customers will be able to
interact with widgets and visuals in the Notebook as an
alternative reporting and data ex
ploration mechanism. Let's take a moment to take a look at the Notebook developer experience. In this demo, we'll dive deeper into a
data developer's authoring experience in Microsoft Fabric. In this project, I'm collaborating with my
colleagues in a predictive model built on top of the marketing
data in the Lakehouse. I can see Pira has the Notebook
open and I can view his code updates in real time inside the cell. To get started, I'm going to install an ML
library I need for my project. Thanks
to the built-in life pools, my Spark session starts
in a matter of seconds and I can immediately
start being productive. I can now drag and drop my
campaign table from my Lakehouse and a code snippet gets
generated for me immediately. As I rung my cell, I can leverage the inline
monitoring to monitor my Spark job and make sure everything
is running smoothly. I can get a preview of my data and use the built-in charting capabilities to explore things and
even adjust the charts around for better i
nsights. Next, I can use the
display summary function to get a quick overview
of the quality of my data, looking at data types, missing values and summary statistics. I can now leverage Spark to do some additional data cleansing. For example, getting rid
of the missing values. Users can also use custom libraries to explore their data further. In this case, go plot and box plots to
look at data distributions of call durations broke it up by job types and campaign outcomes. The Notebook has a
buil
t-in resource folder, which makes it easy to store scripts or other code files I
might need for the project. I'm going to drag and drop
the top feature selector Python script that my colleague created and I can get a quick overview
of the functions it supports. I can now use the top feature
selector function to identify the most important features for my model. I can lead my colleagues a
comment to the Notebook cell letting them know about my progress. All this time, my Notebook is getting auto
saved without any involvement needed from me. I'm now ready to train my
machine learning model. After some experimentation with a variety of different model types, I decide to use a logistic
regression for this project. Finally, I can plot an ROC curve to evaluate my model's performance and I can easily store it as an image in my Notebook research
folder so that my colleagues can easily check it out as well. To conclude, Fabric provides me with a rich developer experience enabling users to colla
borate, easily work
with their lead cast data and leverage the power of Spark. We know many developers are opinionated about the tools they use
and prefer working in IDEs. For this reason, we have invested in VS code integration as the first IDE we
natively integrate with, the users can easily launch
the VS code extension straight from inside the Notebook and seamlessly work with their Notebooks, Spark jobs, and Lakehouse. Users can run and debug their Notebooks either in full local mode or leve
raging the Fabric
Spark clusters remotely. We're also working on providing a fully remote way of
working with vsco.dev, which will be available later this year. In this mode, users can get started
with a browser experience with no setup and changes
are instantly reflected back in the service. Finally, customers who
want to work completely in their own environment
can leverage Fabric purely for submitting
their Spark applications. Using the Spark job definition, they can upload their existing JAR
files, tweak their Spark configurations, and add their Lakehouse
references to submit their jobs. Just like Notebooks, Spark job definitions come
complete with inline monitoring, scheduling and pipeline integration. Spark job definitions also have
the added advantage of being able to specify retry policies, which makes it possible to run
long running streaming jobs with no issues. Combined with Power BI
and directly connectivity, customers can get an end-to-end solution for near real-time repor
ting. This wraps up our developer
experience announcements. Let's take a look at the
pro developer experience before moving on to platform integrations. As a data engineer, I want to work with some
of the marketing data in my Lakehouse. Since I prefer working in IDEs, I can make use of the native
Notebook VS Code integration and open the IDE in a single click. I'm instantly navigated to VS
Code code and prompted about opening up the synapse code extension. My marketing Notebook is
automatically
downloaded, opened up and ready to use. I can easily browse all
the Notebooks, Spark jobs, and Lakehouses in my
workspace and interact with the Notebook I was previously
working on in the browser. I have the option of working
with my Notebook locally or I can easily connect the
remote Spark cluster in Fabric to leverage the Spark pools I'm already using in the service. I can now search my work in via VS Code and run my Notebook cells
to continue iterating on my project and can seamlessly
see the
output of my run. In this next step, I can add a break point to
my code and leverage all the great debugging capabilities
of this code for my project. As I debug the cell
and hit the breakpoint, I can work with my Notebooks
just like any other regular local Java or C-Sharp script. When I hit the next breakpoint, I can inspect my data frame
object in the local call stack on the side and see all
the columns, data types, schemas and more. I can also of course keep
working in the Notebook and addin
g my own code cells. In this case, let's go ahead and save our cleanse data as a new table in the Lakehouse. In the workspace view, we can navigate through
the Lakehouse's available, expand out the marketing
lakehouse and see all the tables we're able to work with. I'm going to run the code
saw and after it's done, let's refresh the workspace sheet. We can immediately see the new
cleanse campaign table appear as a new table in the marketing lakehouse. Now that I'm done making my changes, I can c
hoose to publish my
updating Notebook to Fabric. Navigating back to the Notebook, let's refresh the browser and we can see the new table appear in
our Lakehouse editor. Whilst the Notebook has been updated with the latest code changes. In Fabric, we strive
to give data developers the flexibility to work in
any tool that meets needs, whether it's our Notebooks, VS Code, earn completely external ID. In this final section, we'll talk about how data
engineering is deeply integrated into the Fabric p
latform. All Fabric workloads sit on
top of a shared foundational platform that creates
consistent experiences across governance, security, CICD, and much more. In this section, we'll deep dive into a few
of the platform integrations that are key for data engineers. Firstly, all Fabric items are integrated with enterprise information
management capabilities like lineage, sensitivity
labels and endorsements. You can discover your Lakehouses
in the OneLake data hub. You can apply sensitivity
label
s to your Notebooks. You can trace lineage of your Spark jobs. Another top of mind area is CICD. Users are able to connect
their workspace to Git repo and later this year
they'll be able to commit all their data engineering
items including Notebooks in their native file format
along with any source files. Users are also going to be able to leverage deployment pipelines to deploy their data engineering
items across dev, test and production workspaces. Either using the UI or
automating the process
through Azure pipelines. In addition to inline
monitoring experiences, Spark applications are
also going to be accessible through the monitoring hub, which is the centralized
Fabric monitoring portal. Users can get a bird's eye
view of all their items, but they can also drill down
to the details at a job level. Customers can also view related items like associated Notebooks and pipelines and also view snapshots of their Notebooks for easy troubleshooting. Finally, those who feel
at home in the
Spark UI, can also navigate directly to it as well as the Spark history server. For viewing native
Spark execution metrics. Admins can also get a consolidated view of their capacity reporting, which shows them the utilization of all their workloads in Fabric. This gives admins the clarity
and visibility into how much usage all their data engineering
items are generating, enabling them to make
data-driven capacity decisions. We know how important it is
for data engineers to be able to automate th
eir jobs and
do things programmatically. The Fabric SDK, which is
shipping later this year, enables users to create
items, execute jobs, as well as manage and
monitor their Spark compute. We'll also support the Levy endpoint for programmatic batch job submission. Before wrapping up, let's take a look at one last demo. This time doing a deeper dive into the monitoring
experiences of the platform. Now we'll take a look
at how Fabric provides a unified experience for
monitoring whilst giving users
the flexibility of diving deeper into their workload specific needs. I started out by navigating
to the Monitoring hub, which is the centralized monitoring portal for all Fabric items. Users can stored by item
type, filtered by job status, get more details about a job, and the great part is this experience is completely consistent for every item. Whether it's a data engineering Notebook, a data integration data
flow, or a Power BI data set. If there's a job I submitted by accident or I don't wan
t to run anymore, I can also easily cancel the runs straight from this experience. Whilst the monitoring hub
provides me with a consistent way of looking at all my jobs, I can also drill into the
details of all my runs, for example, navigating into
specific Spark application. At this point, the experience transforms
into something that is personalized per my specific workload. In this case, a data engineer can get all
the details of the Spark jobs that are part of their Spark application. If I h
ave a job that has failed, I can easily get the
specific co-sale snippet of where the problem has occurred. I can also navigate through
the diagnostic panel to get more details about where
and why the error occurred, but also warnings about
potential performance issues. In this case, we can see we have some
skewness in the data that the user should look into. Data engineers can also
look through the driver logs to get more details about the error. They can also download the
logs for further anal
ysis in their own tool of choice. Users can also see the data
inputs and outputs, for example, coming from your Lakehouse,
loft storage and other sources. Finally, data engineers can take a look at the Notebook snapshot from their run to see exactly where
potential issues occurred. For a more scripted view, users can navigate to their workspace and look at the runs
associated with a specific item such as the Notebook. And of course users can monitor their interactive jobs
directly in line data.
Engineers are also able to
navigate to the Spark UI, which shows native Spark execution
metrics at the job level. Users can dig into the different executors and check out the corresponding logs. To conclude, Fabric offers many unified
experiences branching from CICD to monitoring
where users can benefit from a consistent interface
but can also dive into the workload specific details if needed. Thank you so much for joining
the strategy and roadmap session for Synapse to
Engineering in Microsoft
Fabric. As the next step, I highly encourage all of
you to try out these new experiences by navigating
to Fabric.microsoft.com. I would also recommend you checkout, the data science, data
warehousing, an open AI session to learn about some of the
other exciting announcements. I hope you have enjoyed the session and I look forward to
hearing your thoughts, feedback and suggestions about the new capabilities
we are releasing. - Patrick, you weren't wrong. Lakehouses in Microsoft
Fabric, it's amazi
ng. - They're the way to go. You know why? Because they bridge that gap between just having the data warehouse. It brings the data lake and the data warehouse under one umbrella, so now I can cover all the
personas in my organization. I not only can we have
the citizen developers connecting to it, but we can have our data
scientists, our data engineers, just using all the data
in one single place, Adam. - And I like this
concept of starter pools, so just get it up and running fast, but we can al
so customize
that to our business needs, which is important as well, and from a developer perspective, leveraging Notebooks and Spark jobs or bringing your own IDE with VS Code. That was amazing. - That was amazing. Absolutely amazing. - All right, Patrick,
for the next session, we're going to hand it
over to Arthi and Anton where they're going to look
at security and governance inside a Microsoft Fabric. A very important topic. - Let's go. - Hello everyone. Welcome to this session. I'm Arthi Ra
masubramanian
Iyer group product manager and co-presenting With me today is Anton Fritz principle pm lead in the Azure data org in Microsoft. In this session, we will together cover security
and compliance features, but also capabilities enabling
effective administration and governance in Fabric. But first, what's Fabric? Fabric provides a unified
intelligent data foundation for all analytics workloads
and integrates Power BI, Data Factory and the next
generation of Synapse to offer customers a
price performance and easy to manage modern analytics solution. Every analytics workload
works seamlessly with OneLake to minimize data
management, time and effort by eliminating data
movement and duplication. Hence, Fabric reduces
the pain of integration and facilitates better collaboration. In addition to making available
different analytical tools your team needs. And here's the value Fabric brings, Fabric provides a complete
analytics platform with best of breed capabilities across
every ana
lytics workload. With security and governance inbuilt, it is open at every layer. Fabric empowers business
users with deeply integrated Microsoft Office and Teams
experiences and it delivers AI co-pilots to accelerate analytics, productivity and discover
insights with your data. In this session, we will particularly
focus on the capabilities which will make Fabric
secured and governed. Securing and governing your data is a non-negotiable priority for us. With Fabric, we will deliver industry
lea
ding capabilities that will enable you secure and
govern your data end to end. Let's start with how
administrators in Fabric can manage configurations for their tenant. In Fabric, as Fabric admins, you will be able to manage all
tenant and capacity settings in one admin portal. In the Fabric admin portal, you can centrally manage
review and apply settings for the entire tenant,
not just for Power BI but everything Fabric. You can set security configurations
for the entire tenant, so every data e
ngineer or data scientist may not worry about it. For instance, if you would like to allow
users in your tenant, apply sensitivity labels
to Fabric artifacts, you can set this up
once at the tenant level in the admin portal. Next, let's move on to capacity settings, which provides you a tenant
admin visibility and allows you to manage all capacities in your tenant including the new Fabric capacities. In addition to that, tenant admins will also have visibility into all active Fabric trial capaci
ties, provision for users within the tenant. Clicking on the capacity
allows you to adjust settings specific to that capacity. As a capacity admin, you will be able to manage
capacities you're an admin of in a very similar fashion. Let's now take a quick
look at how an admin can control availability
of Fabric Preview workloads for users within your tenant. Users in your tenant can
create Fabric artifacts once the switch at the
tenant level is turned on, you can also choose to restrict it to cert
ain users in your tenant. If no action is taken by July 1st, Fabric will be turned on
by default for your tenant. However, if you choose to
turn off access to Fabric, it'll remain turned off for your tenant until you choose to turn it on. Additionally, capacity admins can configure
the setting for their capacities independent
of the configuration at the tenant level. For instance, you can enable Fabric at the tenant level for specific users. However, at the capacity level, the capacity admin may
choose to follow the setting at the tenant level or override
the tenant configuration. While there are admin APIs
available for a tenant admin to be able to automate
aspects of tenant management, we are introducing a new one
which will allow you to read tenant settings and
configurations you have applied for each setting. This could be used in automation scenarios but also for documentation purposes and sharing current Fabric configurations to other non-ad admin
users in your tenant. We underst
and that as the
number of settings increase, keeping track of newly
added ones are not easy. To help with this problem, in addition to visual
cues for new settings, we will also notify you of
newly added tenant settings in the admin portal as shown here. Now a quick look into other features we're actively working on. We plan to make available most settings, which can be delegated by
a tenant admin to capacity or domain admins and workspace admins. This allows for distributed
and granular managem
ent of relevant settings, enabling efficient management
as we introduce new workloads and features in Fabric. We also plan to make
usability improvements like a search experience,
which searches both settings, titles and also descriptions
to surface settings that match your specific criteria. While some of the admin APIs
will support Fabric artifacts at public preview, we will continue to expand that list to make sure all admin APIs do. Let's take a look at monitoring
capabilities in Fabric now.
As admins, to effectively govern, we understand that you
need insights into usage, adoption and activities
within your tenant. Hence, we introduce the
admin monitoring feature, which is an in-product
admin monitoring workspace with pre-created reports and data sets. This feature will soon extend
to include Fabric artifacts and additional governance capabilities. Here's a quick demo of this feature. - [Narrator] The new
admin monitoring workspace is a Microsoft curated workspace targeted to the
needs of tenant admins. It comes prepopulated
with reports and data sets and we're going to be focusing
on the new feature usage and adoption report for the demo. Keep in mind that as we roll
out more Fabric capabilities, this workspace will include
value added Fabric artifacts on an ongoing basis. You can't add or remove artifacts here, but you can for instance
use the included data sets to build your own reports. As a tenant admin, you can also share this
workspace with others in your organiza
tion as you
would any other workspace. So let's take a look at
the new feature usage and adoption report. This report comes prepopulated
with 30 days worth of data and gives you a bird's
eye view of activities on the tenant across time. You can zero in on a
specific date range here or you can use the familiar
filter interface on the right to filter along a range of parameters. You can also filter directly
from within the report UI. We can right click to drill
through on this category or drill do
wn in this case to break it down into
its constituent elements. We can continue the analysis here or we can move over to this
visualization on the right which correlates the most
active users with this category. Here we see this user's activities are exponentially higher than other users and perhaps we'd like to learn more about what they're doing across the tenant. To do that, we can drill
through the activity details. This is going to give us a contextual view of this user's activities
across
the tenant. correlated with these parameters. We can do even more interesting
things on the analysis pain. I'm going to reset this
to the default view and we see the total activities are represented by this bar here, giving us the opportunity to build out an analysis tree anyway we'd like. We have a number of choices here in which direction we'd like to take it. In this case I'm going
to choose item type. I'm going to select
report here, and again, I have any of a number of
choices where I'd lik
e to go. I'm going to select action and
I'll go with the top ranking one again and from here I'm
going to select activity name and let's end up at users again. So I'm going to select users. So you can see that this is a
very flexible way to build out an analysis tree to support
any of a number of scenarios. - With Fabric, as we introduce new workloads
like data engineering, data warehousing and more, it becomes even more critical
for you to have visibility into capacity utilization
usage trends
so you can plan and scale your capacity accordingly. Which leads us to our next
demo. The capacity metrics app. - [Narrator] Capacity metrics
provides administrators with all the data needed
to monitor capacities and plan for capacity scale up decisions. To get started, I'll first
choose one of the capacities hosted in my company's tenant. The first graphic shows me
a trend of capacity unit consumption by workload. I can also view trends
of operation duration, count of operations by workload
and
count of distinct users to track workload adoption. The utilization graph
on the right shows me the amount of capacity units I've used compared to the amount of
capacity I've purchased. I can also see if autoscale
is enabled or in use here. Trial capacities can run both
production Power BI workloads and Fabric preview workloads. Preview status is differentiated by color along with interactive versus
background classification, which helps me determine if the
usage is from physical users or sched
uled operations to cook data. The items table shows me usage information by item and workspace. In this context, an item
can be a Power BI dataset, a data warehouse, or any
instance of a Fabric workload. To walk you through a
typical analysis scenario, I'm going to investigate what
my top workload usage was on Wednesday the 19th. As I selected date both
the utilization graph and item table update to show usage trends during the selected period, items are sorted by the amount
of capacity units co
nsumed and I can see that Lakehouse, Notebook and dataset operations are
my top three contributors. In this view, I can filter by workload
type to simplify analysis or I can use time point drill to explore full fidelity telemetry. Selecting a region in time in
the usage graph lets me load time point drill via the explorer button. Unlike the aggregated views we just saw, Timepoint drill shows
operations are running during a single point in time to help me analyze what
contributed to capacity usag
e, autoscale or throttling decisions, the views here show the
amount of throughput provided by my capacity SKU and
autoscale configuration. Operations are split between
interactive and background. When analyzing individual operations, admins can see the workload
items, workspaces and duration. User information is also
provided to enable easy follow up with workload creators if
optimization is needed. - This section covers some of
our key foundational features like security and reliability
and be
fore we drill deeper into some of these areas, I would like to first share
our vision for this area. Starting at this networking layer, we will ensure users in your tenant can securely connect a Fabric, but also users working with Fabric can securely connect to
their data outside of Fabric. Beyond that, access
control will be managed via workspace roles,
permissions and sharing, as well as additional
security at the data layer via One Security. Like many other Microsoft
products and services, we
will comply with key
industry certifications and regulations. Your data will be further protected by double encryption,
end-to-end auditability and data recovery in case of a disaster. And finally, purview will be deeply
integrated with Trident, bringing in a whole suite
of governance capabilities. Over the next several slides, let's dive into what's available today and what's on our roadmap. At public preview, for your Fabric data stored
addressed in your home region or in one of your capaciti
es, possibly at a remote
region of your choice, we ensure that data never
leaves the region boundary and is compliant with data
residency requirements. We will also support end-to-end
auditability for Fabric, so all Fabric user and system
operations are captured in audit logs and made
available in Microsoft purview. For access control, the existing Power BI workspace
roles now extend to cover Fabric artifacts as well as
with additional permissions which are specific to
new Fabric artifacts. In a
ddition to workspace roles, you can share individual Fabric artifacts or provide direct access
to them to specific users. Looking at our roadmap, we are an active development
of the first phase of a feature we are calling One Security. One Security will bring a
shared universal security model, which you'll be able to define in OneLake. More granular data security can be defined on data once in OneLake. This includes table column
and role level security. In this example, I have defined security
o
n a data warehouse that security would
flow across any shortcut which references that data and be respected by any engine you choose to access this data with. There are many more fundamental features, some we are actively working on and others which are part of
our mid to long-term roadmap. We're actively working on adding support for managed identities for Fabric, which will allow you to securely
connect your external data sources and also operationalize
relevant Fabric artifacts. We are also w
orking on ensuring
Fabric data is recoverable for business continuity. Other key features we
soon plan to focus on are securing inbound and
outbound connectivity in Fabric and data encryption using
customer managed keys. Now let me hand it over to Anton who will cover more governance features and Fabric purview integration. Thank you. - Thank you. Arthi. I'm really excited to talk with you today about enterprise information
management capabilities in Fabric. As you think about Fabric, we are thi
nking about
enterprise customers, customers that in order
to empower thousands and tens of thousands of
users to leverage Fabric, need enterprise scale data
management's capabilities. These capabilities are
usually oriented toward administrators that are
responsible for data availability, data quality compliance and governance of analytics platforms. And the great thing about
all the capabilities that we are going to discuss today, that they're either built in into Fabric or deeply integrated in
to Fabric. For example, the platform has a built in data lineage and impact analysis
capability enabling you to get an overview of how the data flows in a complex analytics project, which may include data
lakes, multiple data lakes, warehouses, pipelines, models,
reports and dashboards, understanding where
the data is coming from and where the data is going
in a complex analytics project can help you easily assess
the impact of making a change in one of the components and it also can help you
do
root cause analysis if there is a data
quality issue in the one of reports that the
business users consume. So with data Lineage, you can easily identify
where the data is coming from and zoom in on the root cause. Another enterprise
requirement is the ability to discover quality assets
and this is where endorsement is another important
capability for enterprises that is built into Fabric. Endorsement help users in organization to discover the highly quality endorsed by central team data assets
and today in Fabric there are two ways to endorse your data assets. First, certification. Users that are authorized
by Fabric administrators can certify data assets that meet organizations data quality
and reliability standards. The second way to endorse
items is using promotion, this option available for all data owners and it enables them to promote assets that they think that can be viable for other users to reuse and those assets get higher
visibility in OneLake data hub and other data disc
overy
experiences in Fabric. What makes it easy for you to
discover the endorsed assets and leverage them in your analytics tasks? We also understand that
enterprises manage multiple platforms and some enterprises
need to export the metadata of their analytics platform
to their homegrown management tools or third party
data cataloging tools. For this purpose, you can leverage Fabrics
scanner APIs which enable you to fetch all metadata and
data lineage of Fabric items to power your customer analy
tics, or to leverage it in the
third party cataloging tools which you may use. Currently Fabric is integrated with Microsoft Purview
data catalog, Informatica, Collibra and Inhalation. Please note that integration available for Power BI data assets only. We are in different
stages in collaboration with our partners in making
other Fabric assets available. Now let's take a step back
and look on large enterprises and what we understand that in order to achieve
enterprise scale compliance and gover
nance, the capabilities that we showed so far is our only part of the puzzle. Enterprises also needs
enterprise scale tools to manage their entire multi-cloud, multi-platform data
state with capabilities like data cataloging,
information protection, detection of sensitive
data, auditing, et cetera. And this is where Microsoft
as a whole provides you with the best of tier of
the full stack compliance and governance solution
with Microsoft Purview deeply integrated into
Fabric to provide you with
single vendor
approach that giving you the benefit of Build in compliance and governance integration. But first, what is Microsoft Purview? Microsoft Purview is a suite of compliance and governance solutions, helping enterprises
govern their multi-cloud and multi-platform data estate including governance of
Microsoft 360 applications. Microsoft Purview unifies
information protection, data governance, risk management and compliance solutions to enable you one place
to manage the compliance and go
vernance for your
entire data estate. And when we look on the Microsoft purview deeply integrated with Fabric, it start with integration
with Microsoft Purview data catalog where you can
create a new purview tenant that scans content from Fabric and from both first and third party assets enable means to better
govern their multi-cloud, multi-platform data estate
because they have all the assets in one place and also that users that serve the catalog, they can discover data
across their organizat
ion. Today Microsoft Purview data catalog supports scan of Fabrics, Power BI assets and it to be expanded to full set of Fabric assets in the upcoming months. If you are using existing Purview catalog and Data State Insights, they continue work as they worked before. Purview's Deep integration
into Fiber continues to Power BI the information
protection sensitivity labels and as we only know the
concept of sensitivity from office where you can see if the documental email is confidential and you m
ay not be authorized
to export some sensitive data. This is all done through information protection sensitivity labels. These same sensitivity labels
that are available in office are now integrated into Fabric. That means that in one
place in your organization, you manage all the audit
and security policies for all the data platforms and also it provides for
users that are using Fabric with a familiar experience
of how to apply sensitivity on sensitive data and how to
know if the data is sensiti
ve. Now let's say a demo of
how information protection sensitivity levels are
integrated into Fabric. As data engineer, I can easily meet my organization
compliance requirement to classify and label
sensitive data in Fabric using Microsoft Purview information protection sensitivity levels. I will open the the source lake
out of my analytics project and here with office like user experience, I can apply the right sensitivity label to reflect the sensitivity of
the data in this Lakehouse. In this
case, I would apply
highly confidential internal only sensitivity label. Once this sensitivity
applied on the Lakehouse, Fabric automatically takes care of appliances sensitivity label on all the items connected to this Lakehouse. And like you can see in
this complex project, it includes various Lakehouse
pipelines, SQL endpoints, data sets, reports and dashboards. Now, when the business user will consume the Power BI report in this workspace, they will immediately see that this report has highl
y confidential internal only data and when they export this data for further analytics to office, Fabric will automatically
applies the sensitivity label and protection setting
on the exported fire by those providing end-to-end
labeling and protection from Lake to office. But this does not end here. One of the most required
enterprise compliance needs, especially for enterprises in
highly regulated industries like finance or healthcare, is ability to detect upload of
sensitive data to the cloud.
Here is another place where it can benefit from Purview's deep
integration into Fabric. Compliance means can
define automatic DLP rules in Microsoft Purview
portal to detect upload of sensitive data to Power
BI models in Fabric and if such upload
detected they can trigger an automatic policy tip or alert. (indistinct) see is is exactly the tool that can help you automate
your compliance processes in Fabric to meet the enterprise
scale compliance requirements. DLP policies currently
support only
Power BI models in Fabric. Let's see a demo of how
compliance admin can define until (indistinct) to automatically detect upload of sensitive data. - [Narrator] In Microsoft
Purview compliance portal, this is where I define my
data loss prevention policy. After giving in a name, I'm going to decide to run it
on the Power BI location on top of Exchange SharePoint One
Drive another locations you can see here and I can decide if I wanted to run on the entire tenant or to choose to exclude or exclu
de specific workspaces. Creating my rule, these are the conditions and
actions that I want to run. In terms of conditions, we can use sensitive information
types or sensitivity labels or any combination of these two. In this example we'll use
sensitive information types and we'll choose social security numbers, which means that we're
going to automatically scan and detect social security
numbers in the data sets in this tenant. My action is going to
be a user notification, which is a custom poli
cy
tip that's going to appear in the Power BI UI for the Power BI users to be able to interact
with this information and see the guidelines that the security admin defined for it. And here I'm going to
define my admin alert with whatever severity makes sense and I also wanted to run and
to arrive directly in my inbox. Once I created my rule, I'm able to review the
conditions and the actions and this is also the place
that I'll be able to edit the rule if I need any
revisions in the future. In Po
wer BI I can see my dataset. As soon as I refresh it, it's going to trigger the
evaluation of the policy and in this case, we're going to detect
sensitive information. When I click on the dataset, I can see the details of
the rule that was matched. This is the exact same custom
policy tip that we defined a moment ago. Back in Microsoft Purview
compliance portal. In my alert tab I can see
the alert that was triggered and when I click on it I
can see all the details on which dataset ran at what ti
me, what sent information
type was found, et cetera. I will also be able to see that this was sent directly into
my email, like I defined. With data loss prevention
policies for Power BI, you're able to automatically
detect sensitive information as it being uploaded to Power BI and to take immediate remediation actions. Thank you. - Microsoft Purview deep
integration with Fabric continues with Microsoft Purview Hub. One place where you can
gain purview insight about your Fabric data state built
in into Fabric experience
and it also provides you with links to deeper purview capabilities available in Microsoft Purview portals. Let's say a demo of how
Purview hub can help you better govern your Fabric data estate. As Fabric admin, I can find all of my
Microsoft purview insight in my Microsoft purview
hub built into Fabric. I can also find links to
documentation and deep links to Microsoft Purview
portal for data cataloging and information protection solutions. Below in the items insights.
I can see insights about my entire tenant. I can fit the insight by workspace
and see in each workspace how many items of each type there are and how many of them are endorsed, whether certified or promoted. I can also fit this insight
by endorsement type. For example, I can see all of
the items that are certified, what types of items are certified and in which workspaces they are. When you switch to sensitivity insights, I can see insights about sensitivity labels
deployment in my tenant. I ca
n filter the insight by
specific sensitivity label to see the items that are applied with highly confidential FT labor. What types of items are applied and above I can see in which workspaces these items are placed. You can also leverage these
insights to identify assets that are not applied
with sensitivity labor. For example I can see
is that in this tenant, A lot of reports and
datasets are not applied to sensitivity labor. By filtering the insights
by reports or dataset, I can see in which
w
orkspaces these items are to take product actions and label them. For additional insight, I can offer full Microsoft
purview hub insights. We have tabs about sensitivity endorsement and items and for example, in the sensitivity tab I can
identify the honors of the items that are applied with
specific sensitivity label or the honors of items
that weren't applied with sensitivity labels. Like in this example, I can see list of owners
that did not apply sensitivity labels and I can
proactively reac
h out to them for them to take connection. The best enterprise care
capabilities I covered so far, including Lineage,
endorsement, scanner API, integration with spare
information protection, Purview data catalog, Purview
loss prevention policies and purview hub are just the beginning and the team is currently working hard on adding additional
capabilities that will continue to light up in the upcoming months. Thank you very much. - Patrick. You know I like the
admin and governance area of the pr
oduct. Oh, it's just amazing to see some
great things that are coming as part of this, Arthi
looked and showed us things about capacity management, the ability to read tenant
settings versus via an API and she also talked about a
lot of great roadmap things that are going to be coming in the future. So settings delegations,
ability to search items, these are things that
people have been asking for for a long time. And then we also got this
introduction to One Security. - That was my favorite par
t. That's why I'm think
about if you're working across all these different computes, connected to that OneLake and
the security's in one place, that is absolutely amazing. - And then Anton wrapped
it up with this purview hub integration inside of
Microsoft Fabric as well, so you can see what's going
on if you're leveraging the information protection labels and other items that are part of purview. So that's great to see
on the product as well. - Yeah, it's insight to all the things. - All right,
Patrick, what's next? - Next up, Nelly's going to talk
about Synapse data science in Microsoft Fabric. - Hi everyone. My name is Nelly Gustafsson
and I'm a product manager leading the synapse
data science experiences in Microsoft Fabric. Today I'm very excited
to share an overview and a roadmap of the new synapse
data science capabilities we are releasing. First, we're going to go through a key end-to-end data science
scenario we are enabling. We will then deep dive into
a selected set of tools
and experiences we're
providing to help users as they go through the
data science process. And finally, we're going to go through the roadmap and some exciting upcoming features. You may have already heard
about Microsoft Fabric, but let's do a quick recap
about the announcement today. Fabric is Microsoft's new
data platform for all analytic workloads and it integrates
Power BI, Data Factory and the next generation of Synapse with easy to use experiences
for a variety of roles. This brings toge
ther the
different data analytics experiences all the way
from ingestion to insight. And as I mentioned, this session's going to
focus on data science in Microsoft Fabric. Let's start with some key reasons why data science integrated
in the analytics platform is so valuable. Analytics is a team sport. A typical scenario involves
many different roles and many handoffs. Data science plays a key role
in the analytics workflow because it helps us to
enrich data for the purpose of making decisions an
d getting insights. There are many advantages
in removing the silos and inviting data science practitioners into a data and analytics
platform like Fabric. Fabric is data centric. Data is at the heart of data science. As you manage your data
assets in Microsoft Fabric, you build everything from
data pipelines to Lakehouses and your Power BI reports. You should give your data
science teams the opportunity to also work seamlessly
on top of the same secured and governed data. The Open Delta Lake fo
rmat
gives you the reproducibility that you need for machine
learning and the native integration with a data
infrastructure like the data pipelines allows you to
embed your machine learning activities nicely into
your analytics workflows. Microsoft Fabric is
also developer friendly. Our goal is to give developers
great getting started experiences and with delightful
code authoring experiences in the Notebooks. With integration, with
popular tools like VS Code, developers can build
out solutions
wherever they feel most productive. We also offer rich and
scalable machine learning. With a built-in ML Flow model
and experiment tracking, we allow you to version
and manage machine learning artifacts using standard ML Flow APIs. With ML Flow auto logging, we make it really easy for
users to automatically track key parameters and metrics
during the model training. And we also developed
a large set of built-in scalable machine learning tools
with our synapse ML library that you can use out of t
he box. And thanks to sharing the
same Fabric as Power BI, we make it very easy for you
to embed machine learning insights directly into
your Power BI reports. Microsoft Fabric also promotes
collaboration by creating an easy-to-use unified platform
for all analytics roles including data scientists. Users in different roles on
your team can now collaborate on the same platform using
the same set of tools and integrations. So whether they're working
on data engineering or data science or BI, you c
an now take make your
analytics tasks easier. This also makes it easier
to secure and share data. For example, you can share code models and
experiments across the team. It simply makes your
teams more productive. Now let's dive into specific capabilities across a data science lifecycle. We're going to start with
a data science workflow. This is a workflow I'm sure
many of you have seen before in some form and we're going
to focus on a key scenario that helps accelerate
your business insights wi
th help from data science
and machine learning tools. And when we cover the individual features and experiences later on, you'll hopefully get a good understanding of where they fit in. Any data science process typically starts with formulating a problem,
a question to answer. In Fabric, we're making it very easy
for data science users, data analysts, and business
users to collaborate over the same source of truth. This gives a shared
understanding of the data and the problem at hand. For this,
we're introducing
some exciting new capabilities we call Semantic Link and
we will talk about this a little bit later in the session. Next, your data science teams will need to further pre-process a
data that data engineers have landed in Lakehouses. From Notebooks, code is written
to do this pre-processing, but a big part of pre-processing data is to first explore the data,
understand it, detect issues, and then address those issues. We are bringing in
tools like Data Wrangler to help boost the
productivity of users during the data cleansing
and preparation phase. And once the data is clean, you want to construct
machine learning features and train your machine learning models. With ML Flow, we're making it very easy for you to track and manage these machine
learning experiments and models. In fact, machine learning
items are first class citizens in Microsoft Fabric. It means that you can control
permissions on a model, you can share it, you can
label it, you can endorse it. And with
our scalable rich
machine learning library on Spark synapse ML, you can perform these steps at any scale to enrich the new data
coming into your Lakehouses. We offer you scalable batch
prediction capabilities so that you can get your insights faster. And with the new Power
BI Direct Lake mode, your enriched data is
immediately available and continuously refreshed for reporting without any extra steps. Now let's dive into some
of the specific experiences and features that we're releasing. Data Wr
angler is a tool designed
to help simplify data prep and cleansing while still taking advantage of the power of code and
reproducibility of Python. It features dynamic data
displays and built-in statistics and chart rendering
capabilities and it gives you the ability to get started
processing data frames in just a few clicks. Data Wrangler designed for
a range of experience levels from newer developers to
more experienced developers. And in the future the tool
will support Spark data frames and
offer natural language processing to code functionality using Azure OpenAI. As part of the synapse
experiences on Microsoft Fabric, we are also bringing you built-in model and experiment tracking with ML Flow. This allows users to easily
track and compare different experiment runs and model versions. And with auto logging we're
also making it very seamless to capture key metrics automatically as you're authoring code to train models. The ML Flow tracking support
in Microsoft Fabric is powered by
Azure Machine Learning, which opens up exciting integrated
experiences in the future with our scalable
predict function on Spark for batch scoring, we help simplify
operationalization of models. You run your scoring jobs
from the secure confines of your data platform
without moving any data. Just write the enrich data to
your lakehouse and seamlessly serve to Power BI reports
with Direct Lake mode. We want to make it easy for
everyone to leverage our tools. That's why we've added an
easy to use
guided experience that helps you enrich your data. Simply select your source of data, map it to the inputs of your model and choose an output destination and from there, we'll handle the rest and even generate code for you. In addition to features mentioned so far, we also offer you support for the richest machine learning
library for distributed ML on Spark with the Synapse ML library, which is an open source
machine learning library that we maintain, you get access to a lot
of machine learnin
g tools and easy to use APIs for
applying machine learning and enriching your data in an easy way. There's a ton of great
features in this library and I don't have time to cover everything, but let's highlight some
of the core capabilities. SynapseML offers training of
distributed machine learning models with performant
and popular algorithms. We've also added full ML Flow
support for SynapseML models. Spark operators in Synapse
ML also help you to work with pre-trained AI models
from Azure Cogn
itive Services and of course new APIs allow you to apply large language
models and those type of transformations directly
on your Spark data frames. You can go to aka.ms/Spark to
learn more about synapse ml. Lastly, before we dive into a demo, I want to highlight that
the synapse experiences in Microsoft Fabric also come
with R language support, built-in R support for Spark R, Sparkly R make it easy for data scientists to develop machine learning models using familiar interfaces with R. R can be
used both from Notebooks
and Spark job definitions. Now let's dive into a demo
showing you the end-to-end data science scenario we covered a while ago and how it can be implemented
in Microsoft Fabric. We are going to look at
one of the key end-to-end data science scenarios
in Microsoft Fabric. Here I'm in my workspace
and I've created a Notebook. Here we are building machine
learning models on Spark for predicting taxi trip durations and we will then seamlessly
serve predictions for consumptio
n from BI reports without any data movement or extra steps. With our Notebook experiences, data scientists can quickly
get started solving problems using machine learning tools. The raw taxi trip data
that we're going to use is in our lakehouse and
contains 34 million records. This data contains
details about taxi trips and we are interested in being able to predict the duration of a trip given the set of known factors. This could help us plan
and optimize trips better. We are going to take a sa
mple of this data and do exploratory data
analysis using Python and popular visualization libraries. For example, we can learn more about our data by looking at the distribution
of the trip durations. We can analyze this by passenger count and look at the distribution
of passenger counts per trip. In this way we can detect relationships and correlations in our data set. We can analyze peak hours during a day and peak days of the week. This also helps us detect outliers and missing values to filt
er out. And once we are done
cleaning up the data set, we can save the prepared data
set back to the Lakehouse. Microsoft Fabric also has a
built-in ML Flow endpoint. This means that you can
easily track and manage your machine learning
models and experiments with standard ML Flow APIs. We are going to read the
cleanse then transform data set from a delta table in the Lakehouse. After some feature engineering and defining hyper parameters, we are ready to train a
machine learning model. Here we
are also starting an ML Flow run. To make sure we capture
and track this iteration, we are using synapse ml
and a light GBM Regressor to train this model. And finally, we can log the model version. This process will be
repeated for another run with a tune set of hyper parameters. The model and experiment now exists in our workspace as items. If we open the experiment item, we can see the runs that we just created. We can see all the associated files and details about our runs. For example, we ca
n see model
signature and environments. It is also possible to
save models from runs in this experience. We can also filter runs in a
list and compare different runs and customized charts. This makes it easier to
evaluate different runs. Similarly, we can take a look at the
model item in our workspace. Here we see two versions of
models logged from our Notebook. We can see how we can apply models through various experiences and even copy the code for
using a model in a Notebook. Here in the Note
book, we can run model scoring and
save the predicted values in a Lakehouse table. With a Power BI Direct Lake mode, this table with predicted
values automatically becomes part of an auto-generated
Power BI data set. Thanks to this tight integration between Power BI and Lakehouse, data science users and Fabric
can now easily collaborate and continuously share results and data from the data science process with stakeholders like
analysts and business users. The Lakehouse is not only a place to st
ore batch predictive
values in this case, but it's also a bridge
that helps collaboration. Without any data loading,
data movement or manual steps, this enriched data is
now seamlessly served into Power BI Reports. And if we want to automate this by scheduling Spark Jobs or Notebooks to run on a regular basis, it automatically refreshes
the Power BI reports as well. The consumers of the reports can now get the latest enriched data to
analyze with zero lead time. You have now seen a
key end to en
d scenario that you will be able to enable with data science experiences
in Microsoft Fabric. But that's not all. We have a large set of experiences
and capabilities planned on our roadmap and I want
to share them with you. Semantic Link is a feature we
are introducing for a tighter integration between data science and BI. This will help collaboration
with stakeholders throughout the data science lifecycle. For data discovery and pre-processing, we're adding support for Spark
data frames in Data
Wrangler. When it comes to modeling
and experimentation, we want to expand our ML Flow support, bring you richer support
for hyper parameter tuning, and we'll bring you auto ML with FLAML. We're also adding support
for consuming pre-trained AI models from cognitive services. There's a lot we want to do when it comes to
operationalization of models, when you work inside of Fabric. CICD support and SDK will
help programmatic automation, model endpoints will
help you invoke models from other Fabri
c
experiences like for example, data flows. And throughout this entire process, you'll be able to leverage
productivity boosting experiences powered by co-pilots and Azure OpenAI. We don't have time to cover all of these roadmap items in depth, but there are a few I wanted
to specifically highlight. Let's start with Semantic Link. One new capability in Microsoft
Fabric we're really excited about is Semantic Link, which offers a powerful tool set to bridge data science in BI. But what does this m
ean? In the next demo, we will look at how Semantic
Links helps to ease the collaboration between a
data analyst using tools like Power BI and a data scientist using tools like Python and Spark. With Semantic Link, you will see how Notebook
users can explore Power BI data sets from Python and Sparks SQL. This includes things like
tables, calculated columns, but also access to measures. Users can explore, query and validate the data in Power BI
directly from Notebooks. Now let's take a look at
se
mantic link in action. A data analyst has collected sales data in Power BI and built out
some reports for the business. She realizes that she
could use better revenue forecasting data and wants to collaborate with a data scientist. Thanks to Semantic Link data
scientists can now tap into the Power BI semantic data model
and business logic and leverage that to quickly get a better
understanding of the data. In order to answer business questions and solve problems using
machine learning tools. Sem
antic Link offers a
Python library called SymPy that helps data scientists and
other Python users to access, explore and validate the
Power BI semantic data layer. This layer contains things
like tables and applied calculations and logic like
calculated columns and measures. With SymPy, all of this
can be accessed using tools that data scientists are
familiar with like Python. In this Notebook we're using
semantic Link and SymPy to browse a given Power BI dataset. But first we can list the
datas
ets we have access to. We can pick one from the list
and start querying the data. As we are reading and querying the data, we're working on a snapshot of the dataset and the results get saved in a semantically aware Pandas DataFrame. We can visualize the relationships between the tables right
here from a Notebook cell. We can also list available
measures in the data set. For each measure you can see
details like measure names, measure expressions and data types. This is a great way for data scie
ntists to read these values but
also to understand the logic and formulas behind these measures. SymPy also allows us to
read the content in tables and the measure values
using built-in APIs, but there is more. Semantic Link also lets you
query the Power BI dataset, including measures with Spark
SQL directly from Notebooks. Now that a data scientist can easily query and explore the semantic
layer from a data frame, it also means that it's much
easier to use Python libraries to explore the data.
To build a forecasting model,
we're going to use Prophet, a popular time series forecasting library. This allows us to build
a machine learning model to forecast future revenue. And the training data comes
from the Power BI dataset. The forecasted values are
written to a Delta table in the Lakehouse and thanks
to the Direct Lake mode, we can now seamlessly make them available for Power BI reporting. With Semantic Link in Microsoft Fabric, we are really excited to
bridge data science in BI. To em
power all users with AI, cognitive services pre-built
AI models will be integrated into Microsoft Fabric. Users will be able to
access text analytics, anomaly detection, text translator, and other AI models out of the box without the need to pre-provision
any resource in Azure. But it doesn't stop there. We're also adding co-pilot
experiences for developers. This means that Microsoft Fabric will have a built-in integration with Azure OpenAI. We will bring developers
in Microsoft Fabric, a large
set of productivity
boosting experiences specifically for Notebook users. This means that we will
offer you built-in co-pilot experiences for generating
code, explaining code, troubleshooting, migrating
code, and much more. Through an integration with the best of brief foundation
models from Azure OpenAI, were contextualizing
interactions to be relevant for your data in your data frames, in your lakehouse or warehouses. But we also want to empower
Microsoft Fabric users to create your own AI plu
gins for answering
questions about your data. These items will support
security permissions, governance policies, sensitivity labeling, and allow tracking of lineage. These data-centric AI
plugins can be published and used in other chat bots
like M365 business chat and can also be consumed in experiences all across Microsoft Fabric. We can't wait to release these features to help all our users achieve more. I hope this session provided
you with a good understanding of what is possible with data
science and Microsoft Fabric and what's coming. I appreciate your time. I want to thank you for
listening to this session. Please try out the data
science experiences in Microsoft Fabric and
let us know what you think and what you would like to see and stay tuned for many future updates. - Wow, Adam, a lot of great stuff. It was kind of unexpected. I
really enjoyed that session. It blew me away. Especially, there's two
things that blew me away. The Data Wrangler. The
Data Wrangler was amazing. T
hink about I'm a low-code developer, but I can use this Data
Wrangler to really go and get intimate with my data. I really can do that. And the second thing that
semantic link, think about it. We're going backwards now. So you've built this semantic
link and I'm a data scientist and I want to use that to train my model. I can completely do that
directly inside of Notebook with my semantic link. - Brings the data full circle
into my process. I love it. - It's exciting. All right, up next Christia
n and
Zoe are going to look at what's coming from the Power
BI side of this as as well. There's some special things in there that we are really interested in. So have a look. - Hello everybody and
welcome to this session. We are so excited to share the
latest Power BI announcements here at Build. I'm Christian Wade Group product manager for professional business intelligence in Power BI and Fabric. We have Zoe Douglas with us today. Zoe, would you like to introduce yourself? - Thanks Christian.
Yes. So my name is Zoe Douglas and
I am a product manager working on professional tooling
for semantic data models. - Thank you. And we'll be
joined shortly by Rui Romano. So here's the agenda. It's packed full of
awesome demos on Fabric, developer experiences and a vision demo that you won't believe. So let's get stuck in. The cat is well and truly out of the bag. We've been working on
Fabric for some time now and we couldn't be more
excited to share the good news with our community. Fabric wil
l truly transform
how analytics projects are delivered because customers today are often locked into
proprietary vendor formats. They have to spend inordinate
amounts of time and money integrating data across vendor products. And this complexity
causes data fragmentation, which is poisonous to organizations seeking to embrace a data culture. However, there's a silver lining. It's clear that analytics
projects have consistent patents. They invariably require data
integration, data engineering, da
ta warehousing, business
intelligence, et cetera. Now Microsoft has had leading products in each of these areas for a long time. But with Fabric we are
providing the market with the first truly unified analytic system based on one copy of data
and a unified security model. We're taking a bold bet
on Delta Lake and Parquet as an open standard format. Think about what this means for customers. In addition to avoiding vendor lock-in, one copy of data shared across each of the Fabric analytical engi
nes means customers will
dramatically reduce data silos and data integration costs. And of course this is
done with deep integration with Microsoft Office Teams and delightful AI co-pilot experiences. You won't believe some of the
demos you're about to see. Now let's talk about
Direct Lake storage mode for Power BI data sets in Fabric. Direct Lake, remember the name. Power BI data sets have had direct query and import storage modes
for a very long time. Users interact with
visuals in Power BI re
ports and they submit DAX queries
to the Power BI dataset. Direct query avoids having
to copy data but typically suffers performance
degradation by having to submit federated SQL queries to
other database systems that are just not as efficient
for BI style queries. Import mode on the other hand, delivers blazing fast
performance because queries are answered from our column data store that is highly optimized for BI queries. But of course the data has
to be copied during refresh introducing manag
ement overhead. Enter Direct Lake mode
by querying data directly from the Lake, Power BI data sets enjoy blazing fast query
performance on a par with import mode without having to copy a single row of data. Now I know what you're thinking. How on earth is this physically possible? It literally sounds too good to be true. Well it just so happens
that Parquet is also a column storage format that works perfectly with our engine. So let me make this clear. Power BI is moving to
Delta Lake and Parque
t as its native storage format. This changes everything. Let's run a Direct Lake demo. I'll start off in my Fabric
Lakehouse or warehouse. Here you can see the
Parquet files in the lake. I simply click on the new data set button, select the tables I'm interested in, and I immediately land
in the recently announced Power BI dataset web modeling experience. I didn't even have to leave the browser. I can create relationships and measures. And here's the kicker. I can click on the new
report button
and create a beautiful Power BI report
directly from the lake. Notice I didn't have to perform a refresh. There's almost 4 billion
rows of data in this table and I get instant response times. So to summarize, we've unlocked massive data
with blazing fast performance and we didn't have to
copy a single row of data. We didn't have to manage ETL
jobs into the data warehouse. We didn't have to manage data loads into the Power BI data set. Users can create beautiful
Power BI reports in seconds withou
t any data duplication. Now let's switch gears and talk about developer experiences in Power BI. Developer mode enables
source control and CICD integration for Power BI
desktop author data sets and reports. We're providing native
integration with Git from the Power BI service that can be optionally integrated with Power BI deployment pipelines. And as you'll see in the demo, instead of saving to a PBIX file, you simply save to a Power BI project, which places the artifact
metadata on the file sy
stem so you can check into source control. So without further ado, let's invite Rui to join us from Portugal for an amazing demo. Rui, over to you. - Hi, my name is Rui and
I'm a product manager on the Power BI team focusing
on developer experiences. And I'm really excited to show you the new developer experiences
we have for Power BI. Let me show you. Mow with Power BI desktop, you not only have the
option to save your work as a single PBIX file, but you can also save it
as a Power BI project.
A new save option that will make desktop save your
development into a folder. Finally, in blocking source
control and collaboration using Power BI desktop, let me show you. From an open Power BI report. You can now go to file save as and select the Power BI project save as type and desktop from now on will save all your
developments into a folder. This is a folder with a
Power BI project in it, it contains one folder for the data set and one folder for the report. So if I go back to desktop
and
I create a new measure, Desktop will save the new
measure in the model definition within the dataset folder
and I can use tablet editor, the open source community tool
to open the model definition file and view the measure
created on desktop. Now let's do the opposite. Let's create something in tablet editor and see it reflected in desktop. Let's duplicate the
product table and save. If I go back to desktop, I don't see that new
table because desktop, it's not aware from outside changes. So I ne
ed to close desktop and reopen the Power BI project file and I'll be able to see the new table created in table editor. But that is something interesting. If I go to the data view,
all my tables have data, but if I click on that new table, that new table doesn't have data, why? Because tablet editors
didn't refresh any data, just created a new table definition. But notice also something. So part desktop because it's
working in a Power BI project, detected that I have some
tables that have incomp
lete or no data and is asking
me to refresh now. And if I click refresh, it's also smart enough to only
refresh that single new table that I created from tablet editor. And now that Power BI
desktop can work on a folder, I can initialize a git repo
and enable version control and collaboration with other developers. And I can do that using
Visual Studio Code. From the Power BI project folder, I can open Visual Studio Code
and initialize a new Git repo. And from now on, because I
have a Git enable
d folder, I can track inversion control
any change I make in desktop. For example, if I change a measure, I can easily track that change in Git and Visual Studio Code will
show me that I have a file GIF in my model BIM file. Now to enable collaboration,
I need to use Azure DevOps. So I can go to Azure DevOps
and I can create any repo, let's call it the mode. And I need to configure
this remote Git repo URL back in Visual Studio Code and publish my branch. And Visual Studio Code will take care of
syncing my local
development into Azure DevOps. And now I can enable
collaboration and have multiple developers working on
the same Power BI project using Power BI desktop. You only need to be connected
to the same Azure DevOps repo. But we didn't stop here. We will also enable you to sync a Git repo to a workspace in the service. And for that you need to go
to the workspace settings where you will find an
option called Git integration that will allow you to
connect the workspace to an Azure De
vOps Git repo. So let's select the
Azure DevOps organization and the projects, the repo and branch we were working and click on connect and sync. And just like that we just
enabled a two-way synchronization between the workspace and
a Git repo in Azure DevOps. It'll start by synchronizing the content from Git into the workspace, creating a report and a dataset artifact. I must refresh the data
set because in Git, there is no data, only metadata and code and I can also make
changes in the workspa
ce and synchronize those changes into Git. Let's make a small edit in
the sales report and change the background color of this
cards into red and save. And also let's create
a new report and create a very simple report and
save it to the workspace. And I want you to notice two things. The first one is this
indication in the toolbar in the source control where I can click and I can see changes that are
from the workspace into Git. And I can see that they
have a modification in the sales report an
d I have a new report
called Report from Service. I can click on both changes, I can undo this changes if I want, but I'm not going to do that. I will commit and I will provide a message and let's hit Commit and the service is going to synchronize
both changes into Git. And you can also notice in the status bar I can see that my workspace is
connected to the main branch. I can see the last time it sync. And I also have a link that
will take me to the Commit. So where I can click and this
will ta
ke me into Azure DevOps and it'll tell me exactly
what have changed. Now let's go back to my local machine. If I go back to my local folder, I can only see the sales data
set and the sales report. I don't have the report that
was created in the service because this is still in Git. But I can open visual studio
code and I can do a Git pool and Visual Studio Code will
sync the content from Git into my local machine. And if I go back to the folder, I can see that they have my new reports created fr
om the service. And of course I can open
this back in desktop and I'll be able to see the change that I did in the service, the switch of the background caller. But I can also open the report directly by navigating to the report folder and opening the definition file and Power BI desktop will open
the service created report for local offering, but this time connected to the
local dataset that will also be in full edited mode where
I can view and edit measures, transform data, or even go to the d
ata view
to explore the dataset data. And this is it, the new Power BI project
save option that together with Git integration will
and block collaboration, source control and automat deployment into your Power BI project. Thank you. - Wow, that was amazing. Now ladies and gentlemen, we have a special treat in store for you. Zoe is going to show us
real demos of features that are coming soon. We want to give you a
sneak peek into the future. So Zoe, why don't you show us
what we've done lately fo
r semantic model authoring? - Sure. So one of the things we've
been doing recently is we've been making some
changes to the model view. So in the model view, you have your familiar view
while the tables in the diagram and you have 'em listed
over here in the data pain. But now we're introducing
this new model pivot and this gives you full view of all the semantic objects in your model. So here I can see my roles, I can see all the relationships, perspectives and all the measures, even if they're
all in different tables. And Christian, I know you're going to be
excited about this one, but we also can now have
calculation gaps listed. And not-
- [Christian] What a relief. - [Zoe] And not only can you
see these calculation groups, but we can actually come in
and now for the first time in desktop actually see
the calculation items and if I click on it, I
can actually edit it here, write in desktop so I can
actually edit and create calc groups right in desktop. - Well I am blown away, Zoe,
because this model view is
kind of like the field list that we know for the report view, but this is like the field
list for the model view where I can see all of
the semantic model objects in one place, including even the calculation groups and calculation items because you know, calculation group authoring happens to be one of the highest voted
items on ideas.pbr.com. And so it is just so gratifying to see that we can create them here. They're so useful for these large models with complex calc
ulations. Now Zoe, I do have a question for you
about complex calculations. You know, the formula bar
occasionally I have to say, if you don't mind me saying
occasionally I feel a little bit constrained when I'm
authoring these really complex calculations with lots
of interdependencies between measures. Is there anything that we are
doing soon to address that? - Yeah, so I'm glad you asked
Christian because there is, you may have noticed that
there is a fourth few there available in desktop and
that is introducing the DAX query review. So here I can write any
DAX query on this model and run it right here in desktop. So here I have one that is showing me the profit margin by fiscal year I'm going to hit run and I
can see that run right here. And this measure here, I can actually click on here hover and I can see the DAX
expression right here in context in the DAX query. - [Christian] This is amazing. This is something that I've always wanted from the formula barb, because you know when
you're working on a measure and it references another measure, it's quite distracting
to have to context switch to click on the other measure
to see the definition. But this does raise another question because this measure is referring to the measure definition
for this one's referring to two other measures. And naturally I want
to see the definitions for all of them in one place. Does this address that scenario at all? - [Zoe] Yes, it does. So if I click on this one, you'll see this little ligh
t bulb show up. Here, I can click on this and I can say I can define this measure or I can define this measure
and expand references. So now I can see the multiple
DAX expressions listed here. I can see the profit margin and I can see all the measures down to the data columns that is used to generate that measure. - [Christian] That's amazing. - [Zoe] And now Christian, you
cannot only just see it here, but you can also make changes. - [Christian] You've
made changes right here? - [Zoe] I can ma
ke changes. - [Christian] I don't believe it. - [Zoe] I can do two measures at once. I'll show you. So here I'm going to
multiply our costs by two and maybe we'll sell everything at triple the price of course, right? Just to make up for those costs. And now I'm going to run this
and I can see what impact that would have on our profit
margin, which it goes up to 40%. Now these changes are still only limited to the single DAX query. If I go back to my visuals, we still see the old value of 11%, bu
t this DAX crew review is pretty smart and knows I have these
measures in my model. So it's giving me this inline
prompt to actually save it back so I can quickly save
the changes I've made to this measure back to my model
with just these two clicks, still in context of what I was doing. And if I go back to my
report view, I can see, it actually updated with
those measure changes. - That it's like seamless integration, you can make edits to your DAX
and when you're comfortable with the edits and
you
validated the numbers, you just save it straight
back to the model. This is such a productivity boost. But you know, it does beg
another question, Zoe, because here I can see these
four measured definitions, but you know, some of these
models they may have, you know, hundreds or even a thousand measures and why not just see all
of them so that I can do, for example, global find and replace. Is there anything that we
can do for that scenario? - Absolutely Christian. So we can actually come o
ver
here into the data pain. I can right click and
use as quick queries, which will define all the
measures in the model. - [Christian] No, I don't believe it. - [Zoe] So now here I can
see all these measures that I have in my model. I can do find, I can do replace, and I can also do other
text editor things here. Like I can zoom in and zoom out. And not only did I give you
all of the measure definitions, so we can go in, as you said before, edit any of them and save
them back to the model. But
I also gave you a query with them. So you can actually even just run this and see all your measures and then tailor this to what you need. United group by column, remove measures, whatever you need to do, it's already done and ready for you. - [Christian] I'm amazed. - [Zoe] So another thing
those quick queries can do is, and I can now come over to a new page and I can actually
define a single measure. So let's go ahead and take
a look at this discount and I can say, just show me for this measur
e
alone and I can run it. And not only that, but now I have to summarized columns here. I can go ahead and I can
add a group by column. - [Christian] Its full IntelliSense. - [Zoe] Full IntelliSense. And I can run this right
here and see that come back. Now it's not only for measures, I can also come down to
any of my data tables. Like here's my customer. I can right click. And I have a quick query here also to gimme the top 100 rows, which would be really
helpful for those of you in direct quer
y scenarios where you don't have that data view. And then you can also get
down to an individual column. So if I wanted to see
what countries do we have in this model, I have a quick query for that and we can come in and do distinct values. So let me go ahead and run this and now you can see them
all generated for you. - I am amazed, I didn't even realize there
were so many different ways to generate these DAX queries
because a lot of the DAX queries are actually generated by
the report visuals,
right? And occasionally you might
even want to intercept the DAX queries generated
by the report visuals, for example, for debugging purposes. Is there anything that we can
do to address that scenario? - Absolutely. So as you know, our more advanced users will
go to the performance analyzer and get the copy the query out of there. But now we've made just as
easy as the quick queries, you can right click any
visual, go to inspect visual, and now that DAX query is over in the DAX query view and r
an. So you can go ahead and take whatever steps you need to take now. Not only will it do it on, because it's so tightly
integrated in desktop, not only will it do the
visual in this date, but I can actually filter it. And then do the inspect visual. - [Christian] And it brings the filter? - And it brings the filter. So up here, can see that it
brings that filter with it and you see how it is
applied to the visual. And finally we can also get
down to an individual data point and inspect just tha
t data point
to see the query behind it. - You've thought of everything, you've literally thought of
everything, I am so happy. I mean this is such a
productivity booster, Zoe. I honestly can't believe it. Next you're going to tell me that the system's going to
write the DAX for me. Wouldn't that be something? - Well it's funny you
should say that, Christian, because I actually didn't
write that first query either. I had co-pilot and Power BI- - No, I don't believe it. - Do you want me to show y
ou?
- Yes. - All right. So here we can say show
me profit margin percent by fiscal year. And it would generate the query for me. - My goodness, this is amazing. This will change everything. - So not only that, but
it's conversational. So as soon as I've written that one, it's going to actually
suggest another prompt and another query who's
already did it by year. Maybe it thinks, well maybe I want to
see it by quarter next and it will will do that for me. - It's conversational, it's kind of eage
r to have
a conversation with you and it's taken your instructions that were specified in English and it translated them effectively to DAX. Amazing, amazing. Absolutely amazing. You know, I know some individuals who you would've thought DAX is their native language actually. And then others like us we're comfortable providing instructions in English and having the system
generate the DAX for us. But some other individuals, they may not have English
as their first language. What do they have to
do? Do they need to use some
kind of an online translator to use this tool? - No, they can actually just speak to it in whatever language
they're comfortable with. So here I have a prompt in German, and as you can see it took
that prompt no problem. And wrote a query based off
of what I had in German here. And then not only Christian
will it do the first query, but now it's going to continually prompt. - I can't believe it.
- In German. And because it saw that I
was speaking to it in German, it
thought maybe I was actually interested in only the customers in Germany. So that is its next step. And it's going to continue to do that. It's going to continue
provide prompts narrowing in now into Berlin. - Wow, so it's quite
chatty and it's quite keen to have a conversation with you and it's detected that it believes your native language is German, so it's going to have a
conversation with you in German. I mean this is just amazing. Like I never saw this coming. This will transform the
way w
e work right now. Something about these AI co-pilots that with regard to
the Microsoft products, they are now becoming ubiquitous
across Microsoft products. So is there anywhere else in Power BI that I can use this AI
co-pilot experience? - Yeah, so let me show you a report that I had published earlier. - [Christian] It's beautiful. - [Zoe] Here, we have also a co-pilot and if I click on this, it's going to open up a copilot pain and now first it's going
to actually suggest some prompts based of
f of
the data it already sees. But I can still just put
in whatever prompt I want. So here I'm going to go
ask it to tell me about the sales performance in Australia, sight? So that's-
- [Christian] No way. - [Zoe] Right, and
immediately it's given me a summary of the sales performance. - This is amazing. You can have a conversation
with the report, ask it how to increase sales for example. - Sure. So we just go in here and we can ask it how it's going to increase
the sales further and- - Unbeli
evable. - Here we go. - And just like that it came
up with a little business plan. It's given you a customer demographic, which countries to target
in your marketing promotion. This is literally amazing Zoe. Like what else can you do with this? - So we can also... So I don't have any
slicers on this report. Well I have some slices
but none for country. So I can actually ask co-pilot to filter the report to Australia, right? So we can take a further look. It's going to ask my
permission to apply
the filter. And it's going to do that. So the way it did it is it
actually used the filter pain. - [Christian] Because
you didn't have a slicer? - [Zoe] Yeah. Didn't have a slicer. So it actually used the filter pain and filtered the report to Australia. - This is really helpful, right? Because some users may not even know there is a filter pain. And even those that do
are going to have to go and find Australia in the list of values. This is just so smooth, so, so easy. You know, and especially
for users who may have not seen
this particular report, I mean some of these reports
are visually stunning. Someone's obviously put
their heart and soul into authoring these beautiful reports, but sometimes they can be
a little bit overwhelming because there's so much going on in them. There's so much information
packed into these reports. So sometimes it's this
though I could really use some kind of a TLDR
summary of the reports. Do you think co-pilot
could help me with that? - Yeah, it absolut
ely can. But we actually have another
co-pilot that actually may be better suited for that task. So here we have a visual
co-pilot that I can keep on the report even after. So the pain will just be for me, but here I can actually now use a prompt that my report consumers can use. So let's go ahead and change
this one to let's say, give me, let's see, to give me a 20 word
summary of key takeaways and let's use some emojis this time just to make it a little bit more fun. And just like that we
have
our summary built-in. - This is amazing. So this summary is going to change when new data comes through this system. What about cross filtering? Does that work too? - Absolutely.
- Amazing. So it's a dynamic summary for users who are viewing this
report for the first time. They can get a head start
on what the report is about. This is exactly absolutely. It's such a great productivity tool, Zoe. This will really change the
way that we author models and we interact with reports. Just changes eve
rything.
So thank you so much. - You're welcome Christian. And also I would like
to note that this is not the only place we have
co-pilot, in Power BI Fabric. I think there's another
session that's going to get into a few more of them as well. - Oh, you mean Patrick
Baumgartner session? - Yes. Yes. So be sure to check out that one as well. - Okay. Check that one out too. Thank you so much Zoe. - Thank you. - Lastly, let's summarize the importance Power BI announcements. Direct late data sets in
Fabric is in public preview try them out today. Pab Desktop Developer mode
Public preview is coming to a release near you very soon. It's so close I can almost touch it. Azure Analysis Services to
Fabric Automated migration is generally available. Not only migration to Power BI premium, but now you can take your semantic models from Azure Analysis services all the way to Fabric
with just a few clicks and align with the Microsoft
BI product roadmap. Data modeling in the Power BI
service is in pub
lic preview. Like you saw me create
the direct link data set in the web modeling experience. You can do so for other data sets too. The optimized ribbon for Power BI desktop is generally available. Unlock big data with optimized reports, authoring experiences, paginated report drill
through is in public preview. This is a commonly used feature for SQL server reporting services. So removes a barrier when migrating from on-premises to Fabric. The MongoDB connector for Power BI, one of the most req
uested
connectors on ideas.PowerBI.com is now in public preview. Hybrid tables is generally available. Unlock massive data for
interactive analysis with realtime streaming capabilities. And lastly, Azure log
analytics integration for fine-grained logging and auditing of Power BI dataset engine events is now generally available. So with that, thank you all
so much for attending bill. This one was truly epic. Thank you to Zoe. Thank you to Rui and
see you all next time. - Patrick, that was amazing
. This Direct Lake concept where- - Wait, wait, wait. It was not just amazing
Adam. It was insane amazing. It was absolutely insane.
- You're right, you're right. - Okay. Okay.
- Yes. So but this concept that I
can just leverage the data in the lake directly from Power BI. - So I'm going to be honest with you, I never thought I would see
a day where a SQL compute, you know, a Spark compute
and analysis service compute can use the exact same data that is absolutely amazing to me. - Blew my mind.
- Yeah, blew my mind. - That was amazing. - And then the developer mode. The developer mode. Come on. - Source control capabilities to actually leverage source control. We've been hearing that for a long time. - For a long time.
- From customers and they want this and it's
a reality now that we can do. - And then when Zoe gave us the vision of what's coming with
all of the model view and the DAX query and directly
inside of Power BI desktop, My mind was blown, my mind was blown. - And then some
of the
additional co-pilot items that we can do as well. Doing items directly on the visuals. - It was great. It was amazing. - All right. Patrick, do you hear that? - I hear it. - Knock, knock, knock. That's dataverse knocking on the door to Fabric, it wants in. - Should we let it in? - Yep. Let's head over to Melinda where he's going to show
us what this is all about. - Thank you Adam and Patrick. We're thrilled to announce
the private preview of direct Dataverse integration
for Microsoft Fabr
ic. For those unfamiliar
Microsoft dataverse is the data foundation behind power platform that enables you to store and
manage your business data. Dataverse is also the platform on which Dynamics apps are built. So if you're a Dynamics 365 customer, your data is already in dataverse. This new direct integration
between Dataverse and Fabric eliminates the need to build and maintain data pipelines
or use third party tools. Instead, the data is available in Fabric with just a few clicks. The insigh
ts you uncover in Fabric appear as native dataverse tables. So you can quickly go from insights to building low-code
apps and taking action. Let's see a demo. Here in Dynamics 365, you're seeing details
from the account table. Now this is dataverse
where you see all the data from Dynamics, makers can
launch Microsoft Fabric directly from right here, from the power apps maker experience. Simply select one or more tables and click view in Microsoft
Fabric, that's it. Here's the account table
that
I chose just now. And really I can choose one, two, or as many tables as I want. Notice that Data Wars
has created shortcuts to selected tables. So your data never leaves the
Dataverse governance boundary. Dataverse has also created a
Synapse Lakehouse, SQL endpoint and a Power BI data set just for you. I'm going to choose the
data set and explore data with help from AI, I get
a great starter report. Now I can play with the
data and find insights Instead of days, it now takes minutes to
create g
reat Dynamics reports. As data gets updated in Dynamics 365, changes are reflected in Power
BI reports automatically. Data engineers can work with auto-generated Synapse
lakehouse and the SQL endpoint. If you're familiar with
SQL launch the SQL Endpoint and work on the data right here. Or you can open SQL Management Studio and work with the data right there. You can create SQL views
and store procedures. If you like Spark or Python, you can launch a Notebook
and work with Dynamics data. Now here
's the view we created earlier. We can see the view in SQL Endpoint. I've added the SQL endpoint into Dataverse and the view is available
to me in Dataverse as an external table. Now I can build low-code apps with data from Microsoft Fabric. If you are one of the millions of makers and dynamics users out there, you're likely very excited
to try out this integration. These features are
currently in private preview with public preview a few
weeks away, but why wait? You can register now and
get ea
rly access. Thank you. Back to you Patrick and Adam. - All right, Patrick, this was amazing with Dataverse because Dataverse has been there. That's the foundation of Dynamics. - That's right.
- And being able to leverage that easily inside of Microsoft Fabric is a great addition. So we can, you know, any of
the tables that are there, we can just easily pull
those in and reference them. Again, OneLake just showing us the power of what that
brings to the table. - Again, continuing the low code jou
rney. We're just continuing
that low code journey. I'm so excited about it, Adam. - All right, Patrick, what's next? - Well, Tzvia, James and Kevin are going to introduce us and tell us all about Synapse
real realtime analytics. Stay tuned. - Hey, thank you for joining
Kevin, James and I for sense, analyze and generate insight with realtime analytics at Build. Let's start with some context. In the last 25 years, there has been a revolution in the way that we consume our content
in our personal l
ife. The evolution in technology
leads to new habits of interactive experiences on
demand whenever we want it, whenever we need it without any barriers. And everyone can ask questions
without any limitation. From my six years old son to my parents. And the technology behind
this revolution is accessible for everyone with data set that can store any type
of data at any scale, get updates in streaming
with few seconds of latency. All the information is
indexed and partition, which allows us the us
er
to ask any questions without pre-planning and
get the results immediately. And everything is ramp up with a very intuitive user experience. But in the enterprise world, they are still rely on few
experts to generate reports, to write their queries,
and to ask their questions. All the other people in the organization has a strong dependencies on those experts with long waiting list and outdated data. And the answer for that
in the enterprise work is Fabric Realtime analytics. Fabric, as you al
ready know, is assess data and AI portfolio. All the experiences are fully integrated with one logical copy. One logical copy means that
once you bring the data one time is accessible
to all the experiences to run processes and action
without additional effort. And specific in real time analytics, there is the streaming
capability that provide the information into
Fabric in couple of seconds of latency from ingestion to query. And everything is indexed and partitioned, including structured data,
semi-structured like Jason and Arrays, and also free text like chats. And once everything is
partition and indexed, you can ask any question by
everyone in the organization and get results in subseconds. And also realtime analytics
is fully integrated with the other experiences. So you can run also Notebook
and other experiences on top of this information because it's accessible for everyone with the one logical copy. With Fabric realtime analytics solution, organization can consume tons of dat
a, unlimited scale up
their work with storage, CPU, number of queries
and number of users. On data in motion to
empower business analyst to enable data to everyone
at the organization. From the citizen data scientists to the advanced data engineer. And the most common scenario
for realtime analytics is time-based experiences like
IOT fund and log analytics, including but not only
gas and oil, automotive, cyber and security, smart
cities, manufacturing, and many, many more. This is the most commo
n usage pattern. Gate data for any source
ingested with event stream into KQL database. With one logical copy, the data is available also
to the other experiences like Data Warehouse and Lakehouse, and we can consume it with
Power BI report with Notebook and of course with KQL query set. Now let's move directly to see
an end-to-end demo together with Kevin and James. As the CO O of a taxi
company in New York City, Daphne is interested in
a better understanding the use and defining of
opportuniti
es to utilizing. Our understanding is that she will need to correct all rides at the beginning and run question on the data that she will gather. So the first thing that
she will need to do is to create a KQL database. This is an analytic
database that can scale up to extra bites of data and
thousands of queries and users. It can support structured data, semi-structured and free text. Everything is indexed and partitioned, allowing Daphne to run any query and get fresh results immediately. The f
irst thing that we
need to do is to connect the rides into this
database with event screen. Kevin. - Thanks Tzvia. Event Streams provides the
ability to ingest, transform, and route millions of
incoming events in real time using a simple no code designer. That data can be changed
data capture events, telemetry data, clickstream
events, IOT data, and plethora of other event sources that are constantly being
generated all around us. Let's create a new event stream. We'll call it Taxi Event Stream.
After the event stream is created, you are taken to the no code canvas. You would first start by
adding an event stream source. Here you can see you have
a number of source options. A custom app source creates an
endpoint that allows clients using the Kafka API for example, to send events directly
to this event stream. Azure Event Hub's source will
consume events in real time from an existing Azure Event Hub. Sample data source allows you
to choose from various samples data sets that allows
you
to quickly leverage and test your event stream. Let's select sample data. In the sample source, you can select from either
yellow taxi right events or stock market events. Let's select the yellow taxi sample data. This will continuously ingest
yellow taxi event data. Let's call the source taxi
sample events and click create. Selecting the data preview
tab allows you to preview the incoming events into the stream where we can see the sample
data flowing into the stream. Now let's add a destinati
on. A custom app destination
provides an endpoint where a Kafka client, for example, can consume the events
from this event stream. You can also route the
events into a Lakehouse table or you can route the
events to a KQL database. Let's take a look at the
Lakehouse destination. Sending events to a Lakehouse
will automatically convert the events into Delta Lake format. Let's give the destination a
name, select the workspace, select the lakehouse, and enter in the table name where I want to route
the events to. You can then choose to add
an event stream processor. This will create a
no-code stream processor that can filter, transform, and aggregate events before landing into your lakehouse table. This transformation filtering
can eliminate the need to store extra copies of
your data in bronze format. If we're used to thinking
of medallion architectures in data warehousing. I can change the type of incoming fields. I will change the trip distance
from a string to afloat. I can also only
select the
columns and with the names that I want to use in my lakehouse. You can also add different operators to manipulate the incoming
stream, such as aggregate, which allows you to create time windows with summations, counts,
averages, min or max. For example, group I, which provides merging
or windowing over time. Managed fields, which lets
you do rich transformations with built-in functions. Let's select filter. We'll only include taxi rides that were greater than five miles. We will now c
lick done and then we'll finish
creating the destination. Now let's send the data to KQL database. The KQL database destination provides high throughput consumption
and automatic indexing of all the incoming
events for fast querying. Let's give it a name, select the workspace and the
taxi rides demo KQL database, and we'll use taxi rides table. We'll then go through a short wizard to configure the ingestion. We can see your preview of
what the data will look like as we ingest the stream
into the
KQL database. Selecting data insights
provides monitoring insights for the health of your
incoming event stream where you can determine if
there are any bottlenecks. You can see all of the
yellow taxi data landing in a KQL database. I will now hand it off to Tzvia, who will walk you through the capabilities of the KQL database. - Thank you Kevin for
connecting the taxi rides into the table. Using the KQL database, Daphne will be able to
connect her stream of rides with couple of seconds Latency
and all the information will
be immediately available for growth with the fast
querying response time. Now, she would like to
upload the Locus files with information about the drivers to help her to run better
queries in the future. She can ingest data from different sources and different structure,
different formats, whether it's for OneLake,
Azure storage, Amazon S3, she can just, or just select
it from her own computer. At the background, there is an inference that infer
the most appropriate
schema for this source type. She can keep it as it is
or she can make adjustment with the database or the table structure. She can define whether she would
like to use dynamic columns with form Jason with properties that will allow her later
to easily change the scheme of the table without
changing the table structure, meaning that she can
put different formats, different structuring to one table without updated structure, or she can just change the nested level and get it as a structured data.
Now we are creating
table ingesting the data and we have new table
with new information. This is the database editor, one location to manage
and control KQL databases. Everything is available
with simple user experience and fast response time. I can get into the list of tables to a specific specific
tables to sit in the size, the schema. I can manage the database, I can manage in the table and I can start and run
queries on top of it. I can select to run one of the
pre-generated quarries that a
re the most useful syntax, I can write my own queries in utilize and leverage the
capabilities of these (indistinct) and to see how the
vendors are distributed. And I can also add visualization to make it even simpler for me. And at (indistinct) intake I add. In addition, I can run those queries in
a SQL if I prefer to do it like that, and both of them are supported. The KQL side-by-side to the SQL and I can save it as a KQL query set. The KQL query set is the
one place to run analytics. I can r
un complicated
analytics or signals. I can share it with my
colleagues, with my team members. I can save it to myself for future work and coming soon, we will add also
visualization and co-pilot HV. She has it in one of the Lakehouses, so she can just create a
shortcut into this same database and easily join between
the different tables into one unified analysis and can go here, create a shortcut, select the relevant source
to get the information. And immediately, in a couple of seconds, I have
a shortcut to the OneLake table and I can join between
the different sources. I just moved to a KQL database
that I have created a couple of days ago with high volume,
both the taxi right stream. I can see that I already collected
420 gigabytes information compared to 100 of gigabytes with data and it's very efficient
from the cost perspective. Since KQL database store the information as compressed size and
not the original size. At this point I will open
one of the KQL queries that I have prepa
red and
start to run the query. First, I would like to understand
the size of the data set. I have 1.5 millions of records,
I'm running to understand the distribution over the days this is the distribution of
the ride during the weekdays. Now we'd like to see the time series of the 1.5 billions of records. And I can also see that once
I'm adding regression analysis on top of this time series, it's easy to see that since January, 2014, there is a drop with my company rides. Let's try to understan
d what
is the source for this issue. So I will combine it with FHV table that includes Uber Rides Company, and now I should say that the
drops come side by side today increase of the three,
four rides companies. If I would like to go back
and find some anomalies and tops made in time series, I can run it and find it
and just think about it. I just run an anomaly detection
over 1.5 billions of records in couple of seconds, get the response and I
can easily understand it. I would like to increase
the sensitivity and I can rerun it and
get the results passed. Now that I would like to utilize the performance of my drivers, I will find the most
important parts of the city to located my drivers there. We live in a world enterprise
companies rely more and more on IOT events and log analytics for cybersecurity, asset tracking, customer experience, marketing
campaign, health, and more. As a result, tons of data
are generated at high scale. KQL database and KQL query set were built to empower en
terprises
and highest V scenarios with unlimited scale in storage, CPU queries and number of users. A KQL database can store
data of any format, source or structure. The data is indexed and partitioned. So Daphne and anyone like
Daphne can run queries without pre-planning and
the data is available for in seconds and the result
can retrieve in subseconds or seconds, which contributes
to the high freshness, low latency and high query performance. The data is also available in OneLake and data worl
ds with one logical copy. If you need one or more
of those capabilities, and if you have one of those scenarios, KQL database and KQL query
set is the right choice for your enterprise. Now we will move to James to learn how can we make those insight into action? - Thanks Tzvia. So far, Kevin has shown how
event streams can capture, transform, and route event stream data and Tzvia has shown how we
can create real-time insights from that data using KQL databases. Now I'd like to talk about
driving
actions from your data. After all, your real-time data is only
valuable if you can act on it. This means that once
you've generated insights from your data, you need to convert those
insights into jobs to be done. And if you're like many organizations, you're achieving this today through manual monitoring of dashboards. Now, continually
monitoring a bunch of chart throughout the day can be time consuming. So perhaps you've considered coding up an automated monitoring solution, but coding can be
relatively
slow and expensive and the cost involved is
often just not worth it. That's why with Fabric realtime analytics, we've envisaged a brand new solution for driving actions from data, a solution that empowers
the business analyst to detect actionable patterns in data and automatically convert those patterns into actions without the
need for writing code. We call our solution Data Activator. Here's how it works. Data Activator connects to
any data source within Fabric. It can bring in rea
l-time
streaming data from event hubs and run queries on your KQL databases. It's not limited to real-time data, though you can connect to
slower moving data in warehouses and Power BI data sets too. Then Data Activator gives
the business analyst a no-code tool for defining
triggers on that data. The business analyst tells Data Activator which patterns to detect. Then when Data Activator
detects those patterns, it triggers an action. And that action can be as
simple as sending an email or a Team
s message to the relevant person in your organization, or it could be triggering
a power automate flow or driving an action in one
of your line of business apps. Regardless of the data source and regardless of the action system, Data Activator gives the business
analyst a dedicated place to define their triggers and a
consistent no-code experience across all of these different
data sources and systems. So without further ado, let me show you Data Activator in action. Okay, so I've opened up Data
Activator. You might remember that
Tzvia concluded her demo with a KQL query and a chart that showed the number of taxi passengers waiting for a ride per neighborhood. I've connected Data
Activator to that KQL query and Data Activator is now
bringing in those query results in real time. So what I get is an event stream generated from that KQL query that
gives me a continual feed of the number of passengers
currently waiting for a ride in each New York neighborhood. Let's suppose that a taxi
adm
inistrator wants to get an alert if there are too
many passengers waiting for a ride in any neighborhood. That way the administrator
can direct idle drivers to head towards that neighborhood. Let's have a look at how the
administrator can do that. Now, the first step is to
create a Data Activator object. We want to track the number
of waiting passengers per neighborhood. So to do that, I want to
create a neighborhood object keyed off the neighborhood name. So I do that and I flip
across to desig
n mode where I can see my new
neighborhood object. The next step is to add a property to that neighborhood object, which is the number of waiting passengers for that neighborhood. I'll give it a name,
Waiting Passenger Count. And now the next step is to associate a value with that property. What I want to do is to
associate it with the number of waiting passengers
column in my event stream. So I picked the number of
waiting passengers column and straight away Data Activator
gives me a chart show
ing the number of passengers
waiting for a taxi over time per New York neighborhood. Now what I want to do
is to create an alert if the number of passengers
crosses above a threshold. I'm going to pick 10 as my threshold. And finally, what I want to do is
to tell Data Activator to send an email if that
threshold gets crossed for any neighborhood, let's give that email a
meaningful subject and a headline. Great, and now the final step
is to hit start on my trigger. As soon as I hit start, that's
going to activate my
trigger and Data Activator should start sending me an
email whenever the number of passengers waiting for a
ride in any given neighborhood exceeds my threshold of 10. Let's head over to Outlook
to see if we're getting any emails and opening up my Outlook. I can see that I've already
received an email warning that there are more than 10
people waiting for a ride in the Soho neighborhood. Terrific. So in just a couple of minutes, I've been able to convert a
real-time streaming
data feed into actionable email alerts using a simple no-code experience. As a final step, I'll show you how to build
a data activated trigger from a Power BI report. Here's a Power BI report
that shows search activity for our taxi company. The taxi administrator wants
an alert if there are too many unsuccessful ride searches
in any neighborhood. So I filter the report to show
unsuccessful ride searches. Next, I click trigger action on a visual to begin creating a
data activated trigger. Now I'
ll set up a trigger
that checks every hour for each neighborhood if there are more than 10
unsuccessful ride searches per neighborhood and we're done. And as we continue to
build our Data Activator, we'll be expanding it to include more types of data from Fabric. We'll also be expanding it to detect not just threshold conditions, but many types of patterns over time. All this will be accessible to that simple no-code experience
that you've just seen. Now then activator is
currently in preview, s
o if you like what you saw today, I'd encourage you to
sign up for the preview. You can do that by visiting
the link that you can see on the screen right now. We'd love to hear from you. Thanks very much. And I'll now hand back
to Tzvia to wrap up. - Thank you Kevin and James. I enjoyed the demo so much and I hope that you enjoy it as well. With Fabric Realtime Analytics
brings technology evolution into enterprises. We will be able to have
interactive experiences whenever we want, whatever we ne
ed. Without the barriers,
without to rely on experts, we will be independent. We will have the ability
to ask any type of question on our data and get immediate
response and answers. So if you would like to hear more, you are more than welcome
to meet with us in person at the Experts Zone and in the meanwhile, thank you for joining us and
enjoy the rest of the event. - Woo, that real time,
that realtime analytics. I remember when you had to
stitch so many things together to get this to work. Now
it's like point, click, shift. I'm going to drum up my
inner Christian Wade here. It's like clicky, clicky, drag it, droppy. It was amazing. Adam. - Over realtime data. - Over realtime data.
- That's the key. - and it's just integrated
with all the pieces. Even our favorite tool, Power BI. - That's amazing. All right, next up, Swetha is going to take us home
and look at Microsoft Fabric and some call to actions with a great customer story as well. - Hi everyone. I'm Swetha Mannepalli, senior pr
oduct marketing
manager at Microsoft. I'm here to show you how to
accelerate your data potential with Microsoft Fabric. Now at Microsoft, we not only create products
that revolutionize the industry, but we also take time to listen
to our customer's feedback time and time again to
ensure we shape our products, to address your top concerns and ultimately empower you to succeed. And I'm so excited to show you
how Microsoft Fabric delivers on this promise in new
and innovative ways. Let's start with
the challenges faced by data leaders across
organizations today. In the past year, we have
heard from chief data officers, enterprise data architects, and other data leaders about
their top of mind issues. Companies are dealing with siloed systems, creating data pockets. What some call the dreaded data sprawl with inconsistent datasets
and ungoverned data sharing. They're also struggling like many of us, with how to do more with less
delivering analytics with smaller teams and unlimited skill s
ets. There is always a data delivery
gap between business teams and IT arcs supporting them. And even with these limited
teams and skill sets, IT is continually tasked
with onboarding more and more technical tools and platforms requiring even more advanced skills. With these ever-evolving technologies, it's harder than ever to keep up with the demands of the business. Plus, we cannot forget
how costly the integration of disparate systems is, as well as their ongoing maintenance and purchasing mu
ltiple solutions
cost procurement overhead. It all adds up. On top of all this, the adoption of business intelligence to streamline data sharing
has become critical. Businesses that don't get this right are at a competitive disadvantage
and can expose themselves to serious data breaches
and other security risks. So today's data leaders
must balance their data and analytic needs with all
of these factors in mind to prepare scalable
across lines of businesses for the era of AI. So how did we get h
ere? It's been years, even decades, perhaps a recent organically
evolved data estate where your teams realized
the value of data and began spinning up data sets in every corner of the organization. Your marketing and sales teams
manage customer pipelines and automation with
custom database instances, your supply chain, e-commerce, operations, every division is running bespoke tools on top of their own separate data, whether in data marts, data warehouses, hybrid cloud, and on-premises
databases
or something else. It's all with the good intention
of getting more out of data and making data serve
growing business needs, but it's too much. Data copies get out of sync fast. Reports from one part of the organization that should line up with data from another are showing major inconsistencies. Data can't be combined
for further analysis because different teams
use different data formats and tools don't integrate well or at all. And your lead data steward
is sending you daily emails identifyi
ng one data risk after another. It's enough to keep any
data leader up at night. How does Microsoft Fabric
address all these pain points while also preparing your
data for the new era AI? First, Microsoft Fabric
unifies all that silo data into an intelligent data foundation
for all analytics workloads, and your teams can use existing skillsets with the built-in
familiarity of Data Factory, the next generation of
Synapse and Power BI. Microsoft Fabric also eliminates the dreaded integration tax.
No more costly overhead
with a multitude of vendors and tools that barely work together. You get a unified Lake
First software as a service or SaaS environment with
a single pricing structure making Fabric and easy to manage
modern analytics solution. In Microsoft Fabric, your analytics workloads
work seamlessly with OneLake and enterprise grade data foundation, minimizing data management and delays by eliminating data
movement and duplication. Think of OneLake as
OneDrive for all your data and
with persistent governance, your collaboration headaches are over. Your data stewards can finally manage the layers of access
needed for your business. Microsoft Fabric reduces
the pain of integration and facilitates better
collaboration with a solution that makes it easy to connect and use the different
analytics tools your team need. So to review with Microsoft Fabric, you gain competitive edge with
a Lake First SaaS solution to handle all your data needs. With no data movement. Microsoft Fabr
ic is open at every layer with no proprietary lock-ins. It comes with built-in enterprise
security and governance to empower your teams to responsibly share and collaborate with data. Microsoft Fabric empowers business users with deeply integrated Microsoft Office and Teams experiences. Microsoft Fabric also
delivers AI co-pilots to accelerate analytics productivity, help you discover insights
faster and better prepare your data for your custom
AI enhanced solutions. Now, without further ado,
he
re is a sneak peek. (gentle music) - [Narrator] It's time to empower people to activate the potential of data to bridge the gap between
data and intelligence. Introducing Microsoft Fabric, an open and governed
human-centered solution that integrates all your data and tools. With Microsoft Fabric, data engineers can visually integrate data from multiple sources. Data scientists can
model and transform data. Data analysts can bring
together more data sets and enable deeper insights. Data stewards
can govern
data and eliminate sprawl. Business users can work
directly with data, uncovering and shaping
the intelligence they need to make critical decisions
that drive innovation. And by breaking down every barrier and equipping every data professional with access to every tool, your organization creates
intelligence faster. Now your teams can all work seamlessly from a single data foundation
in Microsoft Fabric with consistency across all
your analytics workloads. AI powered features like
co-
pilot help your teams connect data sources, build
machine learning models, and visualize results
at the speed of thought. Intelligence securely
flows to the applications where people work, helping
them make better decisions for transformative impact. And persistent governance
helps ensure compliance and security when users access
and collaborate with data. Empower your teams, unlock
your data potential, transform your organization,
Microsoft Fabric. - With that, we have come full circle on why e
very organization needs
a unified data foundation and how Microsoft Fabric can help you achieve your data goals. In addition to showcasing
this unified solution, we also want to help you
envision how Microsoft Fabric will support your organization through every phase of the
data lifecycle journey. To that end, we have identified
four fundamental steps. First, you will want to
unify your data estate. Next, you will build analytic models based on your business needs. Third, you'll employ data gove
rnance to responsibly democratize analytics across your organization. Finally, you'll scale
transformative analytics and AI powered applications
to drive innovation and competitive edge for your company. Now let's dive deep into
each of these areas. Starting with unifying your data estate. We all know that deriving value from data is top of mind for any organization. With Microsoft Fabric, you will unlock more potential
from your data sources and improve analytics
efficiency by moving away from
proprietary to open standards. The stats show that 55% of
companies have manual approach to discovering data within
their own enterprise. While 81% of the organizations
have increased their data and analytics investments
over the past two years. The core principles here are simple. To drive a unified data
estate for analytics, you need comprehensive and
accurate data integration with reliable data preparation. Start streamlining data integration with a Lake first approach
and improve data qualit
y and consistency with no redundancies. By doing this, you will increase
flexibility and scalability to meet changing business needs. Instead of going back to the drawing board whenever a new requirement comes up. For data preparation, you will simplify the process
to reduce time and errors, empowering you to drive
more meaningful insights and decision making with
intelligent transformations. Now that we have seen the
power of unifying data, let's move on to our second pillar. Build fit for-purp
ose analytics models. Analytics models can be the curated layer to serve your data warehouse needs or predictive models that inform your future business strategy. All feed off of your Lake
First Data Foundation. Build your fit for
purpose analytics models establish a single source of truth by building your models on
the Unified Data Foundation. This will reduce the overhead and risk of unnecessarily moving data, helping achieve cost and
performance efficiencies. Start the paradigm shift by
incre
mentally modernizing towards a lake first pattern that serves as your foundation to
build your data estate. By doing this, you will eliminate data silos
to enable quicker access to insights by data professionals. This will enable you
to optimize data estate for complex queries and analysis with semantic data science models. This will create opportunities
to leverage trend analysis with historical data using modern cognitive service integration. So far we have covered two pillars. One focused on
a unified
data foundation with OneLake and the second covering
sophisticated data models. Now let's explore the third pillar, which is responsibly
democratizing data and analytics with best in category,
governance and security. Data governance is the glue
that bridges the discovery of data to the derived business
value data represents. Without data governance, you cannot responsibly share data with the teams that need it. If you do not responsibly
democratize data, you cannot accelerate data val
ue creation. Microsoft Fabric unlocks
this critical capability so you can operate with confidence when sharing data and insights. Data governance is a foundational pillar that fosters a data
culture by enabling access to the right data for the right users. When you automate data
governance across the enterprise, you can easily comply with regulations like general Data protection regulation, which is GDPR, HIPAA and more. You will start gaining
insights into sensitive data and analytics across yo
ur data estate. Responsible data democratization makes data easily discoverable, and with right access control, you will enable right users to consume the right
levels of information. It'll enable you to provide
near realtime responses with business changes. Align teams with a single
unified source of truth. Enable secure democratized insights across your organization. Leverage a powerful solution
with flexibility and cost and usage by enabling
better decision making for transformative impact. I
ntegrate your services
with an open, secure, and governed foundation. With all the goodness you have seen so far in terms of how Microsoft
Fabric can support your ambitions to unify, scale
and manage your data state in the era of AI, time is now. In summary, by leveraging
Microsoft Fabric, your teams will be better
aligned by no longer needing to piece together various data, analytics and business intelligence solutions. Establishing a center of
enablement for every user will help you democratiz
e your insights and achieve faster time to value. Plus, you won't have to worry about sacrificing the
integrity of your data estate because Microsoft Fabric comes equipped with built-in security, governance and compliance capabilities, giving you peace of mind
as you grow on scale. Throughout this journey to public review, we have collaborated closely
with customers and partners taking into account
feedback and learnings. We have the pleasure
of having a few of them share their excitement
with a
ll of us today. (gentle music) - We've been using these
Microsoft platforms for several years now. Now with Microsoft Fabric
being able to utilize all of those various services
within a single user interface that's very intuitive, it's very clean, it's got the familiarity
of Power BI in it as well, which is easy for us to adopt. So Microsoft Fabric provides
some opportunities for us to improve our governance processes. - Microsoft Fabric elevates
our process by reducing time to delivery, by remo
ving overhead, by using multiple desperate services, by consolidating the
necessary data provisioning, transformation, modeling
analysis services into one UI. Microsoft Fabric meaningfully
impacts Ferguson's data storage, engineering,
and analytics groups. Since all of these workflows
can now be done in the same UI for faster delivery of insights. - Our customers priorities are heavily interrelated with data, and a lot of those organizations
are really struggling with managing their data effecti
vely. Fabric has the benefit of
reducing data integration tax and unifying hybrid and
multi-cloud data estates, and we're very excited to work
with Microsoft on combining these capabilities with
our KPMG One data platform to help our customers
accelerate their data potential. - What excites us is bringing
a single enterprise grade foundation that will underpin
customer use cases across AI, machine learning and analytical domains. Microsoft Fabric will help
Avanade elevate customer conversations
to a more strategic level. We have a far expanded toolkit to address customer challenges. - Informatica is excited
to be the design partner of Microsoft Fabric, working extremely closely
with the Microsoft team, Informatica's Intelligent Data
Management Clouds integrated with Microsoft Fabric
experience will address critical customer challenges across
enterprise data management, data quality, data
integration, data governance, data catalog, and data privacy, real time, ELT change data capture
re
plication into the open and govern OneLake into
the Delta Parkade format is game changing. - We at PWC lead a lot
of programs involving redefining business
processes for our customers, supporting some very critical
mergers and dives as well, and these programs run
under a very tight timeline. With OneLake, the implementation
becomes very simple, by avoiding multiple data hubs, we enabled a data platform
build much quicker and deliver insights to
business much faster as well. The pervasive data g
overnance
is another key aspect that is reduces the risk from the overall analytic
platform build itself. Further, infusion of generative
AI into Microsoft Fabric is going to be a game changer. - I hope you are ready to
embark on this journey with us. Start by visiting the Microsoft
Fabric website and sign up for your free trial or by reaching out to a Microsoft representative. Thank you for tuning in today and have a great time
learning more at Build. - All right, Patrick. That's a wrap.
- It's
a wrap. - There were a lot of sessions, a lot of content learning
about Microsoft Fabric. - All the sessions, Adam, all the sessions will be
available on the Build website. - And they're on demand.
- And they're on demand. It's amazing. Also go over to
microsoft.com/Fabric to learn more and get started. Speaking of which, Patrick, we got to go. We got to start building
out some Lakehouses. - I cannot wait to get
my hands on this preview. - All right.
Comments