- Two things as I get started here. Number one, just a
sincere, sincere thank you for spending your time with us. We recognize you have a lot to do and we're so grateful. Number two, I want you to
pretend like it's you and me for the next 20 minutes, and we're gonna go and demo. I don't have any slides to show you, but we're gonna go through a demo. Now, I'm actually gonna demo on my data, like, my real data. Sometimes it's gonna mess up, and sometimes it might even show you stuff I'm not suppos
ed to show you. If that happens, I'm gonna
tell you to avert your eyes over to some wall
(audience laughs) or something like that. So let's get started. I am gonna start in a
very, very basic place and take us all the way back to ChatGPT to get started. I recognize that in this
room, we have a lot of people with a lot of different
levels of experience with the technology itself, and I'm gonna start
from the very beginning. Think back to the first
time that you saw ChatGPT and you saw it do somet
hing,
perhaps a party trick. We're gonna start there.
(audience laughs) I also will go back and forth between a whole bunch of different prompts that I'm gonna be using. That's mostly so you don't
have to watch me type. So let's talk for a moment
about GPT, what's called 3.5, and I'm gonna ask a very simple question: What can you tell me about elephants? Now, the reason I do this is I just wanna explain conceptually, just to review for a moment, what are these large language models, what we toda
y call foundational models? In essence, what they are is that they have taken the information that's been organized
into knowledge by people through the written word and they have processed that written word and created a conceptual
view of the world. So now, based on the tremendous
amount of training data that we've stuffed into these things, they can tell you about
just about any topic that an educated college university grad would be able to tell you about. So let's start with what can
you te
ll me about elephants? Now to be clear, we never trained this
specifically on elephants, we just gave it a lot
of different materials, and it turns out that
elephants were mentioned, and so as you can see here, it
knows a lot about elephants. It can tell me a bit
about species, about size, about lifespan, physical
features, social structure, and basically, you should think that the neural network that is an LLM has taken, again, the written word and tried to conceptually
create an understanding
of the world around us. It's based on that written word. Now, over the last 12 months, what we've learned about these LLMs is that they can not only spit out facts, they can actually reason. Sometimes, in the industry,
it's called inference. I like to think of it as simply thinking. They can not only tell you about a topic, but they can apply their
knowledge about many topics to solve problems. Let me give you an example. We'll move from a simple
prompt to this prompt: Could an elephant pull my
Tundra? How would it work? Well, I drive a Toyota Tundra, and I've always wondered if
an elephant could pull it, and here we will ask it. Now, one of the most amazing things is that first, it understands
that we're not talking about some large landmass, the tundra, but a Toyota Tundra, it understands the context
of what's happening, and then it gives me its reasoned answer: "It's highly unlikely that an elephant could
pull a Toyota Tundra or any other modern vehicle, on its own." And then it goe
s on: "Here are a few reasons why an elephant couldn't
effectively pull a vehicle like a Tundra," and it gives me its view, traction, weight distribution, vehicle resistance, practicality. It goes on and reasons
across what it knows about the various topics, puts those things together, and does its best, to say: "Even if it were somehow possible for an elephant to move a car, it would not be practical or a safe method of transportation." Thank you.
(audience laughs) I've never thought of that be
fore. As we've been working with these models, we have been understanding what they're good at reasoning
about and what they're not, and we'll talk a little bit
about that in just a moment. Okay, now let's ask another question. Here, we'll get to more of a
factual question for a moment. I'm gonna go to December, 2023, and I'm gonna ask a little
bit about the jobs report from the BLS, the Bureau
of Labor Statistics. Let's see how it answers. Hmm, that's interesting: "I apologize, but I don't have
access to real-time or future data." It turns out that it's
knowledge cutoff date, its training date, was January of 2022, so it can't tell me much about what happened in December of 2023. That was the state of the art
roughly about 12 months ago. We had the equivalent of a
college-educated thinker, but unfortunately, it didn't have access to current or recent information. Well, we addressed that with something that we call grounding, or sometimes it's called
retrieval augmented generation. To
show you that, I'm
gonna move over to Copilot, where I'm gonna issue
that same exact query: What can you tell me about
the December, 2023 jobs report from the Bureau of Labor Statistics? I'm just at copilot.microsoft.com, you can try this yourself, and we'll see something happen
here that's very interesting. It will say it's searching
for that particular report, it will find in that report, and then it will give me
its view of the report, and if you go look at the source here, it went to the act
ual bls.gov
website to find that source. You can see, for instance, that total nonfarm payroll
employment rose by 216,000, that the unemployment rate
remained unchanged at 3.7%. And it has references
which are very interesting that take us directly to some source data that we can look at. This pattern that you just saw is what is called retrieval
augmented generation. It's the idea that we can take what essentially, effectively, is a brain that can do some thinking, it's fairly well educated, we
can go out and get
the most recent data, we can pull that data together in what we now call in the
industry a context window, and we can take the question
and the context window and push it against the
LLM and ask it to reason, and that's exactly what we get here. Now, this opens up all sorts
of amazing possibilities. We can apply that basic
structure, that basic pattern to business. We certainly can apply it to your sector, to the work that you do, the missions that you pursue. It's an interes
ting thought exercise to think, "What would I now
do with synthetic reasoning," we like to think of this as a general-purpose reasoning engine, "and my data, what
problems could I solve?" All right, let's continue, then, over the course of the last few months and see what has started
to become of all of this. So now I'm gonna give
it a different question: Can you give me a graph of the jobs report from the Bureau of Labor Statistics over, I think I asked,
the last eight quarters. So we will ask
that question. The way it handles this is
actually changing right now, every day, it's a little bit different, but it will go out and
find that information that we're looking for, says, "I can provide you with a summary. However, I'm unable to generate
or display graphs directly. You can find detailed
graphical representations of employment data," it gives me a reference there. But you can push it, and here's how you push it. Wait a second.
(audience laughs) I thought you could code. Can't you g
enerate the Python code that would take that data and graphically represent it? (audience member laughs) Please?
(audience laughs) Okay, see what it does here. So it goes out, RAG
pattern, grabs the data: "Certainly! To graphically represent
the jobs report data from the Bureau of Labor Statistics over the last" blah blah, blah, "you can code." And it turns out it just
remembered it could code. (audience laughs)
It can code. Sometimes you have to remind it of things. It has something that
is cal
led code generator that you can think of as a new skill that has now been appended to this overall thinking
general purpose reasoning engine that it can call out to, and it has generated here, and not only generated, but told you what you
need to do to run it, the Python code required to
actually create that plot. "This code assumes you
have a DataFrame named data with the correct information. You'll need to replace the example data," et cetera, et cetera. Our next step is to actually
automate t
hat process. Right now, it's a little bit manual. Now, if you pause for a moment, that's a lot of progress
over the course of 12 months. We've discovered we have a
general-purpose reasoning engine, it started on kind of
essentially a fact base, we can send it, using this
RAG pattern, any information, and it can reason over
it and give us answers, and now, all of a sudden, we
can augment it with new skills. It can make a call, essentially, almost like a centralized brain, taking a question, takin
g an outcome and making a call to something that is more
specialized to do the work. I'll show you just one
last example of this. It is using a very
specialized additional model that allows it to actually draw pictures. So here, I'm going to ask it
to create a picture for me: Create a picture of the
workplace of the future. I wanna show an open floor plan with individual workstations. I'll just get that started as we go. And what this is doing now is it's first parsing that information and it's
giving us a sense of, okay, I can try to create that. It will then... Don't worry, that's not the picture, that's not the picture.
(audience laughs) And everybody gets really worried
at this point in the demo. It's like, but that
doesn't look anything like the workplace of the future, what's going on? And actually creates a couple of workplaces of the future. Now, this is a little bit
dystopian for my taste, I'm not sure I love it, but it gives me a couple
of different options to choose from. Lo
oks very futuristic, I
have to admit. Very nice. And you can continue to go back and forth in what we think of as
an iterative process to get to where you want to be. All right, that's the
basics of what we see, and if I just go through
the concepts we have, essentially, the underlying LLM model, we've learned that it can spit out facts, we learned that it can reason, we learned that with the RAG pattern, you can feed it additional data, we've learned that with
this augmentation pattern, you can
feed it additional skills. Wow, all of a sudden, you
have a little bit of a system that can do pretty amazing things as it relates to your
organization and your mission. Let's take you over, then,
into Copilot for Microsoft 365. Here, we're gonna see those
same types of concepts, but applied to your data. In this case, it'll be my data. Remember, if I say avert your eyes, half of you look that way,
half of you look that way. I'll just start very quickly
with a general prompt here: What's the la
test from, let's think for a moment, Alexia, I did a forward slash so that
knows who I'm talking about, Alexia Cambon, who's on my team, organized by email, let's do chat, and how about documents? Okay, Alexia works for me,
she works on a research team. She actually lives in the UK, and she's been hard at work looking at the quantitative
and qualitative measures of what Copilot can do. It is now going out to
use that basic pattern to grab- Well, there we go. "I'm sorry, I couldn't
find any recen
t emails, or I found a couple of recent documents authored by Alexia Cambon. One is an Excel file, and it looks like the other
is a PowerPoint presentation." That's interesting. Wait a second, (audience laughs) I thought you could find emails. Can you search my email again? Sometimes if you tell it to try harder and give it a little bit of encouragement, (audience laughs) and I'm not joking, it
actually does, it does, and we're still trying to figure out why that's the the case, but it actually
does something different. (audience laughs) There you go. Wait a second, oh, "I found several recent
emails from Alexia," it says. "The most recent one was
sent yesterday evening." Now that's interesting, isn't it? You know, this is much more like a person than you'd think
(audience laughs) it turns out, uh-huh. All right, what you see here, however, is that it's going into your information, finding that information, giving you a summary of that information, then giving you references
that you c
an go to to find the actual information. This RAG-based approach, this pattern, doesn't entirely remove hallucination, but it does mean that it
hallucinates a lot less. We talk about it being
sometimes usefully wrong, which is it doesn't like
to just make up things in the context of what you send to it, but sometimes it does get things wrong, and so we tell people, continue to use your brains (chuckling). You have to continue to look
at what's coming to you. Okay, here's one of my favorites to d
o. I do this at my own peril: Can you summarize the last
five emails from my boss? (audience laughs) All right, to be specific, look for the last five emails from him over the last 24 hours, and stack rank them in order of most important
to least important based on your analysis of the contents. This is always a really good
view of how my boss is doing in sending me important things. All right, it should access my email here, we'll see how it does, and if and when it finds that email, if it unde
rstands that my
boss's name is Takeshi Numoto, he's the chief marketing
officer here at Microsoft, says, "I did find five
recent emails from your boss over the past 24 hours. Here's a summary." And here you go. He sent an email with the
subject Video Gen Product. We're talking about video
generation right now. He sent an email about Copilot Pro and Copilot for Microsoft 365. That should not surprise you at all. He sent something about some deep immersion
sessions we're doing on AI for science, c
hemistry, and biology, and it looks like we
have some security work that we're doing. So you can get a sense
for your ability, now, to perform tasks that perhaps
you've never seen done before. For many people, this idea that you could query your information using Copilot like this is just totally foreign. I'll show you one last one here as we move on to one or two other things. Let's try this. Oh gosh, there're such good ones. Let's try this: Review my meetings this week and create five to seven
categories that describe how I spent my time. For each category, provide
a short description and give me an approximate percentage of time that I spent there. Again, people don't even think
about these types of prompts being how they would do their work. Now, this particular prompt,
we're still working on, it's not perfect. Copilot is learning how to think
about time in your calendar and how to do math, all sorts of things, but "Based on the information I have, you have over 50 meetings
schedul
ed for this week. After analyzing the subject
and content in your meetings, I've categorized them into the
following five categories." I do have to travel, it turns out, that's not so interesting. I do give speeches and presentations, I meet with my teams, I have customer engagements, and then let's see what the fifth one is, I'm kinda interested. Personal. Okay, personal activities,
such as Seattle Temple. There you go, I'm a religious man, I went to the Seattle temple
this morning, it turns ou
t. But wow, pause for a second, pause for a second. General-purpose reasoning engine, any data that's impressive if you think about its
ability to do things, very impressive. Okay, we're gonna keep going. I'm gonna now transition from just the generalized chat experience that you're seeing here over into some of the apps. My favorite app to show is Teams. I like to show this particular app. Oh, my gosh, it's gonna
blow up on me here. I like to show, let me see if I can just, okay, if that comes
back up,
we're gonna try and silence it. I like to show this particular app because we're looking now at
what we call a meeting recap. This is a fictitious meeting, so I really don't leak trade
secrets out to you here, a fictitious meeting
that my team recorded, it's 30 minutes long, where they were talking
about the launch of a product that we called Project Falcon. Now, this recap was
based on the recording, which I can access here. I can see the speakers who were involved, and get a view for
when they spoke. I can look at the topics
that were discussed. I can even get, if you're old enough to
remember the DVD chapterization, you used to be able to skip
between chapters in DVDs. Yes, I know, hard to
remember, but here it is, together with a timestamp
associated with those. It gives me meeting notes, it gives me a summary of follow-up tasks, but the best thing about this is your ability to query
Copilot about the meeting. Just think of the meetings you won't have to go to any longer.
I love it! I love this! Okay, so I have some saved queries here to show you what this is all about, and we're just gonna look
at a couple of them quickly. For instance, I can ask it simply: Can you list the actions
and owners in a table from this 30 minutes of discussion? It says, sure, you know, "Here are the the table
of actions and owners that I extracted from the text." I asked another interesting question, was there any tension in the meeting? How did you detect that tension? (audience laug
hs) "Yes, there was some
tension in the meeting. I detected that tension
by the following clues: Jason's reaction to Marlene's suggestion to delay the launch by a month. He said, 'Are you kidding me?' (audience laughs) and 'We can't delay now,'
in a frustrated tone. And then it goes on, Callie's agreement saying it would throw
everything out of the window, you know, some really, let me see if I can, oh boy, sorry. that might come up a couple times, showing that it really
understands human emotio
n. Now here's a really interesting one. The whole idea behind this meeting was my team talking about
when was the best time to actually announce the new product, so I can ask Copilot, based on the transcript, when would you recommend
that we announce the product, and why? And it says, "Based on the transcript, I'd recommend we announce
the product on September 3rd because it would be ahead
of the competitor's launch on September 5th. It would give us enough time to finish product testing
and cer
ts by August 29th, it would allow us to have
a full day of rehearsals with Paul, our CMO, on September 2nd, it would avoid any potential
issues with Amy, our CEO, who's traveling to Tokyo and has a board meeting on September 4th." Wow, right? Wow! Sometimes we, and by the way, this is
stochastically statistical, it gives you different answers each time or different reasoning a little bit, but sometimes it actually will even say, "And I'm worried about Amy's jet lag." No joke,
(audience laughs) I
can't make this stuff up. Like, wow, I never thought
about my CEO's jet lag, it doesn't even occur to me. But interesting to see it reason across 30 messy minutes of
people talking to each other to get to a business outcome. All right, my time's
almost up here for demos, so now I'm gonna move
you over into my email. Yes, this is my email inbox. Here, we're gonna look at a thread that is fictitious, though. This is a thread like everybody gets, somebody sends you something, somebody replies, som
ebody else replies. It's only been 15 minutes, and you feel like you're
three days behind. That happens all the time. Well, my favorite thing to do is to hit this little
button, Summary by Copilot. You're gonna love this button. Summary by Copilot will
go in across this thread, it uses that same RAG pattern, retrieval augmented generation, and it will summarize what's
happening in the thread. Says Leslie's completed
the first draft of a blog, she's requested feedback from these folks. Theresa di
d this. A meeting was set up, Theresa did that. Leslie is now waiting for the
thumbs-up from Jon Friedman before sending to Jason for
posting on the day of the event. Not bad. Very good summary. Now when you want to reply,
you can simply hit Reply, and after reading the thread, it actually will give you a
couple of different options here on how to reply. You can approve it and praise them, you can suggest minor edits, you can request major changes, or you can do a custom prompt. I'll do a custom
prompt. Approve it, tell them great work, be warm, (audience laughs) be a little funny. (audience laughs) That's good enough for me, yep. Here we go, we'll ask it to generate, and it's gonna go away and generate. Now, here's the truth of what I'm experiencing as I use this, I'm finding that those
things I need to dash off really quickly, no reason for me to use
Copilot, I could do it quickly, but the places where I
have to engage brain, I need to be particularly sensitive, I need to think about
, you know, how I would deal with an issue, hey, this is very useful. Here's a quick thing it came up with. I just looked, "I just reviewed the blog
and it's looking great. You all have done fantastic work and I give my final sign off. Keep up the amazing work and let's celebrate with
some virtual high fives." That's not as funny as I wish it were. (audience laughs) Can you make it a little longer and a little funnier, please? Try hard. (audience laughs)
Okay, I don't know what it's gonna
come u
p with now, we'll see. We'll see what happens. It's kind of a little bit
of a game too, you know. "Wow, this blog is a masterpiece. (audience laughs) You all have done a phenomenal job of showcasing the Copilot design system," and blah, blah blah. "This is a definite win for" blank. Hey, it knows my boss. "I'm sure Takeshi will be thrilled too. I hereby give my final sign off and a big round of applause." It still kept the high fives, I think I need to talk to it, "some real treats. "How about w
e order some pizza
and cupcakes for everyone? You've earned it." Mm,
(audience laughs) I don't know, I don't know. What we are doing, however, is one of the newest features, is something we call Sound Like Me. This allows it to read
your previous emails that you have written and then to take your style
and to write in your style so that it can sound like you. After having done some
tests with Sound Like Me, I'm not making this stuff up, we've actually run some qualitative tests with Sounds Like
Me, with human graders who don't know what's
been written by the person and written by the Copilot. It turns out that those human graders like the Sounds Like Me
versions of the people better than the people. (audience laughs) I don't know, that's what it looks like. All right, one very last
one, and then we'll be done. Here, I'm gonna show you very quickly, a long, frequently asked
questions document. It's some, it's longer than 34 pages. I think this is some 70-odd
pages long as it loads up he
re, and this literally is
something that we used in the early days of Copilot, where we were compiling
the questions people had about the product, and we were just asking them
to give us their questions and we would give them answers. It's interesting and long, but watch this. You can then use things like this, and I'm just gonna
reload this for a moment, sometimes it times out on me, to, in a blank document, ask Copilot to do some thinking for you. And so the first thing I'm going to do is find
that document, so I do a forward slash and just see if it will
pull up the document for me. There it is, Field Sales FAQ, so it gives, I give a link to it, and then, so you don't
have to watch me write, I will go down and grab this prompt, and we'll read it together: Referencing this document, create a 10-step deployment
guide for IT administrators for Microsoft 365 Copilot. For each step, provide an
overview of the work required, an explanation of why it's important, and a detailed set of task
s to complete. Please generate. Now, it is gonna take a moment. This is, like I said, a
70-plus-page document. Today, the context window can be as big as 300 type-written pages, so it's not small, and you can take that context window and shove it into the LLM and ask it to do work for you. It will take just a moment here for it to go through the entire document, but I'm asking it to
take a source document that had something to do with my request, but you actually have to
apply some brain power t
o it and come up with something else, and here you go. Deployment Guide, a 10-step
guide for IT administrators to enable and manage Microsoft 365 Copilot in their organization. It writes me a real nice introduction. Step 1, review the early access program terms and conditions. We'll see it go down to these steps here. Step 2, I have no idea what
it'll be, but it'll pop out. Assign Microsoft 365 licenses to users. It's really thinking, right now, but essentially, what you can see it doing is that
it's taking that information, it really is processing it, and it is going to do the work. If I had someone on my team do this, there is no doubt it would
take them an afternoon to do, it's 75 pages, and if we waited, typically, this will take about two to
three minutes to be done. Wow! Wow, wow, wow! We are entering a new era. Now, as a reminder, it is not perfect, it does make mistakes. It's really interesting to me that you have to encourage it right now. We'll try and do something about that
. So we're still learning about
the technology ourselves, but as you can see, this is
powerful, powerful stuff. When you put it into
practice in real places, we think that it's gonna change the world. We're excited to get it into your hands so that you can change the world with it. Thank you very much.
Comments
My outlook account is hacked and whenever i tried to recover it. Microsoft shows it's indigence that we can not send the otp on that no. What rubbish....? How it will be solved...?
This gives me the same vibes I had watching Bill demoing the first versions of Windows, amazing!!
Every game in your store sucks because of whatever it actually is. It ruins every game. Take anything that should be random with endless patterns, not with artificial intelligence. Nope, you get the same unrealistic patterns over and over. With every card game, you get the rarest occurrences happening multiple times a game. I've never seen video games play so poorly and so dumb...
Did alien already landed