Enhance your solutions with new Azure AI products and features

[MUSIC] MARCO CASALAINA: Hello everybody, and thank you for joining us on this, the last day of Build, the last breakout session of the last day at Build. I guarantee you; I'm going to make it worth your while. I'm Marco Casalaina, and I am VP of Cognitive Services at Microsoft. Cognitive Services includes the Azure OpenAI Service. It also includes vision, language, decision, content safety, and responsible AI and Speech. We have all these capabilities and we're going to be taking a look at a wh

ole bunch of them today. Today's session is about the new capabilities in Azure AI, and how you can use them to bring new types of applications to your customers. When we think about Azure AI, what is Azure AI? It's a complete suite of services that's made to be accessible both to developers, people without a data science background, or to data scientists, people who want to be able to build their own machine learning models. Increasingly, it's repeatable. That means that nowadays, you don't eve

n necessarily need to train a model anymore. In the past, with AI, you had to train up all your own models and stuff like that. That's not so true anymore, and we're going to see it in a minute. Finally, it's made to be responsible right out of the box, so that your AI behaves responsibly and doesn't go off the rails in front of your employees and your customers. Now, this is the layout of the Azure AI Services. But rather than talking to this slide, we are going to leave slide land. Give me jus

t a moment. My computer locked itself. That's not awesome. There we go. We're going to leave slide land and we're going to go straight into the product because this is going to be much more fun. Here we go. I'm going to start with Speech, and one of the new capabilities of Speech. Now, I have a custom neural voice of me in Speech. I'm just going to play this here and you'll hear. SPEAKER 1: "The bear said, I am so angry." Suddenly a fairy voice appeared. "Don't be angry. I am here to help you."

MARCO CASALAINA: That definitely sounds like me, because it is me, and even my mother thinks it sounds like me. But that was pretty flat. It really lacked emotional affect, there's nothing there. We have this new feature now; it's called auto predict. I can auto predict on this stuff. Now, as you can see, it's actually added some emotional content here. I'm going to put it back at the beginning and let's listen to that again. SPEAKER 1: "The bear said, I am so angry." Suddenly a fairy voice appe

ared, "Don't be angry. I am here to help you," "Really, you can help me? That would be great," said the bear hopefully. MARCO CASALAINA: One of the themes of this presentation is that these models, these AI capabilities, are going multimodal. ChatGPT has kind of conditioned everybody to believe that it's just a text in, text out interface. But already, that's not true. Already, we have capabilities like Speech, and some of the vision things that I'll show you later on, that will allow you to han

dle all kinds of different content, not just text. This is just one of the capabilities that we have in Speech. There's lots more, but I don't have time for all of that right now, because we got lots of Azure AI to cover. The next thing we're going to take a look at is language. Before I did this session, I did a dry run of it. I took the transcript of this dry run and I put it into this new capability, which is the summarizer API. In language, we are releasing our new document and conversation

summarizers. Here we have the transcript. This is from the file that I loaded in, and I set it to do the Chapter title and the narrative. Basically, it's chapterising, it's breaking the meeting up into different segments. Then, it writes a little narrative about that. Here we see how it broke it up, right on the screen. In fact, we are in step 2 right now, summarizing a 30-minute meeting with language, which gets me to this. Why would we do this? Why would we make a summarization API when, as yo

u probably all know, you can totally summarize stuff with GPT, with the Azure OpenAI Service. Well, there's a really good reason for that. Well, there's two, actually. One is that this summarization API does what it says. It says what it does. You don't have to do anything with prompts or anything like that to make it work. You just say, I want to do a conversation summarization and it does it; it does it in a much more inexpensive way than the GPT models. But moreover, let's take a closer look.

You said summarizing a 30-minute meeting with language. Well, this was actually about a 45-minute meeting. You'll notice that it has 31,000 characters in it. The token limit of most of the GPT models is about 16,000 characters, give or take. It's actually measured in tokens, but it's roughly 16,000 Latin characters. That means that I couldn't really feed this whole conversation into GPT and expect it to be able to summarize that, or I could into GPT-432K, which is the top-end model. Now, that's

an expensive way to do it. This thing has a much larger character limit. You can feed meetings into here, or conversations in here, that are as long as like an hour and a half or two hours and it'll still work. There is a place here for what we call task-specific models. That's what we have here. Before I move on, because it's about to get really interesting, I would note that there is this QR code over here. I think many of you are familiar with them now. As I go through this presentation, the

re will be time at the end for questions. If you'd like, you can feel free to use that QR code to ask questions, and I will take them at the end of this. Next, we're going to get to Form Recognizer. Form Recognizer, if you've never seen it before, Form Recognizer, well, it also does what it says and says what it does. Here is a form, and it's recognized it. This is a form of a bank application. Into this application, I have hand written some of my own information. Some of this is not real. My mo

ther's maiden name is not "Jones." But what Form Recognizer does, it's not just OCR, it reads the document, and then it divides it up into key-value pairs. That's what it's done here. You'll notice when I mouse over this that it's discovered that this is a key called "Surname." It's found correctly my handwritten name here, Casalaina; that is, indeed, my last name. Now, one of the challenges that our customers used to have with Form Recognizer that we have just recently fixed is, consider this,

it says "Surname." There are lots of different ways that you can say last name, you can say surname, you can say last name, you can say family name, there's all of different ways that you can say that. For our customers that were using Form Recognizer on a diversity of forms, a lot of times they said things in different ways. That made it difficult to map that to whatever workflow or database they were trying to send it into. Now, you'll notice that there's a new key here. It's called "CommonNam

e." What we'll do here is we will attempt to map this to a common name. Whether it says surname, or family name, or last name, we will emit this extra piece of metadata that says LastName. You can take that and map that in without having to do that mapping manually. That's Form Recognizer as it is today. It recognizes forms. What if the content is not a form? What if it's just free text, a contract, something like that. You want to extract information from that. Well, something's coming for that

. This here, let's see if I can get this to scroll down a little bit. There we go. This is one of my own documents, actually. This is, I was buying a storage unit. I live in San Francisco, California; I was buying a storage unit in San Francisco last year. This is the grant deed from that. Now, this grant deed is clearly not a form, so it's not arranged nicely like that other form in key value pairs. The information is in the PROSE here. We have this new capability called "query fields." Really,

what's happening here is that in the background, Azure Form Recognizer is orchestrated to Azure OpenAI. I have defined four fields here that I want Azure OpenAI to pull out from this document. The buyer, the seller, the transfer tax, and the city. When I ran this analysis, as I did just before I came up here, you'll notice along the side here, it did indeed pull this information out. The buyer is myself and my wife, Karen Bird, the city, San Francisco, the seller, SLATS Investors Three; that re

ally was the company I bought it from; I didn't make that up. And the transfer tax 60 bucks. All of that is embedded here, in this paragraph. None of that is in key-value form. Yet, I can use this capability to derive that structure from this unstructured data and drive downstream workflow. Orchestration: that's the other theme of this session. The first theme is multimodality, the second theme is orchestration. From this point forward, we're going to be talking about orchestration, because the

next generation of AI applications will all be orchestrated, it won't just be a single model. Consider Bing, for example, I'm sure all of you have used ChatGPT. If I press this button in Bing, it's going to do something different than ChatGPT does, and you can see it already. Actually, Bing is nice about this, because it tells you exactly what it's orchestrating as it does it. It's going to go give me a response and I'll just let it keep doing that, but what I'm trying to show here is this. If I

were to ask this of just raw ChatGPT, not orchestrated to anything, the model by itself would give a response. It would spin a tale about orchestration and AI, but it wouldn't do this. It wouldn't do a search. Bing is orchestrated and it's orchestrated in such a way, actually, this is a three- step orchestration. The first step is I ask this question. The first step is it goes to GPT once and it says, hey GPT, I'm getting this question; what should I search? GPT comes back and says, you should

search orchestration and AI. The second step is that Bing runs this search in its own search index. Then the third step is, now, it's got these results from that search index from all of these different websites, Medium and eWeek, and Databricks, and all these other things. It takes chunks of all these webpages, puts them all together, sends them back to GPT and says, GPT make me a response, and the response is what we see here. Now, we're going to peek behind the covers of this and how this act

ually works. Hopefully, by this point in Build, you've all had a chance to have a look at the Azure AI Studio and the playgrounds that we have in there, but if you haven't, here we are. This is the chat playground, so I'm doing a conversational interface here, and one really important part of this is the system message. The system message, this is where you set the tone of your conversational AI, this is where you set the domain restriction: I only want it to talk about certain topics. This is w

here you set the rules: don't make jokes about Microsoft. This is where you set also formatting instructions, which is why, for example, Bing just bolded some of those things, some of those topics, when it came up. I have done none of that here, so my system message here is the bare, raw default system message. That means that right now, this ChatGPT model will talk about anything. I'll say, tell me about Santorini, and in a second it's going to spin me a tale about the Greek island of Santorini

. If you're a business or a government, you probably don't want your conversational AI to just talk about whatever; that's probably not such a good idea. Let's say that I am building a conversational AI for my healthcare plan; here, I have a different system message. The system message is effectively the program for these GPT models. And so, here, my system message says something different. It says, "You are an AI assistant that helps people find information about their healthcare plan... Your r

esponses will be in clear and uncomplicated language," and so on. These are the instructions that I'm giving it, which means if I go down here and I say again, "Tell me about Santorini," we will get a different result. It tells me a little bit about Santorini, but then it's like, but if you have any questions about healthcare plans, I'll talk about that. Unlike the last one, it didn't spin a whole big tale about, it's like, I just want to talk about healthcare plans. Now, let's say that I do ask

it a question about a healthcare plan. "Does my plan cover new glasses?" Let's say. Now, this is within the rules, and it does, in fact, give me a pretty decent response, but not a wonderful one. It says, "To determine if your plan covers new glasses, we'd need to know which plan you have," and all this stuff. It's talking in generalities. Premera, actually, is our US healthcare plan for Microsoft, but it doesn't know that right now. Right now, I am talking to the raw model, and the raw the mod

el has no idea which healthcare plan I'm talking about from Premera. It knows I'm talking about Premera, but Premera has thousands of healthcare plans. It's willing to talk, but only in generalities. If I wanted to be more specific, then I can ground it to my own data, which we will do right now. Now we have this button up at the top here, called "Add your data," and I'm going to add a data source here. Now, there's all kinds of different data sources I can add, even now, and we'll add more soon

, but we have a Cognitive Search, previously set up, into which we loaded all of our Microsoft US health plan documents. I've set up this Cognitive Search and now I just got to give it a little bit of information because, what this is going to do, it's going to render citations, just like Bing did. I need to let it know which metadata to use to render these citations, like so. Now, I'm going to let it use semantic search; I would note that we are also releasing here at Build, although it's diffi

cult to demonstrate, Vector search. Vector search is a different type of search that effectively vectorizes your query. What that means is that, you can give a question in a totally different language or using totally different words, and the vector representations of that is the same, and then it will allow it to find that more reliably. That's a behind-the-scenes thing, but that's all I need to do, so I've linked this now to my Cognitive Search. Now, let's try that same question again. Does my

plan cover new glasses? Now, we're getting a different response. Yes, your plan covers new glasses and it's telling me that they're covered up to 100 percent with the maximum benefit. This is grounded to my data. Now, let me take a moment to talk a little bit about how this works, because this is a common source of questions. First of all, there's not just one, like ChatGPT; there are many, and this one is mine. You can make your own instance of a GPT model, and that could be the ChatGPT model,

it could be GPT-4; you can make your own instance in Azure OpenAI, and the data that goes in and out of that instance is yours. Microsoft can't see it, OpenAI can't see it. Nobody can use it to train any more models, or anything like that. That data is yours. Furthermore, what I just did here, I didn't do anything to the model. This is the same model. This is, in fact, the ChatGPT model. If I scroll down here we'll see it, GPT Turbo, which is the ChatGPT model. This is the same model, I didn't

do anything to it, I didn't train it, I didn't fine-tune it. What I am doing is I am injecting this content at run-time, when I make this query, it does a search and it injects that right into the prompt. I don't see it, but it's happening. That's really what it's doing, just like Bing was. That data is ephemeral. The GPT models by themselves, the Azure OpenAI Service, doesn't store this data, so if you're concerned about HIPAA or GDPR, these various regulations around the storage of data. This

doesn't store the data; once this conversation ends, this data is gone, unless you as the user, unless you choose to store it, we don't store it. This data will be gone. Now, that's one way that you can ground this to your data and that's the super easy button for making that happen. Now, let's say that you want to be able to integrate to other data sources or even other things altogether, like systems of action. Well, we're also announcing here at Build the ability to use ChatGPT plugins. This

is an API that OpenAI introduced in March, and we are supporting the very same API now in Azure OpenAI. I'm going to ask a question here that I know that a GPT model by itself could not possibly answer, because the GPT models by themselves have no access to real-time information. It can't possibly know if there's an Audi Q7 available for sale in Seattle right now. But it can now, because I've enabled the Bing search plugin. We have this Bing plugin to ChatGPT. What the model does is, it can actu

ally decide when to call the plugin. I don't have to orchestrate this, per se. I don't have to define the orchestration; I just declare the plugin. I say I would like you to be able to use this Bing plugin, and when it sees a question that it knows it can't answer by itself-- I'm anthropomorphizing a little bit, it knows-- then it will go and hit the plugin. That's exactly what happened here. It went, and it went to Bing, and you can see it used the Bing Search plugin, and it found these various

different Audi Q7s that are available from all of these different vendors. That's how these plugins work. Now, also here at Build, we're releasing a new means of creating your own orchestrations, because you can use the grounding thing and as I said, it's the easy button to ground this to your own data. You can use these plugins. In that case, the model decides when it wants to use an external system. But in many cases, you may want to have that control. You may want to decide how this orchestr

ation works. For that, we have prompt flow. Prompt flow is actually two things. It's not just one thing. What I'm showing, this particular prompt flow is actually the analog of the query grounded to Cognitive Search that I did earlier. What's happening here is, this is actually the individual steps, like what I described with Bing. We take the input, we embed the question that you asked into a vector. This is one is using Vector search. We then go search that in Cognitive Search, we take the out

put of Cognitive Search, and we put it together in such a way that we can make a prompt out of it. We actually make the prompt out of it, like so. Finally, the last step is that we feed it to a model. One of the things that you can do with prompt flow is you can build your own orchestration, just like I did here. You can decide, and you can put, by the way, not just search in here, you can put other AI services altogether in here, you can put speech in here. You could put vision in here, or tran

slation, for example, or something else entirely. Something that has nothing to do with AI. You can use this to do your orchestration itself, but it has another and very important capability, and that is testing and evaluations. As I said, I am generating a prompt. We approach these large language models using natural language, but that means that your prompt matters. What you actually put in this English text here can change the output. One of the things you can do with prompt flow is you can t

ake each individual block, I'm taking this prompt block here, and I'm going to show some variants of this block. This is my variant_0, my initial variant. This is the prompt that we have running in the orchestration right now. But down here, I have a second one that's written differently. This one is more specific to healthcare. The question is, will this one work better than my variant_0? To test that, what I can do, and I'm not going to do this in this demo right now, but I can test that by pr

essing this bulk test button, I can give it a whole bunch of sample prompts of a whole bunch of questions that people are asking about healthcare. It will automatically try both variant_0 and variant_1. Based on a metric that I choose, it will do an evaluation and give me an idea of which one is performing better. Those metrics can be groundedness, like is the second prompt giving me something that's better grounded to my data? It can be relevance; is the second prompt giving me something more r

elevant to what I'm asking? There's all manner of evaluation metrics that you can use on the backend of the bulk test, that sadly, I don't have time to demo here today. But stay tuned, because we'll be putting up more in-depth content on prompt flow soon. Now, it's not just the prompt though, because the other thing that's very common here is that people want to test not just different prompts, but different models. What if you're currently using the GPT Turbo model and you want to try it with G

PT-4? How do you know that that works? Well, the same deal applies. Basically, what I can do here is I can click on my show variants button here. Here's my variant_0 with text-davici-002; that's good old-fashioned GPT-3.5, and let's say I can try it also with variant_1, text-davinci-003, that's GPT-3.5.1 in this case. Once again, I press that bulk test button. I have 1,000 different prompts that it's going to try, and it will test these models side-by-side to determine which one of these models

is giving the better result. Does upgrading to the new model give me a better result? Does it produce more latency? Does it produce different artifacts? Does it hallucinate less? All of those kinds of things. Prompt flow is a very important tool, both for orchestrating your AI services, but also to test them and evaluate them at scale. Now, once you've got this thing up and running, the next thing you're going to want to think about is, how do I ensure that this is behaving responsibly? To that

end, we have our new Azure Content Safety System. So, in our Azure Content Safety system, we have these sliders here. I could actually choose different levels of content that are permissible or impermissible. Now, in this case, I have it set down the middle for all of them, for violence, self-harm, sexual, and hate content. I have it set down the middle. I'm going to put in something here that is not such a bad thing to say, "I need an ax to cut a tree." This is not very violent, unless you're t

he tree. This is considered safe on all axes, and so I get the green checks. That means that this will pass. Now, let's bring our bear back from our story of earlier, "I needed ax to cut a bear," instead. Now, we're getting a little violent. Now, it's gone up to medium, and I get the red blocker symbol. This would be blocked, according to my settings. Now, why would you want to move the slider bars? Let's say that you are building a conversational AI for a police department, for people to actual

ly describe crimes. In that case, there might actually be permissible violent content in there. You want to take that violent content, in this case, because it's talking about crimes. You might want to turn these sliders all the way up to accept them. If, on the other hand, you're writing an application like GitHub Copilot, all the way down. GitHub Copilot should be talking about none of this stuff. You don't want any of these things in your code. In that type of a situation, and we do use this

same system by the way. This is the system that's behind Bing, behind our copilots, and all those things. Different settings for different things, but you turn them all the way down when necessary. That's the concept there. Generally, when it comes to using the OpenAI models, we apply this content safety system both in and out, and that is to say, so here, this is on the way in. "I need an ax to cut a bear." Well, we're going to block that content because it's violent coming in. Now, what if I m

ade some innocuous prompt? For whatever reason, the large language model was about to say something back, that would be violent. Well, we apply the content safety system also on the way out to ensure that nothing weird comes out of the model either. We apply content safety in and out. Finally, I've been talking still about, except for the Speech thing, we've mostly been looking at text here. I need to caffeinate up for this one. We mostly have been talking about Speech, but let's have a look at

vision. Now, this is traditional object detection, and of course we still support traditional object detection. This is a picture of my team in the Bay Area. I live in the Bay Area, in California. We went on a hike one day. I ran this traditional object detection on this picture in which we got person, person, person, person, person, person, person and it's right, right, right, right and right. All of these are indeed people, and for some use cases, that's okay, that's just fine, right? Sometime

s you just need to know, is there a person in this scene, or is there not? But if you want to do something more advanced, some more advanced workflow or processing or something like that, what you need, and this is the new piece, is Dense Captioning. Dense Captioning is powered by our new Florence model. Florence is a foundation model, much like the GPT models, but Florence is a different type of foundation model that understands just about everything that's visible in the world around us. When

I run this exact same picture through Florence's Dense Captioning, we get something very different. We get "A group of people posing for a photo," "A man in a black shirt," "A woman holding a drink." What you see here is that it's picking up not just objects, but more descriptive notions of those objects and what they're doing, their actions, like posing for a photo. When I use this particular photo, there's a little fun artifact here, and it's this, "A woman holding a bottle of vitamin." What i

s going on with that? This is Iman, on my team, and I'm right behind her there. Now, let's zoom in on this a little bit. She's holding a bottle of Vitamin Water, but she has her hand over the water, and so it looks like a bottle of vitamin. In fact, the model is correct, this is a bottle of vitamin and that serves to show that the model can read. But now let's say that you're a retailer and it really matters to you. Like in this case, the model by itself doesn't really know that this is specific

ally Vitamin Water, it just knows that this is a bottle of something and that it says "vitamin" on it. If you're a retailer or a grocer, you might need to know whether this thing, I mean, whether this is Vitamin Water, whether it's Coca-Cola, whether it's Pepsi or whatever, and so, these models are fine- tunable. You can give it just a few different images of Vitamin Water in different configurations, maybe somebody with their hand over the water, and a couple of other images of Coke and Pepsi,

and pretty quickly, it will learn that this is not just a bottle of vitamin, this is Vitamin Water. If this is something that matters to you in your workflow, you can fine-tune this to adapt to that workflow. Now, it's not just images, we can also run this on video. Here's a video of some folks working in a warehouse, and I can summarize this video. Why would I summarize this video? One reason is for captioning for people who are visually impaired, and so I can basically summarize what's happeni

ng in the video. But another, and maybe more common reason is for search metadata. Let's say that I would like to be able to search through my video archive for instances of people carrying a ladder, climbing a ladder, and doing those kinds of things. Well then, I can use this summarization here, and feed that right into my search, whether that's Cognitive Search or something else, and that would allow me to snap to this very video here. Now, with this model, I can actually search in the video i

tself and I'm going to make a custom search query. I'm going to say, "person falling" over here, and I know that somewhere in this video there is, this guy is going down. It snaps me right to the point that this person has fallen. Now, when I do this, it often looks like I'm doing a super fancy, like, ooh, that's a cute demo, that's really nice and all that stuff. Just to prove that this is not just a demo, I got this other video over here. I took it right outside over there, with Dayana, who's

sitting right over there, and Ikenna, I don't know if he's in here right now, in which they were throwing some stuff at each other, and I'm going to do the same thing here. I'm going to say, "person throwing a box," and here we have Dayana throwing the box, right here. But what was really crazy is that Ikenna had, for some reason, in his pocket, a banana, and so if we just scroll this back and there he is, throwing the banana, but watch this. He pulls it out of his pocket and throws it at Dayana

. That was right there, like an hour ago. My point is, this stuff is real. This stuff is here today and you can use it today. When you think about how this might be orchestrated, to finish this up, how this might be orchestrated to applications that you might build in the real world. I mean, it's not just for people throwing bananas. If we take a look at the next generation of Bing, if those of you who were paying attention, you might have noticed, Bing made an announcement a couple of weeks ago

that they're going multimodal. The next generation of Bing, I gave a picture here of an Audi Q7. I say, "Where can I buy one of these in Seattle?" I don't say it's a Q7. The next generation of Bing is orchestrated in such a way that it does just this type of image analysis up front, and like I said, Bing is nice to us, it tells us what it's doing as it does it, it analyzed the image and it's figured out that it's an Audi Q7. Now it's searching for Audi Q7 dealers in Seattle, and in a moment, as

you might expect, there we go, it's going to give me all of these results about where I can buy an Audi Q7 in Seattle. These are the types of orchestrations that you can build for your employees, for your customers, for your businesses, and for your governments. This stuff is here now. For the folks in the back, if you would please switch me back to slide land. We're just going to wrap this up and then we're going to get to some questions. We have thousands of customers who are now using Azure

AI and the Azure OpenAI Service. Thermo Fisher Scientific, for example, is using Copy.ai, and Copy.ai, in turn, is using the Azure OpenAI Service to generate content about Thermo Fisher Scientific instruments, and materials for their manuals, and stuff like that; eBay is using the Azure OpenAI Service to generate this thing called a magical listening. As you probably know, eBay is like this online auction site and you can list things for sale on there and now, with the magical listing, uses the

Azure OpenAI Service to generate a complete listing for you in eBay, rather than you having to type it all out. These kinds of applications, you're going to see them more and more in our Copilots and third-party tools, they're going to be everywhere you look and many of them are indeed based on Azure OpenAI Service. One customer that's using content safety, by the way, is Koo. Koo is like Twitter, but it's big in India, it's starting to get big in Brazil as well, and Koo is using our content saf

ety system, the very one that you just saw, to ensure that the content coming in and out of Koo is safe, non-violent, not including self-harm or abusive, and that kind of stuff, and that works very well for that company and it can work well for you also. As I mentioned, I couldn't really demo it here because there's not a great way to show it, Vector search is coming to Cognitive Search, and this is going to change the game. This is a key piece of how grounded large language models are going to

work. Across the board, what we covered today, we looked at some stuff from Azure Speech, we looked at summarization in language. We looked at both the current and the new capabilities of Azure Form Recognizer. We grounded Azure OpenAI on my data, in this case, with the Azure Data button. We added a plugin to Azure OpenAI. We looked at orchestration, evaluation, and testing with prompt flow, content safety, and Vector Search and Cognitive Search. These are some of the new features, not even all

of them, that we've added to Azure AI just in the last few months. Stay tuned, because there's more coming. We'll see you again at Ignite, and I'm looking forward to that. What comes next on learn.Microsoft.com, you can start your certification journey and we have all kinds of content there. One thing, in particular, that I would recommend, by the way, we just published on learn.Microsoft.com, our new prompt engineering guide, using all the best practices we've learned from Bing and from the Cop

ilot and stuff like that, how to write effective prompts. That's something you're going to want to search on learn.Microsoft.com. You can join the AI tech community to stay connected, both to us and to others of you that are working with AI. I strongly encourage you to explore all of the capabilities across Azure AI. There is much more than I was able to show here today, and the more you mess around, the more you find out, so check this stuff out, get into Speech, get into language, get into Azu

re OpenAI, and Vision. Try these things for yourself. They are amazing. With that, I think we are going to questions. Here, again, is that QR code. Now, on the big screen, we've got about six-and-a-half minutes. So, we have a question from Gustavo, "Will all these features work in Spanish or other languages?" In general, the answer is yes. The vast majority of these features work in a great deal of languages. In fact, what I didn't show here today is, there is a version of My Voice that speaks

Chinese. I've always wanted to speak Chinese, but I can't do it, but My Voice can on there, but yeah. And another person, John asks, "What languages are currently supported in Form Recognizer?" There are 250 languages supported by Form Recognizer. I forget the exact list, but there is a vast number of languages that Form Recognizer supports. The full list is available in the Form Recognizer documentation, and it's a long list. "How does Vector search differ from existing Cognitive Search? How is

it leveraging innovations in Vision?" Oh, there's a good question. Existing Cognitive Search, so Cognitive Search has two modes right now. The first mode is, I'll call it keyword search, where it's literally looking for a word and it can do things like stemming, e.g., if I searched the word "running," it'll search for "running," and "run," and "ran," variants of that word, but it's very keyword-based. The second mode is semantic search, where it will search for things that are like that, but no

t exactly that. Vectors are a different way of doing things. When you make a vector, you are actually mapping concepts into this multidimensional space. One super easy example is discussion and argument. If I take the two words, "discussion" and "argument," in one sense they're the same thing. An argument is a type of a discussion. In another sense, and in another part of the vector space, they're pointing in different directions because an argument is angry and then, a discussion is not. So, ve

ctorization is the mapping of concepts to this multidimensional vector space and you end up with thousands of numbers, actually, when you've given a sentence, you get a bunch of numbers. What that means, though, if I say the same sentence in Spanish and in English, it maps to the same spot in the vector space. The words don't match at all, but the concepts do, and so I'm able to find that spot. The person also asked about Vision, well, we can also vectorize images in the same way. So, I can take

a picture of a hat and a description of a hat, and that maps to the same point in vector space, and that means that I can do a search, like I did here with an image, and it's still able to find it. "When Bing summarizes resulting search documents, is that summarization AI-based?" Also, Bing is using GPT-4, actually. When it summarizes the information that it finds, it's not using this summarization API, which is ideal for the one that I demoed, which is ideal for document and conversation summa

rization, Bing is using the full power of GPT-4 in that case. Guido asks, "Can we use our OpenAI plugins also with Azure AI, instead of using Blob storage or Cognitive Search?" At the moment, the "Azure Data" button doesn't support plugins, although of course, yes, you can add your own plugins in there and it will use it. Whatever you have a plugin to, whatever data source or system of action, you can actually add that in here and use it with Azure OpenAI. The Azure Data capability, we will cont

inue to add more methodologies of adding data sources there. Stay tuned for that. But already, it is immensely powerful, as it is today, and we have a whole bunch of customers that are using it. One of which, by the way, I've got to say this, Dynamics Copilot for customer service is actually using that exact same thing. In fact, like when it goes and creates an email, for those of you who have seen it, you can log a case in Dynamics and you can press this button that says, "Make an email with a

resolution," in which it goes and searches your knowledge base and writes up the email. That is exactly Azure OpenAPI on your data, in fact, so some of our Copilots are using the same underlying technology. For Form Recognizer, how do I put a human-in-the-loop when there's low confidence from the AI? Form Recognizer does have a human-in-the-loop mode to it, so that you can find in the documentation for Form Recognizer, but you could do it with Form Recognizer itself, or you could do it as part o

f your own workflow. When you integrate Form Recognizer into whatever application you're integrating it to, you probably actually want to have the human check and say, is this the thing that you expected it to say? And so, they can actually correct that if needed. That's just a best practice whenever you are doing that type of automation, whenever it's feasible. "Is it possible to ground Florence with your own data for domain-specific attribution and labeling," is another question. And the answe

r is yes, you can. You can fine-tune Florence with your own data. So, if you have your own labeled data set, as with the Vitamin Water example, if you've got a bunch of pictures of Vitamin Water, that says this is Vitamin Water, not a bottle of vitamins, then it will actually be able to learn how to vectorize the Vitamin Water and how to work with it, in the context of all of these other orchestrations. Ashley asks, we've got one minute left, so if you're going to ask the question, now's the tim

e. "Can all these features be used on-premise? Can some of them be used on-premise, or do they have to be in the Cloud?" Well, many of these features actually can be used in containers, and that can be used on-premise. So, we have two modes of containers, disconnected containers, and connected containers. Connected containers have the advantage that they phone home and will get the latest model. So, as we add new languages, new capabilities, better features, the connected containers will pick th

ose up. Disconnected containers don't do that, but they can be run in things like an air gap environment. Some of these things, like Azure OpenAI, can only be run in a Cloud because they require a supercomputer to run, which is what Azure is. So, most of you probably don't have a supercomputer at home, but I do. All right, I'm going to take one last question. I got just a few seconds. "How do the latest announcements fit into Microsoft's Responsible AI mission?" That is core to all of this. For

Speech, I had to give my consent for it to mimic my voice. For content safety, that's used across the board in OpenAI; Responsible AI is absolutely core to our mission and it's part of everything we do in Azure AI. And with that, I'd like to thank you all, and have a wonderful rest of Build.

Enhance your solutions with new Azure AI products and features | BRK215

Related articles

Comments