New Products from OpenAI - In-Depth Analysis (film, 45 minutes)

Good morning, everyone, and welcome to the first breakout talk of the day. My name is Kritika, and I lead marketing here at OpenAI. I'm so excited to see you all here today. As Sam mentioned in the keynote, we're really moving towards more of this agents-like future, and there are two products that we announced at the keynote that we'd like to go hands-on. First, we'd like to talk about GPTs and chat GPT. I know as developers, you're excited to get to the assistance API, but there's a lot of power and capabilities built into GPTs, and when you extend them with custom capabilities and actions, they can become really powerful, not just for yourself, but also for millions of users around the world. And second, we'll get into the assistance API, which lets you build these agents-like experiences within your own apps and products. So without further ado, let me bring up Thomas and Nick to show you more about GPTs. Hey, everyone. Hello, I'm Thomas, the lead engineer on the GPTs project. Hey, I'm Nick. I lead product management for chat GPT. We shipped chat GPT less than a year ago into what we thought would be a low-key demo, and the response has been incredible over the last year. And as we've shipped capabilities, whether it's GPT-4 or speech or vision or code interpreter, one thing has been very, very clear, which is that you, our users, our builders, our developers, know how to get the most out of this technology. So today, we're really excited to show you GPTs as a way to create your own custom chat GPT and share it with the world. GPTs are three things. They're instructions, they're actions, and they're extra knowledge. And we're going to show you three demos, one for each of those concepts, so you get a clear sense of what they're all about. Of course, at the end, we do one more thing. We're going to make one crazy demo that's going to try to combine everything into one. These are live demos. We all know the law of the demo. So there's always a 10% chance that it doesn't work, but I promise it will be pushing the limits of all the things that we've released recently. So it should be pretty exciting. With that, Nick, do you want to kick it off? Let's do it. All right. So let's share. Oh, here we go. All right. What you see right here is the new chat GPT, and the best thing about it is that it looks almost like the old chat GPT, so not too much has changed. Model picker's gone. This is great. But there's another new thing, which is this Explore tab, and I'm going to click into that to show you what it's like to create a new GPT. So here I see a list of the GPTs I've already created, but I'm going to click in to create a GPT, and what you see here is our new creation UI. Now, the best thing about this UI is that you can get started conversationally. So this tab on the left lets you have a chat with a GPT builder and iteratively create your GPT. The second tab is the Configuration tab, and it allows you to inspect your GPT and modify all of its internals, whether or not it's the instructions, the knowledge, the custom actions, or the tools it has access to, and then on the right, you can play with your GPT and see how it would respond to a real user. Now, let's kick it off with the first demo. Thomas, do you want to show us what instructions are all about? Absolutely, I do. OK. All right. Let's do it. There's another name for instructions. Well, it's just a message, but we can also call instructions a way of giving the GPT a personality. So I'll share a little bit of a dated reference confession. The way I started programming was I made Half-Life mods back in the 90s. And all right, thank you. One of the mods I made when I was in middle school was about pirates. And so we're going to keep a very dated theme for a little bit, but I promise we'll get up to 2023 in a little bit about pirates. So let's make a pirate GPT. And I do actually believe that most great products start as toys. So this is a little bit of a toy demo, but I think that's a great place to start. So you can think of this GPT builder a little bit as like this blank slate. And so I'm going to tell it, you are a live demo in front of the world's best AI developers. Yeah. So I want you to talk like a pirate, a real salty pirate, like Sayarg. All right. So obviously, that was not a canned prompt. I've always varied it every time I've rehearsed this, so we'll see what we get back. But it's able to understand in natural language the same way chat GPT does immediately the assignment. Captain Coder is pretty good. Yeah. Captain Coder. It's fine. It's fine. All right, so we'll take Captain Coder. Do I like the name? Do I have another name in mind? I'll do it one more. OK. Let's switch it up. Make it salty. Perfect. So I can refine the GPT as I'm creating it. It will understand modifications that I need to make. Right now, it's going to give it a little bit of personality. And so it's going to do that. Or I should say identity by a profile picture. Profile picture imperative. Immediately, I'm recognizing this. Salty. OK. A salty pirate skilled in AI. I'll take it. So if I go behind the scenes here, I think Sam went into a little bit of this. But this is the Configure tab for GPT, so you can see what's actually going on behind the scenes. Magic Create is populating these fields. It's got a salty pirate skilled in AI. And then we have these instructions section. So just for that little dialogue, the instructions are actually longer than the dialogue that I put in there. It's giving me something pretty good. And another word for instructions is really the system prompt. And so it's a big part of the system prompt. You're able to customize that. Pick some conversation starters. And then as Nick hinted, we have knowledge, capabilities, and custom actions as well. But I'll spare the demo right now. Nick's going to get into that. Over here is the Testing tab. So we can try it out. Maybe I can say one of the starter ones, like, what be the secrets of the machine learning? Very exciting stuff here. Again, I promised a data demo, so here we go. Ahoy, matey, data, the treasure trove. Yeah, yeah, OK, got it. Features, the map, overfitting the kraken, underfitting the shallows. Pretty good, actually. And that's my talk. OK, so that's the Testing mode. Of course, you can publish this, which I think the most exciting part is really introducing this concept of developer, but also user-generated content inside of Chat GPT. So if I go up here to Save, you can see that I can save. If you're not in a workspace, you can actually share this publicly. But I can just share it to people at OpenAI right now. I'm going to go ahead and click Confirm. All right, we're back here. We see Salty on the side. Now, I'm going to try to boost this demo and bring it back up to 2023. So I'm just going to swap over to the mobile app now. Come up on stage. Perfect, perfect. OK, so you'll also notice, I don't think you've seen this yet, but the iOS app and the Android app today will be getting a bit of a facelift and a little bit of a design cleanup, which I think is really great. And you're able to interact with GPTs. So what we really want is these GPTs to be available in basically anywhere you can think of. They're really the answer to, who am I talking with? With whom do I have the pleasure of speaking? A little bit of foreshadowing there. So I swipe over to the right. You'll see that there's these GPTs. Sure enough, Salty just came in. If I go over here, I can actually, so, who am I talking with? Of course, this is the 2023 part. So let me hit the Audio tab here. Hi, you're in front of the OpenAI Developer Day audience. Please be short, but say a quick intro. Ahoy there, Developer Day crew. You can call me Salty, the craftiest GPT to ever sail the digital seas. I'm here to spin ye yarns of AI and to navigate through the vast oceans of tech. Let's set sail on this grand adventure together. OK. So yeah, we have a secret pirate voice in there. We'll see if we can ship it. Appreciate some feedback on that. Nick, what do you think about that demo? I don't know. It's pretty funny, but I can't say it's useful, Thomas. Let's make something useful. Shame. OK. All right. So we talked a little bit about instructions. Instructions are great, because you can give your GPT any personality you want, any instructions that you want. But we've got some more exciting additions to the GPT anatomy that really let you build useful things. Who's built a plugin before? Thank you, first of all. We learned a lot from you. It was our first time connecting chat GPT to the external world. And we've iterated on plugins into a new concept called Actions. And Actions are very similar to plugins in the sense that you can connect your GPT to the real world. So I made a GPT called Tasky McTaskface. Very serious. Very serious demo. And Tasky is great, because Tasky helps me keep track of my to-dos. And I just hit Edit GPT. And what I could do now is continue conversationally. So whenever you want to keep working on your GPT, you can just continue the chat. But instead, for time's sake, I'm going to go into the Configuration tab. And as you can see, Tasky's prompt, its instruction set, is a little bit more elaborate. So I actually took some time to set this up. And the other new piece here is that I added an action. And you saw this earlier in the keynote. We're using Retool here to wrap the Asana API. You can configure pretty much anything. And if I click Edit here, there's actually a new UI that just lets you import any OpenAPI spec and just paste it in here. So you no longer need to host it. We also made OAuth a lot better. We have end user confirmation for Actions. But I'll show you all that in a minute. So really, Nick is in the wrong line of work. And so what he's not talking about, there's also authentication behind the scenes. So the OAuth that you know and love for end users can actually be connected to these apps as well. And there's a lot of exciting stuff there. Sorry, not to interrupt your thunder. What Thomas said. So yeah, so super easy. If you're already in the plugin, migration takes a few minutes. And let me show you this in action. So I'll just use the preview UI for time's sake. Let's see what's on top of my to-do. OK. As you can tell, it now confirms with the user that I actually do want to send data to retool. I do. All right, I do need to finish this demo. I guess it's being completed as we speak. There's one thing I did want to do, which is give all of you event attendees access to GPT creation today. We're super excited what you build. So let me remind myself, remind me to give the cool peeps at Dev Day access to GPT creation. So just as it was able to read my to-dos, it's going to log a to-do by filing an actual Asana task. Pretty useful. Boom, the task actually does exist. Do tomorrow. OK, that's not right. I am going to do it today. So actions, they let you connect your GPT to the external world. But there's one other new concept in the GPT anatomy that we want to show you, and that's knowledge. So I prepared another very simple GPT, Danny Dev Day. And Danny Dev Day knows everything about Dev Day. And obviously, there could not possibly be any information about Dev Day in our pre-training set, even with the new knowledge cutoff of April 2023. So what I did is I actually gave Danny access to Sam's keynote script. So if I inspect Danny, same story, configuration tab as before. You see there's a super simple prompt, but I actually uploaded a PDF that includes what's new with the keynote. And I can now ask it simple things like summarize the keynote. And because it's a small file, and because context windows just keep growing, we're actually stuffing it into context here. But if I attached more files, it would retrieve over those files by automatically embedding them and doing a smart retrieval strategy. And you're going to hear more about that in the API in a minute. So we've seen all the stuff that's new, but I think the most powerful thing isn't just summarizing the data, but actually talking to your GPT as if it knows that information. That's really our aspiration here. We're going to iterate on that over time. So why don't we do something more creative, like can you do a rap battle between the old Completions API and the brand new Assistants API? Thomas has promised me that he's going to do one of the verses on stage. We'll see about that. We'll feel the energy. All right. Built-in retrieval and state that persists. And the evolution, a twist you can't resist. Pretty good. All right. We're not going to do the rap, but you get the idea. It knows the information. It's all in there. So again, we showed you three different demos for each of the core pieces inside of GPT. You saw instructions, you saw actions, and now you saw knowledge. But got to put it all together. Yes, we do. All right, let's do it. Switch back over here. All right. So I truly, truly went all out and tried to think of the craziest things we could possibly do that are most likely to fail. All right, so first of all, I made this little thing called Mood Tunes, a mixstrip maestro. Nick, I'm going to need you for this portion. Instead of just prompting it, I'm going to try to use one of the new things that we've launched recently that I hope everyone's familiar with and is excited about. So I'm going to keep the theme of maybe now dated 2000s reference. Truly, truly great stuff. OK. Oh, Nick. OK, so let me drag and drop this down here. And I'm going to say, mood. So behind the scenes, we're having the Vision's engines kind of kicking in here trying to understand exactly what's going on. Perfect. All right. Looking at this picture, I'm getting a laid-back camaraderie with a touch of whimsy. Let's roll with that energy and create a mixtape. I think that's pretty good. It's pretty good. Chill vibes and high fives. Actually, I'm in. Laid-back camaraderie. Three little birds. Actually, I don't recognize half these things. But OK, great. Now, I happen to remember one piece of information. So before this demo, I uploaded a bunch of knowledge into this GPT. And it was a list of fun facts about all of my colleagues. So I happen to remember one of my colleagues is in a band. Can you slot in a song by them into the first position? Perfect. So what it's done here is it's understood that my colleague is in a band and immediately found their name. And since it didn't have the song information, it's going to go ahead and browse with Bing. So it's saying Sally Mango's new band. I hope everyone's heard of Sally Mango. Very exciting band. But unfortunately, they haven't made it too big yet. And so they weren't in GPT's knowledge set. It had to browse to go get that information. But it did actually successfully do that. So Sally Mango's album, Pressure Slide. Indie jazz pop band known for the music that addresses serious themes. Chill vibes and high fives. Starting with a Buena Vista. Features a rolling bass line and a hopeful message about living in the present despite future uncertainties. I love it. OK, great. So I'm going to say love it. Go ahead. It suggested that it generates album art. So it's taken vision. We've done browsing. Now this is going to generate a cover art for this mixtape using Dolly. Like I said, I really do love to push the limits in these things. And it's pretty out there. But I mean, I've been in AI for probably a decade, maybe more than that. It's shocking to me that this works at all. The level of sophistication here is insane. And so there's an album art for this. Chill vibes and high fives. OK, so. Check it out. It's pretty cool. That's neat. OK, so Nick, I believe for this portion you have to stand on this X. This is a complete fake out there. Don't look at it. And so I'm getting a good vibe in here. But I don't think the mood is quite right. So let's set the mood. So I've set this up, this action in the back end. Any guesses on what that does? Look over here. Fingers crossed here. OK. All right, so we set the mood. Some questionable lighting choices, but we set the mood. That's gone through retool. And it's connected to the Hue API, which is able to light this up. I am feeling the vibe. Are you feeling the vibe? I'm feeling the vibe. So it's offering to play the first track on Spotify. Let me pull up my Spotify here, just so you can know. Got Rick Astley. So go ahead, please. Again, very, very local band. Gotten that information. And let's take a look. So this was obviously the best thing we could come up last night. Pretty contrived, but hopefully it gives you an idea of the things you can build. We are so excited to see what you build. So you're all going to get access to your songs, and you're going to have access to your songs. So let's take a look. Look at the icon. And you can also use it to sign. So I'm going to go ahead and remove the JSON. Hello. Hopefully you can all see me? OK. OK, you're on. We're going to see what I'm building. Thank you. Hi, everyone. I'm Olivier, I lead product on the OpenAI platform. Hi, I'm Michelle, and I'm an engineering lead on the API. Today, I'm super excited to demo the new Assistants API. But first, Olivier is going to tell us a little more about it. Let's do it. All right. It shouldn't be a surprise to anyone that there has been an explosion of AI Assistants in the past year. ChatGPT, of course, unexpectedly took the world by storm a year ago. But the developer community has been building some amazing AI Assistant as well. Some of my personal favorites are Spotify AI DJ, which personalizes my music listening experience, and the Ask Inspector feature, which helps me put together healthy meals for my two-year-old toddler. When done right, those products are amazing for the product experience. They are fun, they are useful, and they truly personalize the user experience. But they are also extremely hard to build and get right. We've had hundreds of conversations with developers in the past few months, and the same pain points keep coming up over and over again. Developers have to manage limited context window. They have to manage prompts. They have to extend the capabilities of the model with their APIs and functions. They have to extend the knowledge of the model with retrieval. They have to compute to store embeddings to implement semantic search. The list goes on. And so that got us thinking, what are the right products? What are the right APIs? What are the right tools to help you build such cool AI products? What are the right abstractions beyond our existing models and API? And so today, we are thrilled to introduce a new API, the Assistance API. The Assistance APIs enables you to build world-class assistance directly within your own applications. At its core, an assistant is an AI that you assign instructions to. And the assistant can call models and tools on behalf of users. Behind the scenes, the Assistance API is built with the same capabilities that you just saw for Chat GPT, such as Cointerpreter and Retrieval. The API has three key primitives. Number one is the assistant. The assistant models instructions that you give to the model, to the assistant. And that's also where you're going to specify what models and what tools the assistant can access to perform its job. For instance, I could create an assistant whose task is to answer a personal finance question and can access Cointerpreter. So the first primitive is the Assistance API. The second primitive are threads. Threads represent a session between your users and the assistant. You can think of thread as a conversation. A conversation has a group of participants, and it can track the message history of the conversation. And very similar to Slack or Microsoft Teams, if you want to discuss a new topic or if you want to add new attendees, new guests to a thread, it's probably best to create a new thread. So the second primitive, threads. The last primitive is messages. Messages are simply posts between the users and the assistant. On top of these API primitives, we are super excited to release a few tools. The first tool is Cointerpreter. Most of you are already familiar with Cointerpreter and ChaiDPT. Cointerpreter allows the model to write and run code in a sandboxed, safe environment. It can perform math. It can run code. It can even process files and generate charts on your behalf. It's pretty magical when you think of it. Cointerpreter can write and run code on your behalf. So with the Assistant API, your applications will now be able to call directly Cointerpreter and get its outputs directly in the API. Pretty cool. The second tool is Retrieval. Knowledge Retrieval augments the Assistant with knowledge from outside the models. The developer can upload their knowledge to the Assistant, for instance, uploading some product information to the Assistant. And end users can as well upload, for instance, their own files, such as me uploading my personal master thesis to the Assistant. And the Assistant is going to intelligently retrieve the model, depending on the user queries. It's a completely pre-built tool. There is no need for you to compute embeddings, to store them, to figure out semantic search. The Assistant's API is going to automatically figure it out on your behalf. The last category of tools are tools that you host and execute yourself. We call those tools function calling. You define custom function to the model, and the model is going to select the most relevant function based on the user query and provide you with the arguments to call the function. As a whole, starting today, function calling is getting smarter. In particular, it's much more likely to select the right arguments, depending on the user query. And on top of that, there are two new features that we're going to go into much more details later on. But first on, let's do a cool demo of the Assistant's API. Over to you, Michelle. Thanks, Olivier. Now let's get started building our very own Assistant. So I'm building a geography tutor app to teach my users about world capitals. I can get started in just a few API calls. And since you all are developers, I'm going to start out in my terminal with some cURL requests. You can see my terminal here. Awesome. So we're going to start with the very first cURL request. So now we want to create this Assistant to power my geography tutor app. You can see we're posting to the Assistant's endpoint, and we're passing two key pieces of information. The first is the model. We're using the newest GPT-4 Turbo model, the latest heat that just dropped today. Next, we're passing the instructions to the model. You can see I'm asking the Assistant to be helpful and concise, letting know that it's an expert. When I send that request, I get an Assistant back, and I can store that Assistant ID for use later. Before the Assistant's API, if I wanted to use these instructions repeatedly, I would have to send them on every single API request to OpenAI. Now, with the new stateful API, I can create my Assistant once, and the instructions are stored there forever. Now, let's move on to the next primitive that Olivier mentioned, which is threads. Threads are kind of like one session between your user and your application. So now I've got a user on my website, and they're starting to type out a geography question, so let's create a thread. You can see here we're posting to the thread's endpoint, and there's nothing in the body because the thread is empty for now. We've got a thread ID, so just like before, let's save it for later. Cool. Now my user has finished typing, and I want to add their message to the thread, so let's do that. You can see here I'm posting to the thread's thread ID message's endpoint, and I'm passing in the data about the message. So the role is from the user because the user's typed out this message, and the content is their question. So they're curious, and they want to know what the capital of France is. Cool. So now you can see we've got a response back, we've got a message ID, and the message has been added to the thread. Now you're probably wondering, how can I get my Assistant to start acting on this thread? So we've introduced a new primitive called a run, which is how we've packaged up one invocation of your Assistant. So let's kick off a run now and get my Assistant going. You can see here I'm posting to the thread's thread ID run's endpoint, and I'm passing in the Assistant ID. This is pretty cool. You can actually have multiple different Assistants work on the same thread. But for now, let's use my geography Assistant. I'm going to kick off the thread and tell you a little bit about what's happening in the background. You can see I've got a run ID, and the run is queued. So a run is how we've packaged up all of the work of loading all the messages in the thread, truncating them to fit the model's context window, calling the model, and then saving any resulting messages back to the thread. So let's see what happened. We can fetch the messages on the thread to see how the Assistant has replied. So I'm going to issue a get request to the thread's thread ID messages endpoint to see what there is. You can see we have the first message from our user, and then the Assistant has replied saying the capital of France is Paris. So this is a super simple example, but we can talk about why this is better than the previous API. With the new Assistant's API, I don't have to store any messages in my own database. OpenAI handles truncating messages to fit the context window for me. The model output is generated even if I'm disconnected from the API. And finally, I can always get the messages that the model output later. They're always saved to the thread. I think that's pretty cool. Great. So these are the basics of the API, and now let's move to what I'm most excited about, which is how it can power your application with tools. One of the most useful parts of Assistants is their ability to leverage tools to perform actions autonomously. Code Interpreter is a tool that is hosted and executed by OpenAI. When I enable Code Interpreter in my Assistant, I can expand its capabilities to include accurate math, processing files, data analysis, and even generating images. To optimize the quality and performance of these hosted tools, we fine-tune our models to best determine when to call these tools, what inputs are most effective, and how best to integrate the outputs into the Assistant's response. So now, let's get started by using Code Interpreter. So I'm actually building a personal finance app that I want to ship to my users to let them analyze their transactions. And we've already been in the terminal, so let's move over to the OpenAI Playground. Here, you can see the Playground that you know and love. It's super useful for testing chat completions. But we've actually refreshed it, and you can see there's a dropdown in the top left to pick the Assistants tab. Here in the Playground, you can see you can create Assistants, change their instructions, and you can start threads. Super useful for testing. You can actually see the Assistant I just created in the API. All of that information is loaded here as well. So now, let's create a new Assistant to show off the power of Code Interpreter. So I'm creating a personal finance Assistant. So let's give it a name. I'm going to call it the Personal Finance Genius. And I'm going to tell it, you help users with their personal finance questions. Now, I'm going to select a model, and I'm going to select the newest GPT-4 Turbo. There it is. And I'm going to flip on Code Interpreter. With just one click or one line of code, you can enable Code Interpreter for your Assistant. So let's save that. Great. So now a user has come to my application, and they want to analyze a spreadsheet of their transactions. Let's take a look at what it looks like. You can see there's a bunch of info in here, a bunch of numbers. The dates aren't even sorted. It's pretty messy. Let's upload that to the thread. And now, my user is asking for a chart. So generate a chart showing which day of the week I spend the most money. Awesome. So now, I've done a compound action to create a thread, add a message, and kick off the run. You can see things are happening in the background. Let's talk about what's going on. When we've kicked off the run, we will get all of the messages on the thread, summarize them for the model to fit the context window, determine if it's called any tools, execute the tools, and then give that information back to the model. So let's take a look. Oh, here we are. We've got the output, actually. It's actually pretty surprising. I did not think I'd be spending the most money on Sunday. But let's look at how we got here. You can see that our personal finance genius has started by telling us what's going on. It's actually written some code. We can click into the code here. And then it kept writing some more messages and finally generated a chart, which we were able to see. So we can actually look a little more deeply to see how this happened. We can look at what we've called steps. Steps are basically the logs for your run. And they're super useful, so you can render your own rich UIs to show your users what's happening. You can see the Logs tab here. I can open it, scroll down, and find the steps for this run. This is reverse cron, so I'm scrolling to the bottom. And I can show you how we got here. So first, you can see we've created a step. And it has type message. So that corresponds with this first message we've created. Next, you can see there's a run step that is a tool call. And it's for Code Interpreter. So this is how the Playground was able to render this snippet. The inputs and the outputs are directly in the API. Next, you can see another message creation step corresponding to this message. And then there's a few more Code Interpreter and message snippets. So all of this information is sufficient for you to render the same UI. The Playground is actually built entirely off of our public APIs, so you can do the same. And speaking of, this is exactly what Chat GPT does under the hood when they're using Code Interpreter. So you can skin your application to look however you like. Great. Now let's move on to Retrieval. The Retrieval tool is incredibly useful when you want to expand the knowledge of your Assistant. Maybe you have some information about your app or business that you want to imbue your Assistant with. Rather than having to implement custom retrieval system on your end, OpenAI has built a tool that you can use with one line of code or one click in the Playground. The Retrieval tool does all of the document parsing, chunking, generates embeddings, determines when to use them, so you don't have to. Let's get started. So I'm going to hide the logs, clear this thread, and create a new Assistant. So now I'm actually building an Assistant to help my users use the OpenAI API better. So I'm going to create the OpenAI API wizard. That looks pretty good. Now I'm going to tell it that it's a helpful Assistant and then use the attached docs to answer questions about the OpenAI API. So I'm actually going to upload. I have a full dump of the OpenAI docs, just a markdown file, didn't process at all. I'm going to attach it to the Assistant, and I'm going to flip retrieval on just one click. Finally, let's pick the new GPT-4 turbo model and save our Assistant. So while this is being saved, we're actually doing the work on the back end to make this data accessible to the Assistant. So let's see what it looks like when my user asks a question about the OpenAI API. So my user is curious, and they're wondering, how do I use the embeddings API? And so now, we've kicked off a run. Again, what's happening in the background here is we're fetching all of the messages, truncating them however needed, calling the model, determining if the model has called the retrieval tool, executing retrieval for you, and then giving that back to the model to summarize. You can see here, we've actually grabbed a snippet from the docs and we even have a citation giving a direct quote from our docs so we can render that for the user. I think that's pretty cool. Woo! Ooh! Ooh! Ooh! Ooh! Ooh! Ooh! Ooh! Ooh! Ooh! Ooh! Ooh! Ooh! Just like last time, we can take a look at the steps and see how we got here. So one's a little simpler. We've got first the tool call to the retrieval tool. So you can let your user know that's happening while it's going. And then we have a step for message creation so you could render this UI. Awesome. So now, back to Olivia to explain how you can use the retrieval tool with different levels of file scope. Thank you, Michelle. Knowledge retrieval is extremely useful to augment the knowledge of your assistant with data from outside of the models. Concretely, retrieval can work in two different ways. Number one, you can upload and pass files at the assistant level. That's useful if you want your assistant to leverage that knowledge in every single interaction, in every single thread. So for instance, in Michelle's example, on the customer support and the API docs, you should likely pass that information at the assistant level. The second option is to pass files at the thread level. That's useful if you only want that specific thread to be aware of that content. So if I'm uploading my personal bank statements, I should likely do it at the thread level. Behind the scenes, retrieval takes care of embeddings. Sometimes it even do not do vector search and instead stop the context. So you don't have to essentially handle that logic on your end. And we're excited to launch several new features over the coming month. For instance, we want to allow you to pick between different retrieval strategies to find the right balance between cost, accuracy, and latency. One final reminder, OpenAI never trains on any files or data that you pass to the API. And so that's the same for retrieval files. Let's move on to the last category of tools, function coding. So again, function coding are custom functions that you define to the models. And the model selects those functions on your behalf. We are excited to release two new features to function coding starting today. The first one is JSON mode. With JSON mode, the model will always return valid JSON. Behind the scenes, we made model improvements and improvements to our internal inference stack in order to constrain the sampling of the model and to make sure that every time the output complies with JSON syntax. So that's pretty useful. That means that you can trust that the model generates JSON that you can directly execute on your end. That's pretty cool. And by the way, JSON mode will also work outside of function coding. So if you use, for instance, the chat competitions API and you have a very simple application that just takes in data, extracts some fields, converts it to JSON, JSON mode will work as well. The second improvement to function coding is parallel function coding. With parallel function coding, the model can call multiple functions at a time. We've seen in many applications users giving multiple instructions at once to an assistant. So let's say, for instance, I'm building a car voice assistant. And I tell the assistant, raise the windows and turn on the radio. Before parallel function coding, that meant that you had to do two different model calls to open AI, which, of course, resulted in extra latency and cost. With parallel function coding, such a use case can be handled with one single call, which is pretty awesome. All right, let's do a demo now of function coding. Awesome. Back to the demos. So I'm building off Olivier's example, and I'm building a new type of car. And I want a voice-activated assistant to be able to use some of the features of the car. So I have a bunch of functions that I've implemented. So let's pull up an assistant I previously created. Here it is. So I previously created this car assistant, and I've told it it's a helpful in-car assistant and to please call the appropriate function based on the user's request. So with function calling, I have an assistant with all of my functions. And my assistant will tell me when to call my functions based on the user's input and what the most appropriate arguments are. This is super helpful, so I can figure out how to use my system to answer the user request. So let's take a look at some of the functions I have here. So I have a lot of the stuff you might expect to see. So honk horn, start phone call, send text message. So let's try it out. There's a user in my car, and they just said, ah, that guy just cut me off. Let's see which function the assistant determines makes the most sense to call. There it is. Our assistant's decided to honk the horn to let that guy know it wasn't cool. You can see here that we've called the honk horn function with no arguments, and so now I know how to pass it to my system. We also have a way for you to give the function output back to the assistant so it can keep going. So here, I'm going to enter the output. It was successful. Oops, I have a typo there, but that's all right. We're going to keep going. And the assistant says, I've honked the horn for you. Please stay calm. So actually, I'm realizing that there's a function here that's missing that I want to add. So let's show you the add function flow. You can see there's a button here where I can add a function. We have some helpful presets to get you started. So let's use the Get Stock app and change it a little bit to be my function. So I actually want a function to be able to change the volume in the car. So I'm going to call it Set Audio Volume. And I'm going to give the description to the assistant. So set the car's audio volume. Now, we need to tell the assistant how to call this function. So we can tell it about the parameters and whether or not they're required. So the main parameter for my function is the volume. And actually, the volume isn't a string. It's a number. And so this is the volume to set, and it's 0 to 100. Finally, we'll tell the assistant that this is required when you're calling this function. Great, this function looks good, so let's save it. And now I can save my assistant. You can see here the new Set Audio Volume function is listed on the left. Let's start a new thread. And now my user is asking for a different query. They're saying, play Wonderwall and crank it up. In the background, my assistant is determining which of these functions makes the most sense to call. You can see here we've got the output, and there's actually two functions. This is showing you the power of parallel function calling. The first function starts the music with the query Wonderwall makes sense, and the second sets the audio volume to 100. Pretty cranked up. Now I can execute these in parallel and then get back to the assistant. Awesome, we're super excited about parallel function calling, and let's recap everything we launched today. We're super excited to see what you build with the new Assistance API. We've added three new stateful primitives, the Assistant for storing instructions, models, and tools, Threads for keeping track of all your conversations, and Messages, which are between the user and the assistant. We've also added the Runs primitive for every time you want to invoke an assistant, and we've added Steps, which are useful logs for rendering your UI based on what the assistant is doing. Finally, we've launched two new tools, Code Interpreter, Retrieval, and we've made two huge improvements to function calling, JSON mode, and parallel functions. We're super excited to see what you build, and now over to Olivier to tell you about what's next. OLIVIER LAMBERCY-JONES Thank you, Michelle. So that was a lot, but there is a lot more coming soon. We plan to ship many new improvements to the Assistance API in the coming months. Number one, we want to make the API multi-modal by default. It should be able to accept and generate images and audio file. Number two, we want to allow you to be able to bring your own code execution, so you can execute on your end the code that was generated by the assistant. A third big feature that we want to ship soon is to ship asynchronous support with WebSockets and WebHooks to make it easier for real-time applications to use the API. It's a new feature. It's a new product. It's in beta. We would love to hear what you will build. If you have any future requests, any wish lists, please tweet at us at OpenAI and show us what you build. We would love to hear all about it. Thank you so much, and enjoy Dev Day. MICHELLE LAMBERTY-JONES Thank you. Thank you. Thank you. Thank you.

Menu

New Products from OpenAI - In-Depth Analysis (film, 45 minutes)

Toggle timeline summary

Transcription