How to create a search engine for quotes from TV shows? (film, 23m)

So, welcome everyone. I will be the last speaker of the day. I'm going to tell you about which episode was that or how can you search into your favorite TV shows. First, my name is Tim Curry. I'm a developer advocate with Chuck at Algolia. You can find me on Blue Sky. And I tend to use Algolia both professionally, because this is my job, but also personally, I often push data to an Algolia index and build a UI on top of it to try to solve many of my problems. If it can be solved through search, I will put it into Algolia and see if it solves my issue. But when I'm not doing stuff with Algolia, I'm doing something I assume many of you would relate to, like I enjoy a good TV show. I'm going to use friends as an example because it's a pretty universally known TV show and I really like friends. There's actually many specific moments of friends that live in my head. And sometimes I just want to rewatch a specific scene in the entirety of the ten seasons. But there are so many iconic moments inside of friends. Even if I want to watch a specific one, I don't remember in which season it was, which specific episode. And even if I knew, I would then have to skip to the exact moment inside of the episode to watch that specific scene or that specific moment. And even if I do own all the DVDs and it's easy to find friends on streaming platform, most of the time I just end up on YouTube, I search for the scene I want to watch, I watch it, I smile, I move on to something else, and that's fine. But not every TV show is as popular as friends. You can find almost any specific scene of friends on YouTube. But I have another TV show I really, really like. That's called Bref. You probably don't know it because it's a very French thing. But Bref actually means in French short. And that's one of the whole gimmick of the whole TV show, that it's really, really short episodes of about two minutes each. It's a really fast paced kind of show, like the main character is speaking really, really fast. And there's a lot of jokes per minute. And there was only one season, so really short lived as well. It was released in 2012. And it was like a phenomenon in France. Everybody in France knows what Bref is about. And today, all the episodes are available on YouTube. So if I ever want to rewatch something, I can go on YouTube, search in a few episodes manually, because it's not as popular as friends. There's not a specific cut of every specific scene. But it's a pretty short episode I could manage. But then something pretty unexpected happened, end of February, and season two of Bref was being announced. Like 13 years later after season one, that show that everybody thought was done, was finished, actually had a sequel. It's a really, really good one. And it was like a big surprise. A big surprise for me. And that actually triggered me into wanting to build something for the season one. Everything is available on YouTube. Maybe I could build something with Algolia to help me find easily one specific moment of the whole season one with just a few keywords. So that's what I set up to do. And I'm going to show you how it works and then how it's built. So I built something called Bref Search. This is the list. Sorry, getting my mouse back. This is the list of all the episodes in chronological order. And if you start searching for something, so it's all in French, but I'm going to search for French words you might know, like bonjour, you find when the characters are saying that specific things. I can search for bon appétit. And if I put my mouse over, a looping two second video starts playing of that specific moment. I'm searching for au revoir, meaning goodbye. And what is pretty good is that in addition to displaying me what's happening in the episode at that exact moment, if I do click on the result, it's going to play the video at the exact moment when that dialogue is being said. So let's try that out. So that's working. I built something that would allow me to find anything in the 82 episodes just by typing a few keywords of what I remember was being said in a dialogue. And this is all thanks to Algolia. So let's see how it actually works. How does it work? So we have the Algolia dev bit. I'm going to show you the structure of a record. All my records are following the same pattern. They all have two main elements, two main objects, a video element that contains everything related to the actual episode, the video on YouTube, like the YouTube ID, the title, the index, which is in that case it's episode 57 out of the 82, the duration, and the number of views. But then I have a second part that's everything related to the actual dialogue, the actual subtitle at that moment. What is the content? What is being said at that exact moment? And the time is the number of seconds from the beginning of the episode. So that's what I'm using when I'm clicking on a search result to start the video exactly at the right moment. So that means every single episode has several records. They are really small episodes. As I mentioned, about two minutes, that's about 50 subtitles per episode. So that's a grand total of about 4,000 records. That definitely fits in the Algolia free plan. For the record, no pun intended, the Algolia free plan can hold up to 1 million records. So what I'm using for the breadth search, it's just a drop in the ocean. I actually made the calculation if I wanted to index the whole 10 seasons of Friends, it would still fit in the Algolia free plan. But we did not always have that large of a free plan at Algolia. 12 years ago we had another logo but we also had a free plan with only 10,000 records. But that would still have fit in the old plan of 2012. And actually I already wanted to build that thing in 2012. Actually when the DVD was being released, I wanted to build a search to search into everything. I've wanted to build that thing for 13 years but I couldn't because I tried to find a way to extract the subtitles from the DVD and I couldn't find a way to do that. So I gave up. But fast forward to today we are in 2025. We live in an age of AI and now speech to text just works. And so I thought maybe I could use the speech to text capabilities of 2025 to generate the subtitles I was missing 13 years ago. So I'm going to show you the full pipeline of how I managed to move from a YouTube playlist to a full website. I'm going to show you a bird's eye view of the full pipeline and then I will go into details of every specific step. First I want to download all the videos locally on my laptop as mp4 files. From those files I'm going to extract the audio as mp3. From the audio I'm going to use an AI to get the subtitled files as vtt. And once I have the subtitles, on one hand I will generate the records as I showed you and on the other hand I will generate all the media I need. The static thumbnails and the animated previews. And once I have all of that I can build the full website. So the first step is how do I get all the videos from YouTube. So I'm using an open source tool. It's a command line tool called YouTube Downloader which is awesome and just works as well. You just pass the id of the playlist as an input and it will download every single video locally on your laptop as mp4 files. That's one of the two main command line tools I'm going to use and I'm going to reuse that tool later on when I will need some popularity metrics like the number of views. The second tool I'm going to use is ffmpeg to transform a video file into an mp3 file. So I'm just going to extract the audio stream. I'm again going to reuse ffmpeg later on in the pipeline to get the static thumbnail and the animated previews. But those tools already allow me to get audio from all the videos in the playlist. Then I move to the extraction to subtitles. I knew it was possible today in 2025. I didn't know which command line tool to use. So I think that if I remember correctly what I did was I went on to Google and I searched for a third-party API convert mp3 to subtitle and I found HappyScribe as one of the very first results. So I gave it a try. I opened a free account. I uploaded one single mp3 file and in less than two minutes it got me a subtitle as a result and the result was pretty good. So I opened a paid account, 30 bucks or so, and I uploaded the 81 other episodes. And that thing that had me stuck for more than 10 years, now I managed to solve in less than 10 minutes and 30 bucks. So this is what HappyScribe is giving me. On the left-hand side you have a UI with the timestamp of everything it managed to extract from the audio. And on the right-hand side you can see what a VTT file looks like. So it's really a text file, really simple format where you have the timestamp where a specific line should start being displayed and a timestamp where it should stop being displayed, followed by the actual content of the subtitle. And that I can easily parse and create records for Algolia. Now that I have this crucial piece of information which is the way to link the sentence being said with the timestamp it is being said, I can extract the the thumbnails. So I know this is happening at second 65. So I'm asking FFmpeg to extract from the input video one frame at second 65 and save it as a PNG file. I'm asking FFmpeg again to do the same thing for all the animated previews. This time the FFmpeg command line is slightly longer because I have to input what kind of compression I need and so on but I'm just omitting it because it wouldn't fit on the slide but it's pretty much the same logic. Just asking for a two-second video starting at the exact same second. And so I'm doing that for my 4,000 records for my 4,000 lines of text. Okay I have all my information. I'm pushing that into an Algolia index. I'm pushing all my assets on a server. Now I still need to slightly configure the Algolia index. First thing I need to do is I add the popularity ranking. When you go to the website initially every episode is ranked chronologically in the order they were released but the moment you start typing I switch the popularity to, sorry, I switch the ranking to a popularity ranking. I'm now using the number of views as a popularity metric to rank first the episodes that I assume are the most popular because they are the most viewed. And once again I'm using YouTube Downloader with its dump JSON argument. It will create a really really large JSON file for a specific video with a lot of metadata about the actual video including the number of views, number of comments, and number of likes. I'm only using the number of views in my popularity ranking but I could have used the other one in case of some typewriting. I also configured the index to use the distinct feature of Algolia because what I wanted to show in the search result was one result per episode. I didn't want several results that were all linking to the same episode so I used distinct and set up an attribute for this thing to video ID meaning that if several of my results were sharing the same video ID I would only return one and would only return the most relevant one. And how do I pick which one is the most relevant from inside of a specific episode? Well I used another data that I could get from YouTube Downloader which is information about the most replayed segment of a specific video. So in that huge JSON file I told you about there is a hitmap array that contains 100 lines. Each line represents a segment of the video with the start time and end time and a value which is kind of the hit. It's a value that goes from 0 to 1 and high value means it's part of the most replayed, low value meaning it's barely played. So I'm using that inside of a specific episode to see which match is the most relevant. I'm considering that if it's part of the most replayed parts of the video it might be the thing you are looking for. Finally I still need to do some front-end optimization. The website is built with Next.js and hosted on Vercel but because Bref is known for being really really fast and Algolia is also known for being really really fast I wanted to build a website that felt fast as well. I wanted to make both those things justice. The third thing I did was making sure all my assets were actually fetched through Cloudinary. If you don't know Cloudinary, go sign up for Cloudinary right away. It's an amazing image CDN. Basically it acts like a CDN for any kind of asset but it can also convert and compress your images and your videos on the fly. So the best version of each thumbnail and of each video is actually downloaded by the end user and is being compressed without me having to think about the compression. Then I'm using some low quality image placeholder or LQIP whenever a search result is written by Algolia. That means that inside of my record I'm actually saving a base64 encoded version of the final thumbnail but that's a really tiny version of it like 16 by 9 pixels and I'm displaying that really stretched to the final dimension of my image while the image is being downloaded. That means that the moment Algolia returns a search result I can already display an image placeholder. Even if it's very blurry you can still see the main shapes and this is going to be displayed just a few seconds while the final image is being downloaded and once it's downloaded it's going to replace the placeholder. I'm also adding some lazy loading so I'm not loading anything that's below the fold but all of that is to give the impression that results are being loaded instantly. And finally I needed to add some optimization on the videos as well. I really wanted the animated previews to start playing the moment you put your mouse on top of the results but even if I'm using Cloudinary, even if I'm compressing everything, I will still need to download the file before I can play it. So what we did is that we added an area around the hit where the video starts playing but we also added a second larger and invisible buffer around it but when your cursor goes into that zone we start downloading the file and we only start playing it once you are on the real zone. So doing that if everything goes right I can start downloading the file while your cursor is moving toward the end hit and only start playing at that moment and because it's really small it's usually long enough for the browser to download it and play it instantly. So all of those things, now that you know how it works, what I realized is that when people see that website for the first time they feel like it's amazing and you can search into videos using Algolia. I thought Algolia was only text search but it really feels like I can search into a video, I can play the video but once you actually know how it's built it's just text search underneath and just having I just saved the right data in my index I've just been a little bit creative in the way I'm displaying the thing in the end but it's just text search underneath and that's what I learned by building that thing is video search in that context is just text search but with fancy graphics but I still think this is really important if you are building a search that search into a specific kind of elements like video you have to make your search results look like those elements even if you're just searching into text you have to make it look like you're searching into video because it has to look like video and not a text listing. So thank you very much I would be open to any question you might have for my American friends know that the series I'm talking about the tv show I'm talking about is going to be available on Disney plus for you in April so that way you will actually know what I'm talking about. Thank you very much. Thank you Tim I'm gonna have to practice my speed reading so that I can keep up with the subtitles on the American release of it. Even for native French speakers sometimes it's pretty fast so I hope for you the subtitles will be good. I will have to see how it goes. I love your comment at the end it makes me think of I can't even I don't can't accredit it but I remember reading somewhere that television is basically just radio plays with pictures and I think you kind of tapped into that as well that that's really kind of the dirty little secret of television. We do have some questions here the first one was actually about the build plan and Ryan was saying I thought I recalled that Algolia docs saying that the build plan isn't intended for production use of course I can't find where I saw that now is it's still a recommendation and I would say you know from my side Ryan it it's not it is a recommendation basically the build plan wasn't designed for production use cases the limits are hard when you hit them you have no mechanism to extend or to purchase more if you have like a spiky day or something like that also there there really isn't a lot of support baked into the plan as well the intention is that the build plan opens up a lot of functionality lets you really explore kick the tires validate that it's good for your use case and then the intention is that when you're ready for production you would be moving to a plan like row that said I know our own Raymond who has been in the the chat quite a bit today Raymond Camden runs his blog on a build plan and it's sufficient for him I would add that in my opinion what we mean by in production would be that if you're not making any money out of it if it's for a side project I would say that's fine as long as you keep the power by Algolia somewhere so people can see how you build that thing that's why we have this Algolia 3000 on top which is a private joke inside the the tv show but if you're not making any money out of it we're not going to charge you for it yeah yeah we just love the shout out on your site that really helps us out Julien would like to know is there documentation on how to do this because we need to get that to our devs is what Julien says so the only documentation so far in this presentation everything is in the private repo so far but I plan on writing a blog post giving more details into the actual command line tools I'm using and so on so that should act as documentation maybe in the future we will make it as a boilerplate so you can do that for any playlist as long as you have the subtitles it was a test something fun I did I I want to expand it to other things but the yeah now there will be a blog post there will be maybe a boilerplate but so far you're the only person seeing it and this is what you have as of today you're seeing it first folks very exciting uh mad scientist Gary wants to know uh why don't we take the VTT files convert the language to another language and uh inject that back in as the soundtrack plus some AI to lip sync it and then hands it with copyright schmopy right in theory that might work I might not have picked the best tv show to do that because really the pace of the speech is really really fast and I there was one more step in the pipeline actually after I got the VTT file from HappyScribe I still had to proofread them most of it were good like 90% were good but because of the speed of the thing and some specific noun that were used in I still needed to proofread it so I would not still would not trust a fully automated full circle of translating it translating it into several languages and in addition to that because uh um a show with jokes I'm not sure they're all going to translate well that's a good point you end up with very literal translation that is not very exciting to watch um and I saw there was a bunch of uh comments in chat too Tim I'll let you take a peek at those later Michael was mentioning that uh Michael has a uh I think a book search you were saying Michael that this was kind of reminding you of uh we'd love to see that uh hit us up on the discord maybe that's something that we could show at a future dev bit and um with that we've got a lot of applause icons here but I'm going to go ahead and and let you go Tim and you can chat with the folks in chat and I'll go ahead and wrap this thing up

Menu

How to create a search engine for quotes from TV shows? (film, 23m)

Toggle timeline summary

Transcription