Self-hosted, useful AI stack (film, 12m)

One of the most exciting and played-out topics in the world of technology today is AI. It seems you can't use a product without it claiming to have some sort of AI enhancement. And I'm honestly torn on AI. On one hand, you have this really great technology that can have profound impacts on our daily lives. And on the other hand, we have to give up privacy to use it. This is amplified by the fact that the more we use it, and the more it learns from us, the better it becomes, and the more it can help. And this means giving up our privacy to use it. Or does it? I set out to learn about AI systems that are private and local. I started building a system in my own home that could run AI workloads and help me see if there are any practical uses for running AI at home, all without any data ever leaving my home network. I've managed to build and host some pretty advanced systems that help me with everything from chatbots to image generation to help with my home and more. Let's see if we can self-host an AI stack comparable to what you see from the big players. Since we're on the topic of privacy, I think it's a good time to talk about keeping your internet private with today's sponsor. You all know the scenario, right? You travel to a far off place and you finally get to your destination. When you arrive at the place you're staying, you want to connect to Wi-Fi and you search for the username and password. You're immediately presented with a choice. Stay on your carrier's data network and use up your data plan, not to mention allowing them to snoop on you or connect to the location's Wi-Fi network where you have no control over what they do, who is connected to the network, and allow their ISP to snoop on you too. Is that like a double or a triple snoop? Well, I'll choose the third option and that's using a VPN like Surfshark and surfing the web without tracking and blocking all of the snoopers involved. Surfshark keeps prying eyes off of my activities when browsing the internet. It encrypts my connection using WireGuard, which is a fast and modern VPN protocol that has quickly become one of the most popular and secure protocols out there. Surfshark also adds some really nice features like dynamic multi-hop that uses multiple servers to connect me to my destination online, an integrated ad blocker so that I'm not targeted for ads, cookies, or tracked, and they even have a browser extension that lets me quickly connect to a VPN using a single tap in my browser. This is all backed by a no-logging policy and over 3,200 RAM-only servers, meaning nothing's written to disk. So on your next trip, if you're like me and you bring your iPhone, tablet, Android phone, MacBook, Windows laptop, Apple TV, and even a travel router, know that Surfshark will keep all of your device's connections secure with their unlimited device policy. So sign up today with their 30-day money-back guarantee, and if you use the code technotim, you'll get an extra four months for free on top of their already low price. So we'll start with OLAMA. So OLAMA is an open-source API for creating, running, and managing models. It helps you download and use trained models from the big players, like Lama3 from Meta, Gemma2 from Google, and Five3 from Microsoft. These are open models that are pre-trained to help you do a variety of chat-based tasks or even specialize in tasks like Starcoder2, which can help you with coding tasks and even image generation in line, but we'll cover that a little bit later. But OLAMA is the tool that I use to load and manage models that I want to use. Now, OLAMA by itself is kind of useful, but it's exponentially more useful when you pair it with something to use the API, like a web UI. One of the easiest ways to see the benefits of using OLAMA is to use it with a chat UI, like OpenWebUI. Now, if you're familiar with ChatGPT, you'll feel right at home using OpenWebUI, but it also comes with some features that even ChatGPT can't do. When using OpenWebUI, you can download and manage models pretty easily in the UI. It's also nice because it allows you to create multiple accounts, set default models, and even restrict which users can access which models. They're also adding memory features too, which can remember some of your answers and provide more contextual results. It's pretty cool. After getting things set up, you can use it just like you would a chatbot. Want to list all the U.S. states and capitals? Sure. Easy. Wanted to summarize something for you? No problem. Wanted to do it in the voice of John Dunbar from Dances with Wolves, as if he's creating an entry in his journal? Nice party trick. Two socks like Cisco has become a trusted friend. He still won't eat from my hand, but his keen eyes and ears never fail to alert me when something is wrong. Maybe soon it can bring in local AI text-to-speech and also mimic Kevin Costner's voice. So this is fun and all, but here's where OpenWebUI can do some tricks that might make some of the public chatbots lose their digital minds. You can enable web search too. Here in my chat, I can enable web search to find and summarize my search results. You can see here in the results, it's also citing sources so you know where it's getting its information. Cool. Now, I know you're also worried about privacy. That's why I've integrated this web search within OpenWebUI with SearchNG, which is a free internal meta search engine which aggregates results from different sources without tracking or profiling you as a user. SearchNG is a really great project that I'm running here locally, and I can easily hook it into OpenWebUI, bringing anonymous web searches into my chat client. This is one of the many things you can do in OpenWebUI, and one of the many reasons I'm going to continue using this as an interface for Olama. We'll see some more cool tricks it can do later, but let's move on. We've all seen the incredible images that can be generated with a system like DALI from OpenAI. They range from creative, to scary, to realistic, to even laughable. So I'll put aside creating images with text is a powerful tool for getting ideas and inspiration, killing time, or even just for laughs. StabilityAI and other companies have released their models that are open to use. Each company and each model has been trained on a variety of images and artworks. These models can be found on HuggingFace, which is a community for building and sharing AI models. These models can be downloaded and used, but require an engine and a UI to make it possible, similar to Olama and OpenWebUI. Models are one thing, and the engine and the UI are another. There are quite a few to pick from in this space, from the simple Auto 1.1.1.1, which I started with, to ComfyUI, which I have started using more and more. Now, ComfyUI is a little bit more advanced, but what I like about it is, is that it supports new models, allows you to visually tweak your pipelines, and seems to have more regular updates. Now, I'm going to be honest, I'm not an expert when it comes to this UI, but I know how to adjust my settings according to prompts that I find on the internet. And that's one of the key learnings here, is that the better the prompt, the better the output. And this is why things like prompt generators exist and are very helpful at getting the most out of some of these models. For the most part, I run StableDiffusion and ComfyUI for fun. It's really cool seeing some of the results, but I can see how this could be helpful for creatives for getting some inspiration. The cool thing about having this service running within your stack is that you can then hook it into existing systems. Remember OpenWebUI? Yeah, I've even integrated it into chat. Now, it's still early and it's less configurable than using the dedicated UI, but if you want an image for your chatbot in OpenWebUI, it's totally doable and really cool to play with. Using Olama with WebUI in various models can help you accomplish various tasks, even highly technical ones like break-fix scenarios and debugging help. Now, I personally found it really helpful when trying to fix errors with some of my scripts, converting my code to a different or newer version of a language, and even explaining how to use various components that I haven't used before. Now, you can even integrate this into your IDE or editor. Now, I've never used Copilot for my code, but many claim that integrating Olama with a model and an extension is a free and private way to get Copilot. Let me show you what I mean. I've been testing out a free extension called Continue that can connect to your local and private instance of Olama. While the model that I'm using isn't the greatest for code generation, it's really good at code suggestions as you type. If you enable autocomplete, it can suggest code for tab completion as you type, which is much better than depending on suggestions from VS Code alone. I found this feature super helpful and it works out of the box with the Starcode 2 model really well. Another tool I have in my self-hosted AI tool belt is Whisper. Whisper is an automatic speech recognition neural network created and trained by OpenAI. It handles things like accents, different languages, and even background noises with surprising accuracy. OpenAI has released these models and even the code to process it, but it's largely up to you for the interface. There's a great freemium desktop client called Mac Whisper that I paid for, which is wonderful, but I wanted a universal way to get transcription regardless of what platform that I'm using. I'm using an open source web-based version of Whisper that not only allows you to transcribe audio into text, but it also does this visually and lets you update them before exporting them into various formats. I can either upload a video or choose an existing YouTube video and choose the Whisper model to use. I found that smaller models work great for spoken word or unscripted talks because it will remove repeats and stutters from the captions. And if you're going to do something scripted, I found that the larger models are what you want. After choosing your options, shortly after you will have your video transcribed and you can visually scrub through it and fix any captions that you don't like. You can then export these into various formats and even transcribe it into another language. Now this might seem helpful only to me since I'm a YouTuber and I need to generate subtitles, but I plan on using this for home videos too and possibly even for meetings. Now this project is undergoing a complete rewrite and I'm excited to see where it goes. I would love to see multi-speaker options, find all, and other options you see in some of the other clients. But it's pretty awesome. One of the other things I've supercharged at home is Home Assistant. Home Assistant has started embracing AI and starting with an integrated chatbot and, you guessed it, you can integrate it with Olama. After integrating Home Assistant with Olama, your virtual assistant will get supercharged by Olama's LLM support. In the most basic example, if you ask the default assistant a question, it doesn't know how to respond. But when switching to your supercharged Olama assistant, it knows exactly what you're talking about and has the context of your spare home. I've also enabled Whisper to transcribe my voice and Piper to read text-to-speech. However, these are considerably slower since they aren't GPU enabled and require their own version of Whisper and Piper. I do wish that I could use my existing Whisper server, but I'll take what I can get because it's usually faster than typing it all out on mobile anyways. One thing to note is although you can't integrate with Olama, Home Assistant won't actually perform these actions. Now this is a little bit odd since the response said it did it for you, but currently it can only perform actions if you integrate with the cloud versions of Google AI and OpenAI, which is kind of weird because you integrate this self-hosted thing with the cloud that can then do things on your smart home network. But Home Assistant says Olama action support is coming soon. Now this is just the tip of the iceberg for what's to come with AI, and I hope that other self-hosted platforms make their systems pluggable with Olama and other services. As we see more and more systems sprinkle AI onto all of their products, I sincerely hope that there is also a future where we can plug in our own system like Olama with our own models, giving us the control and privacy we need. After all, some of these products allow you to plug in chat GPT, so plugging in our own should also be an option. I hope that companies see this as a way to keep our data private as well as theirs, and local to the person who's using it, because that's the future I want to live in. I'm Tim. Thanks for watching.

Menu

Self-hosted, useful AI stack (film, 12m)

Toggle timeline summary

Transcription