Reducing Docker Image from 1.2GB to 10MB - Practical Tips (video, 7m)
In a recent video from the Better Stack channel, the creator shares their experience in optimizing Docker images by reducing the size from 1.2 GB to merely 10 MB. This technique highlights the importance of every single megabyte since it not only lowers storage costs but also affects deployment times, scalability, and even security. This is especially crucial when utilizing tools like Kubernetes. Throughout the presentation, the creator uses a Node React application as an example, but the tips shared are applicable to all Docker images.
The first key step discussed is the choice of the base image. Typically, using the Node Latest image can weigh in at over a GB, which is excessive. Instead, one could opt for the Alpine variant which is only 155 MB. By appending ‘-alpine’ to the base image, one can save around 80% in size while maintaining just the necessary components for running the application. Although there can be compatibility issues at times, generally speaking, opting for lightweight alternatives is a much more efficient strategy.
Next, the video reviews how to manage layers during the image build process. The author explains how to optimize the build process by utilizing layer caching—it's essential to copy the ‘package.json’ file before the entire application. This enables reuse of unchanged layers during build time, significantly speeding up the process. Moreover, understanding how Docker layers work and why deleting files using separate 'RUN' commands doesn't affect the final image size is crucial for proper image optimization.
The length of the build and the security can also be influenced by what’s included in the final image. Using a .dockerignore file to exclude unnecessary files like 'node_modules' minimizes the build context size. The creator emphasizes the importance of consolidating operations into a single step to avoid leaving unnecessary files in different layers.
Finally, the author illustrates how utilizing multistage builds can save space. In the last build stage, the use of a lightweight Nginx image to serve static files effectively reduces the final image size to 57 MB. To round things off, the creator recommends several tools, including Dive and Slim, to aid in image optimization and better management of containers. As of the time of writing this article, the video on the Better Stack channel boasts 285,265 views and 11,520 likes, which indicates a strong interest in this topic.
Toggle timeline summary
-
Introduction to reducing Docker image size from 1.2GB to 10MB.
-
Sharing recently learned tips for experienced developers.
-
Outline of upcoming Dockerfile best practices.
-
Understanding the importance of image size and its impacts.
-
Using Node React application as a case study for tips.
-
Choosing a better base image, highlighting 'Node Latest' limitations.
-
Benefits of using Alpine images for smaller size.
-
Explanation of Alpine's efficiency by stripping essentials.
-
Demonstration of proper layer caching for faster builds.
-
Importance of restructuring Dockerfile for dependency caching.
-
Utilizing Docker ignore to skip unnecessary files.
-
Explanation of layers and importance of layer squashing.
-
Introduction to multistage builds for optimal image size.
-
How to effectively use Nginx for serving static files.
-
Final image size achieved using multistage builds.
-
Introduction of tools like Dive and Slim for optimization.
-
Encouragement to subscribe and engage with feedback.
Transcription
I want to take you through the steps that I use to take my Docker image from 1.2GB to just 10MB. The last few tips I actually learned recently, despite having used Docker for years, so even if you're an experienced dev, stay tuned for those ones. We'll also be looking at some Dockerfile best practices along the way as well. But first, why should you care? Well, it's because every megabyte counts. It doesn't just impact storage costs, but it can also impact deployment times, scalability, and even security. This is all amplified if you use tools like Kubernetes as well. In this example, I'll be using a Node React application, but these tips apply to all Docker images. For the first tip, let's talk about your foundation. In our React application here, we started with Node Latest, which weighs in at over 1GB as an image. The same goes for most of these images that include everything, like Python, for example. It's like using a cargo ship to deliver a letter. It's way too much. So let's see how we can choose a better base image. For a lot of images, Node and Python included, I can simply append "-alpine". This image is only 155MB, and using it in our build reduces it down to just 250MB. So about 7 characters there, and we're already down 80% in size. But you might be wondering why that works. Alpine is purpose-built for containers. Essentially, Alpine stripped out everything except the bare essentials needed to run your applications. Now, it's not always the best choice. Since it does use different system libraries to achieve this than the standard Linux distributions, it can sometimes cause compatibility issues, especially when you're doing something that works with native modules. But with pretty much all popular images, you should be able to find a variant that is more minimal than the standard one. You can also use the full image for your development container, but then work out what's needed and then use a minimal one for production. A cool shout out here to distroless images. These are Google's take on minimal images. Distroless images contain no operating system at all. You get no shell, no package manager, not even basic Linux commands. It's just your application and its runtime. These are a little more complex to set up, though, so I'll stick to the Alpine variant. But I'll leave the links to that project in the description down below. So now that we've got our Alpine base image, let's talk about speed. Every time you change a single line of code, you shouldn't be waiting minutes for Docker to go ahead and rebuild and reinstall all of your dependencies. So let's fix that with some proper layer caching. So here, as an example, I made a small change in my React application. I changed a line of text, but it's rebuilding everything from scratch when I go ahead and run the Docker build command. This is because each instruction in your Docker file creates a new layer. The magic happens when Docker is able to reuse the layers that haven't changed from a previous build. So let's rewrite our Docker file to take advantage of layer caching. So in this file, the difference here is that we're copying just the package.json first. This is because our dependencies change less frequently than our code. Now on a rebuild, the dependency step can reuse the cache variant from its previous build and only the code build step needs redoing. The same would go if you use something like requirements.txt in Python as well. Just so you know, three things trigger a cache invalidation. One, changes to the file that you're copying. Two is changes to the Docker file instruction. And three is changes to any previous layer. This is why order matters. Put your most stable layers at the top and then your most frequently changing ones at the bottom. So we've optimized our base image and we're utilizing layer caching. So let's look at removing some unneeded files. The first tip is simply to use a Docker ignore. When I used npm install locally and I'm developing locally, then in my Docker file, I go ahead and copy everything over. It's carrying a lot of extra folders that aren't actually needed, like node modules, for example. You can actually see this in the build command where it tells you how much context was transferred over. This is actually entirely useless for building my application because if you can see, we're reinstalling the modules within the image. So we need to go ahead and make sure they aren't included. So let's set up a Docker ignore. I'll add in the files that I know aren't needed in the image. So that's things like node modules. And it's also good practice to make sure your secrets aren't going in there, too. Now, when we run the build command again, you can see we transferred way less build context. This could speed up your build significantly, especially on larger scale applications. Next, I want to explain a little bit about layers, as I think it's important to know before we move on to multistage builds. So let's talk about layer squashing. If we look into our container, we can actually see a few files that we don't want in there. With this application, if I wasn't using the preview, I wouldn't even need that node modules folder as my application has already been built into static files. We'll look at that in a bit. But for now, let's focus on what would happen if we wanted to clean up those extra folders. We can clean the cache, remove temp files, remove node modules. So we've removed some extra folders there. So we should have a smaller image, right? Oh, we don't. This is where we need to talk about how Docker layers actually work. The separate run commands here don't save additional space in the final image. In Docker, each layer is immutable and contains only the changes from the previous layer. When you use separate run commands like this, even though you're deleting files in the later layers, the files still exist in those earlier layers. Docker can't actually remove data from previous layers because they are immutable. The deletion in a new layer only marks the files as not accessible in the final container, but they still consume space in that image. If we moved all of these into one run command, though, all of these operations are now happening in a single layer. When that layer is committed, it only contains the final state with the cleaned up files, not any of that intermediate state with the extra files. This could even have security implications if you're copying over EMV files, which you shouldn't be doing, by the way. But if you did and then you removed it on another line, it would actually still be findable in the image. Someone could go in there and extract those secret files. So that's even more space saved. But now my app personally won't actually run in this state. As I mentioned earlier, I deleted the node modules folder, but the application relied on the preview to go ahead and host those static files that in my case are HTML files. So I now need to host them again. This is the number one Docker file tip, and that is multistage builds. When my application has been built, I actually only need the distribution files and then some way to serve what in my case is static HTML files. So I don't need anything else from Node, NPM or anything like that to actually go ahead and run the final container. Instead, then what I'll do is I'll add a line that says from and then I'll use Nginx to go ahead and host my files. The from keyword here separates this out into separate stages. The magic here is that everything that happens in that builder stage, so Node.js, NPM, Node modules, your source code, it all gets thrown away. The final image only contains your build assets that we can copy over from that stage and then the Nginx image itself, which is very small. So now I'm down to just 57 megabytes. Super easy, right? Now, I can understand that Docker files can get super complex. So I want to shout out some tools that can help you out with this process. First, we have Dive. This is an image explorer that will let you look at the individual layers that you've created. You can use this to help you find ways to optimize and debug your builds. Now, we also have Slim. Slim allows developers to inspect, optimize and debug that containers. You don't have to change anything in the container image itself, and you can minify it by up to 30 times while making it secure as well. Optimizing images isn't the only thing it can do, though. It can also help you understand and author better container images through tools like X-Ray and Linting. I highly recommend this tool. I actually used the slim build command on the improvements that we made here, and I got it down to the final 10 megabytes. There we go. If this video helped you out, do go ahead and subscribe and let me know your favorite tip in the comments down below. As always, see you in the next one.