Menu
O mnie Kontakt

Artykuł opisuje najnowszy odcinek kursu "Projekty w Linuxie" autorstwa Ahmeda, który jest częścią wykładu na temat wykorzystania Linuxa jako serwera pamięci podręcznej proxy. Na początku Ahmed zapewnia wprowadzenie do tematu, wyjaśniając, czym jest serwer proxy i jego różnice w stosunku do serwera proxy odwrotnego. Serwer proxy działa jako pośrednik pomiędzy użytkownikiem a zdalnym zasobem, co zwiększa zarówno bezpieczeństwo, jak i wydajność. Ahmed wspomina również o korzystnych funkcjach serwera proxy, w tym o dzieleniu się dostępem do internetu oraz ukrywaniu adresów IP klientów. Jest to szczególnie istotne w środowisku firmowym, gdzie dostęp do sieci musi być kontrolowany.

Następnie Ahmed wprowadza przykład serwera proxy o nazwie Squid, który jest znany ze swojej funkcjonalności zarówno jako serwer proxy, jak i odwrotny serwer proxy. Proces instalacji Squid na systemie Ubuntu jest przedstawiony szczegółowo, w tym polecenie do jego instalacji oraz uruchomienia serwisu. Ahmed korzysta z terminala, aby pokazać, jak sprawdzić, czy serwer Squid działa poprawnie. Jego wykład obejmuje również instalację Squid na systemie CentOS, co pokazuje elastyczność tego narzędzia do użycia na różnych dystrybucjach Linuxa.

Po zainstalowaniu i uruchomieniu Squid, Ahmed wprowadza konfigurację serwera, analizując plik konfiguracyjny squid.conf oraz różne typy dyrektyw. Wyjaśnia funkcję pojedynczych dyrektyw, dyrektyw boolowskich, a także dyrektyw wielokrotnych, które mogą przyjmować wiele wartości. Przykłady użycia dyrektyw takich jak HTTP access i ACL pozwalają czytelnikom zrozumieć, w jaki sposób można kontrolować dostęp do zasobów i stosować zasady bezpieczeństwa. Ahmed przedstawia również sposób dodawania zasad do pliku konfiguracyjnego, co ilustruje jak dostosować wartości dla różnych zastosowań.

Kolejne kroki Ahmeda obejmują zaawansowane opcje konfiguracyjne i wyjaśnienie, jak można dostosować porty, na których Squid nasłuchuje, a także jak zdefiniować konkretne zasady dla różnych typów ruchu internetowego. W szczególności omawia wykorzystanie ACL do ograniczenia dostępu w oparciu o adresy IP oraz domeny. To daje użytkownikom dużą kontrolę i możliwość dopasowania konfiguracji do ich potrzeb. Porusza również sposoby na zablokowanie określonych typów plików, takich jak MP3 czy MP4, co jest przydatne w środowiskach korporacyjnych i edukacyjnych.

Na koniec Ahmed podsumowuje, jak Squid może służyć zarówno do kontroli dostępu, jak i do zwiększenia wydajności internetu poprzez caching. W liczbach, czas wspomnianego filmu wynosi 17,639 wyświetleń oraz 330 polubień w momencie pisania artykułu. Taka statystyka pokazuje popularność i użyteczność tematu omawianego w tym wykładzie, sugerując, że wiele osób zyskało cenną wiedzę na temat działania serwerów proxy i konfiguracji Squid. Jest to istotne dla każdego, kto chce lepiej zrozumieć architekturę sieciową oraz zarządzać dostępem do zasobów w sposób efektywny i bezpieczny.

Toggle timeline summary

  • 00:00 Wprowadzenie do kursu na temat systemu Linux z instruktorem Ahmedem.
  • 00:10 Przegląd Sekcji 5, koncentrując się na używaniu Linuxa jako serwera proxy do buforowania.
  • 00:24 Definicja serwera proxy, szczegółowe omówienie jego funkcji.
  • 00:36 Wyjaśnienie działania serwerów proxy odwrotnych z wykorzystaniem Nginx i ich zalet bezpieczeństwa.
  • 01:06 Dyskusja nad powszechnymi zastosowaniami serwerów proxy w środowisku korporacyjnym.
  • 01:48 Wprowadzenie do serwerów buforujących i ich roli w poprawie wydajności internetu.
  • 02:29 Podsumowanie korzyści płynących z używania serwerów proxy w internecie po stronie klienta.
  • 03:50 Wyjaśnienie funkcji zabezpieczeń osiąganych dzięki użyciu serwerów proxy.
  • 05:07 Wprowadzenie do Squid jako serwera proxy oraz jego instalacja w systemie Linux.
  • 05:54 Instrukcja instalacji Squid z użyciem menedżera pakietów Ubuntu.
  • 07:05 Potwierdzenie instalacji Squid i jego statusu operacyjnego.
  • 10:38 Przewodnik po konfiguracji Firefoxa do korzystania z Squid jako serwera proxy.
  • 11:49 Przegląd pliku konfiguracyjnego Squid i jego dyrektyw.
  • 17:45 Szczegółowe wyjaśnienie dyrektyw ACL (Access Control List) używanych w Squid.
  • 29:38 Przegląd konfiguracji w celu ograniczenia typów plików pobieranych przez proxy.
  • 47:33 Dyskusja na temat mechanizmów buforowania i ich wpływu na wydajność webową.
  • 51:13 Demonstracja efektów buforowania poprzez praktyczne przykłady z wykorzystaniem Squid.

Transcription

Right. Hello, everyone. Welcome back. This is a new lecture in your course Projects in Linux by Adionix. My name is Ahmed and this is Section 5 of our class titled Using Linux as a Proxy Caching Server. So let's get started. Okay, so before we get our hands dirty, let's first define what a proxy server is. In a previous section in this class, we have seen the reverse proxy server, and we used Nginx for that. And we mentioned that a reverse proxy is a server or a service that stands in front of your web application to receive requests from clients, routes those requests to the backend server, and receives the reply from that backend server and routes it back to the client. And we also mentioned that this provides a layer of security and a layer of performance, because your application servers are not directly exposed to the internet. Well, a proxy server can also work on the front end part of the communication, you can have as a client, you can have a proxy server. And perhaps this is the more well known usage of a proxy server. When the term proxy server is used, you immediately think of the proxy server that you might be using in your corporate LAN or in your organization. It is that server that is responsible for receiving your requests to visit a remote website or a remote resource or a remote web resource handles this request, it makes this request on your behalf, and it receives the response from the remote web server and routes this response back to you as a client. So why is it also called a caching server? That is because part of the proxy server work is that it caches the response that it receives from the web server, just the same as the reverse proxy works when it caches static content. If you remember from the reverse proxy section, when we said that Nginx caches the requests that it receives, and it also have a cache of the static files to serve them to the clients quickly without having to resort back to the application server. The proxy server on the front end side also does the same thing. But of course, in the reverse manner, it caches the requests that you frequently make, so that when you request the same resource again, from the proxy server, it does not go and make the request once more again, it just serves the already cached content that it has. So in a nutshell, there are numerous reasons for using a proxy server on the client side. Let's go through them quickly. First, it provides internet connection sharing to multiple users. So if you have multiple users in the same LAN or in the same network, they can all have internet connection without having to provide braille IPs for them, they just have to configure their browsers to point to the proxy server that you have installed, and they are already set. It enhances performance by caching responses, you have just mentioned that point. It controls access to the internet by denying specific URLs or domains. And that is, again, one of the advantages of using a proxy server, which is that you can censor malicious sites or inappropriate sites, so that users are denied their access. And that is most commonly used in work environments, schools, universities, and so on. So if you are in a bank, for example, you ought not be opening social media, for example, Facebook or Twitter, for example. So a proxy server may be used to deny access to those social networking sites while you are at work. And the next feature is that it adds a layer of security by hiding client machines and IP addresses behind it. So think of it now you're browsing the internet, your IP address is visible to your to the remote web server that you visit, and also to anybody who is intercepting this connection. So you might be vulnerable to an attack, but a proxy server places itself in front of you as a client, and it connects to the remote web server using its own IP address. So your private IP address or your client IP address is hidden. It is not shown to the remote web server or to anybody who is intercepting the connection. And if an attack happens, then the furthest point that the attacker can reach is the proxy server. And finally, it can circumvent regional internet access restrictions. That is, if we are talking about public proxy servers, I bet you have heard that term before. A public proxy server is a proxy server that is available for free on the internet. Some of them are free and some of them are paid. All what you have to do is just point your browser to that proxy and using this you can pass through any regional restrictions that are in place where you are living. So for example, you can visit a website that is banned by your government by using a proxy server. Okay. The server that we are going to use as an example for this lab is known as squid. Squid is one of the most well known proxy and actually it can work as a reverse proxy as well. For Linux installing squid can be done using the native OS package manager or it can be compiled from source. So let's have a look at our example system or Ubuntu system. Let's open a terminal. Okay, and if you want to install squid all what we have to do is just type sudo apt get install squid three. And this is specific to Ubuntu. You will have to add three at the end to indicate that you are installing squid version three which is the latest version. At the time of this recording, it is going to download the packages installs them on your system. And in a few moments you are done. Okay. Now let's clear the screen and all what you have to do to start squid is just use sudo system CL start squid. Okay, now squid is running. Let's just ensure that it is. Okay, as you can see, I have a service or a process with the name squid running and squid by default listens on port. If you want that stat and grab 3128 like this. Okay, that stat is not found. Let's install it quickly. Okay, sudo apt install net tools. Okay, and now if I used that stat dash to LNP correct 3128, it is listening and it the process that is listening to it is squid. So that is how squid can be installed on your Ubuntu machine and how it can be started. It's just the same like any other service. It is started by using the system CDL. And you can also use this system CDL to check the status of the service just like that, just like any other service running on Ubuntu natively, it's active and it is running. Okay, on a Santos machine, let's have a look quickly. Let's open the terminal. And here, you can run sudo yam dash y install squid just like that. Okay, now let's clear the screen and one service or system CDL, just like we did on Ubuntu. Start squid. Okay, and again, that's that dash to LNP. If I grab squid, I can see that I have the process running on 3128. Okay, those are the ways by which I can install squid on both the RPM based systems like Santos and the Debian systems like Ubuntu. If you want to install squid by compiling it from source, you can do that. Just open your browser and navigate to www.squid-cache.org slash download. And here you can go to official source code release. And here you can find the latest version of squid. And I recommend if you got if I'm going to compile from source, I strongly recommend that you use the stable version and not the beta version, the beta version because the beta version is meant for testing purposes, as it is mentioned here, and it is not ready for production. So you're better off always always sticking to the stable versions. The latest one at the time of this recording is 3.5 point 27, released on the 19th of August 2017. Just click on that. And you will be inside this page where we can download the tar.gz file like this, copy a link location and then inside your terminal, you can just use W get or any tool that downloads files off the internet just like that. Once you download it, you can start compiling it. However, this is an advanced option. And it is strictly meant for people who want to fine grain how squid is run on their systems, like for example, choosing an alternate directory for installing squid, which allows them to install several versions of the product. And let's say examining the changes or differences between between each one of them, or they can activate or deactivate features. This is an advanced way of installing squid. I'll just stick with the default ways of installing it through the package manager. Okay, so now that we have the squid server running, let's start by opening our browser. You're using Firefox here. Okay, and go to settings or preferences depending on your version of operating system. And side preferences go to the network. The latest version of Firefox, it can be accessed here from here at the bottom. In previous versions, it was just a tab here on the top. Now they have moved it to be all stacked over each other. So go to network proxy and use manual proxy configuration. Because I am running a localhost running Firefox localhost, I'm going to specify that the HTTP proxy is running a localhost and I'm going to use this proxy server for all protocols, the SSL FTP. And I'm going to click OK. And now I am using my proxy server to access the internet. So let's type for example, google.com. Okay, and I am effectively using squid to access the internet. Okay, let's minimize this and get back to our terminal. Okay, so let's have a look at our squid.conf file in slash etc slash squid slash squid.conf. And this is not the default squid configuration file that gets shipped with the installation of squid. This is a sample file that I have placed here in order to show you the minimum configuration that is required to start a squid server. So let's have a look at the types of directives that we have here. Well, directives in squid can be one of five types, either a single directive, a single directive is a directive that can be added only once. If you have a look here at the different directives that we have, you can see that we have directives that are repeated, like three or four times or more, like the local like the ACL, for example, we go further, you're gonna see something like HTTP access, this is also repeated, HTTP port only one time, and so on. We have a single directive, a single directive, let's just type here a comment, a single directive is a directive that can only be available once in the configuration file, if it is available, or if it is present more than once, then the last occurrence of this directive is the one get will get executed or the one that will be considered. The well known directive of this type is the log file underscore rotate directive. If this directive is repeated, let's say if I type this, and I specify 10. And by the way, this directive specifies the number of log files that are going to be kept before squid does the log file rotation rotation just basically means purging old log files and overwriting the next most recent log files just a way of preserving space by keeping only a number of log files available and purging or deleting the older ones or archiving them. So here I am specifying that log file rotation should be kept at 10. If I place this, let's have a copy and I placed this at the end of the pile like this and replace this to be instead of 10, it's 30. For example, now when I then when I restart squid, the last occurrence of this directive, which is log file rotate 30 is the one that will get considered the previous one, this is not going to be of any value. So this is called a single directive type. Another directive is called a Boolean. And the Boolean as the name suggests, set the directive to on or off. Like for example, let's see if it is available connection. Okay, it's not available. Let's add it. The Boolean directive, like for example, connection dash oath. If this is true, then this means that I am setting this directive to on if this is false, basically I'm setting it to off. It is another, it is considered a single directive. Naturally, because if I set something to false, then I set it elsewhere in the file to true, then of course the last occurrence is the one that will get considered and the directive is going to be true. Okay, now the most commonly used directive that you're going to see a lot of times in the squid.com file is the multiple directive type. The multiple directive type can take more than one value. If the directive is repeated, the values from each line will be aggregated. For example, the HTTP access file and the ACL type. Now let's have a look at this because this is the most important one. In my opinion, this is the most important directive that you will have to be very confident understanding. Let's have a look at the HTTP access. For example, the HTTP access. This is a directive that is repeated several times in the file. Yet, all those occurrences get aggregated when squid starts, and it is going to aggregate all those values and put them inside one virtual HTTP access directive. So what's the point? Why do you need to repeat the HTTP access directory? Why don't you just put all what you want on the same line? Well, while that is a perfectly valid option. So for example, I can take the connect and the exclamation mark or not SSL underscore ports, we're gonna look at that later in this section. And I can put both of those directives or both of those values right next to this one save ports. So I will have HTTP underscore access deny, not safe ports, not SSL ports, and connect. But this is a better in terms of organization. Here, I'm organizing my directive. So that here I can add a comment deny requests to certain unsafe ports. This is the only directive that is going to deny access to certain unsafe ports. So if I commented out this one, I am only commenting out the deny denying non safe ports from being accessed, the rest of my HTTP HTTP access rules are intact. So deny connect and deny non SSL ports are still in action. So that is one advantage of using multiple directives this way. The other one is that, as you can see here, I have an HTTP access with an allow and another with a deny. So I will have to change the verb or the specific instruction that HTTP access is going to use. And this must be added on a separate line, you can cannot add a deny and allow at the same line, you'll have to add them at two different lines. So for that reason, you have the multiple directive type. And the most prominent or the most well known multiple directive is the HTTP access. And of course, also the ACL or the access control list, we are going to have a look at both of them later in this section. The next type we have is the time, it's not used a lot, or you don't want to see it a lot. If you are just starting to use squid, the time directive is for example, when you want to make a timed value, a numerical value for by seconds, or hours, for example, I can add dead underscore peer underscore timeout. And I can make this 10 seconds. Of course, as the name suggests, this is a time option that specifies when would you consider your peer the squid peer or the proxy cache peer dead if it does not reply or if it does not respond in more than 10 seconds. So you're not supposed to understand exactly what this does, just support just understand that this is a time directive. And squid can infer this seconds string here and convert this whole directive into a timeline it understands this is 10 seconds, I can replace the seconds with minutes, or even hours if I want to like this, and squid is going to understand them. The last one, or the last type is the size. The size as the name suggests, specifies a memory or a file size, it understands values like kilobyte, megabyte or gigabyte, for example, I can add cache underscore mem, or the cache memory, I can change this to be 256. So I can add MB at the end, and squid will understand that this is a megabytes and that this is of course the cached memory maximum size to be 256 megabytes. Okay. So that is the squid configuration file at a glance. And that is the different directives that it has or that you can add. All right, now that we are actively editing our squid.com file. Let's first ensure that we are editing it using the sudo command so that we can make the changes that we need with the root account. So let's search for HTTP underscore port. This is the first directory that we have. And as you can see here it's listening at port 3128. This is the default port that squid listens at this of course is changeable. And it is not only changeable, but also you can specify how it is going to listen or where it is going to listen. So for example, I can add that the HTTP port is on 3128, but only is going to listen to traffic coming at 127.0.0.1 which is local host. This means that if I issue now sudo systemcdl restart squid just like that may take a moment or two to restart squid is one of the services that takes time to restart. So as we're going to see in later in this section, if you have a change, you will be better off using the reload sub command of the systemcdl rather than restart so that it does not take that much time. Okay, now if I try to issue netstat dash TULNP and then grep 3128 you're going to see that it is listening, but it is only listening to 127.0.0.1. If you remember from the previous lecture when we issued the netstat command, this was an asterisk which meant that it is listening on all the available interfaces. But now we have restricted access to becoming only from localhost. This means that squid will not respond to any connections coming from any other network interface but the localhost. In other words, it's going to respond to all the requests coming from Firefox or any local browser on this machine. Okay, so let's go again to our configuration file. And actually, the HTTP underscore port is one of the multiple value directives. This means that I can do something like this HTTP underscore port, and I can specify another IP address. And I can also say that squid is going to listen on this IP address on a different port, let's say 8080. I'm going to save this and I'm not going to restart. I'm just going to have a look at the IP address that I'm having on my second interface. It's 192.168.1.117. So let's use this 192.168.1.117. This means that now let's issue again, restart squid. Okay, and now if I issued netstat, but this time I'm not going to grep on the port, I'm going to grep on squid itself. And as you can see here, I'm listening on two interfaces. I'm listening on the local host at port 3128. And I'm listening on the public interface on a public host on port 8080. So let's see how this can be used effectively. Let's have a look at our Santos machine. Okay, let's log in here. And I'm going to open Firefox on this machine. This machine is on the same network as the Ubuntu machine, I'm going to use it as a client for my squid server. So I'm going to start Firefox, I'm going to go to preferences, same way I did on the Ubuntu machine. However, this time, I'm going to notice here how the network is still in the tabbed version. Okay, go to settings manual. And here I'm going to add the IP address of my Ubuntu machine or my proxy server, it was 1.1.17. If you remember, and it's 8080, I'm going to use this proxy server for all protocols, I'm going to press OK. Now, let's open a new tab. And if I go to google.com, for example, I'm going to be navigating to google.com. However, this time, I am navigating through the proxy server that is located on the Ubuntu server. Okay, but notice here that I'm using a port 8080. This is because I'm accessing this Ubuntu machine from the network from the LAN, not from localhost. If I'm accessing it from localhost, I'm going to have to use the 3128, same way we did right now, just moments ago. Okay, so that is how you can change the default port that squid listens at, I'm going to leave it like that for the rest of this lab. And now let's have a look at the most important directive of the squid.com file, which is the access control list or the ACL. ACLs, let's go to the start of this file. ACLs are used to control access to squid or web resources. So they do two things, they control which clients can connect to squid. And don't confuse this with the change that we have just made, which is that we have restricted access from the local machine to only port 3128 and specified access from the outside network on port 8080. This is not a restriction, this is just a configuration so that you can just differentiate between different types of connections, those are coming locally, or those who are which are coming from the outside. But if you want to restrict access and give the necessary authorization to users, you must use the ACLs. So ACLs can do two things, they can restrict access to squid from outside sources. And they can also restrict access from the squid server to web resources. Let's see how the general format of an ACL rule is ACL, let's see, the ACL, followed by a name, this is a name, this is not a type, this is a name you give any name you desire. It is called here by default localnet. Then you have the type of access, the type here is SRC, which stands, of course, for source. And I think this is very self explanatory. This is a rule that is called localnet. And it is specifying that or it is indicating that the source must be coming from the subnet 10.0.0.0 slash eight. This means that this rule is specifying that a source will be coming from this subnet. And notice here that we did not make any restrictions yet, this is just specifying a range of IP addresses, a range of hosts, a range of machines, no more than that. And it is specifying that those machines should be of type source. Okay, we are going to see what we can do with that later. But just understand for now that ACL is only indicating a range of IP addresses or a number of hosts, just a specific range, no rules yet. Okay. And of course, as we mentioned before, ACL is one of the multiple value directives. So I can place as many ACLs as I do as I need. And as long as I have the same name here, all those are going to be aggregated when squid starts, so that this is going to be read, like ACL localnet, followed by all of these address ranges. So I have the 10.0.0.0 slash eight, and also on 7216 00, this is these are generally the private IP address ranges, squid is configured by default to allow access from those ranges. That's why we didn't, we didn't need any configuration when we try to access the Ubuntu machine or the squid server from our centers machine, because it was having an IP address that falls in this wide range of addresses. 192.168 is of class B here, if you have some network background, this is of class B. So of course, 192.168.1 falls inside this range. That's why it has been granted access. But let's see what happens if I commented out this. And that is, as we mentioned before, one of the reasons why you use multiple directives so that you can selectively comment or uncomment the selection that you want them that you don't want to be in action. So if I did that, and if I use sudo system CDL, reload, squid notice here that I'm using reload, rather than restart, it's going to take a fraction of a second, while in the restart command, it took about five or six seconds to restart. Now if I go to Santos, okay, and login. Now if I go here to Google, and I refresh this page, I'm going to see that I no longer can access the proxy, the proxy server is refusing connections. That is because I have commented out this line. Because the sentence machine is falling inside the same address range, like you want to 192.168.0.0. And I have commented out this range. Now, the sentence machine is no longer able to access my squid server. Okay, let's uncomment it. And let's reload. In order to be able to access it again. What can you do with ACLs? Or what can you specify with an ACL? ACL is a very powerful tool. You can use it to fine tune what exactly you want to capture. The following are different ways you can define the target of an ACL. First, you can use just a single IP address. Let's just add an ACL like this. And I can say that an ACL is a single IP address. And I can say that an ACL of Santos is 192.168.1.1.18. Actually, I don't know what the IP address of Santos is. Let's have a look at it quickly. Okay, it's 119. So let's let this be 119. And I'm gonna call this Santos. And of course, don't forget to specify the type of this range or this host name, or this host IP address, it is of type source. So I am going to name this Santos. And now I can just close this very wide range of addresses, and just restrict access to let's just comment out this, this, and that these are IPv6 addresses. So here I have my Santos machine, which is source accepted. And we also want to specify my local net, I can add a local net that is 127.0.0.1 naturally, and also my own IP address 192.168.1.1.17. This is to allow myself to access the squid proxy server. So that is one way of specifying ACLs. Another way is that you can use an entire subnet, we have seen that already when we use that one, or that one. And you can also add multiple subnets on the same line, or you can mix and match them. So I can add Santos, which is this IP address or any IP address that falls in this range. This is for strict example. Okay, so it's a single IP address followed by a range can be two ranges, or it can be two IP addresses, whatever you desire. You can also use a domain name. But this time you can you cannot use the SRC alone, you have to use an SRC domain or a DST domain for destination, we haven't covered destinations yet. I can say for example, search engines. And I can see, I can specify that DST domain is Google, or dot Google.com dot Bing.com. Now notice here that I have added a dot at the start of the domain name. And that is necessary, because if I left it just like that, Google.com, it's gonna match only Google.com, it's not gonna match something like mail.google.com, or www.google.com, or even plus the google.com. So I will need to be greedy. And I'm gonna add just a dot at the start of the domain name so that it will match anything that ends in google.com. So, let's save this. And let's reload. Let's just ensure that everything is working as expected. I'll refresh this. Okay, as you can see, Google is still loading. And the reason why because we only specified here, a rule, we haven't told squid yet what it has to do with this rule. That is the role of the HTTP access directive. The HTTP access is another multiple line directive. And it specifies or it dictates what to do with an ACL. So I can say for example, HTTP access deny search engines. Just like that search engines have been specified here. The destination domain that is the google.com or bing.com and I can save reload. Now if I go here and let's open another tag, type for example, bing.com, you're gonna see that, okay, the requested URL could not be retrieved. And that is the error message that squid is presenting us or clients clients access is denied. Now, if I try to type www.bing.com, it's going to give me the same result. And even if I try to type something that is does not exist, like for example, media admin, the comics think that this URL exists, but before it attempts, and that's very important before it attempts to access this URL, it's going to give you access tonight, even if I type gibberish, like this, something that I'm sure does not exist, like for example, anything, it's going to give you access tonight. Okay, so here I specified an ACL, and I give it a name. And then I combined it with HTTP access with a verb, and HTTP access can take a deny or allow. And if you deny you can deny. And you can use also the not or the exclamation mark to fine tune what you are going to do. So let's say for example, for this HTTP underscore access deny, not safe ports. So what are the safe ports, those are the safe ports, they are specified here in the ACL of type port, this is another type that ACL can use. So you can use ACL to specify source, destination, source domain. So I can also specify here. For example, safe domains. And I can say SRC domain. And if it is coming from my company.com is considered to be a safe client. And I can say here that I want to deny everybody. Or let's say allow everybody. Allow everybody. That is from my safe domain. Or I can say deny anybody who is not within my safe domains. See the difference. Okay, so let's go back here. And that is exactly the same thing that is done here using ports is just changing the type of criteria or the type of selection from source or destination to SR to port. So here I have the ports that squid is allowed to access. As you can see, these are the most well known ports of the web 4434 HTTPS 84 HTTP, FTP, gopher, HTTP management, and so on. And even if you had a port that is not registered or that does not have a specific name, yet, like a random port, you can add anything that is from 125 to 6,000 to 65,535. Those are the non system ports. But for example, you cannot connect to a web server on port 22, for example, because port 22 is specific for the SSH communication. Same thing holds for port 23, which is for telnet. Trying to access such ports using the web protocol might be a security threat. And that is why squid is denying you from using those ports for access. Actually, you can get more restrictive and you can comment this wide range of ports, for example, and say that you are going to add port, for example, 8080 and say 8008. And you can import some demand. For example, if you know that, especially if you're in a corporate environment, and you know that a web server or a web application is using a specific port, you can explicitly allow access to that port. Allowing access to this wide range of ports might be considered a security vulnerability in some restrictive environments. So according to your policy, according to what your management needs, you will have to fine tune those settings to fit your exact needs. Okay, so as you can see here, ACL accepts the source IP address, the destination IP address, the source domain, the destination domain. And that is always always combined with the HTTP underscore access directive, because this is what actually makes the rule run. And let's let us hear something that is very interesting. This directive, and finally deny all other access to this proxy HTTP access, deny all. Why do we place this here or why the default configuration places this access rule at the end. This is because the HTTP access directive works the same as the IP tables. If you have worked with IP tables before, and the IP tables is the firewall of Linux, it works by reversing the rules from up to down or from top to bottom. So let's have a look. Let's ignore the ACL for the moment. And let's concentrate on the HTTP access. Now we have this rule deny search engines, okay, deny non safe ports, okay, deny connect, and so on. But what but what if I have a connection that does not fall into any of those rules that is not governed by any of those rules, something that I have not taken care of something that I have not considered. This is going to fall directly into this one, if I have an access that is not governed by any of the rules that I have just set, then simply I want this access to be denied because I do not know anything about this access. In other words, I have an attempt to connect to my squid server, for example, that does not match any of those rules for some reason or another, maybe because I forgot to add an appropriate rule, maybe because it's an attack or something. This is a fallback rule or a catch all as it's sometimes called a catch all rule. Anything that does not fall, or it is not measured by any of the above rules is going to be denied. So as you can see here, squid is a very powerful server, it can be used to control access to the internet, it has it has its own access control lists to restrict access to itself, and also to restrict access to the outside resources. Sometimes you are not interested in denying specific websites or specific domains, as much as you are interested in denying specific types of files. This is very common in corporate environments and organizations like schools or universities where for example, you are allowed to visit whatever site you want, just make sure that you don't download an mp3 file or a video like an mp4 file, or so on. So here I want to make a slightly more complex rule that can be used to achieve this target. So let's go to our default configuration file, and I'm going to add here a new ACL rule. Let's give it a comment that deny downloading audio and video files. And for this specific example, I'm not going to use all or most of the commonly used media extensions. For example, we have a lot of media extensions that can be added here we have mp3, we have mp4, we have mov, we have wmv, avi, and a lot and a lot of extensions that can be added in this rule if we seriously want to deny access to such types of files. But for the sake of simplicity, I'm just going to add the mp3 and mp4 file extensions. I'm going to use ACL here, I'm going to call it media files. This is the name of the ACL. And now I'm going to use a type that we haven't used before. It's called the URL path underscore underscore regex or regex. The URL path underscore regex is using the regular expressions regular expressions is a vast topic of its own. It is just a way of selecting strings of text using a predefined pattern. This is a large topic. And you can have a look at it, you can just Google regular expressions and have a look at how regular expressions are constructed. However, I'm going to just use a very, very simple regular expression. It's not going to be hard even if you haven't seen regular expressions before. But in order to do that, I'm going to have to add the dash I and dash I here specifies that I'm going to use a case insensitive regular expression search or regular expression matching. That is because I'm going to tell you in a moment why let's just use the regular expression itself, I'm going to specify that anything that ends in or that has mp3 or specified by the pipe sign mp4 files. This is just this is just the this is just the ACL, I'm gonna have an HTTP access rule now in order to deny access to those files. Just have a look at this regular expression. This backslash indicates that I want anything that has a literal dot. That is because dot is a regular expression character in self dot specifies any character, or this means any character in regular expressions. But here, I'm not interested in having any character that precedes mp3. Because I might, for example, have something like some text, mp3, I don't want this file to be denied, if it is found inside the URL, I want anything that has dot mp3 to be denied. So if I want to do that, I'm going to have to have the literal meaning of the dot, as we just mentioned, dot regular expressions have a special meaning, but I know I do not need this special meaning, I want the literal dot, so I have to escape it using the backslash backslash here escapes the dot. And it indicates regular expression that we want to use dot as its literal meaning just as a dot. Then I have a group. And a group contains mp3 or mp4. As I mentioned, I can add here as much extensions as I need, I can add mo v, I can add AVI, I can add WMB, I can add VOB, I can add whatever extensions that I want to deny, or that I want to select because this is not a denial rule, this is just a selection rule, you select the extensions that you want to have in your rule, you are not denying or allowing anything yet. Okay, so notice here that we have two types of ACL that uses regular expression in the URL, we have this, we have another one, just without URL path, just URL underscore regex. And the difference is huge. The difference is that the URL path is going to search for the presence of mp3 or mp4 or technically speaking dot mp3 or dot mp4 inside the URL path. The URL path is anything that is like if I have example.com dash something dash another thing dash file dot mp3, for example. Starting from here from something this is called the URL path. This is the path of the URL. But the URL itself if I have used the URL, this refers to everything from the start of the URL till the end. So the reason why I'm using URL path is that I want to ensure that dot mp3 is present in the URL path. So let's comment this out. Okay, and we are only interested in the mp3s that rely inside the URL path, specifically that come at the end of the URL just to be more specific or most specific, we want to match a URL like this one. Okay, and I will have to also add the necessary HTTP access rule deny access to media files. And HTTP access deny media files, deny the ACL that is called media files. Okay. Okay. And I have enabled access to the search engines because I'm going to use Google to search for mp3 files, just to ensure that I cannot download them. A simple hack to do that is just search in URL mp3, you're going to find you have lots of sites that offer mp3 files. Okay, if we try to download any of those, as you can see here, following error was encountered while trying to retrieve the URL so and so that mp3 access is denied. However, if you go to any other website, like mp3.com, for example, it's going to load without any problems. However, if you try to download any mp3 files, either from this website or from any other websites, as we have just seen, it is going to be denied. And one last thing before we close this session is the cache. We have mentioned already that proxy servers can be used for both boosting performance and protecting the internal network. We've already seen how squid can be used to control the internet, the internet access. Let's see how it can be used for making webpages load faster. That is pretty easy. In the configuration file, you are going to search for a directive that is called cache underscore dire. The cache dire is commented out by default and the default configuration file of squid. All what you have to do is just specify which directory do you want to store your cache is by default inside this directory, you can change it however you like. And you can change any of those three numbers. First, the UFS is the type of file system or file format that is used by squid to store cached web content. Then have a look at those numbers. The first number is the maximum amount of space in megabytes that squid will use for caching. So here I have 100 megabytes. Let's say I want to make them 200. I need to store more data, more cache data. The second 16 is the maximum number of directories that the file system will hold. So if you have a shortage in the I nodes of your file system, or the maximum number of directories that your file system can hold, you may want to keep that number at a low level, let's keep it at 1616 sounds fine. And the last number is the number of sub directories that this sub director that this directory can hold. So we have 16 directories underneath them, you cannot have more than 256 directories or sub directories if you wish to be more technical underneath those directories, I can make this 512. For example, so I'm going to change this cache directory to be slash bar slash cache slash squid and exit. And before we can test our configuration, we need to do a couple of things. First thing is that we need to ensure that this directory exists slash bar slash cache slash squid, okay, it does not exist. So let's make it bar cache squid. Okay, you sudo for that, of course. And the second thing we need to do is that we need to make the squid user and owner for this directory, the squid user is called proxy. So I'm going to use of course, sudo, change ownership, dash, or proxy for username and proxy for a group on bar cache squid and use the bar dash R in order to make this recursive. Okay, then I'm going to need to stop squids like this. And I'm going to need to run this command squid dash z, or dash z. This command is going to create the missing swap directories and the missing cache directories inside the cache directory that we have specified in the configuration file. So I'm going to click Enter. Okay, and as you can see here, it's created the missing directories in our cache directory. Now I can do sudo systemctl start squid like this. And if I open the browser, and I did a couple of browsing, like for example, to the Wikipedia, make some browsing inside, let's have a look at oracle.com, for example. Okay, just a little browsing. If I try to refresh this page, you'll notice that it's a little faster than it's it loaded the first time. Refresh this one. Refresh this one. It's loading a little bit faster, if you notice. Refresh this one, the mp3.com. And if we go here and visit our bar cache squid, okay, and let's do sudo du dash sh du for disk usage, going to see that I already have 33 megabytes of data. These are the directories that have been created for us by squid and these cache the files and images and CSS files and JavaScript files, everything that we visit, when using the proxy server is going to be cached here for better retrieval, it's going to be retrieved directly from this directory, when we access it again. So that is why we are using squid. That is why we are using a caching server.