How Does Your Browser Find a Webpage?
Let’s see what happens when you type “holbertonschool.com in your browser and press “Enter”.
When we want to access a web page from our computers or from a mobile device, first we need to use a browser. A browser is a program like Google Chrome, Safari, Microsoft Edge, Brave or Firefox that can process your request to see a specific web page and render it so that you can see it in your device.
The device from which we intend to access the website is called a client; clientes send requests through their web browsers to other computer programs or machines called servers. In order to find that web page you want to see, your browser needs to locate it. A location in the internet is an IP address (Internet Protocol). In the same way that we used a phone directory in the past to find a phone number listed by last name, we have IP addresses to find web pages in the internet. You can go a this webpage https://www.ipaddress.com/ip-lookup and type holbertonschool.com and they will give you one or more IP addresses that you can go to. The good news is this is no longer necessary. Just like we stopped needing those phone directories and we just type the name of the person we want to call in our cell phones, our browser will do the job for you. It will process the address that you type and will find its IP address. You need to type in a URL (Uniform Resource Locator) which usually includes protocol, a subdomain and a domain. For example, in the URL http:www.holbertonschool.com, we can identify four parts: the protocol, the subdomain, the domain and a top level domain.
Protocol: HTTP, Hyper Text Transfer Protocol
Subdomain: WWW, World Wide Web
Top Level Domain: .com, this domain gives your information about the kind of web page you are about to visit: .edu, .org, et cetera.
With this URL, your browser will try to find the IP address in its local memory. If it is not there, it will have to look for it somewhere else. Several things can happen then. First it will look for this IP address by asking it to our IPS (Internet Service Provider), which is the company that you hire to provide you with internet access. This IPS usually is the one who determines which DNS Resolver your browser should send its request to. A DNS resolver is a server that converts Domain Names to IP addresses by looking in its database or finding it in other servers.
Once your browser has the IP address, it has a second problem to solve: the protocol. The most common protocol to request web pages are HTTP and HTTPS; that last S stands for Secure. If the connection that you are trying to establish requires a secure connection (HTTPS), this means that the information that you send and receive to and from the website is encrypted. You can identify that your connection is secure when you see a small lock on your web browser next to the URL. If you want to see an example of this, look for a bank’s web address and you will see that the address bar gets green and shows the small lock. This means that they have a certificate like the SSL (Secure Socket Layer). Sometimes when you try to access a website that is not secure, your browser might show you a warning telling you their certificate has expired and they advice you not to visit that web page. Another obstacle that your browser has to deal with is the presence of a firewall. A firewall is another layer of protection that your browser has to control what information leaves or enters your machine. It does so by controlling the ports that your machine allows to receive information and also by controlling the content that can go through this wall.
Let’s use once again our analogy with a traditional phone call. What happens when you dial a phone number and that number is on a call? If it’s a traditional phone: a landline, the connection won’t be possible because the number is busy and you will have to try again later. If you are trying to contact a call center, however, that number that you dialed can be redirected to different lines that can take your call. You will be added to a queue and wait until one of the operators is free to take your call. There is something similar going on when you try to access a web page. If you are trying to reach a simple web page that has a single IP address and there is a lot of traffic, it’s possible that that web page receives more traffic than it can handle and it collapses. This is called a SPOF (Single Point of Failure), this is a web page that is hosted in one single server so if that server stops working, the web page won’t be available. Because of this, a web page that has to deal with a lot of traffic needs to have its version of a call center. That “call center” which manages large numbers of requests and redirects them to redundant servers (identical servers) is called a load balancer. This load balancer is a server that uses an algorithm (there are several) to choose the most appropriate server to respond to a request. Some algorithms take into account the state of the servers, the number of requests it has to serve, its availability, etc.
The load balancer distributes requests to a web server, which stores static content like text, images or videos but it is possible that the web page that you are requesting doesn’t just serve static content but also runs an application. For example, when you try to access your bank account to make a transaction. In this case, your web server will have to be connected with an application server. This application server stores and executes code but it will probably need to be connected to a database. In the case of your bank account online, it will have to access a database where your login information is stored, and your bank account information, etc. in order to perform the transaction you need to do.