When you enter a URL, what exactly happens in the background?

When you enter a URL, what exactly happens in the background?

As a software developer, you must have a complete hierarchical understanding of how network applications work, which also includes the technologies used by these applications: browsers, HTTP, HTML, network servers, request processing, etc.

This article will take a closer look at what happens in the background when you enter a URL.

1. First, you need to enter the URL in your browser :

2. The browser looks up the IP address of the domain name

image

The first step in navigation is to find out the IP address of the domain name you are visiting. The DNS lookup process is as follows:

Browser caching – Browsers cache DNS records for a period of time. Interestingly, the operating system does not tell the browser how long to store the DNS records, so different browsers will store them for a fixed period of time (ranging from 2 minutes to 30 minutes). System cache – If the required record is not found in the browser cache, the browser will make a system call (gethostbyname in Windows). This will get the records in the system cache. Router Cache – Next, the previous query request is sent to the router, which usually has its own DNS cache. ISP DNS Cache – The next thing to check is the ISP’s DNS cache server. The corresponding cache records can generally be found here. Recursive search - Your ISP's DNS server performs a recursive search starting from the root nameservers, from the .com top-level nameservers to Facebook's nameservers. Generally, the DNS server cache contains the domain names in the .com domain name server, so the matching process to the top-level server is not so necessary.

DNS recursive lookup is shown in the following figure:

500px-An_example_of_theoretical_DNS_recursion_svg

One of the things that's worrying about DNS is that entire domains like wikipedia.org or facebook.com appear to correspond to just a single IP address. Fortunately, there are several ways to eliminate this bottleneck:

Round Robin DNS is a solution when multiple IPs are returned during a DNS lookup. For example, Facebook.com actually corresponds to four IP addresses. A load balancer is a hardware device that listens on a specific IP address and forwards network requests to the servers in the cluster. Some large sites generally use this expensive, high-performance load balancer. Geographic DNS improves scalability by mapping domain names to multiple different IP addresses based on the user's geographic location. This way different servers can't update in sync, but it's great for mapping static content. Anycast is a routing technology that maps an IP address to multiple physical hosts. The only drawback is that Anycast does not adapt well to the TCP protocol, so it is rarely used in those solutions.

Most DNS servers use Anycast to achieve efficient and low-latency DNS lookups.

3. The browser sends an HTTP request to the web server

image

Because dynamic pages like the Facebook homepage will expire soon or even immediately in the browser cache after opening, they cannot read from it.

So, the browser will send the following request to the Facebook server:

 GET http://facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: facebook.com
Cookies: datr=1265876274-[...]; locale=en_US; lsd=WW[...]; c_user=2101[...]

GET This request defines the URL to read: "http://facebook.com/". The browser defines itself ( User-Agent header), and what type of responses it wants to receive (Accept and Accept -Encoding headers). The Connection header asks the server not to close the TCP connection for subsequent requests.

The request also includes the cookies stored by the browser for the domain. As you probably already know, cookies are key-value pairs that track the state of a website across different page requests. In this way, cookies will store the login username, the password assigned by the server and some user settings. Cookies are stored in the client as a text file and sent to the server each time a request is made.

There are many tools available for viewing raw HTTP requests and their responses. The author prefers to use fiddler, of course there are other tools like FireBug. These software can be of great help in website optimization.

In addition to get requests, there is also a send request, which is often used when submitting forms. Send a request passing its parameters via the URL (eg: http://robozzle.com/puzzle.aspx?id=85). A send request sends its parameters after the request body headers.

Like in "http://facebook.com/" the slash is crucial. In this case, the browser can safely add the slash. For an address like "http://example.com/folderOrFile", the browser cannot automatically add a slash because it does not know whether folderOrFile is a folder or a file. At this point, the browser accesses the address directly without the slash, and the server responds with a redirect, resulting in an unnecessary handshake.

4. Permanent redirect response from facebook service

image

The picture shows the response sent back by the Facebook server to the browser:

 HTTP/1.1 301 Moved Permanently
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: http://www.facebook.com/
P3P: CP="DSP LAW"
Pragma: no-cache
Set-Cookie: made_write_conn=deleted; expires=Thu, 12-Feb-2009 05:09:50 GMT;
path=/; domain=.facebook.com; httponly
Content-Type: text/html; charset=utf-8
X-Cnection: close
Date: Fri, 12 Feb 2010 05:09:51 GMT
Content-Length: 0

The server responds to the browser with a 301 Permanent Redirect response, so that the browser visits "http://www.facebook.com/" instead of "http://facebook.com/".

Why does the server have to redirect instead of directly sending the web page content that the user wants to see? This question has many interesting answers.

One of the reasons has to do with search engine rankings. You see, if a page has two addresses, like http://www.igoro.com/ and http://igoro.com/, search engines will think they are two websites, resulting in each having fewer search links and lower rankings. Search engines know what 301 permanent redirect means, so they will rank visits to addresses with and without www in the same website.

Another is that using different addresses will make it less cache-friendly. When a page has several names, it may appear several times in the cache.

5. Browser tracking redirect address

image

Now the browser knows that “http://www.facebook.com/” is the correct address to visit, so it sends another GET request:

 GET http://www.facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Cookies: lsd=XW[...]; c_user=21[...]; x-referer=[...]
Host: www.facebook.com

The header information has the same meaning as in the previous request.

6. The server "processes" the request

image

The server receives the GET request, processes it, and returns a response.

On the surface this may seem like a straightforward task, but there are actually a lot of interesting things happening in the process - for a website as simple as the author's blog, let alone a website with a large number of visitors like Facebook!

Web server software
Web server software (like IIS and Apache) receives an HTTP request and then determines what request processing to perform to handle it. A request handler is a program that understands the request and generates HTML in response (like ASP.NET, PHP, RUBY...).

In the simplest example, demand processing can be stored in a file hierarchy that maps the structure of a website address. For example, the address http://example.com/folder1/page1.aspx will map to the file /httpdocs/folder1/page1.aspx. The web server software can be configured to manually process requests for addresses, so that the publishing address of page1.aspx can be http://example.com/folder1/page1.

Request processing
The request handler reads the request and its parameters and cookies. It reads and possibly updates some data and stores it on the server. The request processing then generates an HTML response.

All dynamic websites face an interesting difficulty - how to store data. Most small websites will have a SQL database to store data, and websites that store large amounts of data and/or have a lot of traffic will have to find some way to distribute the database across multiple machines. Solutions include: sharding (distributing data tables into multiple databases based on primary key values), replication, and simplified databases using weak semantic consistency.

Delegating work to batch processing is a cheap technique for keeping data updated. For example, Facebook has to update its news feed in a timely manner, but its data-supported "People You May Know" feature only needs to be updated every night (this is the author's guess, and it is unknown how this feature will be improved). Batch job updates may cause some less important data to become obsolete, but can make data update work faster and more concise.

7. The server sends back an HTML response

image

The figure shows the response generated and returned by the server:

 HTTP/1.1 200 OK
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
P3P: CP="DSP LAW"
Pragma: no-cache
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
X-Cnection: close
Transfer-Encoding: chunked
Date: Fri, 12 Feb 2010 09:05:55 GMT

2b3Tn@[...]

The entire response size is 35kB, most of which is transferred as a blob after being trimmed.

The Content-Encoding header tells the browser that the entire response body should be compressed using the gzip algorithm. After decompressing the blob, you can see the expected HTML:

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en" id="facebook" class="no_js">
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-language" content="en" />
...

Regarding compression, the header information indicates whether to cache the page, how to do it if cached, what cookies to set (this is not included in the previous response), privacy information, etc.

Please note that the Content-type header is set to " text/html ". The header tells the browser to render the response content as HTML instead of downloading it as a file. The browser decides how to interpret the response based on the header information, but also considers other factors like the URL extension.

8. The browser starts displaying HTML

Before the browser has even finished reading the entire HTML document, it starts displaying this page:

image

9. The browser sends a request to retrieve an object embedded in HTML

image

When the browser displays the HTML, it notices the tags that need to fetch the content of other addresses. At this point, the browser sends a get request to retrieve the files.

Here are a few URLs we need to re-fetch when we visit facebook.com:

picture
http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif
CSS Style Sheets
http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css
JavaScript files
http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js

These addresses go through a process similar to HTML reading. So the browser will look up those domains in DNS, send the request, redirect, etc...

But unlike dynamic pages, static files allow browsers to cache them. Some files may not need to communicate with the server and can be read directly from the cache. The server's response includes information about how long static files should be kept, so the browser knows how long to cache them. In addition, each response may contain an ETag header (the entity value of the requested variable) that works like a version number. If the browser observes that the version ETag information of the file already exists, it will immediately stop transmitting the file.

Try to guess what " fbcdn.net " stands for in the address? The clever answer is "Facebook Content Delivery Network". Facebook uses a content delivery network (CDN) to deliver static files like images, CSS sheets, and JavaScript files. Therefore, these files will be backed up in many CDN data centers around the world.

Static content often represents the bandwidth size of the site and can be easily replicated through CDN. Usually websites use third-party CDN. For example, Facebook's static files are hosted by Akamai, the largest CDN provider.

For example, when you try to ping static.ak.fbcdn.net, you might get a response from an akamai.net server. Interestingly, when you ping again, the server that responds may be different, which means that the load balancing behind the scenes has started to work.

10. The browser sends an asynchronous (AJAX) request

image

In the spirit of Web 2.0, the client remains connected to the server after the page is displayed.

Take Facebook's chat function as an example. It will keep in touch with the server to update the status of your bright and gray friends in time. To update the status of these friends whose avatars are lit, the JavaScript code executed in the browser sends an asynchronous request to the server. This asynchronous request is sent to a specific address and is a programmatically constructed get or send request. Still in the Facebook example, the client sends a publish request to http://www.facebook.com/ajax/chat/buddy_list.php to get the online status information of which of your friends are online.

When talking about this pattern, we have to talk about "AJAX" - "Asynchronous JavaScript and XML", although there is no clear reason why the server responds in XML format. As another example, for asynchronous requests, Facebook will return some JavaScript code snippets.

Among other things, fiddler is a tool that allows you to see the asynchronous requests sent by the browser. In fact, not only can you passively watch these requests, you can also take the initiative to modify and resend them. The fact that AJAX requests can be easily fooled is frustrating for developers of online games that keep score. (Of course, don’t lie to others like that~)

Facebook's chat feature provides an interesting example of the problem with AJAX: pushing data from the server to the client. Because HTTP is a request-response protocol, the chat server cannot send new messages to the client. Instead, the client has to poll the server every few seconds to see if it has any new messages.

Long polling is an interesting technique to reduce server load when these situations occur. If the server has no new messages when polled, it ignores the client. When a new message is received from the client before the timeout, the server will find the unfinished request and return the new message to the client as a response.

<<:  Implementation of Nginx configuration Https security authentication

>>:  Superficial Web Design

Recommend

Vue implements a visual drag page editor

Table of contents Drag and drop implementation Dr...

HTML table markup tutorial (2): table border attributes BORDER

By default, the border of the table is 0, and we ...

Install Linux using VMware virtual machine (CentOS7 image)

1. VMware download and install Link: https://www....

5 tips for writing CSS to make your style more standardized

1. Arrange CSS in alphabetical order Not in alphab...

JavaScript canvas to achieve meteor effects

This article shares the specific code for JavaScr...

Detailed description of common events and methods of html text

Event Description onactivate: Fired when the objec...

Detailed installation instructions for the cloud server pagoda panel

Table of contents 0x01. Install the Pagoda Panel ...

Example code for implementing beautiful clock animation effects with CSS

I'm looking for a job!!! Advance preparation:...

Installation, configuration and use of process daemon supervisor in Linux

Supervisor is a very good daemon management tool....

Four methods of using JS to determine data types

Table of contents Preface 1. typeof 2. instanceof...

Designing the experience: What’s on the button

<br />Recently, UCDChina wrote a series of a...

HTML Frameset Example Code

This article introduces a framework made by Frame...

Detailed explanation of several examples of insert and batch statements in MySQL

Table of contents Preface 1.insert ignore into 2....