An Introduction to HTTP: The Foundation of the Web
The HTTP protocol is everywhere. You used it to receive this blog post, to pay for your bills on your bank site, or to look something up. The modern web requires it and it’s the underlying base of it all. HTTP, at its core, is simple and runs off of just 2 ideas: getting data and sending data. That’s it. HTTP stands for “HyperText Transfer Protocol”, which we can break down into the following. Hypertext means websites, transferring is the act of moving data from one computer to another and a protocol is a way two computers can communicate. Today we’re going to be demystifying this protocol. In the future, I plan on posting more about the different aspects of HTTP so I hope this post will create a strong foundation we can build on. Let’s get to it!
The Basics of How the Internet Works
A client is anything that requests information from a different computer, which we call a server. Your browser acted as a client when you requested this website and does so whenever you hit enter in the address box. The server/computer that my site is on can be identified with a name. We call those names URLs. For example, when you type in wikipedia.org in your browser you are sending a request for data from Wikipedia’s servers. How the names you type in your browser relate to computers far away is a whole other topic which relates to DNS resolution and TCP/IP. For now, all you need to know is that you can use URLs to request information from another computer that can be close by or far away.
Let’s use an analogy so we can gain a deeper understanding of this. You can think of the Internet like the mailing system. If you were writing a letter to a friend, asking for some information, what would you do? First, you’d probably write down the address where you want your letter to go to, which perfectly relates to the idea of URLs. Then, you’d drop it off at a mailbox. The mailing system figures out a way to get it to its destination, that’s the DNS resolution or TCP/IP part we just mentioned. We don’t really care that much in this post about HOW it gets to its destination, we just want to know that it does. Now before you send the letter, you wrote down your return address, language you speak, and what data it is you’re looking for. In the HTTP world, that would be the equivalent of the headers and body you include in a request, which we’ll discuss later. Eventually a letter makes its way back to you with the information you requested! We’d call that a response. Like sending letters between two friends, HTTP is made up of requests
and responses
. You send a request, you get a response. There’s a newer version of HTTP called HTTP/2 that can break free of the request-response model, but it still remains the foundation and it’s the aspect we’ll be focusing on today.
Let’s go back to our mailing analogy for a second. You see, there’s lots of things you can send though the mailing system. We sent letters, but you can also send big packages, small packages, verified mail, etc. Just like with the mailing system, HTTP isn’t the only method we can use to transfer data from computer to computer. Another method you’re probably familiar with is email which uses its own protocol. To continue with our analogy, we can think about email as using the same mailing system, but sending packages instead of letters.
HTTP in Action
Before we begin breaking down HTTP and how it all works, I wanted to show you an example of what an HTTP request and response looks like so you know what to expect. It’s okay not to fully understand what’s going on, all we want to do is get your feet wet with HTTP. Try changing the values in the request side and see what happens! Changes will appear in the response box. Click “Send request!” to see what changes!
Request:
Response:
HTTP/1.1 200 OK date: Sun, 08 Dec 2019 10:00:00 GMT content-type: text/html; charset=UTF-8 server: nico-computer Hello!
Verbs
With a general idea of what’s going on, we can now dive into the nitty gritty of the protocol. HTTP revolves around a couple of different verbs. The main ones are: GET, POST, PUT, PATCH and DELETE. These are known as HTTP Verbs. The two most commonly used verbs are GET and POST.
GET Request
A GET request means “I want to get data from the server, but don’t worry, I’m not changing anything”.
You use this when you enter a URL and load a website. The GET request is always supposed to be idempotent. This means that the effect will be the same no matter how many times you run it. You can reload this page as many times as you like and the effect will be the same, you’ll get the latest version of the website. You may get different data, but you loading my website doesn’t change anything on my server.
POST Request
When you send a POST request you’re saying, “I want to put data on the server”.
You’ve made this sort of request whenever you’ve signed up for a newsletter or made an account on a website. It’s usually used for uploading new data, but that isn’t always the case. This request is not idempotent, if you request to subscribe to a newsletter again it’s going to subscribe you twice! This is also why when you’re doing certain things online, you may come across a page that says “Please do not press back!”. This is because if you pressed back you would be re-running the request which could have nasty consequences, like charging you twice for the same item.
Other HTTP verbs
There’s a couple of other ways to add/update data on the server too.
The PUT verb means “I want to update data on the server”. This is used when you want to update information that already exists, like changing your username on a website.
Finally, with the DELETE verb you’re essentially saying “I want to delete this piece of information on the server”. If you wanted to delete your account on a website it might send a DELETE request. Like with verbs in languages, HTTP verbs describe the action that’s taking place.
Interacting with HTTP
It’s not enough to simply send an HTTP verb in your request to a server. Without any more information, they’re going to be confused with what you want! Servers and clients communicate with each other from all around the world so you need to specific. At a minimum, your browser will usually specify where which browser it is (your user agent), language, etc. This information all goes in the “header” part of your request. Headers specify how and what data you want.
The body of an HTTP request is used to send additional information. Let’s say we have we have a todo website. Our HTTP request to add a todo item might look something like this:
POST www.example.com/add-todo-item
Content-type: text
todo-item=clean room
Let’s break it down. First we have our HTTP verb, in this case it’s POST which means we’re adding something to the server. Then we have the website we want to send our request to. After that, we see our first header: “content-type”. The “content-type” header specifies what sort of data you’re looking for. In this request, we’re just sending some basic text. There are tons of different ways you can format the information you’re sending or receiving. Some common formats include JSON, or XML.
The new line separates the headers from the body. The body is where our main data can be found. In it, we’re just sending the text “todo-item=clean room”. That’s just an example and won’t work for all servers, only if it expects data that way. Still, that’s basically the simplest body you can get, outside of not sending anything. There you go, you can now understand how to break down an HTTP request!
Let’s take a look at a more complex example. Even a basic HTTP request has loads of headers, look at how many I needed just to load this website!
GET / HTTP/1.1 Host: www.nicowil.me User-Agent: Mozilla Accept: text/html,application/xhtml+xml,application/xml; Accept-Language: en-US Accept-Encoding: gzip Connection: keep-alive Upgrade-Insecure-Requests: 1 Pragma: no-cache Cache-Control: no-cache TE: Trailers <!DOCTYPE html....> (the rest of the site would be here)
Each header has a different effect and there’s tons of them. HTTP, although simple at it’s core, has loads of features. You can specify headers in order to cache data, authenticate, send and recieve cookies plus so much more. Even as it gets more complicated, it still falls back onto it’s most basic features like headers which is why it’s so important to understand them.
HTTP Responses - Status Codes
We’ve spoken in depth about the request itself, but we’ve barely mentioned the response that a server will return. Just like a request it has a header, body, etc. Responses have an extra special value too though, known as a status code, which is just a 3 digit number. Status codes vary greatly and they tell the client how the request was dealt with. There’s entertaining status codes too. On April 1st, 1998, the IETF, which is the group who writes a ton of different protocols, made a joke about there being a status code 418 I'm a teapot
. This status code would be sent if a server refused to make coffee because it’s a teapot.
Now although it’s a joke, the number does make sense given the context. Why? Well you see, each status code can get a different range depending on how the request was dealt with. Was it successful? If that’s the case, you’ll get a status code in the 200s. Did an error occur? With that, a number somewhere in the 400s will be received. A status code you may have seen before is 404 which means that whatever you’ve requested doesn’t exist. You can check out all the different status codes here and I highly recommend you take a look, it’s incredible to see how many different cases are covered.
Conclusion
With that, we’re done! We now have a basic understanding of what goes on behind the scenes when you open up a website and know a bit more about how information is transferred! This is just the basics, as the bread and butter of the web, HTTP has got a lot of features. There’s so much more hiding beneath the surface, so let’s keep learning. Until next time!