25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more...
What is a URL (web address)? What really happens when I "go to" a website?
The internet is a very large number of computers that are all connected together. When you "go to" a website, click on a link, or download a file, what you're really doing is asking one of those computers to send you a particular file.
In order for this to work, every file on every computer in the world must have a name that is guaranteed to be unique. The URL (Uniform Resource Locator) naming system gives a unique name to every file on every internet-connected computer in the world.
A URL or web address consists of several parts. Here is an example URL:
Its parts are:
When two computers exchange information, they have to agree on how to interpret the electrical signals they send to each other. They also have to agree on when the first computer should send information and when and how the other should reply. This set of rules is known as a protocol. An example of a protocol translated into English might look like this:
In the 1980's on CompuServe, we had a choice of protocols for file downloads: CompuServe A (CIS A), CompuServe B (CIS B), and xmodem. The xmodem protocol is still useful for some purposes, but a different set of protocols is now more common on the internet. The most common of them is HTTP, HyperText Transfer Protocol. This is the set of transmission rules generally used for web browsing. There are other protocols, such as ftp://, which is commonly used for transfers of files not intended for viewing in a browser.
Thus, the first part of a URL tells the other computer which set of rules you want it to use for communicating with you. In this case it is HTTP.
Domain name (www.website.com)
The second part of a URL is the domain name, which most people think of as the name of the website they are "going to".
The domain name includes everything up to the first forward slash (/), if any. The domain name is NOT case significant. You can type it in uppercase or lowercase or a combination, and it will not matter.
There might be several parts to a domain name, separated by periods (.), but we will not go into those here, except for one that is sometimes confusing:
In the early days of the internet, it was common for every domain name to start with www., which stood for World Wide Web and helped people to recognize it as a website address when they saw it in print. It still serves that purpose these days, but in practice the prefix www has no meaning. Most websites will respond the same whether you type the www or not. Some website owners prefer to use the www, and others prefer not to use it, but that is the only difference.
Additional detail (a web address is really a number)
Although you use the domain name in a URL, every computer on the internet is actually known to the rest of the internet by a unique number, called its IP address. When you issue your request from your browser's address bar, the domain name is automatically looked up at a domain name server (called a DNS lookup), which translates the domain name to a numeric IP address. Thus, your request actually goes to a numeric address, and not to the textual domain name that you typed.
A previous article has some information about IP addresses and "dotted quad notation".
Path and File name (/directory/page.html)
The part of a URL after the first forward slash (/) following the domain name is a path to a file stored on that computer. This is the file you are requesting. The path, just like on your own computer, consists of a series of zero or more directory names (separated by forward slashes), followed by a filename and its extension.
Case usually IS significant in this part of the URL. To avoid confusion, most careful web designers use (and should use) only lowercase letters in their path and file names. That way, when somebody is trying to remember the name of a file such as index.html, they don't have to remember whether it is index.html, Index.html, or INDEX.HTML. On most websites, those would be the names of three different files, or more likely only one of the three exists. Typing either of the other two would return a "File Not Found" error instead of the page you wanted.
Often the URL you type has no path or file name, such as http://www.website.com. Websites are configured so that if no specific page is requested, it serves a default page. With this type of URL, you are really asking for the default page for that website or for a particular directory in that website (such as when you type http://www.website.com/foldername/ without a page specified).
Query string (?variable=value)
Sometimes you will see at the end of a URL a question mark followed by additional text, like this:
The part after the question mark is called the query string. Its purpose is to provide the website with additional information it needs to process the request. The variables are the names of variables that will be recognized by the program processing the request, and the associated values are the values being assigned to those variables. The variable names, their values, and even the format of the query string, depend on what that website and page were designed to expect and accept as data.
When you type a web address into the address bar of your browser, what you are really doing is asking for a specific file from some other computer on the internet and telling that computer in which format you would like to receive it (HTTP).
How does the other computer know where to send the file I asked for?
Whenever you are connected to the internet, you, too, have a unique numeric IP address. Your IP address is sent to the remote computer along with your request for the file, so it sends the request back to that address, to you.
Providing the remote computer with your IP address is necessary because when you visit a website, although you may feel like you have "gone there" and have some sort of connection to it, you really don't.
The two of you are not connected directly in the same way as when you talk to someone by landline telephone. You are connected to the internet and the other computer is also connected to the internet, but the two computers are not connected directly to each other.
You could think of it as a very fast postal system. A message from one computer to another is sent as a packet of data. On entry to the internet, the packet is mixed in with all the other packets, but because it carries on it a specific IP address destination, it gets routed and rerouted along the way until it eventually gets to the computer it is intended for.
When you "go to" a website, you request a file from the remote computer. It sends it to you, and the communication is finished. While you think you are "on the website", you're really just viewing in your browser the page you received. Your browser is a file viewer.
If you click on a link to "go to another page" on that website, you're simply sending a new request for a different page (file). It sends it to you, and the communication is finished again. If you click on a link to "go to" a different website, your request for a page goes to a different computer somewhere else on the internet, and that computer sends its page to you.
What about when I'm "logged into" a website? What is that?
On many websites you have the option of logging in. What does this mean?
As described above, when you are browsing through a website's pages, you're really just requesting one file after another. There is no ongoing connection between the two of you, and the remote computer doesn't know that you, who requested the current page, are the same person who requested that other page a few minutes ago.
It could try to discover that you're the same person by looking at your IP address in your request, but that's a very bad idea because what if you are at a library or internet cafe? You might be the same person, or you might not. And if you're on a dialup connection, your IP address changes every time you dial your internet service provider (ISP).
A solution: cookies
Instead, when you "log into" a website, the page it sends back can write a "cookie" onto your computer. A cookie is a small encrypted text file that can only be read by the website that put it there. The cookie contains data that allows the remote website to verify that it was the one that placed the cookie there, and it contains a unique identifier created when you logged in and the cookie was originally created. When you request subsequent pages from that website, your browser sends the cookie back with the request to tell the remote computer that you really are the same person (or at least the same computer) who requested that other page a few minutes ago.
This gives continuity to the sequence of pages you request. The remote computer can recognize that that you are the same person and thus keep you "logged in". Without cookies (or one of the other alternatives), you would have to log in and verify your identity for every page.
There is additional information about cookies at Recommended Privacy Settings for IE7 and Firefox 2, and more technical details at Wikipedia.
Comments and questions are welcome in the Forum.
Copyright ©2012 Steven Whitney. Last modified Sun 07/29/2012 10:53:21 -0700.