2. Getting Started
2.1 Main Structural Tags
You create a web page in a tree structure fashion using text files that provide the framework for
the content. This tree structure can then be further developed into a matrix of linked pages,
not just linking to each other but also to other pages on the Internet. The first page that is
encountered by the user when they enter a Universal Resource Locator (URL) such as
http://www.rhyshaden.com/ into their web browser, is the top page of the tree for that particular
website. The name of this first page can vary depending on the web server configuration, the most
commonly used names for this HTML file are index.html, index.htm, default.html
and welcome.html. By default, a web browser pointing at a directory will return a directory
listing. This is not very secure, so the server would normally be configured with the name of the index
file. This index file e.g. index.html, will open if the browser is directed at the directory
in which this file resides. The directory is denoted by the final / in the URL that was typed
in the client browser.
The first step is to open a text editor such as Notepad (in Windows) and create a new file called,
say, Web1.htm. HTML works by way of tag pairs which form containers. A browser looks for
these tag pairs and acts on the text contained with the pairs. The parameters within the tags
define attributes that instruct the parser within the browser, how to treat that text or the content.
There are some tags that do not have
'container terminators' and are standalone tags. One example is the <HR> tag, others
include <BR> and <IMG>, we will come on to these later.
Many of the tags can have Attributes, and some of these have values assigned to them. These
attributes determine how the tag behaves. The format of an attribute with a value is
ATTRIBUTE_NAME = "value". The value can be a number (decimal or hexadecimal denoted by
#) or a percentage, or even a word such as a colour. The quotation marks are a requirement
of HTML 4.01!
It is possible to nest tags within tags provided that you close the nested tags at the end of the
appropriate places. Unknown tags are ignored and are generally displayed on the screen. No matter
how many spaces or tabs are entered within the HTML document itself, the parser just translates all
of them together as one space. You can add multiple spaces by typing   the required
number of times, this is the character entity for the Non-Breaking Space. We will come
across more character entities later. Line returns within the HTML text are ignored by the parser.
It is good practice to annotate the HTML code, particularly if it is complex. You can add comments
by typing them between <!--
. Note that a space is required after
the opening comments delimiter and before the closing comments delimiter! Capitalising tags and
attributes adds to the size of the HTML file. It is possible to use utilities such as Mizer
to compact your HTML code
by stripping out spaces and tabs, enabling the page to be downloaded a little more speedily.
For a web page there are three essential tag pairs:
- The tags <HTML> and </HTML> enclose an entire web document telling the
browser that this file is for the browser's attention.
- The <HEAD> and </HEAD> tags enclose the description of the page and perhaps
- <BODY> and </BODY> tags contain the rest of the page.
If we type these into Notepad like this:
Then, in a Web browser, this turns out like this:
What shall we try next? Let's use the tags <TITLE> and </TITLE> like this:
And guess what this gives us:
Hmmm! OK so the document title appears in the menu bar but it is getting a little boring now!
Don't panic! What's happening is that the
<TITLE> tag pair hides the title from view but provides a bookmark
for anyone (including web search engines) who wishes to return to this page.
Let's get our heading in view by using the heading tags <H1> and </H1>. We
will also throw in some centering using <CENTER> and stick a couple of horizontal lines
This gives the following display:
tag has the following attributes:
- WIDTH - this can take a value in pixels or a percentage of the width of the page.
- ALIGN - by default the line is centred on the page, however you can align the line
to the LEFT or to the RIGHT if you wish.
- NOSHADE - changes the default, bevelled line to a plain line.
- SIZE - changes the thickness of the line in pixels.
Using H1 gives the largest size letters in the heading. We can use H2, H3, etc. up to H6 to
give progressively smaller headings. The heading tag automatically creates a blank line underneath
to separate the heading from the next block of content.
Using the <CENTER> tags puts everything
contained within the tags, in the centre of the screen (be careful of the American spelling
of center!). As an aside, you will notice how the browser reshapes the text as you
re-size your browser window.
2.3 Meta Tags
Before we leave this 'heading' part of the HTML document, it is worth looking at the <META>
tag. The Meta tag allows you to decide what information is picked up in some search engines on the World
Wide Web, it is completely transparent to the user. The following example shows how I might advertise
this tutorial on the internet:
<TITLE>Writing For The Web Tutorial</TITLE>
<META> NAME="description" CONTENT="A tutorial showing you
how to create Web Pages.">
<META> NAME="keywords" CONTENT="WWW, HTML, tags, frames,
Notice how the META tag sits within the HEAD tags, and also notice how that there is some text
or CONTENT associated with a NAME which I called description.
Many search engines will
grab the title 'Writing For the Web Tutorial' and the content of the description that I wrote down. Both
NAME and CONTENT are the attributes.
Another META tag was also created which used the name keywords, and the CONTENT of
this is a number of words which are likely to be used within a search. Someone typing one of these words
in a search, is more likely to see your site come up in the results. You are allowed to use up to
a total of 1024 characters for the keywords.
If you wished for your page not to be indexed by a search engine, then you could use robots as a
NAME and this would exclude this particular page from search engines. You would give
it the CONTENT value of noindex or nofollow to prevent Internet spiders from indexing
or or following links on that particular page.
Only one general rule, you should not use any HTML formatting information within your META tags.
Other meta tag examples include <META NAME="author" CONTENT="Rhys Haden">,
<META NAME="copyright" CONTENT="2003, Rhys Haden"> and a recent one
<META NAME="MSSmartTagsPreventParsing" CONTENT="TRUE"> which will prevent browsers
that have Smart Tags enabled from attaching tags to words and phrases on the website.
You are able to influence HTTP Response header information sent by the web server, by using the meta
tags HTTP-EQUIV attribute. For instance, you may wish to inform Internet caching engines,
browser caches and web robots that particular content included within a page has a specified expiry date
i.e. once content has reached a certain age you want the clients to request updated content. Examples
can include stock market information, weather or travel information. You would do this with a line
such as <META HTTP-EQUIV="expires" CONTENT="August 10, 2003 12:31:00 UST">. Another
example is if you want to inform clients which language a certain page has been written in
e.g. <META HTTP-EQUIV="content-language" CONTENT="en-gb">.
Also used is the content-type: to inform the browsers of the character sets being used
on the website e.g. <META HTTP-EQUIV="content-type" CONTENT="text/html; CHARSET=utf-8">.
The values for the HTTP-EQUIV attribute are header names used within the HTTP protocol.
One interesting header that has often been used is the refresh: header. This has been used
for instance when a web page has moved from one URL to another. Instead of just deleting a page
from a location that users have been used to visiting resulting in a 404 error, you can
automatically redirect that user by way of the refresh header. An example would be
<META HTTP-EQUIV="refresh" CONTENT="10; URL=http://new.html">. The value
10 means that after 10 seconds the refresh will occur. The URL has been inserted to
indicate to the client browser which site to go to. Note how the quote marks surround both
the interval value and the URL, plus there is a space after the semi-colon. Leaving out the URL
and the semi-colon would result in just the current page getting refreshed, useful if the content
on that page changes frequently. You can create a crude animation by cycling through a number
of URLs that just contain graphics.
If you did not use META tags then search engines would just grab the title and the first few words
to use as a summary.
2.4 Body Tags
The <BODY> tags indicate where the viewable parts of the page begin and end. You can use
attributes within the <BODY> tags to affect what happens to the whole of that web page.
The following attributes can be used:
- ALINK="..." - If you click and hold on a link, it will turn whatever colour you specify
- VLINK="..." - If you have visited a particular link, it will turn whatever colour you specify
- LINK="..." - All links appear in the specified colour here until they are clicked on.
- BGCOLOR="..." - Specifies the background colour for the page.
- TEXT="..." - All text will appear in this colour unless overwritten by style sheets or the
- BACKGROUND="..." - Specifies an image to use as a background to the page. This image
is tiled across the screen. There is an additional argument with a value
<BGPROPERTIES="FIXED"> which keeps the background in the same place on the screen
even when the screen is scrolled.