|Introduction to Web Design - 2. Mark-up Languages|
|2.1 SGML | 2.2 HTML | 2.3 XML | 2.4 XHTML | 2.5 DHTML|
Early word processors used “markups”. These were instructions embedded in the documents that specified how parts of the text should be presented and included effects like bold type, italics, superscript, etc. While this was very useful, these instructions were of little use in helping determine the content or meaning of any parts of the document. In the late 1960’s, the utility of the “markup” idea was extended by some researchers (including Charles Goldfarb). These people realized that marking electronic documents with more general tags that indicated the meaning of parts of the documents could be very valuable. For example, many documents have a title, an author, a date, etc. If all of the similar portions of the documents in a collection were marked up in a standard manner, it would be possible to write programs that could locate documents by a certain author, produce outlines of all of the documents, etc. In addition, using the meaningful markup tags, one could print the documents in different styles based on preferences. For example, you could print a document in one font for one person and in another font for another person by specifying at the time of printing that titles should be printed in a certain size, the authors’ names printed in italics, etc.
2.1 SGMLThis concept proved so useful, that SGML (Standard Generalized Markup Language) was made into an international standard (ISO 8859) in the mid-1980’s. While SGML is utilized by thousands of institutions, companies and organization, SGML is a very large, complicated standard and does not lend itself to casual use.
2.2 HTMLHTML (HyperText Markup Language) was first developed by Tim Berners-Lee as a hypertext language for linking information among researchers. His system used a set of uniform addresses to refer to documents on different computers, a set of rules (a protocol) for transmitting the documents and a simple markup language for encoding the documents. He based his HTML on SGML, but used only a small subset of markers or tags. Documents using these tags could be interpreted by programs written for a number of different computer architectures. Because the programs and the computer platforms varied, the documents would appear differently in a program written for a very high-powered computer that could display different fonts, sizes and colors, than they would in a program written for a low-end text-based system. But the important notion was that these documents could be requested from a server, read by almost anyone with a computer and could provide links to other related information.
The World Wide Web spread quicker than anyone expected. People wanted to include more than just text in these documents. They wanted to include other media such as images, icons, text styles, and other media included. The rapid spread of the WWW left standards organizations in the dust. Two major browsers emerged, and gradually each responded to the demand for more complex media and more style in different ways with different tags. This diversion meant that pages developed for one browser may not be able to be interpreted by another browser. This was in direct opposition to the initial concept of the WWW, which was to provide universal access to information.
HTML is still a means of formatting text for display in a browser window. It is not a procedural programming language like C, C++, Java, Pascal, or Fortran. However, a variety of HTML, DHTML (Dynamic HTML), has programming aspects to it. XHTML (eXtended HTML) is the latest standard version of HTML. XHTML is based on XML.
2.3 XMLXML (Extensible Markup Language) is another derivative of SGML developed by ISO in the mid1990’s. It is called “extensible” because it does not consist of a set of tags like HTML. It is really a “meta” language – a language from which to create other languages. It provides a consistent set of rules for creating these other languages and can be thought of as a lighter version of SGML.
XML was developed by ISO in response to a need for consistency among browser markup languages (see XHTML) in response to the desire for a means of using tags that were meaningful in terms of content, not just page appearance. The latter is close to the initial goals of SGML, and, in fact, XML makes it easier to share SGML-style documents over the WWW. In a way, XML can be viewed as an intermediary between HTML and SGML.
XML furnishes a common syntax for the creation of specialized markup languages for any domain or discipline. It is not a procedural programming language like C++, Java or Fortran. It is a means of describing information that will be stored, transmitted to others and processed by a program written to interpret it. There are already specialized markup languages for Mathematics (MathML), Chemistry (CML) and business data (XBRL). With XML, a group can create a markup language for books, games, sports, teams, people, animals, finance, products, services, etc. (Deitel, Deitel & Nieto, 2001, p25) For example, a book may be described in XML like this:
< book> <title>Gone with the Wind</title> <author> <firstname>Margaret</firstname> <lastname>Mitchell</lastname> <flag gender = “F” /> </author> <publisher>Warner Books</publisher> <isbn>0446365386</isbn> <review> Sometimes only remembered for the epic motion picture and "Frankly ... I don't give a damn," Gone with the Wind was initially a compelling and entertaining novel. It was the sweeping story of tangled passions and the rare courage of a group of people in Atlanta during the time of Civil War that brought those cinematic scenes to life. The reason the movie became so popular was the strength of its characters--Scarlett O'Hara, Rhett Butler, and Ashley Wilkes --all created here by the deft hand of Margaret Mitchell, in this, her first novel. </review> </book>The XML code shows an element book which has subelements title, author, publisher, isbn and review. The subelement, author, has subelements for the first and last names of the author. It also contains an empty element, flag. The flag element has an attribute, gender which indicates the gender of the author as “M” or “F”. Empty elements can either be enclosed by placing the slash at the end of the beginning tag or by explicitly using a closing tag. In other words the following two set of tags are equivalent.
<flag gender = “F” /> <flag gender = “F”></flag>In order to process an XML document, an XML parser is employed. The XML parser locates the tags and comments in an XML document. Programs written in Java, C++ or other languages can then respond to the elements found by the parser. For example, one might write a program that displays a book’s title in bold print and the author in italics. More complicated programs might also search for other books by this author and provide a link to those records.
XML documents can optionally reference a document type definition (DTD) or a schema. The DTD or schema contains a formal definition for the XML used in a document. Some parsers will check the DTD or schema and check to see that the tags used in the document conform to the formal definition.
If we wanted to place the XML code above in a complete XML document, it might look like this:
<?xml version = “1.0”?> <!-- gwtw.xml --> <!DOCTYPE book SYSTEM “book.dtd”> <book> <title>Gone with the Wind</title> <author> <firstname>Margaret</firstname> <lastname>Mitchell</lastname> <flag gender = “F” /> </author> <publisher>Warner Books</publisher> <isbn>0446365386</isbn> <review> Sometimes only remembered for the epic motion picture and "Frankly ... I don't give a damn," Gone with the Wind was initially a compelling and entertaining novel. … </review> </book>The first line is an optional declaration indicating that this document conforms to a particular version of XML. The next line indicates the name of this particular file, using “.xml” as the file extension. The third line indicates that the root element of this file is “book” and that this file based on a DTD found in “book.dtd”. The keyword SYSTEM denotes the location of the external DTD.
The DTD for the book XML document might be as follows:
<!-- book.dtd --> <!ELEMENT book ( title, author+, publisher, isbn, review*)> <!ELEMENT title ( #PCDATA)> <!ELEMENT author ( firstname, lastname, flag)> <!ELEMENT firstname ( #PCDATA)> <!ELEMENT lastname ( #PCDATA)> <!ELEMENT flag ( EMPTY)> <!ATTLIST flag gender ( M | F ) “F”> <!ELEMENT publisher ( #PCDATA)> <!ELEMENT isbn ( #PCDATA)> <!ELEMENT review ( #PCDATA)>The first ELEMENT defines the rule for a book. It says that a book consists of a title, one or more authors, a publisher, isbn and optional reviews. The ‘+’ after author indicates that there can be more than one author. Other indicators used with XML are the asterisk, ‘*’, indicating optional elements that can occur not at all or any number of times, and the question mark, ‘?’, indicating optional elements that can occur at most once.
The title ELEMENT contains a flag which indicates that the content of a title is of type, parsed character data ( #PCDATA). Parsable data should not contain any markup characters used to indicate tags, such as the ‘<’ and ‘>” angle brackets. To include these characters in the content of XML tags, the entity codes discussed under the HTML section should be used (e.g., <, >, etc.). Another type recognized by XML is character data ( #CDATA) which indicates one or more character that the parser will not process (i.e, the parser will not look for other tags in this type of element). The other types that can occur are ANY and EMPTY.
The ELEMENT author specifies three child components, firstname, lastname and flag. The flag ELEMENT indicates that this is an empty tag – there should be no content between the beginning and ending tag (see above). The ATTLIST for flag defines the gender attribute of the flag element. This attribute can be either M or F and the default for a flag that is omitted will be and “F”. (The example above was adapted from one in Deitel, Deitel & Nieto, 2001, p644)
2.4 XHTMLXHTML is the most current version of HTML. XHTML combines aspects of HTML but, unlike HTML, is extensible like XML. XHTML conforms to the stricter syntax rules of XML. The differences between HTML and XHTML include the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>An XHTML document</title> </head> <body> <p> This document uses XHTML instead of HTML. </p> </body> </html>The basic XHTML document begins by specifying the DOCTYPE. In the section on XML above, the DOCTYPE was a DTD (document type definition) that was local. The official XHTML definition is located at W3 Consortium.
The beginning html tag must include the attribute xmlns and the value "http://www.w3.org/1999/xhtml". This attribute identifies the namespace of the document.
There are some other differences between HTML and XHTML, but the transition from HTML to XHTML is relatively painless. One beneficial aspect of XHTML, is that the more rigid structure identified by the DOCTYPE at the beginning of the document, facilitates the use of validators which can verify whether or not the document has correct XHTML syntax. One such validator can be found at http://validator.w3.org/check.
Cynthia J. Martincic
Saint Vincent College
Latrobe, PA 15650