XML stands for EXtensible Markup Language. XML is a markup language much like HTML and uses a Document Type Definition (DTD) or an XML Schema to describe the data. XML was designed to describe data and focus on what data is.


The following example is a note to Pascal from Aubrey stored as XML file.


<note>

<to>Pascal</to>

<from>Aubrey</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>


XML is a cross-platform, software and hardware independent tool for transmitting information. Since XML is stored in plain text format, XML provides a software and hardware-independant way of sharing data.



<?xml version="1.0" encoding="ISO-8859-1"?>

<note>

<to>Pascal</to>

<from>Aubrey</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>


The first line in the document, the XML declaration, 


<?xml version="1.0" encoding="ISO-8859-1"?>


defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification of XML and uses the ISO-8859-1 (Latin-1 / West European) character set. The next line 


<note>


describes the root element of the document. The next four lines

 

<to>Pascal</to>

<from>Aubrey</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>


describe 4 child elements of the root (to, from, heading and body).


Unlike html, XML tags are case sensitive. In XML all elements must be properly nested. XML documents must have a root element. XML attributes must be quoted. With XML, white space is preserved.


Comments in XML : <!-- This is a comment -->  (like in HTML).


XML elements are extensible

Let us consider the following note


<note>

<to>Pascal</to>

<from>Aubrey</from>

<body>Your file has been updated.</body>

</note>


A software, extracts <to>,<from> and <body> to produce an output. Imagine that the author of the XML document added some extra information to it


<note>

<date>20060922</date>

<to>Pascal</to>

<from>Aubrey</from>

<body>Your file has been updated.</body>

</note>


The software should be able to produce the same output.


Vocabulary

Let us consider an example.

<?xml version="1.0" encoding="utf-8"?>

<book>

<title>My first book</title>

<prod id="234511" media="paper"></prod>

<chapter>

Introduction to XML

<para>What is HTML</para>

<para>What is XML</para>

</chapter>

</book>


"book" is the root element. "title", "prod", "chapter" are child elements of book. "book" is the parent element of "title" (or "prod" or "chapter"). An element can have element content, simple content or empty content and an element can also have attributes. In the example above, book has element content, because it contains other elements (only). Chapter has mixed content because it contains both text and other elements. Para has simple contents. (or text contents) because it contains only text. Prod has empty content, because it contains no information. The prod element has attributes. The attribute named "id" has the value "33-657". The attribute named media has the value "paper".


About empty elements


Rather than writing the empty "prod"element as above, we could have used the following syntax.


<prod id="234511" media-"paper" />



Naming conventions

- Names can contain letter, numbers and other characters.

- Names must not start with a number or punctuation character

- Names must not start with the letter xml (or XML, or Xml etc.)

- Names cannot contain spaces.


Attributes 

Attributes values must always be enclosed in quotes, but either single or double quotes can be used.



XML validation


- XML with correct syntax is well formed XML

- XML validated against a DTD is valid XML


A well formed XML document has correct XML syntax:

- XML documents must have a root element

- XML elements must have a closing tag

- XML tags are case sensitive

- XML elements must be properly nested

- XML attributes must always be quoted


A valid XML document also conforms to a DTD.

A valid XML document is a well formed XML document, which also conforms to the rules of a Document Type Definition (DTD):


<?xml version="1.0" encoding="ISO-8859-1" ?>

<!DOCTYPE note SYSTEM "InternalNote.dtd">

<note>

<date>20060922</date>

<to>Pascal</to>

<from>Aubrey</from>

<body>Your file has been updated.</body>

</note>


A DTD defines the legal elements of an XML document. XML Schema is an  alternative to DTD.


Displaying XML

It is possible to display XML with CSS, but the sophisticated way to display XML is XSL, the (eXtensible Stylesheet Language).



XML namespaces

Since elements in XML are not predefined, a name conflict will occur when two different documents use the same names. The following document carries information in a table


<table>

<tr>

<td>Apples</td>

<td>Bananas</td>

</tr>

</table>


while the following carries information about a table


<table>

<name>African Coffee Table</name>

<width>80</width>

<length>120</length>

</table>


solving name conflicts using a prefix


<h:table>

<h:tr>

<h:td>Apples</h:td>

<h:td>Bananas</h:td>

</h:tr>

</h:table>


<f:table>

<f:name>African Coffee Table</f:name>

<f:width>80</f:width>

<f:length>120</f:length>

</f:table>


Now there is no longer a conflict because the two documents use a different name for their <table> element.


The following xml document carries information in a table


<h:table xmlns:h="http://www.w3.org/TR/html4/">

<h:tr>

<h:td>Apples</h:td>

<h:td>Bananas</h:td>

</h:tr>

</h:table>



<f:table xmlns:f="http://www.w3schools.com/furniture">

<f:name>African Coffee Table</f:name>

<f:width>80</f:width>

<f:length>120</f:length>

</f:table>


The XML Namespace (xmlns) Attribute

The xml namespace attribute is placed in the start of an element and has the following syntax: 

xmlns:namespace-prefix="namespaceURI"


When a namespace is defined in the start tag of an element, all child elements with the same prefix are associated with the same name space. Defining a default namespace for an element saves us from using prefixes in all the child elements. It has the following syntax:


xmlns="namespaceURI"



XML CDATA


All text in am XML document will be parsed by the parser. Only text inside a CDATA will be ignored by the parser.



Escape Characters


There are 5 predefined entity references in XML:

&lt; <

&gt; >

&amp; & ampersand

&apos; ' apostrophe

&quot; " quotation mark.



CDATA


Everything inside a CDATA section is ignored by the parser. A CDATA section starts with "<![CDATA[" and ends with "]]>"




XML technologies


XHTML (Extensible HTML) is a stricter and cleaner version of HTML.

XML DOM (XML Document Object Model) defines a standard way for accessing and manipulating XML documents.

XSL (Extensible Style Sheet Language) - XSL consists of three parts: XSLT - a language for transforming XML documents, XPath - a language for navigating in XML documents, and XSL-FO - a language for formatting XML documents.

XSLT (XSL Transformations) is used to transform XML documents into other XML formats, like XHTML.

XPath is a language for navigating in XML documents.

XSL-FO (Extensible Style Sheet Language Formatting Objects) is an XML based markup language describing the formatting of XML data for output to screen, paper or other media.

XLink (XML Linking Language) is a language for creating hyperlinks in XML documents.

XPointer (XML Pointer Language) allows the XLink hyperlinks to point to more specific parts in the XML document.

DTD (Document Type Definition) is used to define the legal elements in an XML document.

XSD (XML Schema) is an XML-based alternative to DTDs.

XForms (XML Forms) uses XML to define form data.

XQuery (XML Query Language) is designed to query XML data.

SOAP (Simple Object Access Protocol) is an XML-based protocol to let applications exchange information over HTTP.

WSDL (Web Services Description Language) is an XML-based language for describing web services.

RDF (Resource Description Framework) is an XML-based language for describing web resources.

RSS (Really Simple Syndication) is a format for syndicating news and the content of news-like sites.

WAP (Wireless Application Protocol) was designed to show internet contents on wireless clients, like mobile phones.

SMIL (Synchronized Multimedia Integration Language) is a language for describing audiovisual presentations.

SVG (Scalable Vector Graphics) defines graphics in XML format.