San José State University

applet-magic.com
Thayer Watkins
Silicon Valley
& Tornado Alley
USA

Geography, Mapping and XML

Introduction

The purpose of this study is to examine the developing technology utilizing the Extensible Markup Language (XML) for creating maps for the Web. In particular it is concerned with what XML products are available and whether there are emerging standards in the field. This study tries to explain the nature of the XML technology and its applications to mapping. It presumes an acquaintance with Hypertext Markup Language (HTML). A brief review of HTML is provided at this link.

In ancient times there was the legend of the cornucopia, the magical horn of plenty filled with an unending supply of wonderful things. The cornucopia was just a legend but XML is proving to be a real cornucopia in information processing and display. While XML (Extensible Markup Language) is called a markup language it is more properly speaking a programming language for creating markup languages. In other words it is a metalanguage for creating markup languages. XML was itself created as a subset of SGML (Standard Generalized Markup Language), which, despite its name, is more oriented toward content structure rather than formatting, but it can be used for formatting. HTML (Hypertext Markup Language) is an application of SGML in which formatting is the major concern. XML provides a means for specifying the structure of the content of documents as well as the formatting of their presentation on output devices such as computer monitor screens and printers.

The above diagram illustrates the relationship between SGML, the parent, and HTML and XML, the siblings. It also indicates that Dynamic HTML (DHTML) is HTML augmented by scripting and a document object model. More information about DHTML is available at this link. The above diagram also depicts the relationship of the special languages created with XML, such as XHTML and Scalable Vector Graphics (SVG). The nature of these application languages will be explained later. Files created in these languages are XML files and subject to the rules of XML as well as the special rules of the particular language. For a list of some of the XML application languages which have been created see Appendix A.

XML does not function in isolation. It makes use of style sheets, as does its parent SGML and its sibling HTML. A brief description of the nature of style sheet technicalities is available at this link.

The modification of HTML that adheres to the rules of XML is called XHTML. In XHTML, for example, all of the tags must be closed and properly nested. Since XHTML is a more consistent version of HTML it is likely that the future modifications of HTML will be in the direction of XHTML. In XHML users can create their own tags as well as use those available in HTML.

Although the individual user could make use of the metalanguage of XML itself, it is more likely that most users will use the products of XML. To create an adequate markup language for geographic applications takes an enormous effort. For example, to create ArcXML, a sophisticated mapping language, Environmental Systems Research Institute, Inc (ESRI) expended 20 person-years of effort over a three-year period [Schutzberg, 2001]. For an individual even to learn to use ArcXML involves a great deal of time and effort.

Despite the cost of creating these markup languages there seems to be a bit of a problem of proliferation. Among the graphics-capable markup languages now available are two from Microsoft, Structured Graphics Control (SGC)and Vector Markup Language (VML), and two from Adobe Systems, Precision Graphics Markup Language (PGML) and the much more sophisticated language Scalable Vector Graphics (SVG) in addition to ESRI's ArcXML.

There has also been an attempt at creating an industry standard for content structure for maps called Geography Markup Language (GML). GML is devoted to specifying the information structure of a geography document. This focus of GML on content is in keeping with a principle that emerged from the experience with SGML that content should be separated from formatting. A content file can be converted into a file in one of the graphics-capable languages for creating the map. The situation is depicted in the following diagram.

The line of standardization which may be emerging is that GML will be used for expressing the content or structure of maps and other geographic documents and SVG will become the standard for formatting the presentation of these documents.

The Nature of XML

XML has based upon what are called Document Type Definitions, DTD's. These precisely specify the meaning of elements. At this point it is convenient to note the difference between a tag and an element. A tag is an element name enclosed between angle brackets whereas an element is the opening tag and all the information included between it and its closing tag. For example, the paragraph tag in in HTML is <P>. A paragraph element is <P>text...text</P>.

The Document Type Definition (DTD) for an element is given in a variation of Backus-Naur Format which tells the attributes and subelements an element may have and whether they are mandatory or optional and if more that one instance of a type is allowed. An example of a DTD will be given later in this study after a visually more easily comprehended version of language structure is presented.

XML allows for external DTD's, which may be standard sources, and an internal DTD, which enables the user to create special features for a particular application field. For example, a petroleum exploration company might use standard external DTD's for standard mapping but create internal DTD's for some special features that are useful and important only in petroleum exploration.

When several sources of DTD's are permitted the danger is that a term such as Street might appear in more than one DTD and lead to confusion or error. The solution to this problem of potential collision of names is the concept of namespace. The location of each DTD used is given a unique name or prefix. All element names are given in the form nameSpace1:elementName. This tells the browser that the elementName is used in the meaning given in the DTD at the location associated with nameSpace1. It is a system that permits very precise specification of terms.

The technology of XML is still developing. A new system for defining elements, called Schema, has been formulated. An XML Schema gives a more powerful method of specifying the nature of data. For example, a Document Type Definition may only allow the user to specify that a particular input is a string of characters, where an XML Schema might specify the number of digits and the range of the digits. Now both DTD's and Schema are permitted in defining an XML application.

Geography Markup Language (GML)

Background

Geographers like most everyone else made wide use of HTML when the Web became available, but HTML was not ideally suited to the needs of geography with its heavy graphics orientation. When XML was released it made possible the easy creation of specialized markup languages. Some geographically-oriented languages were created but none appeared acceptable as a standard for the field. The OpenGIS Consortium (OGC) proposed to put together a standard to be called Geography Markup Language, GML. This language was to be consistent with a model of geographic information, called the OpenGIS Abstract Specification, which OGC had created. GML development was started in 1999 under the auspices of the OpenGIS Consortium and its specification was released in February of 2001. The purpose of this section is to explain GML version 1.0 as given in OGC Document Number 00-029 (May 12, 2000). Although the GML specification calls it version 1.0 there are essentially three forms of GML now:

Although these profiles are characterized and named in terms of the types of users of GML they are, in effect, different systems likely to appeal to different categories of users. This section is concerned with the simplest form of GML, the Profile 1 of the above list.

GML is likely to change as it has already changed, but the future changes are likely to be modification and elaborations of a fundamental model.

General Strategy of GML

GML was intended to be a content-oriented XML application which completely ignores the matter of how to display maps and other types of geographic information on the Web. OGC intended to create an unambiguous system of coding geographic information that would make possible the storage and the sharing of such information. This meant the coded documents would not require an expert or special additional knowledge for their interpretation. It is commonplace for some maps or data to unusable except by their caretaker because only the caretaker knows those crucial bits of information such as what the projection is for a map or the base point or units for the coordinates.

GML was created to avoid the problem of missing information for geographic documents. It does this as an application of XML by specifying what information is mandatory for a file and by requiring all documents to be validated by a parsing program. This is the basic strategy of the parent of XML, the Standard Generalized Markup Language (SGML).

Formal Structure of Map Documents in GML

The general scheme of GML is that a geographic document involves the specification of a set of features. These features are described by properties, which may be of two types: Simple Properties and Geometric Properties. Simple properties are those that may be given by basic data forms: strings of characters (names), integers, real numbers or true/false (boolean) values. The nature of the features and their properties is the essence of GML. The geometric properties come at the last stage of the specification of a feature but it is expedient to present the geometric properties first. Generally it is only feasible to explain a complex topic such as GML by breaking it down into modules and simplifying in the interest of clarity. It is not possible to explain GML in its full complexity all at once.

Geometry and Geometric Features in GML

The geometric property element of GML is created using elements called Points, LineStrings, and Polygons with the special case of closed linearRings which are merely closed LineStrings. Polygons are more than just LinearRings because there may be an interior boundary as well as an exterior boundary. In addition to these primitive geometric entities GML makes provision for sets (collections) of these elements: i.e., MultiPoint, MultiLineString and MultiPolygon elements. These are made up, as the names imply, of Points, LineStrings and Polygons. GML has another geometric element that can include geometric elements of any type. This is called a GeometryCollection. A GeometryCollection can also include other GeometryCollections.

The geometric elements in GML may be specified by a Coordinate List, which is just a list of (x,y) coordinate pairs with some separators such as commas and spaces. The coordinate list is enclosed by the <coordinates> </coordinates> tag pair. The separators within the coordinate list can be specified by attribute values of the the <coordinates> tag.

There is another geometric element, the Box, which is specified by two coordinate points, the upper left corner and the lower right corner. The Box element is used more to encode simple properties components of a geographic document than the strictly geometric structure of a map.

The structure of GML is more easily visualized by the following depiction of its Geometry Collection element.

In the above diagram some elements are not fully described in order to improve comprehensibilty. Also in GML the official names of the features denoted Exterior Boundary and Interior Boundary are outerBoundaryIs and InnerBoundaryIs. The obstruse terminology comes from Logic Programming, upon which GML is modeled.

Collections of geometric features such as MultiPoint use a subfeature selection element which in the case of MultiPoint is called pointMember. A few examples illustrating how the geometric information is encoded are given in Appendix E.

The geometric elements are used to specify geographic features. The concept of a geographic or map feature is essential building block of GML.

Properties of Map Features

The logical structure of a Feature element is given by the subelements and attributes it may contain. The subelements that a feature in GML may have are:

As in the case of the geometry of a map document, there are the primitive elements and then elements that contain a set of elements. Thus there is the Feature element described above and a FeatureCollection element that contains a set of Features or even other FeatureCollection elements. An interesting difference between a FeatureCollection element and a Feature element is that the boundedBy element (specified by a Box element) is optional for a Feature element but it is mandatory for a FeatureCollection element.

The structure of a featureCollection element is given by the following diagram:

While the above diagram serves its purpose of showing the general structure of the featureCollection element of GML it is deficient in that it does not show which elements are optional and which may appear multiple times. The notation that is used to show such information is as follows:

NotationMeaning
ELEMENT*ELEMENT may occur zero or any number of times
ELEMENT?ELEMENT may occur zero or one time
ELEMENT+ELEMENT may occur one or more times,
it must occur at least once
ELEMENT  ELEMENT must occur exactly once

The diagram for the featureCollection element with the information on the allowed occurence of the elements is shown below:

The above diagram indicates that, as was noted previously, the boundedBy element is mandatory for a featureCollection element but optional for a Feature element.

The corresponding diagram; i.e., with the optionality for subelements shown, for the geometry elements is shown below.

Note the interesting item in the above diagram that the exterior boundary of a Polygon is optional but there can be only one. There may be zero or any number of interior boundaries. Making the exterior boundary optional allows the entire world of the map to defined as a Polygon element.

In the Backus-Naur Format notation used in DTD for XML the structure of a Polygon element is:

<!ELEMENT Polygon (outerBoundaryIs, innerBoundaryIs*)>

All the subelements need their own similar specification. In the DTD the nature of the attributes of an element must also be specified. The format of the attribute list for a Polygon element in a DTD is as follows:

<!ATTLIST Polygon
ID CDATA #IMPLIED
srsName CDATA #IMPLIED>

The above indicates that the Polygon tag can include an ID name in the form of character data, i.e., a string of characters, and srsName which gives information on the coordinates for the polygon. The #IMPLIED indicates for an attribute indicates that it is optional.

In order to simply the structure the geometricProperty element in the above diagram is not expanded. A geometricProperty element will include a typeName attribute which may be any of the following: pointProperty, lineStringProperty, polygonProperty and multiGeometryProperty. These in turn may have subproperties such as location, centerOf, position, centerLineOf, edgeOf, extentOf, coverage. These subProperties may have subelements which are the geometry classes of point, lineString, polygon, multiPoint, multiLineString and multiPolygon. This is shown diagramatically below:

In the above diagram a feature of Backus-Naur Format notation is used in which a term of the form (..|..) in which the "|" denotes "or." Thus (a|b) would mean that either a or b must occur. There can be any number of terms from which exactly one must be selected.

Since there is no space in the above diagram for the structure of the elements such as centerOf these are given below:

The outerBoundaryOf and innerBoundaryOf elements of course have lineString subelements which in turn have coordinates elements.

With the previous overview of the formal structure of GML it is possible to encode various features of a map. Some examples of the encoding of features using GML are given in Appendix F

The Structure of a GML File

A GML file is an XML file and thus must include Document Type Definitions (DTD) as well as the coding for a geography document. The DTD's enable the computer to interpret the coding and determine whether it is valid for the structure given by the DTD's. The DTD's for a GML can be external as well as internal. Typically most of the structure comes from the external DTD's with only relatively minor elements defined within the GML file.

A user may want to use several external DTD's. A problem would arise if an element of the same name is defined in more than one place. The W3C formulated a method for avoiding the problem of conflicting definitions of terms used in XML files, including GML files. It is called namespace. Although this term sounds obscure the concept is relatively simple.

The Namespace Concept in XML

The XML file defines a name for a source of DTD's as given in a URL. The GML cites an element type in the notation of nmsp:element which means that the definition of element is to be found in the URL for the label nmsp. While namespace notation is an effective solution to the problem of conflicting definitions of terms in DTD's it does make the XML files much less readable.

Attributes of Elements

Generally an element has an attribute list which provides a way of including important items of information about the element. For example, the Box element can have two attributes, an identification name and the name of the Spatial Resource System (srsName) that applies for its coordinates. The Spatial Resource Name is usually a reference to a standard document that gives information on units and coordinates system. The attribute list not only tells which attributes may be given for the element but whether each one is optional or mandatory.

The geometric classes of Point, LineString, Polygon, Multipoint, MultiLineString, MultiPolygon and GeometricCollection all have the same attribute list.

The technology for XML is still evolving. The W3C, as noted previously, has created Schema as an alternative to DTD's. It has also promoted a metalanguage for describing data, called Resource Description Framework. OpenGIS, the organization which created GML, has incorported the Resource Description Framework of W3C into the structure of GML.

The W3C Resource Description Framework (RDF) Schema
(also referred to as Resource Description Format)

The RDF Schema for GML is a set definitions of Classes for GML Geometry. For example, a Point class is defined which can reference a coordinates property. The Point class is defined by the code:

<rdfs:Class rdf:ID="Point">
<rdfs:subClassOf rdf:resource="#Geometry" />
</rdfs:Class>

As can be seen above the code uses the Resource Definition Format Schema namespace rdfs: and the Resource Definition Format namespace rdf:. Generally the RDF Schema is consistent with GML but gives the opportunity of going beyond GML. The RDF Schema is explained more fully in Appendix G.

ArcXML and the Logical Structure of a Map

Environmental Systems Research Institute, Inc. created a mapping application of XML for conveying mapping data between various components of its ArcIMS software. The purpose of this section is to examine the system by which ArcXML stores the information for a map. Ultimately the ArcXML system will be compared with that of the Geography Markup Language (GML) to see if the two systems are logically equivalent or if one of the systems contains components related to map encoding not found in the other. Note that ArcXML, ArcIMS and ESRI are trademarks of the Environmental Systems Research Institute, Inc.

Since ArcXML was created by ESRI for transfering information within its ArcIMS mapping software package ArcXML is a much more extensive instrument than GML. In particular, ArcXML includes both content and formatting structure contrary to the limitation of GML to content structure.

ESRI provided for the public a Programmer's Manual for ArcXML. This document contains the information needed to examine the formal structure of the language. The problem is in separating elements related to the content of a map document from the code necessary for creating a presentation of the information; i.e., the graphic display of the map.

The root element of an ArcXML file is identified by the <ARCXML> tag. There are four possible subtags to the root tag but only one, the <CONFIG> tag has to do with content structure. The others have to with queries and updating the data for a document.

The logical structure of the <CONFIG> tag is shown below.

The <MAP> tag is shown with subtags but the <SCALEBAR> tag has no subtags. The information required for specifying a scalebar is encodde as attribute values within the <SCALEBAR> tag.

The <PROPERTIES> element of ArcXML generally contains metadata, information about the data, which in this case is contained in the <LAYER> elements. A <MAP> must contain one and only one <PROPERTIES> element but it can contain any number of <LAYER> elements. The <WORKSPACES> element is for giving the location, in terms of a URL, of data for the map.

The <LAYER> element is the heart of the map document. Its logical structure is shown below.

The above diagram indicates that most subtags of the <PROPERTIES> tag are optional as denoted by the "?" following the name. Only the <ENVELOPE> element is mandatory. The subtags to the <PROPERTIES> tag do not themselves have subtags. All required data for these tags are given as values of attributes within the tags.

The essential element of a map is a layer. The <MAP> element may have any number of <LAYER> elements.

As shown above a <LAYER> element can contain any number of <OBJECT> elements which are defined by POINT, LINE, POLYGON and TEXT elements. The SCALEBAR may also be included. The <EXTENSION> tag allows provision for geocoding through a <GCSTYLE> tag and any number of <GCFIELD> subtags. A <LAYER> may also have a dataset which can involve any number of partitions.

The POINT, LINE, POLYGON and TEXT elements may have subelements but these are symbols which are not part of the content structure and are merely presentation features and therefore they are not shown. The geometric content of the of <POINT>, <LINE> and <POLYGON> tags which is not shown is given by attribute values, such as a list of coordinates.

A Comparison of the Formal Structures
of GML and ArcXML

The question of whether two languages are logically equivalent; i.e., homomorphic; probably merits a deep, rigorous mathematical analysis but here the question of whether the logical structures of GML and ArcXML for map documents are equivalent will be dealt with on an informal level. Some general principles that would apply to this question are:

One of the lowest levels of GML and ArcXML is their coordinates lists. GML has the <coordinates> </coordinates> tag pair which encloses a list of xy coordinate pairs with a comma seperator within the pair and whitespace separating the pairs. ArcXML has two forms of a coordinates list, one of which, the short form, is exactly the same as the GML version. The other, called the long form, gives the coordinates in the form of x= , y= and is easier to read but may make files considerably longer. The long form is equivalent to the short form and hence to the GML coordinates list. The tag pair for the ArcXML coordinates list are <COORDS> and </COORDS>

With the equivalency of the coordinates lists it follows that the point element of GML is equivalent to the POINT element of ArcXML because the only sub-elements of these two elements are coordinate lists. Likewise the lineString of GML is equivalent to the LINE element of ArcXML.

The polygon and POLYGON elements present a different situation. GML's polygon element has outerBoundaryOf and innerBoundaryOf as sub-elements whereas ArcXML's POLYGON element has a RING sub-element which can have a HOLE sub-element. The innerBoundaryOf element is equivalent to the HOLE element. But because HOLE is not a co-subelement of RING this means RING is not strictly equivalent to the outerBoundaryOf element. Furthermore, an outerBoundaryOf element is an optional subelement of polygon whereas RING is a mandatory subelement of POLYGON. Thus GML can have a boundaryless world polygon but ArcXML cannot. This not a practical difference but it does mean GML are not strictly equivalent.

A tabulation of the effective correspondences between GML and ArcXML are given below:

CORRESPONDENCES
GMLArcXML
coordinatesCOORDS
short form
pointPOINT
lineStringLINE
polygonPOLYGON
outerBoundaryOfRING
innerBoundaryOfHOLE
featureCollectionMAP
featureOBJECT
boundedByENVELOPE
extentOfENVELOPE

Other Attempts at Formulating a Formal Structure of Maps

It is important to note at the outset of this material that there are two closely related but different worthwhile endeavors: 1. The encoding of map documents which have been prepared by cartographers with traditional methods and orientation 2. The encoding of the geographic information about an area which may later be used in preparing a variety of maps concerning an area. The first of these endeavors is the one that is more clearly feasible. The second is obviously a desirable goal but it is not certain that a consensus can be achieved among geographers as to how it should be done.

In the late 1950's and early 1960's linguistics achieved major breakthroughs in rigorous analysis as result of the application of formal language methods to natural languages. Researchers in other fields began to look for structures similar to grammars in their disciplines and some tried to generalize the results of linguistics to a science of signs, which was called semiotics.

Semiotics perhaps promised more than it delivered. Or perhaps scholars had unrealistic expectations of the benefits of this abstract analysis in their field. In linguistics no one expected that the discovery of hidden structures in natural language would make anyone a better user of natural languages. There was possibly some benefit of the linguistic discoveries in constructing automated translation systems, but even the feasibility of this is in doubt.

Geography had its flury of interest in semiotics and this has been summarized by Jan Pravda in an article entitled, "The Language of Maps." Hansgeorg Schlichtmann also gives a summary of the semiotic approach to maps in his article entitled "Codes in Map Communication."

Clearly maps have a language or code and there are rules of structure but whether there are any important insights to gained from looking at maps from a linguistic ; i.e., semiotic; perspective is not certain. The semiotic approach does not seems to get beyond defining terms and making classifications. Further material on the matter of semiotic and map structure are given in Appendix B

The previously cited studies in the geographic literature on the formal structure of maps are interesting but they are not of much help in the matter of encoding geographic information.

One study that focuses precisely on the topic of the formal structure of of representation of geographic information is the doctoral dissertation of R. Taketa at the University of Washington. Taketa emphasizes that the formal encoding should be of the geography of an area and the relationships within that area. In particular the highest order of representation would be of relationship that exist because of the geographic processes that created the features. Taketa illustrates the differences between representing the geography of an area and the representation of a map of that area by defining three level of description:

Taketa's Formal Structure of a Map

A depiction of Taketa's Level One representation as a formal structure is given below:

The Level One description is essentially what is called a "flat file" structure. The Level Two description involves much more geographic structure and is as follows:

The structure is a bit too large to encompass in one diagram. The structure of the Contours element is shown below:

A Level Three description incorporates more sophisticated relationships between pairs of elements than the hierarchical relationship. Taketa's relationship involve the definition of functions for expressing these relationships but this information is not easily represented in terms of XML and will not be pursued further here. There is material on Taketa's Level Three description of a map in Appendix C. It does seem that there is a need for a topological geography language involved in partitioning and classifying the points in a two dimensional region.

As noted previously there are three parts to the system for using XML for storing geographic informatin and creating maps for the Web: 1. The Content Language (GML) 2. Vector Graphics Languages 3. The programs for translating content language files into vector graphics language files for display by means of browsers. The programs for translation from content language files to vector graphics language files are implementations of what is called Extensible Stylesheet Language, a creation of W3C.

The Nature of
the Extensible Stylesheet Language (XSL)

The separation of content and display requires some means to translate the content-oriented file into one that can be displayed. The means for this translation for XML is called Extensible Stylesheet Language (XSL). There are three components to this system:

Shortly after XML was approved as a recommendation of W3C in early 1998 a draft for XSL was prepared. The initial draft was 140 pages long and had two editors. In April 1999 the effort was divided into two parts, one for XSLT and one for XSLFO. The XSLT draft, 100 pages long, became a W3C recommendation in November of 1999. The draft for XSL-FO has grown to 400 pages with ten editors and contributors and is still not approved as a recommendation. The draft for XPath achieved recommendation status on the same day as XSLT. [Fitzgerald, 2001, p. 253] Clearly it is much easier to get agreement on the computational problems of XSLT and XPath than it is for presentation matters involved in XSL-FO.

There have been a dozen or more programs developed for implementing XSLT, just as there have been a number of browsers written for interpreting HTML files. Two prominent programs for XSLT are SAXON and XALAN, which are both available free. Some simple examples of how files are transformed by these programs are given in Appendix D.

The method by which one type of XML is translated into one of another type is to select or pick out the information needed for the new file and incorporate as boilerplate the code for the new type of file into which the selected items of data are inserted and the result written to a new file. Thus the process of translation from one type of XML file to another is simply the creation of a generic file of the second type and incorporating this code into a style documents along with the code for picking out the necessary information from the source file.

Vector Graphics Languages

This is an examination of vector graphics languages that can be used for creating maps for the Web from map content files in the Geography Markup Language. Generally such vector graphics languages can do more than create maps but that is the scope of interest of this document.

Vector graphics languages can be created using XML but the graphic operations must be programmed as Java code. That is to say, the creation of graphics languages is not just a matter of creating a text file of Document Type Definitions (DTD's). One has to program the graphics operations in Java code.

The four graphics languages which have been created are:

All but the first, SCG, are XML based languages. SGC is a creation of Microsoft, as is VML. SGC is included to provide a contrast with the way the other languages work. PGML and SVG are creations basically of Adobe Systems. PGML was said to be Adobe Postscript, a printer control language, with angle brackets. PGML is now obsolete because Adobe replaced it with SVG. PGML is included for historical comparison with SVG. Adobe incorporated features of VML with features of PGML to create SVG.

There are two other vector graphics languages which have been formulated but have not reached the stage of becoming W3C recommendations. These are DrawML and HGML (Hyper Graphics Markup Language). DrawML is a simplified vector drawing language for creating diagrams for the Web. DrawML can be used to create simple diagrams involving rectangles and ellipses with text and with arrows connecting them. HGML is a vector graphics language for wireless devices connecting to the Web such as cellular phones and handheld computers. HGML creates simple graphics that require a minimal time to load and display.

Before reviewing the characteristics of the available vector graphics programming languages let us consider what are the essential requirements of a vector graphics language for creating maps for the web. The absolute essential characteristics are:

Characteristics which are highly desirable for map making are:

Characteristics for vector graphics languges which are desirable though not essential are:

The major efforts in developing graphics languages for the Web have come from two software companies, Microsoft and Adobe Systems. Technically the development of these languages were carried out under the auspices of the World Wide Web Constortium (W3C) with collaboration among several companies to create a specification of the languages that would gain the status of a W3C Recommendation. Although there were committees responsible for producing the documentation the impetitous for the developoment came from Microsoft and Adobe Systems for the separate languages. In the late 1990's Microsoft had implemented Vector Markup Language (VML) and Adobe System had implemented Precision Graphics Markup Language (PGML). Adobe System then upgraded PGML and incorporated some features from VML to produce Scalable Vector Graphics (SVG) as shown below. For further information on vector graphics languages see Appendix H.

The expectation generally is that SVG will become the standard in vector graphics languages for the Web.

Overall Conclusions

The system that appears to be developing for the use of XML technology for creating maps for the Web is that Geography Markup Language (GML) will be used to encode the information for a map. In GML format the geographic information can be stored and shared. When a map based on the stored information is desired Extensible Stylesheet Language for Transformation (XSLT) will be used to create a file in a graphics language which can be opened with a web browser. Most likely the graphics language of choice will be Scalable Vector Graphics (SVG) but that is not certain. There are other graphics languages such as Structured Graphics Control and Vector Markup Language but these have problems, notably more limited capabilities and a lack of support by browsers other than Microsoft's Internet Explorer 5. The XSLT technology can be used to create PDF (Portable Document Format) files as well as SVG files. Portable Document Format appears to be emerging as a standard on the Web.

Bibliography

Appendix A: SML Application Languages

Some of the markup language which have been created using XML are listed below.

Some Markup Languages Derived From XML
AcronymNameFunction
GMLGeography Markup LanguageFor storing and sharing geographic documents such as maps
VMLVector Markup LanguageFor creating vector graphics for the Web
PGMLPrecision Graphics Markup LanguageFor creating vector graphics for the Web
SVGScalable Vector GraphicsFor creating vector graphics for the Web
SMILSynchronized Multimedia Integration LanguageFor displaying movies and sound via the Web
DrawMLDraw Markup LanguageFor creating vector graphics of simple diagrams involving rectangles and ellipses for the Web
HGMLHyper Graphics Markup LanguageFor creating vector graphics for devices such as cellular phones and handheld computers which have severely limited graphics capacities
VoiceMLVoice Markup LanguageFor processing inputs and outputs by voice
CFMLCold Fusion Markup LanguageFor creating access to a database via the Web
FpMLFinancial Products Markup LanguageFor processing information in the financial derivatives industry
WMLWeather Markup LanguageFor processing information in the weather-based financial derivatives industry
OILOntology Integration LanguageFor representation of formal semantics and logic on the Web
EMLElection Markup LanguageFor the interchange of data concerning voter registration and participation in elections
VocMLVocabulary Markup LanguageFor representation of thesauri in knowledge organization systems
XGMMLExtensible Graph Markup and Modeling LanguageFor the interchange of graphs on the Web
MathMLMathematics Markup LanguageFor storage, diaplay and transfer of mathematical equations on the Web
AMLAstronomical Markup LanguageFor recording and transfer of astronomical observations and information on the Web
HMLHuman Markup LanguageFor recording and sharing descriptions of humans in terms of physical, social, cultural and psychologica characteristics

The list seems endless but it is still growing rapidly.

Appendix B: The Formal Structure of Maps

Appendix B: Semiotics and Map Structure

There is still active intellectual interest in semiotics as indicatted by the number of articles which cite the word semiotics in the title or anywhere in the text as tabulated by INFOTRAC, a service which tabulates articles from 1500 magazines and journals. The summary statistics for articles citing the term semiotics are given below.

Number of Articles Citing Semiotics
PeriodSemiotics in Title of Article Semiotics used anywehere in Article
1980 through 198992166
1990 through 19993942196
2000 to present48344

There is little or no content that would be significant for encoding geographic information.

The interest in semiotics in geography seems to be fading away. In INFOTRAC there was only 10 instances of articles using both semiotics and geography for the period 1980 to 2001. But some of the terminology of semiotics is still used, particularly the term deep structure. Harold Moellering uses the term deep structure in a survey of analytical cartography but it is not clear that the term any longer has any link to semiotics. Deep structure has come to refer to the fundamental theory for a field. In the case of cartography the deep structure may well be topology. Another tem used by Moellering that is more relevant to the problem of the formal structure of maps is virtual map, meaning a map stored as information in a computer rather than a two dimensional display.

It was not reasonable to expect that a semiotic analysis of map would provide any useful advice on ways to improve map making and map use. Semiotics at best could give a new interpretation of what was being done in map language.

Some of the researchers who had an interest in semiotics are still pursuing something in the nature of a "deep structure" of geography but their work has more of a topological character than linguistics/semiotics one. Martien Molenaar and Jose A. Martinez Casasnovas develop a formalism for the structure of vector maps and apply it to a natural drainage system in Spain. The formalism is topological and while it is a worthwhile endeavor it is not likely to be useful for cartographers generally. The choice of a drainage basin as a test case was an excellent one in as much as river drainage system constitutes a geographic structure that is not easily represented formally. A drainage system has a fractal structure and does not lend itself to representation by networks.

The work by other scholars and researchers in the area of the formal representation of maps and geographic systems is interesting but it does not, at least at this stage, have much potential for providing an encoding of maps and perhaps it never will. Nevertheless it is interesting. Martien Molenaar, along with Yaser Bishr and M.M. Radwan published a study entitled. "Semantics of Parallel Object Hierarchies in a Multi-Scale Environmental Decision-Support System for Watershed Management," that semantics and semantic proximity in GIS theory. In "A Spatiotemporal Framework for Environmental Information Systems," H.A. Kucera and Mark Flaherty deal with the issues of encoding geographic information in a database format for institutions dealing with environmental problems. Likewise D.E. Richardson in his article, "Automatic Processes in Database Building and Subsequent Automatic Abstractions," deals with the encoding geographic information. Richardson constructs a language for this purpose but his language is not in the nature of a markup language but instead what in computer science is called Abstract Data Types. John van Smaalen deals with the problem of the structure of geographic databases in his article, "Spatial Abstraction Based on Hierarchical Re-classification," where he deals with the object-oriented approach to databases and the programming language coding needed to implement it.

Appendix C: Taketa's Level Three Description of a Map

An attemppt at representing the structure of Taketa's Level Three description is shown below:

Definitions of Functions

With these primitive or basic functions it is possible to define the more complex functions and sets Taketa used in his Level Three description. There are three sets defined:

Taketa envisions a situation in which the only data available is the location of various features on a map. It not assumed that height and depth information is known for every point. Therefore it is not possible to define the set of shallow water points as being those whose depth is between the mean sea level and the depth of the isobath. The set of shallow water points must be defined in terms of the two dimensional characteristics of the map points. He does this by specifying that the shallow water points are all those between curve for the mean sea level and the curve for the isobath. The rigorous geometric definition of inbetween would have to be something of the sort that any straight line through the point in question first intersects the curve for the mean sea level on one side and the curve for the isobath on the other side.

The deep water points are all water points that are not in the shallow water set. The definition of the set of Wh points, the water points at the coast line would require some careful analysis.

Appendix D: XSLT

To see the general nature of how XSLT functions consider the very simplistic GML source file shown below:

stanford.xml source file
  • <?xml version="1.0" encoding="utf-8"?>
  • <gml>
  • <feature>
  • <geometricProperty>
  • <point>
  • <coordinates>
    37.3,-122.5
  • </coordinates>
  • </point>
  • </geometricProperty>
  • <name>
    Stanford University
  • </name>
  • </feature>
  • </gml>

stanford.xsl style file
    <stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
    <output method="text" encoding="utf-8"/>
    <template match="point">
    Coordinates: <apply-templates select="coordinates"/>
    </template>
    </stylesheet>

When the Saxon XSLT processor is run using the above as source and style file the result is:




   Coordinates: 37.3,-122.5





Stanford University

The style file selects the <coordinates> tag and reads the data it contains. The result is printed out right after the string Coordinates: which was included in the style file. Generally the results of running the XSLT processor is a tree, called the results tree.

Appendix E: Encoding Geometric Features Using GML

Examples of Geometric Property Encloding in GML

Simple Geometric Features

First consider some simple geometric features. Suppose there is point at x,y coordinates (10,25), a line from (0,0) to (15,30) to (40,70) and a triangle polygon with corners at (30,40), (70,80) and (50,120). These features would be encoded at follows:


<Point>
<coordinates>
10,25
</coordinates>
</Point>


<LineString>
<coordinates>
0,0 15,30
</coordinates>
</LineString>


<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
30,40 70,80 50,120
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>

Collections of Geometric Features
Suppose there were the locations of three cities that were to be grouped together, say (10,25), (15,40) and (20,45). This set of three points could be encoded as:

<MultiPoint>
<pointMember>
<Point>
<coordinates>
10,25
</coordinates>
</Point>
</pointMember>
<pointMember>
<Point>
<coordinates>
15,40
</coordinates>
</Point>
</pointMember>
<pointMember>
<Point>
<coordinates>
20,45
</coordinates>
</Point>
</pointMember>
</MultiPoint>

The coding for a MutiLineString and MultiPolygon would be similar. A GeometryCollection would require additional tags for <geometryMember> </geometryMember>.

In the above examples each tag was put on a separate line for clarity but there is no necessity of doing this. Each encloding could have been put on a single line and it would still be valid, as would any formatting between these two extremes.

Appendix F: Examples of Feature Encloding in GML

<Feature typeName="Region">
<name>
Silicon Valley </name>
<description>
The region of the San Francisco Bay Area noted for its high technology industry. It is largely a part of Santa Clara County, California but it also includes portions of San Mateo County.
</description>
<boundedBy>
<Box>
<coordinates>
-123.5,38.7 -121.7,37.1
</coordinates>
</Box>
</boundedBy>
</Feature>

The meaning of the coordinates has to be given elsewhere in the GML file. The code sequences given above are merely fragments of a GML file. Also most references to the attributes of elements has been purposely left out in order to simplify the presentation. The matter of element attributes will be dealt with below.

For another example of a feature, one that has a more complicated structure, consider Highway 101, a freeway which traverses the Silicon Valley.

<Feature typeName="Freeway">
<name>
Highway 101 in the Silicon Valley </name>
<description>
A major highway running through Santa Clara County, California with a northwest-southeast orientation
</description>
<geometricProperty>
<LineString>
<coordinates>
-123.5,38.7 -121.7,37.1
</coordinates>
</LineString>
</geometricProperty>
</Feature>

Appendix G: Resource Description Framework

Appendix H: Vector Graphics Languages