First cut

From: Ralph Hartley (hartley@aic.nrl.navy.mil)
Date: Mon Mar 26 2001 - 16:38:35 CEST


Attached are two documents that contain my first cut at defining a
proposed CaveXML format. I consider semanics.html to be the more
important of the two.

There are several things I want to change, but to avoid contaminating
your opinion, I'm not going to tell what they are. Both documents are
incomplete and contain numerous ommisions, errors, and typos. The dtd
did pass a validity check, but that doesn't mean it is correct. None of
the examples in semantics.html have been tested.

Some advanced features (personel, instuments etc.) have been
deliberately postponed. The current version is intended to be sufficient
to allow me to do my test implementation, which will be a converter
to/from CMAP format. Many attributes and elements that CMAP would not
understand have been left out.

I also tried to include the more problematical features, because they
are expected to require the most evolution.

The documents can also be found at http://www.psc-cavers.org/cavexml/ .
Updates will be visible there as soon as they are made, and will be
announced when they are significant.

Comments are welcome.

Ralph Hartley


Semantics of CaveXML

Status of this document

This document is a draft. It is intended to form a part the specification of a proposed CaveXML format. It should be understood in the contex of the other documents that make up that proposal.

This draft document is known to be incomplete. There are many features required for a final version of CaveXML which are not included. There also may be inconsistancies, bugs, or simply bad design decisions.

Comments should be emailed to the author Ralph hartley .

Purpose of this Document

This document describes the semanics of CaveXML.

Definitions

Defaults

Some elements in a survey file have attributes with values which often have the same value for all, or large portions of a file. CaveXML allows overridable defaults that allow values of attributes to be set for sections of the file.

The interpretation of files containig Default elements is determined by the DefaultTransform. Any two files for which DefaultTransform produces identical results are defined to be BasicEquivalent. Some sections of this document will describe the semantics of files that contain no Default elements, the corresponding semantics for files which do contain Default elements are defined to be the same as the result of applying the DefaultTransform to them.

The DefaultTransform is not information preserving. It results in a file which has a very different structure from the original. In fact it is not recommended that the DefaultTransform ever be applied to any file, it is intended only to describe the semantics of defaults.

Definition of DefaultTransform

The DefaultTransform starts with the innermost nested Default element:

<Default element="<element>" <attribute>="<value>" ... >

It adds <attribute>="<value>" to the start tag of every nested element of type <element> that has no attribute named <attribute>. This is repeated for each attribute value pair in the start tag of the Default element. The start and end tags of the Default elemtent is then removed.

The above is repeated untill no Default elements remain.

Except for the element attribute, Default elements should not have attributes that are not legal for the named element type. Such attributes, if present, may be ignored; they have no defined meaning.

Examples

Original
DefaultTransform
Comment
<Default element="Distance" unit="feet">
<Distance value="12.6"/>
</Default>
<Distance value="12.6" unit="feet"/>

<Default element="Distance" unit="feet">
<Default element="Distance" unit="meters">
<Distance value="5.3"/>
</Default>
</Default>

<Distance value="5.3" unit="meters"/>

The value is allways supplied by the innermost nested Default element that matches.
<Default element="Distance" unit="meters">
<Distance value="12.6" unit="feet"/>
</Default>

<Distance value="12.6" unit="feet"/>

Defaults only supply values to elements that do not have an attribute of their own
<Default element="Inclination" unit="mils">
<Azimuth value="12.6"/>
</Default>

<Azimuth value="12.6"/>

Defaults only supply values to elements of matching type.

Provenance

Often the origin of a data set is as important as its contents. A data set obtainded directly from a known, reliable, source is more trustworthy than one of unknown origin, or obtained second hand. Also, it is important to record which data represents raw measurements and which is the results of processing. The Provenance element is used to describr the origin and history of all or some of the data.

As a rule, whenever a substantial change is made in any data, or it passes from one party to another, it should be wraped in a new Provenance element. The nesting of Provenance elements should describe the data's history, with earlier stages nested within latter.

It is also permited to nest processed data within the data from which it was computed. In that case the "original" attribute should be used.

Attribute
Meaning
source
The name of the person or organization that supplied the data.
description
A brief text discription of the data's origin.
converter
If the data was converted from another format, the value of this attribute should be the name of the conversion program.
format
If the data was converted from another format, this attribute should contain the name of that format.
program
If this data was produced by a program, except for a generic (not CaveXML specific) editor or a converter, this attribute should contain its name.
process
raw
(default) The element contains raw survey data.
unclosed The data has been processed to produce a line plot, but has not had loop adjustment performed.
closed
The data is the result of loop adjustment.
processdetail
Other information describing the processing or conversion that produced this data. The meaning of this attribute is specific to the program.
original
If the data is the result of processing or conversion, this may contain a link to the data it was produced from. If the data was copied from another source, this may contain a link to the original. The interpretation provenance can be complicated if the original is part of the same file and/or has Povenance elements of its own.
reliablity
A qualitative measure of the reliability of the data. This does not indicate the expected precision or accuracy of the measurements themselves, instead it is a measure of the integrity and trustworthyness of the source. The reliability of the data is no better than the worst of the nested provenance elements.
ok
The source is of normal reliablity.
suspect
The source is suspect, doubtfull, or unknown, but is not known to be worthless.
error
The data is known to be inacurate or mangled. I may be kept for archival or historical purposes but it should not be trusted or combined with other data.
date
The date of the transfer, conversion, or processing. Format is ISO 8601.
sourcedata
The actual data from which the data was derived.

This document should also contain a standard for naming converters, programs, and other formats.

A single Provenence element is permited to have all of the above attributes.  However, this does not record the order of the changes, and is discouraged. (note: future drafts may consider using more specialized elements for each aspect of provenance.)

Stations

A station consists of a set of Station elements. It is defined to contain one element with an ID attribute, together with all the Station elements with a matching reference attribute.

For the purpose of this discusion, it is assumed that every Station element has either an ID or a reference attribute. If not, the file is to be interpreted as described in the next section.

No significance should be placed on which Station element in a station is the one with an ID. Which station has the ID attribute does not change the meaning of the file in any way.

Each Station element provides information about the station of which it is a member. This information may be encoded in the position of the element within other elements, or in its attributes, or in the elements it contains.

IDs

All stations are assigned unique IDs, ether explicitly or implicitly. Any Station element has etither an ID or a reference that identifies the station is is a member of. If all Station elements have either an ID or a reference attribute, it is a member of the station whith the matching ID.

For some programs or converters, or when entering data by hand, it may be inconvienient to assign IDs to stations, as this requires ensuring that the IDs are unique and that exactly one Station element has an ID attribute for each ID. The mechanism described in this section is designed to mitigate this problem. Nothing below should be construed as implying that IDs should be assigned in any particular way. For station elements that have an ID of reference attribute, there may be no connection between the name attribute and the ID.

Any Station element that lacks both an ID and a reference attribute has <prefix><name> (the concatination of the values of the prefix and name attributes) as its ID. The values of the prefix and name attributes may also be supplied by a nesting Default element. For the purposes of this section all stations will behave as if nested within an outermost Default element providing the name "noname" and the prefix "STA".

The meaning of any file in which some Station elements lack both ID and reference elements is the same as that of a file in which each such element has had an ID or reference element added. The value of the attribute is the concatenation of the values of the values of the element's prefix and name attributes. For each such value, if the file does not already contain a Station element with an ID attribute with that value, then exactly one of the elements has an ID attribute added, all other elements have a reference attribute added. The transform also adds the attribute generatedname="TRUE" to each station that has an ID or reference added. Two files that give the same result when applying this transformation (IDTransform) are defined to have the same meaning.

Examples
Original
Equivalent
Comment
<Station name="AB1" ID="foobar"\>
<Station name="AB1" ID="foobar"\>
Stations with explicit IDs don't change. Name is independant of ID.
<Default element="Station" prefix="Asurvey">
  <Station name="1"/>
  <Station name="2"/>
  <Station name="1" prefix="Bsurvey\>
</Default>
<Station name="1" ID="Asurvey1" generatedname="true"/>
<Station name="2" ID="Asurvey2" generatedname="true"/>
<Station name="1" prefix="Bsurvey reference="Bsurvey1" generatedname="true"\>
Prefix or name may be provided by a default, which may be overriden.
<Station ID="foobar"/>
<Station prefix="foo" name="bar"/>
<Station ID="foobar"/>
<Station reference="foobar" prefix="foo" name="bar" generatedname="true"/>
Only one ID per value.
<Station/>
<Station ID="STAnoname" generatedname="true"/>
Global defaults

Locations

Location elements locate a station when they are nested within one of its Station elements, or when they refer to its ID.

<Station reference="STA25">
<Location northing="23.6" easting="15.2" elevation="-23"/>
</Station>

and

<Location station="STA25" northing="23.6" easting="15.2" elevation="-23"/>

Are both legal, but they do not have exactly the same meaning, the first locates a station element and the second the whole station. The distinction is irrelevant for raw data, but for unclosed coordinates it matters.

In future versions of this proposal, Location elements will be permitted to describe the position of a station in other ways, e.g. lat/log or GPS fixes.

Equivalents

Sometimes stations that were initially believed to be distinct turn out to be the same point. They could be identified by assigning them the same ID (directly or using the mechanism described in the section on IDs) but this is dangerous. It does not permit the stations to be easily disentangled if the identification turns out to be incorrect.

The following are all BasicEquivalent:

Nested within one station, the other named by an attribute:

<Station reference="STA1">
  <Equivalent reference="STA2"/>
</Station>

Nested within one station, containing the other.

<Station reference="STA1">
  <Equivalent>
    <Station reference="STA2"/>
  </Equivalent>
</Station>

Directly between two sibling Station Elements.:

<Station reference="STA1"/>
<Equivalent/>
<Station reference="STA2"/>

They all state that the station with ID STA1 is equivalent to the station with ID STA2.

Shots

Shots, like stations, may consist of more than one Shot element, connected by an ID. Because it is common for a given shot to be mentioned only once, however, no mechanism for providing default IDs for shots is provided.

From and To

A Shot element connects two stations, refered to as the "from" and "to" stations.

The first source on this list that applies determines the from station.
Source
The station refered to by the from attribute.
The first of two Station elements nested inside the shot.
A Station element that is a sibling of the Shot, that precedes it, and is not separated from it by any station or shot elements.
The Station element that most closely precedes the Shot element in the file.
The Station element that most closely follows the Shot element in the file.

The last two are for completeness, their use is discouraged.

The first source on this list that applies determines the to station.
Source
The station refered to by the to attribute.
The second of two Station elements nested inside the shot.
A Station element that is a sibling of the Shot, that follows it, and is not separated from it by any station or shot elements.
The Station element that most closely follows the Shot element in the file.
The Station element that most closely preceeds the Shot element in the file.

The last two are for completeness, their use is discouraged.

Distance

The distance measurement of the shot.

Azimuth

There are two elements that may be used to describe azimuth, ForAzimuth and BackAzimuth. ForAzimuth is normally from the "from" station to the "to" station, while BackAzimuth is the reverse. The reason for having separate "for" and "back" elements is that defaults (personel, insturments etc) can be attached to them separately.

If the reversed attribute is "true" the direction of the shot is reversed, that is it is taken from the oposite station than normal. This is intended for shots where the surveyers temporarily trade places, or two forward readings replace a forsight and backsight.

If the inverted attribute is "true" the reading recorded is 180 degrees different than normal. This is intended to indicate "corrected" backsights.

All of the following indicate the same direction between from and to stations:

<ForAzimuth value="23"/>
<BackAzimuth value="203"/>
<ForAzimuth reversed="true" value="203"/>
<BackAzimuth reversed="true" value="23"/>
<ForAzimuth inverted="true" value="203"/>
<BackAzimuth inverted="true" value="23"/>
<ForAzimuth inverted="true" reversed="true" value="23"/>
<BackAzimuth inverted="true" reversed="true" value="203"/>

If other units are used the reversed value will be whatever the oposite direction is in those units.

Inclination

The interpretation of ForInclination and BackInclination are almost exactly the same as ForAzimuth and BackAzimuth (except that in default units the inverse of +5 is -5).

Cross Sections

Cross section data that might be saved in a file franges from simple "lrud" values at stations to detailed drawings of cross sections or profiles. This proposal only covers the simplest case, but the same principles could be applied to exended versions.

Cross sections can only be unabiguously defined with reference to both a position and an orientation. In CaveXML the position must come from a station and the orientation from a Shot, or from 2 shots combined.

The orientation for a Cross Section may come from any of the folowing sources. If more than one is present, the one that appears first on this list.

Source
The direction indicated by the value of the orientation attribute.
The the direction of the Shot element the CrossSection element is nested inside.
The direction of the unique Shot element nested inside the CrossSection element.
The average of the directions of the two shot elements nested inside the CrossSection element.
The direction of the single shot refered to by the shot attribute.
The average of the directions of the two shots refered to by the shot attribute.
The average of the directions of the two Shot elements that are siblings of the CrossSection element, and are not separated from the CrossSection element by any Shot or Station elements.
The direction of the unique Shot element that is a sibling of the CrossSection element, and is not separated from the CrossSection element by any Shot or Station element.
The average of the directions of the two Shot elements that are siblings of the Station element within which the CrossSection element is nested, and are not separated from that Station element by any Shot or Station elements.
The direction of the unique Shot element that is a sibling of the Station element within which the CrossSection element is nested, and is not separated from that Station element by any Shot or Station elements.
The direction of the Shot element that most closely precedes the CrossSection element in the file.
The direction of the shot element that most closely follows the CrossSection element in the file.

The last two are for completeness, their use is discouraged.

If a Shot has more than one azimuth value, they are averaged (converted to forward directions). If a shot has no azimuth value, it is ignored.

Two directions x and y are averaged by finding the direction x such that the sum of the squares of the angles between each direction and x is minimized. If the directions are exactly 180 degrees apart, the lower of the two values (in 0-360) is used. (Note: if cross-sections are actually recorded in exact directions, it is best to specify the direction using the orientation attribute.)

The station for a Cross Section may come from any of the folowing sources. If more than one is present, the one that appears first on this list is used.

Source
The station the CrossSection element is nested inside.
The unique Station element nested inside the CrossSection element.
The station reffered to by the station attribute.
If the position attribute is "from", the from station of the single shot used to determine the orientation of the CrossSection.
If the position attribute is "to", the to station of the single shot used to determine the orientation of the CrossSection.
The unique Station element that is a sibling of the CrossSection element, and is not separated from the CrossSection element by any Shot or Station element.
The Station element that most closely precedes the CrossSection element in the file.
The Station element that most closely follows the CrossSection element in the file.

The last two are for completeness, their use is discouraged.

Processed Data



<?xml version="1.0" encoding="UTF-8"?>

<!ENTITY % DistUnit "feetdecimal|feetinches|meters">
<!ENTITY % HUnit "feetdecimal|feetinches|meters">
<!ENTITY % VUnit "feetdecimal|feetinches|meters">
<!ENTITY % AzUnit "degrees">
<!ENTITY % IncUnit "degrees|percent">

<!ELEMENT Default ANY>
<!ATTLIST Default
  element (Station|Location|ForAzimuth|BackAzimuth|ForInclination|BackInclination|Distance|Shot|Eqivalent|Section) #REQUIRED
>

<!ELEMENT Provenance ANY>
<!ATTLIST Provenance
  source CDATA #IMPLIED
  description CDATA #IMPLIED
  converter CDATA #IMPLIED
  format CDATA #IMPLIED
  program CDATA #IMPLIED
  process ( raw | unclosed | closed ) "raw"
  processdetail CDATA #IMPLIED
  reliablity (ok|suspect|error) "ok"
  date CDATA #IMPLIED
  sourcedata CDATA #IMPLIED
>

<!-- original xlink:href #IMPLIED -->

<!-- Data in another format that was derived from the element in which
  it is nested. When the other format is the original from which CaveXML
  was derived, use sourcedata in a Provenance element -->
<!ELEMENT Foreign ANY>
<!ATTLIST Foreign
  format CDATA #REQUIRED
>

<!ELEMENT Text ANY>
<!ATTLIST Text
  type (CDATA|comment) "comment"
>

<!ELEMENT Station ANY>
<!ATTLIST Station
  name CDATA "noname"
  prefix CDATA "STA"
  ID ID #IMPLIED
  reference IDREF #IMPLIED
>
<!-- should only allow prefixes that are valid prefixes of IDs -->

<!ELEMENT Equivalent ANY>
<!ATTLIST Equivalent
  station IDREF #IMPLIED
>

<!ELEMENT Location ANY>
<!ATTLIST Location
  station IDREF #IMPLIED
  easting CDATA #IMPLIED
  northing CDATA #IMPLIED
  elevation CDATA #IMPLIED
  hunit (%HUnit;) #IMPLIED
  vunit (%VUnit;) #IMPLIED
>

<!ELEMENT Shot ANY>
<!ATTLIST Shot
  from IDREF #IMPLIED
  to IDREF #IMPLIED
>

<!ELEMENT ForAzimuth ANY>
<!ATTLIST ForAzimuth
  value CDATA #REQUIRED
  reversed (true|false) #IMPLIED
  inverted (true|false) #IMPLIED
  unit (%AzUnit;) #IMPLIED
>

<!ELEMENT BackAzimuth ANY>
<!ATTLIST BackAzimuth
  value CDATA #REQUIRED
  reversed (true|false) #IMPLIED
  inverted (true|false) #IMPLIED
  unit (%AzUnit;) #IMPLIED
>

<!ELEMENT ForInclination ANY>
<!ATTLIST ForInclination
  value CDATA #REQUIRED
  reversed (true|false) #IMPLIED
  inverted (true|false) #IMPLIED
  unit (%IncUnit;) #IMPLIED
>

<!ELEMENT BackInclination ANY>
<!ATTLIST BackInclination
  value CDATA #REQUIRED
  reversed (true|false) #IMPLIED
  inverted (true|false) #IMPLIED
  unit (%IncUnit;) #IMPLIED
>

<!ELEMENT Distance ANY>
<!ATTLIST Distance
  value CDATA #REQUIRED
  unit (%DistUnit;) #IMPLIED
>

<!ELEMENT Section ANY>
<!ATTLIST Section
  orientation (CDATA | N | NNE |NE|ENE|E|ESE|SE|SSE|S|SSW|SW|WSW|W|WNW|NW|NNW) #IMPLIED
  shot IDREFS #IMPLIED
  station IDREF #IMPLIED
  position (from|to) #IMPLIED
  unit (%DistUnit;) #IMPLIED
  left CDATA #IMPLIED
  right CDATA #IMPLIED
  up CDATA #IMPLIED
  down CDATA #IMPLIED
>

<!ELEMENT CaveSurvey ANY>



This archive was generated by hypermail 2b30 : Mon Apr 02 2001 - 18:00:01 CEST