From: Ralph Hartley (hartley_at_aic.nrl.navy.mil)
Date: Fri Feb 23 2001 - 15:21:49 CET
Received: (from mdom_at_localhost) by karto.ethz.ch (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id PAA30799 for cavexml-outgoing; Fri, 23 Feb 2001 15:51:54 +0100 Received: from sun0.aic.nrl.navy.mil (sun0.aic.nrl.navy.mil [132.250.84.10]) by karto.ethz.ch (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id PAA30494 for <cavexml_at_cartography.ch>; Fri, 23 Feb 2001 15:18:16 +0100 Received: from aic.nrl.navy.mil (pc31.aic.nrl.navy.mil [132.250.84.181]) by sun0.aic.nrl.navy.mil (8.9.3+Sun/8.9.3) with ESMTP id JAA04600 for <cavexml_at_cartography.ch>; Fri, 23 Feb 2001 09:18:18 -0500 (EST) Message-ID: <3A9671FD.8070506@aic.nrl.navy.mil> Date: Fri, 23 Feb 2001 09:21:49 -0500 From: Ralph Hartley <hartley_at_aic.nrl.navy.mil> User-Agent: Mozilla/5.0 (X11; U; Linux 2.2.16-22 i686; en-US; m18) Gecko/20010124 X-Accept-Language: en To: cavexml_at_cartography.ch Subject: Re: Stations are primary References: <Pine.GSO.4.05.10102221318120.26224-100000_at_cor.oz.cc.utah.edu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-cavexml_at_karto.baug.ethz.ch Precedence: bulk Reply-To: cavexml_at_cartography.ch
John Halleck wrote:
> On Thu, 22 Feb 2001, Ralph Hartley wrote:
>
>> The more I have thought about CaveXML, the more I am drawn to the
>> conclusion that the primary element must be the station, not the shot.
>
> Note that the LS adjustment is naturally point (station) oriented.
>
> As my internal angles paper also indicated, rearranging
> data from shots (as recorded) to point oriented is needed to do some
> forms of error analysis (looking for magnetic anomolies).
Wow! Is that a coincidence, or *what*!
I didn't emphasize the that point because I didn't think the people who
need to be convinced (I knew you would agree) would be that impressed by
it.
> That is one of the things that having a unique station number can
> be used for. (And having the input program handle equivalences
> by making the numbers the same means that later programs don't
> have to care.)
That's why I think unique names are essential. Making those names be
numbers is ok, but doesn't buy as much. Also, ID attributes are
required to be unique by to be valid xml, so existing tools should check
them.
An "Equivalence" element is still needed because there has to be a
nondestructive way to say that two stations are the same. By
nondestructive I mean that there needs to be an easy way to back out the
identification if it turns out to be wrong. I assume that ordinary data
entry tools would use the unique names to identify stations, reserving
the Equivalent element for use when the identification is uncertain, or
possibly when the stations are in other files (which cannot be assumed
to respect our unique names).
>>
>> Alternatively, From and To could be attributes with type IDREF. That
>> depends on how grouping and default values for measurements are done,
>> which is an orthogonal issue.
>
>
> Although having them as IDREF's constrains the form that they may have.
Yes, but the station IDs are for internal use only (within a file), so
constraining them is OK. The central point of my proposal is that
stations need to be referred to by IDREFs, which are required to be
unique. We can't legislate what sort of names will be given to stations
by the surveyors, or their software, so we can't trust those names to
have any properties whatsoever. If programs using the file want to let
the users refer to stations by name (in some sort of consistent naming
system) they may do so, without interfering with other programs
understanding the data, because when writing the CaveXML file they have
to convert those names to IDREFs, which is the only way to refer to
stations in CaveXML.
>>
>> The down side of this is that the CaveXML file contains elements that
>> don't come directly from the survey book. An editor, when a shot is
>> added, needs to check if the stations already exist. If so it uses the
>> old id for the station, otherwise it needs to generate a new station
>> element.
>
>
> Some program is going to have to do this some time, it may as well
> be the original editor, so that the other programs need not care.
Using IDREFs means that any program that outputs CaveXML has to do it.
If a program started with a CaveXML file (if the program is not the
original editor) it MAY simply use the same IDs as in the input file, or
it can renumber them in any way it sees fit.
There needs to be some way to know which stations referred to in the
notes are the same, it is needed now, and it will be needed with
CaveXML. The way it works now is the that the programs understand the
particular naming scheme used by the users community. This will not
(cannot) change. The first program that outputs data in CaveXML format
has to know which stations are which, there is really no getting around
that.
> What we don't have is a data processing model that spells out
> what tasks are which program's responisbility.
Nor can we. We have no idea what sort of programs will even be used, how
can we divide up responsibility between players we can't even identify?
Even if we could do that now, tasks could be divided up differently in
the future.
By spelling out some simple things that all programs producing CaveXML
files need to do (and making them easy enough that implementations will
actually do them) we can hope to allow flexibility in what is done, and
by whom.
The Prime Directive remains "Thou shalt not remove what thou dost not
understand". We also have "The user can do what he wills". I want to add
a corollary, "Don't depend on the users names to keep stations
straight", which requires unique internal names, and to enable the Prime
Directive requires a way to keep track of user's names, even without
understanding them.
>
>> The up side is that all this is very easy to generate automatically.
>
> If they are numbers (with a possible prefix), and they run 1..N when
> there are N unique points, then the algorithm to combine two files
> and preserve that property in the result is trivial.
True. But if the names are unique, even if not numbers, or not
consecutive, it is still not exactly rocket science.
>
>
>
>> I don't see a real need for the station id's to follow any particular
>> pattern, sequential numbers, globally unique identifiers etc., but of
>
>
> If they do (as I mentioned above) then it can make for simpler algorithms
> for later processing programs.)
If you want sequential numbers, assign them as you read the file. Many
programs don't. Requiring it in the standard means that the generating
program has to do it anyway. Also, IDs in xml are a simple way to
require uniqueness. Do existing xml tools verify things like you propose?
Ralph Hartley
This archive was generated by hypermail 2b30 : Thu Mar 01 2001 - 18:00:01 CET