Devin Kouts wrote:
> Ralph Hartley wrote:
>What I mean is that Bad Things shouldn't automatically happen when an
>application that only understands baseCaveXML is given a file that
>contains extended data.
OK I follow you now. Bad things shouldn't happen if the newCaveXML file
complies with the DTD or schema of a baseCaveXML file. Extending the
basic schema to add new data will have no delitarious effect on a
processing application (a strength of XML).
>> Valid data must flow from an authoritative source. If I give you the
>> Twisted FIssure data in CaveXML and you use an app that bungs it all
>> up, no big deal. Just contact me (or the authoritative source) and
>> I'll give you another copy of the original data.
>
>Can your great great granddaughter do that? People die, they loose
>interest, they loose their own data. Also, my app may add information
to
>the file that I need, but isn't in the original.
The solution to preserving data, and confirming it's validity, does not
lie through the levying of processing requirements when handling that
data. If I came across Twisted Fissure data 10 years from now that had
traded hands repeatedly and gone through god only knows what processing,
I wouldn't trust it. Going back to the original source of the data is a
better option, but fraught with the perils you point out. The solution
may lie in a library service devoted to the preservation of cave survey
data, such as I put together at www.psc-cavers.org/wvcs (which could
easily be improved upon).
I won't disagree with your requirement but I feel it won't be very
achievable.Data manipulation is fraught with peril, and unless we use
some kind of checksum on the original data file, there's little way of
telling whether a bit has been twiddled or not.
>Note, that here I'm talking about otherformat->CaveXML->otherformat.
>Number 2 covers the (trickier) CaveXML->otherformat->CaveXML round
trip.
OK, I've got you now. I was thinking about that a few months back when I
looked at all the proprietary constructs that exist in the various
legacy data file formats. There may be some desire to preserve those
constructs in CaveXML and this would suggest a classic case of
"extenstion to the baseline". For instance, the CaveXML baseline would
be capable of storing all the data commonly found in most cave surveys,
but in order to store some proprietary tags common to a survex data
file, then survex extenstion to the CaveXML format would need to be
defined.
I would not incorporate those extenstions into the baseline. But by
extending the baseline you create a data file that any other XML reading
app can handle and a survex format concious application can use to get
to those proprietary survex things.
>But it should also be permitted to delete data from the file itself.
>Some people will want to share lineplots, but not raw data. This is the
>converse of the some of the other requirements. They say it should me
>permitted to INCLUDE things, this says it should be permitted to
EXCLUDE
>things.
OK, not sure I like you example but OK. Maybe you're hinting at data
coupling, and that the removal of some data should not make other data
unuseable.
>But you left out quite a bit. For instance, all of those programs can
>store closed and unclosed lineplots in some form.
Hot dog, I was waiting for the opportunity to crack this nut!
**rant on**
This is a big hairy issue that we really have to try and solve now,
before CaveXML drowns of featuritis. To set the stage for this little
rant please refer to the context level DFD I put together at
http://www.psc-cavers.org/xml/CaveSurveyDFD.html
This diagram illustrates the high level processes we go through in cave
survey to arrive at a lineplot. You'll see three types of data store
along the way, survey book, basic electronic storage, and enhanced
electronic storage.
I maintain that the CaveXML baseline should be designed to normalize the
basic electronic storage data store. Many of the suggestions I've seen
on the CaveXML list for adding features to CaveXML are clearly
extenstions to basic survey data, features which usually only occur by
some form of data processing (processing node #3). Examples include
creation of ID's or organizing shots into <Groups>. I'm not suggesting
these things are invalid constructs but they do not occur as a result of
the survey process (node 1), they don't appear in the survey book data
store, and you can't force them to occur in the data entry process (node
2) (because we have a half dozen or more survey editor developers out
there).
Your statement "all those programs store closed and unclosed lineplots"
is true, but it lacks precision. Things like lineplots are a result of
processing that occurs in node 3 of the DFD. The data necessary to
constructing the lineplot, an output of that processing is NOT stored in
the basic electronic data store. Instead most (if not all) of these
applications create a new, unique file that holds the output of the data
processing, i.e. an Enhanced Electronic Storage file. As it should be.
This is what I'm getting at when I discuss the CaveXML baseline vs.
extenstions to the CaveXML baseline. Extensions to CaveXML are
acceptable if you wish to build them but I wouldn't recommend sticking
your post processing back into your original data file. Instead I would
suggest creating a "ClosedLoopXML", a "LineplotXML", etc.
I really wish we could settle on the basic CaveXML, the fundamental data
that results from process nodes 1 and 2, before we move on to storing
data that results from later processes.
**rant off**
>I don't really want to argue this point (again) in this thread. Briefly
>my point is, if you already have unique names for stations, it isn't
>much extra code. If you don't, then all the kings horses and all the
>kings men can't "generate them as you need to".
>
>Perhaps, you could allow reference to stations using only names IF you
>included in the spec "Station names must be unique according to the
>following definition of uniqueness ...". This would violate Principle
11
>(see below).
See my rant above.
>I was one of them, remember? What I tried to write here are a bit
higher
l>evel, more in the nature of guiding principles.
Yeah I remember, you were one of the best contributors too, thanks.
>I will also add:
>
>11 It should be possible to determine if a file is a valid CaveXML file
>using only commonly available tools. This means the definition of a
>valid CaveXML file should be completely specified by a DTD or schema.
Agreed!
See you underground in Germany Valley this weekend!
-- Devin Kouts Caver Systems Engineer www.psc-cavers.org
This archive was generated by hypermail 2b30 : Mon Apr 02 2001 - 18:00:01 CEST