From: devinkouts_at_earthlink.net
Date: Tue Jan 16 2001 - 16:12:38 CET
Received: (from mdom_at_localhost) by karto.ethz.ch (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id QAA29006 for cavexml-outgoing; Tue, 16 Jan 2001 16:11:08 +0100 Received: from [209.70.170.131] (brick.cist.saic.com [209.70.170.131]) by karto.ethz.ch (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) with SMTP id QAA29002 for <cavexml_at_cartography.ch>; Tue, 16 Jan 2001 16:11:06 +0100 From: devinkouts_at_earthlink.net Received: from cist.saic.com by [209.70.170.131] via smtpd (for karto.ethz.ch [129.132.127.159]) with SMTP; 16 Jan 2001 15:11:09 UT Received: from earthlink.net (unverified [10.43.39.246]) by exmail.cist.saic.com (EMWAC SMTPRS 0.83) with SMTP id <B0000707360_at_exmail.cist.saic.com>; Tue, 16 Jan 2001 10:12:03 -0500 Message-ID: <3A6464E6.58B17C34@earthlink.net> Date: Tue, 16 Jan 2001 10:12:38 -0500 X-Mailer: Mozilla 4.6 [en] (WinNT; U) X-Accept-Language: en To: cavexml_at_cartography.ch Subject: Re: CDFO: Raw Data and XML Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-cavexml_at_karto.baug.ethz.ch Precedence: bulk
OK Ralph,
I admit I made my proposal too narrow. I was working under the
assumption that few people would be interested in trading CaveXML files
full of raw data when all they really wanted was the survey numbers to
begin with. You've reminded me of a basic and valuable lesson, developer
needs are rarely equal to user requirements, so try to avoid limiting
their options. I can agree that the ability to imbed the raw data in the
file should be available, alongside the ability to link to a remote copy
of the raw data. There's no reason to limit the user's options.
Devin Kouts
I am afraid I have to disagree strongly with Devin on this point. If a
file can contain a pointer to other data, it MUST be possible for the
file to contain the data itself.
Devin Kouts wrote:
> I can understand the desire to create an ability to cross reference
> data stored in XML with the original survey data. But in this
> proposal there's no guarantee that the data you're cross referencing
> to, in Survex (or Compass or Onstation, etc.) format, does not
> possess transcription errors of its own. In fact it's not uncommon to
> see transcription errors occur in-cave, as the numbers are written
> into the book.
>
> In order to serve the original "need", i.e. some method of comparing
> the XML represented data to original data for quality assurance, I
> submit the following argument and proposal.
>
> First, handwritten data must be stored electronically to ease its
> "online" use and reuse. Because of the graphic nature of cave survey
> data, and the limitations of today's survey data collection
> technologies, the only real solution to making data available online
> and "complete" is the creation of a digital image of the original
> survey notes (e.g. scan or photographic). See an example at:
> http://www.psc-cavers.org/kouts/RAS/RASNotesIndex.html
On line data has two problems, either one of which is sufficient to make
dependance on it unacceptable.
It is ephemeral. Web sites require active resources to maintain. They
can disapear for many reasons, from legal disputes to loss of interest.
Even within a site pages move around as the priorities and organization
of their owners change. Even if they stay at the same place, the data
itself can change. Imagine the confusion that can result when a
processed data file claims to be derived from raw data, but the raw data
is later "corrected". The new version of the data may be more acurate,
but the process that produced the processed file now apears to be
broken, because its output doesn't match its purported input.
It is public. Not everyone is willing to share their raw data. Even
excluding secret cavers (possibly not a bad thing), publishing raw data,
especially for works in progress, is not a universal practice, and with
good reason. Not everyone is friendly with everyone else. Releasing raw
survey notes could allow a competing group to, for example, publish
their own map based partially on someone else's work, or (far more
likely) to disparage someones's work based on (invitable) flaws in their
normally private data.
Of course, on line data can be protected by limits to access (passwords
etc) but that only reduces the usefullness of links in a data file.
> Second, while the potential to embed binary data into the XML cave
> survey file is certainly possible, it would also represent a very
> "mature" capability of the standard we are struggling to create. I
> would even go so far as to say this capability would be an extenstion
> of the standard, and should not be a fundamental part of the baseline
> standard itself. Furthermore, the inclusion of binary data (based
> upon my argument) would at the same time increase the size of the
> cave survey data file to an impractical level. However, the inclusion
> of a reference to the remote location of that binary representation
> of the original survey data would be a trivial, and practical, step.
Are there not already available, and easy to use, standard methods for
including data of relatively arbitrary type in markup file formats? I
would not be in favor of a whole new standard for describing included
data. That's what mime types are for. You might want a way to describe
how the included data is related to the data in the CaveXML file
(For instance "original notes", "data program x used to produce this
file", or even "Photo taken near station bx34" etc.), but you need that
for a link too. Existing standards should be used to determine the type
of data.
Never build what you can borrow.
> So here's the proposal:
>
> Make available (but not mandatory) in the CaveXML format a <link> or
> <reference> to the <original survey data>. This could contain several
> important elements, to include a <person> and their <contact
> information>, and "hopefully" a URI where the original data could be
> inspected (and downloaded from).
Maybe "hopefully" is too strong a word. I think "occasionally" might be
more accurate. As a rule if you don't have something in your actual
possesion, you are unjustified in believing that it exists.
What would this mean for someone who was trying to maintain an archive
of survey data? They could not just save the files they were given,
because they would miss the referenced data. The archivist would have to
have an automated process to retrieve and store that data as well. The
CaveXML file could not then be saved unaltered, because it contains
references to the (ephemeral) on line data, and not to the archived
versions.
> This solution would create, with minimal effort, a fundamental
> capacity to locate and inspect the original cave survey data.
> Furthermore it would encourage the online and digital preservation of
> cave survey data and avoid the inclusion of "proprietary" constructs
> in the CaveXML standard that would bloat the format and limit its
> usefulness across various software platforms.
>
> Devin Kouts
There is of course a down side to including the original in processed
data. The files could grow exponentially. This can happen when each step
in the process adds its own copy of everything that came before. You can
end up with an astronomical number of copies of the same thing. Having a
big disk drive doesn't help here, a billion gigabyte disk will still end
up full. Having lots of capacity just hides the problem untill it is to
late to fix it.
Some restraint in USING the ability to include files is called for. A
good place to start would be a rule like "never include the same data
more than once in the same file".
Ralph Hartley
-- Devin Kouts Caver Systems Engineer www.psc-cavers.org
This archive was generated by hypermail 2b30 : Wed Feb 14 2001 - 00:03:52 CET