CDFO: Raw Data and XML

New Message Reply About this list Date view Thread view Subject view Author view

From: Devin Kouts (devinkouts_at_earthlink.net)
Date: Mon Jan 15 2001 - 23:52:11 CET


Received: (from mdom_at_localhost) by karto.ethz.ch (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id XAA24482 for cavexml-outgoing; Mon, 15 Jan 2001 23:47:52 +0100
Received: from harrier.prod.itd.earthlink.net (harrier.prod.itd.earthlink.net [207.217.121.12]) by karto.ethz.ch (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id XAA24478 for <cavexml_at_cartography.ch>; Mon, 15 Jan 2001 23:47:50 +0100
Received: from earthlink.net (sdn-ar-003varestP063.dialsprint.net [168.191.219.47]) by harrier.prod.itd.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id OAA25461; Mon, 15 Jan 2001 14:47:43 -0800 (PST)
Message-ID: <3A637F1B.4050002@earthlink.net>
Date: Mon, 15 Jan 2001 17:52:11 -0500
From: Devin Kouts <devinkouts_at_earthlink.net>
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20001108 Netscape6/6.0
X-Accept-Language: ko,en
To: John Halleck <John.Halleck_at_utah.edu>
CC: cavexml_at_cartography.ch
Subject: CDFO: Raw Data and XML
References: <Pine.GSO.4.05.10101150940410.17691-100000_at_cor.oz.cc.utah.edu>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-cavexml_at_karto.baug.ethz.ch
Precedence: bulk

I can understand the desire to create an ability to cross reference data
stored in XML with the original survey data. But in this proposal
there's no guarantee that the data you're cross referencing to, in
Survex (or Compass or Onstation, etc.) format, does not possess
transcription errors of its own. In fact it's not uncommon to see
transcription errors occur in-cave, as the numbers are written into the
book.

In order to serve the original "need", i.e. some method of comparing the
XML represented data to original data for quality assurance, I submit
the following argument and proposal.

First, handwritten data must be stored electronically to ease its
"online" use and reuse. Because of the graphic nature of cave survey
data, and the limitations of today's survey data collection
technologies, the only real solution to making data available online and
"complete" is the creation of a digital image of the original survey
notes (e.g. scan or photographic). See an example at:
http://www.psc-cavers.org/kouts/RAS/RASNotesIndex.html

Second, while the potential to embed binary data into the XML cave
survey file is certainly possible, it would also represent a very
"mature" capability of the standard we are struggling to create. I would
even go so far as to say this capability would be an extenstion of the
standard, and should not be a fundamental part of the baseline standard
itself. Furthermore, the inclusion of binary data (based upon my
argument) would at the same time increase the size of the cave survey
data file to an impractical level. However, the inclusion of a reference
to the remote location of that binary representation of the original
survey data would be a trivial, and practical, step.

So here's the proposal:

Make available (but not mandatory) in the CaveXML format a <link> or
<reference> to the <original survey data>. This could contain several
important elements, to include a <person> and their <contact
information>, and "hopefully" a URI where the original data could be
inspected (and downloaded from).

This solution would create, with minimal effort, a fundamental capacity
to locate and inspect the original cave survey data. Furthermore it
would encourage the online and digital preservation of cave survey data
and avoid the inclusion of "proprietary" constructs in the CaveXML
standard that would bloat the format and limit its usefulness across
various software platforms.

Devin Kouts

John Halleck wrote:

> The original list some how got left off of this reply...
> But since the discussion has moved here, I'm forwarding
> the message here.
>
> ---------- Forwarded message ----------
> Date: Fri, 12 Jan 2001 14:34:53 -0700 (MST)
> From: John Halleck <nahaj_at_u.cc.utah.edu>
> To: Garry Petrie <gp_at_europa.com>
> Cc: John Halleck <John.Halleck_at_utah.edu>
> Subject: Re: Images in the data file
>
> On Fri, 12 Jan 2001, Garry Petrie wrote:
>
>> Date: Fri, 12 Jan 2001 13:10:16 -0800
>> From: Garry Petrie <gp_at_europa.com>
>> To: John Halleck <John.Halleck_at_utah.edu>
>> Subject: Re: Images in the data file
>>
>> John Halleck wrote:
>>
>>> Obviously, there is something that takes the text file and adds the text
>>> to the "real" file, marked as originals, and marked as needing processing.
>>> (Just as something has to put those images in the file, appropriately marked.)
>>>
>>>> People say XML is good because it is human
>>>> readable. That is bunk, those bits on your hard disk are not human readable.
>>>
>> What I thought was especially absurd was the suggestion to include SURVEX data files in
>> the XML, like that was some sort of golden standard.
>
>
> I don't see why there shouldn't be a way to do it.
> I won't speak to the need.
>
>> The fact is, the transcription of the
>> survey notes to even a text file, aside from the formatting issues (tab or no tab
>> characters, unix or msdos newline character(s), fixed columns or field widths), is one of
>> the biggest sources to blunders.
>
>
> True enough.
> Having the two together means that a tool can display them side by side to
> aid proofreading. That might be usefull.
>
> But, having a tag for it, that I will probably not use, does me no real
> harm, and might encourge someone to actually write something that uses it.
>
>> Last summer, I spent three months cateloging Lechuguilla
>> survey notes, then comparing two independantly transcribed sets of data. I found around 2
>> to 3% transcription error rate, number reversals, mixing data from two lines, comfusion
>> over the meaning of a backsight, etc.
>
>
> That's a lower rate than we saw on LBCC. (Counting things that were corrected
> after a proofread.) But then I typed it in, and I'm sloppy.
>
>> Go directly to XML, if that is what me are going to adopt.
>
>
> Something is still going to be typed it somewhere. I think it should
> be marked as what it is, and whatever parses it should leave it alone
> and produce whatever it does appropriately marked up. Then, at least,
> I can see what stage in the process a given error was introduced.
> (Not so much a problem with a mature project like Lechuguilla, but
> I think it important in an environment with lots of new tools.)
>
>> If CaveXML doesn't have everything needed to record your data, then we have a
>> serious problem with the developing standard.
>
>
> I think you and I see the suggestion differently. I see it as just another way
> to include documentation of what happened. "Here is what we started with,
> here is what was produced from it, here is what we had then."
> It doesn't replace anything else, it is just something added.
>
>>>> I do trust technology enough to preserve digital images.
>>>> Scan your notes and place the files on a CD not stored at your house.
>>>
>>> And, even though I don't plan to use them, it would be nice to have a way to
>>> put the images in the file directly.
>>
>> I don't think you want to embed image files as some sort of uuencoded format in CaveXML.
>
>
> I think this is a point where I actually disagree with you. If the data is kept
> together, I think that there is more chance of it ALL being kept around years later.
> If it is in seperate files, people will not save the files that are not of interest
> to them.
>
>> Hard disks are so huge and cheap, I store all sorts of images now. I have over 500 images
>> from my new digital camera. I recently installed a 40GB drive in my wife's computer for
>> only $160!
>
>
> My hard disk is only 40 MEGABYTES...
>
>> Garry
>
>
>
>
>
>
>
>


New Message Reply About this list Date view Thread view Subject view Author view

This archive was generated by hypermail 2b30 : Wed Feb 14 2001 - 00:03:51 CET