Re: ID specification?

From: Peter MATTHEWS (matthews@melbpc.org.au)
Date: Thu Jul 05 2001 - 13:32:43 CEST


Hi, I have to apologise for my non-contribution so far - have had too much
on my plate this year. I've only just now managed to skim the vast amount
of solid input everyone has posted, and have a few new points to
contribute, but (blast it!) I'm going away for 2 weeks tomorrow! Very
frustrating. But anyway, here's a few prelim rushed words following on from
Richard's posting. I'll send the rest when I return (entities, fields,
ERDs, survey types).

At 17:23 18-05-2001 -0400, Richard Knapp wrote:
>With all the different IDs floating around in the DTDs and Schemas, are
there any ideas on helping to distinguish which is
>which? For all the DTD cares, I could specify a Survey ID as a Station
IDREF.
>
>Would it be helpful to put a naming convention on the IDs? What I am
proposing is this: adding a specific prefix to an ID to
>help distinguish its intended usage. The draft DTD on which I'm working
has Station and Survey IDs. (Ralph's has more:
>Shots, Provenance elements, etc so he will need more prefixes). Here's
what I have so far:
>
> Element Prefix Description
> Station N _N_ode
> Shot E _E_dge
> Survey S _S_urvey
>
>This also helps address a concern on generating IDs. With a preset Prefix
for all IDs, the rest could be easily generated
>from an incremental number, the station name, or ? There is nothing to
preclude this list from expanding to add IDs for
>Cave, System, or other elements. While these are only single character
prefixes, that should not be considered a limit. The
>prefixes could also be "ST" for Station.
>
>Comments?
>
> - Richard Knapp

I guess there's really two types of IDs to consider - one is the "name" of
the instance of the entity we are dealing with, such as the name of a
particular survey, and the other is its internal "record ID" which
identifies it in a relational database. The former is like a public
common-usage name, and the latter is the internal computer name. Both have
different criteria to fulfil, so it's best not to try to use the same ID
for both.

Another type of example would be a cave name - cave names are what everyone
knows the cave by, it may not be unique, and it may change over time, but
in a database the cave is identified by an internal record ID which is
unique, and should (must?) never change, because apart from identifying the
particular record in the database, it is also used to link between the
various physical tables in the database. If it ever changes you can be in
big trouble down the track, especially with linked tables and data which
has been distributed to several sites and so not accessible to a common
control. The point here is that you would not use the common ID, i.e. the
cave name, as the record ID in the database - it's too long, too variable
in format, and is subject to change.

Database "wisdom" says that record IDs (keys) should not contain any data,
but just serve to uniquely identify each record, i.e. a simple integer is
the ideal. However because we want to be able to move our data around
between separate sites without confusing/mixing/overwriting records from
different sources, and we do *not* want to have a central control for
issuing unique record IDs, UISIC has proposed a scheme for a minimal format
record ID for use in cave-related databases where each site can issue its
own IDs without risk of duplicating any one else's. You can see it at:

http://rubens.its.unimelb.edu.au/~pgm/uisic/exchange/exchprop.html

This format does not include a component which identifies the entity-type
to which the record belongs, e.g. a survey record ID would look the same as
a shot ID. Rather than complicate the ID by including a component which
identifies the entity-type, I have found that it is not necessary, as you
can identify the entity-type from its context. This scheme has been
successfully used in the Australian national cave database which includes a
range of entities (caves, maps, refs, people, organisations,...).

Also, I guess I should really be calling this record ID an "instance ID",
because particularly when using XML the idea of a "record" becomes rather
vague. Instance ID would cover both situations - xml and a "conventional"
database.

Peter



This archive was generated by hypermail 2b30 : Mon Sep 03 2001 - 06:00:00 CEST