Re: PCDATA vs. CDATA

New Message Reply About this list Date view Thread view Subject view Author view

From: martinl_at_talk21.com
Date: Thu Jan 25 2001 - 15:56:43 CET


Received: (from mdom_at_localhost) by karto.ethz.ch (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id PAA16732 for cavexml-outgoing; Thu, 25 Jan 2001 15:54:52 +0100
Received: from t21mta01-app.talk21.com (mta01.talk21.com [62.172.192.171]) by karto.ethz.ch (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id PAA16718 for <cavexml_at_cartography.ch>; Thu, 25 Jan 2001 15:54:50 +0100
From: martinl_at_talk21.com
Received: from t21mtaV-lrs ([10.216.84.10]) by t21mta01-app.talk21.com (InterMail vM.4.01.02.27 201-229-119-110) with SMTP id <20010125145302.PMJJ20154.t21mta01-app.talk21.com_at_t21mtaV-lrs> for <cavexml_at_cartography.ch>; Thu, 25 Jan 2001 14:53:02 +0000
X-Mailer: talk21 v1.17 - http://www.talk21.com
To: cavexml_at_cartography.ch
X-Talk21Ref: none
Date: Thu, 25 Jan 2001 14:56:43 GMT
Subject: Re: PCDATA vs. CDATA
Message-Id: <20010125145302.PMJJ20154.t21mta01-app.talk21.com@t21mtaV-lrs>
Sender: owner-cavexml_at_karto.baug.ethz.ch
Precedence: bulk
Reply-To: cavexml_at_cartography.ch


Devin wrote:
>"...But, it does seem almost certain that even if the >value is available in memory, indexing into a specific >point in the data would be impossible (e.g. the third
>character of a six character SurveyName), because it >wasn't parsed. If this is correct then it limits the use >of Xpointers by blocking access to specific pieces of >data."
CDATA is not parsed, but is accessible from the DOM (if a DOM is being used - it needn't be, see below) as text so you could index into it. You could also use Xpointer because that is just a specification at the moment with no standard implementations; at the moment you're almost bound to have to write your own code anyway.

>So here's the question: Should we not be declaring the >values of Elements and Attributes to be PCDATA to the >greatest degree possible, making data as completely >accessable through the DOM as it can be made?
>And at the same time reserve the use of CDATA for those >types of data that we would like the parser to ignore e.g. >hyperlinks or java code encapsulated in pairs of tags, >etc.).
I think the answer is yes, limit the use of CDATA sections to stuff like scanned images, maybe photos, possibly exotic automatically collected data logged in some binary format ..

Better than, I suspect, thinking DTD I suggest we all start looking at how XML Schemas, maybe in combination with an alternative and complementary approach to validity checking known as the Schematron, can help us define what we define as the interchange standard, what bits of that are required by this (debatably fascist) concept of standardisation and what extensions are allowable for enthusiasts (debatably fanatics).

PS.

As some terms are beginning to crop up as though they are essential when they are not, I think some notes on XML Parsing may be useful. When corrected and clarified they might form a basis for what will appear on the website. As I understand things at the moment:

Parsing is the reading of input data (for convenience you may think of a file, but most parsers accept more general concepts so that the data could be coming in as a more
or less intermittent stream of characters - maybe typed into a keyboard, converted from speech, input by Optical Character Recognition from scanned text, generated by another program ...) and the recognition of the units of XML: Elements, Attributes, Text, Processing Instructions, Comments, and concepts like CDATA, namespaces, entities, etc.. On being recognised an XMLParser can act on the units in one of two standard ways:
1) it may use them to build a Document Object Model (DOM) hierachical, tree structure in memory. When the DOM has been competely built, the program can manipulate the parts of this structure using the Application Programming Interface (API) - inserting, modifying, deleting individual units or tree 'fragments', and may ultimately write out a new XML file (although the XML specification does not nsist that a parser should be able to create output files). So, a DOM Parser creates a memory resident model of the input; requires the input to be a complete, well-formed document before it can be manipulated; and can usually create an XML file for output (as well as any other output you wanted,
typically html web pages, svg images, pdf documents..).

2) it may delegate processing for each unit it finds to predefined handler code as it finds it. This event-driven model is known as the Simple API for XML (SAX). I think the word 'simple' is a bit misleading as the processing is not inhrerently any simpler than than the DOM approach; it's not inherently harder either! SAX working does not require the whole of the data to be available before any processing can be done and does not require the whole document to fit in memory; on the other hand, a SAX API by itself does not allow you to write out an XML document (a program can easily write out the units for itself though, so don't depair, all is not lost).

>From the XERCES API (Application Programming Interface) documentation: "XML provides the CDATA markup to allow a region of text in which most of the XML delimiter recognition does not take place. This is intended to ease the task of quoting XML fragments and other programmatic
information in a document's text without needing to escape these special characters. It's primarily a convenience feature for those who are hand-editing XML."

So, CDATA will be parsed by XML parser, but only so far as recognising it as a unit with content that is not parsed extending from the [[ to the ]] symbols; that content is accessible and treatable in the program as text.

XPointers are a way of identifying a particular place in an XML document using a version of the XPath syntax, the document itself being identified by the XLink protocol. I think most of the examples so far have really been seeking to use XLink to point at documents or at units of an
XML document which have an ID element....ID's have their problems as Mike Lake found ...

--------------------
talk21 your FREE portable and private address on the net at http://www.talk21.com


New Message Reply About this list Date view Thread view Subject view Author view

This archive was generated by hypermail 2b30 : Wed Feb 14 2001 - 00:03:53 CET