PCDATA vs. CDATA

New Message Reply About this list Date view Thread view Subject view Author view

From: Devin Kouts (devinkouts_at_earthlink.net)
Date: Thu Jan 25 2001 - 04:58:23 CET


Received: (from mdom_at_localhost) by karto.ethz.ch (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id EAA12013 for cavexml-outgoing; Thu, 25 Jan 2001 04:53:21 +0100
Received: from swan.prod.itd.earthlink.net (swan.prod.itd.earthlink.net [207.217.120.123]) by karto.ethz.ch (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) with ESMTP id EAA12009 for <cavexml_at_cartography.ch>; Thu, 25 Jan 2001 04:53:15 +0100
Received: from earthlink.net (sdn-ar-004varestP326.dialsprint.net [168.191.217.232]) by swan.prod.itd.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id TAA15141 for <cavexml_at_cartography.ch>; Wed, 24 Jan 2001 19:53:10 -0800 (PST)
Message-ID: <3A6FA45F.3030802@earthlink.net>
Date: Wed, 24 Jan 2001 22:58:23 -0500
From: Devin Kouts <devinkouts_at_earthlink.net>
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20001108 Netscape6/6.0
X-Accept-Language: ko,en
To: cavexml_at_cartography.ch
Subject: PCDATA vs. CDATA
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-cavexml_at_karto.baug.ethz.ch
Precedence: bulk
Reply-To: cavexml_at_cartography.ch

Here's a question. In my XML readings I haven't found a clear indication
of how best to use CDATA vs. PCDATA. I understand that labelling an
Element or Attribute's value as CDATA (Character Data) will cause the
parser to ignore that data and not parse it. This is useful when the
data contains characters that could confuse the parser (html code for
example). But does this mean that particular data is not accessable in
memory? Doesn't seem like it would work that way. But, it does seem
almost certain that even if the value is available in memory, indexing
into a specific point in the data would be impossible (e.g. the third
character of a six character SurveyName), because it wasn't parsed. If
this is correct then it limits the use of Xpointers by blocking access
to specific pieces of data.

On the other hand PCDATA (Parseable Character Data) will be read and
parsed into the DOM (Document Object Model) structure held in memory.
As fully parsed data every part of an Element or Attribute's value would
be avialable to things like Xpointer or other methods() of access that
might be created by a software developer.

So here's the question: Should we not be declaring the values of
Elements and Attributes to be PCDATA to the greatest degree possible,
making data as completely accessable through the DOM as it can be made?
And at the same time reserve the use of CDATA for those types of data
that we would like the parser to ignore (e.g. hyperlinks or java code
encapsulated in pairs of tags, etc.).

Opinions please...

Devin Kouts


New Message Reply About this list Date view Thread view Subject view Author view

This archive was generated by hypermail 2b30 : Wed Feb 14 2001 - 00:03:53 CET