[lextypes] data type thoughts

Simon St.Laurent simonstl at simonstl.com
Mon Jul 21 22:00:25 BST 2003


These are just a few of my meanderings; maybe they'll be a useful place
to start, maybe they won't.  You're all quite welcome to tear them apart
or replace them with your own thoughts.

Data typing seems to be a field with an enormous diversity of opinions.
The same sets of issues crop up in databases, XML, and programming
languages, and are solved differently by different groups even within
those categories.

There are a few axes of opinion I keep encountering (none are simple
binaries in practice):

Intrinsic vs. Applied - Some people seem to think that data types come
from the information itself, while others see data types as metadata
applied to information, something that can be changed or discarded at
will.  Some people see the types as intrinsic but want the metadata made
explicit during processing.

Validating vs. interpretative - Some people just want a yes/no answer.
Is this good data, or not?  Other people want to be able to make
statements about the data, and compose or decompose it according to
rules about the information.

Serialization issues - I can't describe this cleanly, though I bounce
into all the time.  Some people see XML merely as a container for
information described elsewhere, and therefore don't see lexical
representation as an issue, provided that it's standardized and
understood by the elsewheres.

Ad hoc vs. structured - This axis has a lot of different dimensions, as
there can be an ad hoc foundation on which structures are built (kind of
like W3C XML Schema Part 2 with its many primitives plus rules for
extension and restriction). Other people would like to start from
strings - text being at the foundation of XML content - and build up
from there. 

I've argued for a while that types are applied ('painted' is my favorite
description), that they can and should be interpreted, and that valuing
lexical representations is a good way to build flexibility (even human
flexibility) into otherwise brittle systems.  I'd also like to see a set
of rules for describing and processing types which is built solely on
the textual (or conceivably markup, though that's risky) content of an
element and defines a pathway to a value space or spaces.

I'm aware that this is considerably less concrete than the proposals I
mentioned in the previous message, but I'd like to get a sense of what
people are looking for before mixing and pouring materials which set
permanently.



-- 
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com -- http://monasticxml.org



More information about the lextypes mailing list