[lextypes] data type thoughts

Tue Jul 22 11:34:03 BST 2003

bob at objfac.com (Bob Foster) writes:
>My general point of view is that XML (atomic) types should be named
>descriptions of sets of strings.

Mine too.

>Intrinsic types, from integer to furniture, but the strings "1" and
>"table" are not intrinsically anything but strings. Nor do type names
>have any intrinsic semantics; they are just names.

That seems like the right starting point for a list named lextypes. :)

>As for metadata, do you have some specific examples in mind?

The PSVI, mostly, but I think similar things are possible with RDF
schema.

>> Validating vs. interpretative - Some people just want a yes/no
>> answer. Is this good data, or not?  Other people want to be able to
>> make statements about the data, and compose or decompose it
>> according to rules about the information.
>
>Is this actually a dichotomy or the beginning of a list of
>type-related services people want. Some want a yes/no answer but many
>more want one accompanied by a meaningful diagnostic. Some want to be
>able to interrogate the types discovered during validation. Some want
>assistance with translating XML types to and from programming language
>types. Etc. There is a list.

It's only a dichotomy if validation is all you want.  I think a lot of
people start there and eventually realize that there are other good
things to do with this machinery.

>Examples please for "statements about the data" and "compose or
>decompose it according to rules about the information"?

That was kind of messy.  Basically I'm thinking about processes where
you might test assertions against the data (does this date/time have a
time zone, no time zone, or 'Z'?) and then break the content into
smaller pieces (day/month/year/hour/minute/etc.) for consistent
comparisons possibly against different lexical representations.

If you'd like me to walk through a US vs. European date (or date/time)
comparison, I can definitely do that.  Comparing "2.2lbs" to "1kg" might
be another case where you test an assertion and decompose the data into
smaller processable parts.

>> Serialization issues - I can't describe this cleanly, though I bounce
>> into all the time.  Some people see XML merely as a container for
>> information described elsewhere, and therefore don't see lexical
>> representation as an issue, provided that it's standardized and
>> understood by the elsewheres.
>
>Type-related translation service? (Some people don't care about XML
>representations, but they should be thankful that somebody does.)

Something like that, yes.

>Just to put a stake in the ground: Each application should be able to
>define its own type systems with no more in common than the RELAX NG
>"text" type.

I agree, but apparently there are other plausible answers.

>Structure in types (base types, subtypes) is good to the extent it is
>rational and useful to an application. A good argument for type
>hierarchies is they save time defining new types. Good arguments
>against type hierarchies is no two people seem to be able to agree on
>one and committees are capable of producing very bad ones. I believe
>the good can be had without the bad by concentrating on tools that
>allow applications to define type hierarchies.

I have a lot of skepticism about the value of type hierarchies.  I think
we might want to look into other mechanisms for identifying
compatibility among types.

>Yes, a lot of good ideas there, but it's good to back off a little and
>try to get a sense of how people see the issues.

I suspect (and hope) that even among this small group there's bound to
be a good deal of diversity.

-- 
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com -- http://monasticxml.org