[lextypes] Type systems for regular models

Mon Jul 21 03:39:15 BST 2003

I just posted this to the relax-ng-comments list. It's not really a response
to John Cowan's recent note about type lattices, but shares some thoughts on
the same topic.

Bob
---------------------
Lattice, I don't know, but I have been thinking that RELAX NG types should
be treated as sets. Perhaps this is obvious, but I haven't seen it, so I'll
ramble on for a bit.

The motivation for this is to construct a type system that isn't ashamed of
non-determinism, but respects the regular model from which types must be
derived. Since union is a commutative operator, the quasi-ambiguity employed
by W3 XML Schema for its union types and, as far as I can tell, extended to
the entire type system by XQuery/XPath 2.0, isn't a satisfactory answer.
Instead of founding the type system on a choice of a single value from a
set, which will be as good as random to users when schemas are combined by
union, and requiring the validator to maintain declaration order in
inherently unordered patterns, it makes more sense to keep the set and deal
with the consequences.

In this note, I'm going to stick with "atomic" types, i.e., types of
text-only content, because they are simpler and RELAX NG doesn't have a
concept of named types for non-text content. For example, given:

element e { xsd:integer | xsd:boolean }

and the input <e>1</e>, which matches both xsd:integer and xsd:boolean, the
type of e is the set,

{xsd:integer, xsd:boolean}

which, it should go without saying, is unordered and contains no duplicates.
It is also reasonable to view these types as patterns with (only) choice
operators, as long as the set properties are honored by the implementation.
(The pattern view may be easier to extend to hedge types, but I'm not going
to go there.)

We can define the subtype (<=) relation as follows: A type is a subtype of a
base type if every valid value of the subtype is a valid value of the base.
If two types are subtypes of each other, they are equivalent. Then trivially
every member type of a set type, is a subtype of the set type. Given two set
types, one is a subtype of the other if every member of the one is a subtype
of the other. Every atomic type is a subtype of {text}.

Assuming there were an application API to receive it, the (atomic) type of
each element and attribute in an input stream (that has an atomic type)can
be constructed during validation by forming the union of the types of each
successfully matched pattern. This seems a straightforward byproduct of the
derivative with respect to text. I don't know about other implementations.

Since the type decorations of nodes would be sets, an API like that of SAX2
for attributes seems inappropriate, and not just because string values are
inappropriate. Instead, if the application is written to have different
behaviors depending on the type of the input, it would be more useful if the
API allowed the application to ask if a particular type is a subtype of the
node type. For example, given the pattern:

element color { "red"|"green"|"blue"| list { part, part, part }}
...
part = xsd:int { minValue="0" maxValue="255" }

the application could ask if the input is a token and if so do a table
lookup, otherwise if it is a list consisting of three xsd:int values,
construct a color. The more rare application that needs to know the entire
set of valid types could be provided a list of names.

Note that in this interpretation, a list type is a sequence of sets. I know
this doesn't match well the XQuery/XPath 2.0 data model, but then, it's hard
to find a sensible interpretation that does.

Bob Foster