[lextypes] Patterns in DLL

Bob Foster bob at objfac.com
Wed Jul 23 14:53:28 BST 2003


Jeni said she'd be happy to get comments on this list, so here is one. (Jeni
you're welcome to copy any previous comments to the list.)

1. A subtype may partition a string differently than a base type. What is
the parse result? E.g.,

<datatype name="nameToken">
  <parse>
    <group name="first">
      <initialNameChar/>
    </group>
    <group name="rest">
      <zeroOrMore>
        <nameChar/>
      </zeroOrMore>
    </group>
  </parse>
</datatype>

<datatype name="numberedName">
  <supertype name="nameToken"/>
  <parse>
    <group name="word">
      <initialNameChar/>
      <zeroOrMore>
        <charGroup>
          <nameChar/>
          <except>
            <digit/>
          </except>
        </charGroup
      </zeroOrMore>
    </group>
    <group name="num">
      <oneOrMore>
        <digit/>
      </oneOrMore>
    </group>
  </parse>
</datatype>

Given the value "abc123" it has two result sets:

<nameToken><initial>a</initial><rest>bc123</rest></nameToken>

<numberedName><word>abc</word><num>123</num><numberedName>

Since there is (in general and in this case) no way to merge the results,
should I assume that parse results are specific to a datatype?

(If yes, this implies to me that there is an api that can interrogate the
decorated-by-means-as-yet-unspecified node to obtain a specific datatype
result, or all of the datatype results. That is, if there is no api, what is
the point? If there is one, it must offer one or both of these services.)

2. Just a nit, but I don't see that it works to have whitespace in the parse
results that doesn't appear in the string. A type is not required to
entirely partition the value into named regions and leading/trailing
whitespace may not be stripped. That's why I wrote the results above as I
did.

3. I asked previously about ambiguous partitions (resulting from <choice>).
XML Schema does not have this problem, as it cannot identify regions of a
match. XPath 2.0 has the issue, but I don't see that it is addressed in
Functions and Operators. Java regex (somewhat beside the point, but it's an
example) resolve this by producing as results the last matched value of the
region. Whatever the resolution, some rule is needed unless explicit
alternatives (which can expand exponentially) are to appear in the results.

Bob Foster




More information about the lextypes mailing list