[lextypes] A precis of my comments to Jeni on DTL and her reply

Tue Jul 29 13:28:50 BST 2003

John wrote:
> 1) I expressed my view that XML-style regexes are just too much of a
> notation change (too verbose, too unfamiliar) from classical string
> regexes for too little gain (despite being the one who spec'd them
> for RNG 2.x, based on Olin Shivers's Scheme regex library).
> References to named sub-regexes could be achieved with Perlish
> ${foo} syntax or something novel like \R{foo}. Jeni said that the
> advantage of the XML-style regexes was the ability to annotate them
> with attributes, especially the locally-scoped named subgroups --
> she didn't see how to do those with string regexes.

Thinking about it, it would be easy enough to add something like the
following syntax:

  \r{definition}   equivalent to <ref name="definition" />
  \d{datatype}     equivalent to <data type="datatype" />
  \d{datatype=R}   equivalent to <group name="datatype">R</group>
                     where R is a regular expression

(I'm using this format because it's similar to \p{...} in the XML
Schema regex syntax.)

Examples are, instead of:

<define name="Digit">
  <charGroup>
    <range from="0" to="9" />
  </charGroup>
</define>

<define name="Digits">
  <oneOrMore>
    <ref name="Digit" />
  </oneOrMore>
</define>

<datatype name="decimal">
  <parse>
    <optional>
      <choice>
        <string>+</string>
        <string>-</string>
      </choice>
    </optional>
    <ref name="Digits" />
    <optional>
      <string>.</string>
      <ref name="Digits" />
    </optional>
  </parse>
</datatype>

use:

<define name="Digit">[0-9]</define>
<define name="Digits">\r{Digit}+</define>

<datatype name="decimal">
  <parse>(+|-)?\r{Digits}(.\r{Digits})?</parse>
</datatype>

Instead of:

<datatype name="decimal">
  <parse>
    <group name="sign">
      <optional>
        <choice>
          <string>+</string>
          <string>-</string>
        </choice>
      </optional>
    </group>
    <group name="whole-part">
      <ref name="Digits" />
    </group>
    <optional>
      <string>.</string>
      <group name="fraction-part">
        <ref name="Digits" />
      </group>
    </optional>
  </parse>
</datatype>

use:

<datatype name="decimal">
  <parse>\d{sign=(+|-)?}\d{whole-part=\r{Digits}}(.\d{fraction-part=\r{Digits}})?</parse>
</datatype>

And instead of:

<datatype name="date">
  <parse>
    <data type="year" />
    <string>-</string>
    <data type="month" />
    <string>-</string>
    <data type="day" />
  </parse>
</datatype>

use:

<datatype name="date">
  <parse>\d{year}-\d{month}-\d{day}</parse>
</datatype>

I couldn't use XSLT to do a proof-of-concept implementation any more,
but that's a small price to pay for concision. It would also
necessitate dropping the 'ignore' attribute, but I think that's
probably a good thing anyway.

I got other comments about the requirement for both <ref> and <data>
as mechanisms for building up complex regular expressions. They do
serve distinct purposes, but I'm still pondering over whether there's
a way of combining them...

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/