[lextypes] A precis of my comments to Jeni on DTL and her reply
Jeni Tennison
jeni at jenitennison.com
Tue Jul 29 13:28:50 BST 2003
John wrote:
> 1) I expressed my view that XML-style regexes are just too much of a
> notation change (too verbose, too unfamiliar) from classical string
> regexes for too little gain (despite being the one who spec'd them
> for RNG 2.x, based on Olin Shivers's Scheme regex library).
> References to named sub-regexes could be achieved with Perlish
> ${foo} syntax or something novel like \R{foo}. Jeni said that the
> advantage of the XML-style regexes was the ability to annotate them
> with attributes, especially the locally-scoped named subgroups --
> she didn't see how to do those with string regexes.
Thinking about it, it would be easy enough to add something like the
following syntax:
\r{definition} equivalent to <ref name="definition" />
\d{datatype} equivalent to <data type="datatype" />
\d{datatype=R} equivalent to <group name="datatype">R</group>
where R is a regular expression
(I'm using this format because it's similar to \p{...} in the XML
Schema regex syntax.)
Examples are, instead of:
<define name="Digit">
<charGroup>
<range from="0" to="9" />
</charGroup>
</define>
<define name="Digits">
<oneOrMore>
<ref name="Digit" />
</oneOrMore>
</define>
<datatype name="decimal">
<parse>
<optional>
<choice>
<string>+</string>
<string>-</string>
</choice>
</optional>
<ref name="Digits" />
<optional>
<string>.</string>
<ref name="Digits" />
</optional>
</parse>
</datatype>
use:
<define name="Digit">[0-9]</define>
<define name="Digits">\r{Digit}+</define>
<datatype name="decimal">
<parse>(+|-)?\r{Digits}(.\r{Digits})?</parse>
</datatype>
Instead of:
<datatype name="decimal">
<parse>
<group name="sign">
<optional>
<choice>
<string>+</string>
<string>-</string>
</choice>
</optional>
</group>
<group name="whole-part">
<ref name="Digits" />
</group>
<optional>
<string>.</string>
<group name="fraction-part">
<ref name="Digits" />
</group>
</optional>
</parse>
</datatype>
use:
<datatype name="decimal">
<parse>\d{sign=(+|-)?}\d{whole-part=\r{Digits}}(.\d{fraction-part=\r{Digits}})?</parse>
</datatype>
And instead of:
<datatype name="date">
<parse>
<data type="year" />
<string>-</string>
<data type="month" />
<string>-</string>
<data type="day" />
</parse>
</datatype>
use:
<datatype name="date">
<parse>\d{year}-\d{month}-\d{day}</parse>
</datatype>
I couldn't use XSLT to do a proof-of-concept implementation any more,
but that's a small price to pay for concision. It would also
necessitate dropping the 'ignore' attribute, but I think that's
probably a good thing anyway.
I got other comments about the requirement for both <ref> and <data>
as mechanisms for building up complex regular expressions. They do
serve distinct purposes, but I'm still pondering over whether there's
a way of combining them...
Cheers,
Jeni
---
Jeni Tennison
http://www.jenitennison.com/
More information about the lextypes
mailing list