[lextypes] DLL path context

Wed Jul 23 17:13:35 BST 2003

Paths used in examples are relative to the root element of the parse result
of the type. E.g.,

<when test="timezone != 'Z'">

assumes there is a timezone child of the time parent.

<time>...<timezone>...</timezone>...</time>

A type that is a subtype of another has a different parse result. Can this
result be referred to? If so, how?

A similiar question arises with respect to normalization. Any API that can
see the result due to parsing should also be able to see the result due to
normalization, but these are separate results! The application must be able
to distinguish between them (for example, the normalized result can't be
used for round-tripping).

The example tidily has the components (children) of the normalized result
mirror those of the parsed result, but this is not necessarily the case.
Also, comparison refers implicitly to the normalized result but this is not
necessarily the case, either. There may not be one; there may be children
for which no normalization is necessary.

These comments suggest that the "timezone" referred to in the example is
actually three levels deep in a hierarchy. For the sake of an example, I'll
use the names "result", "parsed" and "normalized" for the parents and invent
a base type "znumber":

<result>
  <parsed>
    <znumber>...</znumber>
    <time>
      ...
      <timezone>...</timezone>
      ...
    </time>
  </parsed>
  <normalized>
    <znumber>...</znumber>
    <time>
      <timezone>...</timezone>
      <hour>...</hour>
      <minute>...</minute>
      <second>...</second>
  </normalized>
</result>

Within normalize, the implicit context would be "/result/parsed" while
within compare it would be "/result/normalized". However, any other value
could be referred to with an absolute path, and it might be made possible to
explicitly specify the context.

------------
Separate comment. Parsing, normalization, etc. is a lot of work to do as a
side effect. The validation provided by a base type is sometimes duplicated
by a subtype (but it can be hard for a program to know this is so). As I
commented previously, it should be possible for an application to take
advantage of DLL declarations without requiring schema validation, and it
should be possible to take advantage of a subset of the declaration, i.e.,
parsing without normalization. To keep the library from imposing unwanted
overheads on an implementation or duplicating work already performed, three
rules should be observed:

- Results should be computed lazily when requested.
- Validation should not be performed unless it is requested by the
application (or schema processor) or unavoidable in the course of producing
a result.
- When a result is computed, e.g., as a side effect of validation, it should
be cached and retained for a reasonable amount of time.

The second rule may seem odd, but validation is not a primary goal in many
applications, while help with parsing strings is almost always welcome.

Bob Foster