[rdfweb-dev] XSLTs for FOAF, Spring v1.3.1 and plans for FOAFspec improvements

Thu Jul 24 18:16:27 UTC 2003

* Julian Bond <julian_bond at voidstar.com> [2003-07-24 15:11+0100]
> James Carlyle <james.carlyle at takepart.com> wrote:
> >I think that for publishers should not be constrained by the serialisation
> >format of their RDF in XML - if they are, then someone could start to argue
> >that FOAF might just as well be a plain XML language.
> 
> FOAF might just as well be a plain XML language. Oh, you want
> justification too? Sorry the margin's not big enough. ;-)

<grin/>

As Jim said, FOAF on its own, could be just a plain XML tagset. As could
Dublin Core on its own, or Wordnet, or RSS, or MusicBrainz. Each on
their own. A big motivation for doing FOAF within the RDF approach was to 
allow for all  these things to be mixed together. There is so much to be said 
about people and the things they create and do... More than any single markup 
language can hope to address. So FOAF didn't try to win that game.
Instead, we adopted a generic file format that allows _any_ RDF
vocabularies to be combined. 

If you really care about music, you can mix your FOAF with MusicBrainz.
If you really care care about bibliography (favourite books, or stuff
you write), mix it with Dublin Core.
If you want to describe your photo collection, mix in the 50k noun-terms 
from the RDF-wordnet vocabulary I made.
If you want to describe where any of this stuff is, you can plug in the 
basic Geo/mapping vocab we made at http://www.w3.org/2003/01/geo/
If you want to describe events, Libby and co have been working on
iCal-in-RDF... And so on. Each now RDF vocab is instantly deployable
within FOAF without us having a committee meeting to approve it.
Thankfully.

The big strength is mixing. FOAF without the ability to mix in other
data wouldn't be FOAF any more, since we'd have sacrificed our core 
pluralism.

Justification for doing this as RDF rather XML, in one line?

  /there's no algorithm for merging 2 random XML docs (but there is for RDF/XML)/

That's the crux of it really. A big chunk of technology that supports
FOAF, including the harvesters, databases / aggregators etc. is utterly
generic, and has nothing to do with the FOAF vocab at all. That's also
why the tech site for FOAF is called RDFWeb... This was carefully built
on generic lines so we can re-apply to related problem domains without
having to re-engineer the aggregator side of things. Being a FOAF
scutter (indexing tool eg. see Matt Biddulph's writeup at 
http://www.hackdiary.com/archives/000021.html) is something that has
nothing to do with FOAF vocabulary specifics. And it's all based on 
algorithms for merging completely arbitrary collections of RDF data, 
in a way that simple can't be done for XML.

When someone comes up with such an algorithm for XML documents, we can
re-open the issue.

In the meantime, it would be healthy to spend the time and effort on 
ways of making FOAF (and RDF generally) easier to deal with for people 
coming at it from an XML developers perspective. Maybe this means 
working to bring RDF APIs to the maturity levels people find in SAX, DOM
etc., or working to help explain the conceptual differences between the
RDF and the XML approaches. 

Either way, that's a big part of what my day job is about, so I'm hoping
to put some real time towards such efforts over the problem. Not for
FOAF specifically, although FOAF does crystalise many of the issues
quite usefully... 

> James Carlyle <james.carlyle at takepart.com> wrote:
> >Or
> >canonicalisation in the context of FOAF might mean shaping FOAF specific
> >statements (i.e. knows, Person etc, with a defined and consistent nesting
> >structure) and removing other non-FOAF statements.
> 
> I think RSS 2.0 can teach us something here. It should be possible to
> use namespaces to allow extendability and to allow intermixed RDF while
> still maintaining a consistent structure of FOAF elements that are
> parsable by plain XML parsers.
> 
> To take one example I've seen recently.
> <foaf:Person rdf:ID="1">
>   <foo:acquaitanceOf>
>     <foaf:Person rdf:ID="2">

This is perfectly fine RDF, but it doesn't tell you explicitly 
that 'acquaintanceOf' as a relationship between two people 
implies a 'knows' relationship as well. We could put that in the 
schema that defined foo:acquaintanceOf though.

It is just one triple: foo:acquaintanceOf rdfs:subPropertyOf foaf:knows

> This could just as easily have been coded
> <foaf:Person rdf:ID="1">
>   <foaf:knows>
>     <foaf:Person rdf:ID="2">
>     <foo:relationship>Acquaintance</foo:relationship>
> or
> <foaf:Person rdf:ID="1">
>   <foaf:knows foo:relationship="Acquaintance">
>     <foaf:Person rdf:ID="2">

The RDF statements that this corresponds to are (I strongly suspect) not 
what you're wanting to say. From RDF's point of view, your additions 
are showing up as properties of the 2nd person, not as indications of
the kind of relationship between the two people.

> The second two extend FOAF but are still easily parsable by a FOAF
> specific XML parser. 

OK, taking that as a goal (although the merits of writing a FOAF
specific parser versus just using an RDF parser might be debated
further - see below)....

Here is some markup (note I'm using nodeID, which is a more modern RDF
form, ignore that detail for now) which I think does what you want:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	xmlns:eg="http://example.org/foovocab#"
	xmlns:foaf="http://xmlns.com/foaf/0.1/">

<foaf:Person rdf:nodeID="p1">
    <foaf:name>Alice</foaf:name>
    <foaf:knows>
      <foaf:Person rdf:nodeID="p2">
        <foaf:name>Bob</foaf:name>
        <eg:secretelyStalking rdf:nodeID="p1"/>
      </foaf:Person>
    </foaf:knows>
    <eg:acquaintance rdf:nodeID="p2"/>
    <eg:archNemesis rdf:nodeID="p2"/>
</foaf:Person>

</rdf:RDF>

If you paste that into the RDF validator at
http://www.w3.org/RDF/Validator and ask for 'triples and graph', you
should see a graph with nodes standing for the two people, and a variety
of named relationships connecting them.

I think this might answer some of Marc Canter's technical questions too.
Basically, you want to express more relationships all at once than can
be easily expresssed with XML hierarchical containment, so we have to do 
it explicitly in RDF syntax instead.

> 		It may miss the new information but it will still
> pick up the 1->2 relationship. The first is going to hide person 2
> behind an unknown tag.

XML and extensibility have a troubled relationship, yes. 

> The only downside I can see to having an agreed hierarchy is that if a
> particular person wants to use foaf elements in an RDF file in a way
> that doesn't follow the hierarchy then they can. But they do it in the
> knowledge that foaf specific parsers might barf on the file and
> misunderstand what they were conveying.

Anything that calls itself a FOAF parser should not be prone to
misunderstandings of this kind. Just as regex-based hackparsers caused 
problems for RSS, we need to be careful to use real tools for the job.
Since this is personal data, often being published in the public
internet, people parsing it with quick-hack tools need to be
appropriately cautious. If in doubt, it is better for a tool to say that
it can't understand some/all of the data than to guess and
mis-characterise the data. So I am concerned to read 'they do it in the
knowledge that foaf specific parsers might ... misunderstand'. People
writing perfectly good RDF shouldn't be held accountable for the errors
of software that isn't capable of reading those files. 

It is perfectly proper for me to write the following in a FOAF RDF file,
expecting it to be merged with other data sources. If XML-based tools
don't understand what I've written, though should generate nothing or
through an exception or something. Anyway, sample markup:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	 xmlns:wordnet="http://xmlns.com/wordnet/1.6/"
	 xmlns:eg="http://example.org/foovocab#"
	 xmlns:foaf="http://xmlns.com/foaf/0.1/">

<foaf:Person>
  <foaf:homepage rdf:resource="http://rdfweb.org/people/danbri/"/>
  <eg:collaborator>
    <wordnet:Programmer>
       <foaf:weblog rdf:resource="http://usefulinc.com/edd/blog"/>
    </wordnet:Programmer>
</foaf:Person>

</rdf:RDF>

This says, in FOAF-ese, that "there is a Person that has a foaf:homepage URI 
http://rdfweb.org/people/danbri/ and stands in an eg:collaborator 
relationship to a thing that is a wordnet:Programmer and that has a 
foaf:weblog URI of http://usefulinc.com/edd/blog".

Now because we're using RDF, and not just SAX/DOM/XSLT, we get to make
use of the RDF vocabularies that define the terms we've used. And when
we take those into account, in particular the FOAF definitions,
something interesting happens. The translation into English comes out
subtly different. We get:

"The person with a foaf:homepage URI of http://rdfweb.org/people/danbri/
stands in an eg:collaborator relationship to the wordnet:Programmer that has a
foaf:weblog URI of http://usefulinc.com/edd/blog".

The machine-readable schema for FOAF licenses our use of 'the' rather
than 'a', because it tells us that foaf:homepage and foaf:weblog are
uniquely identifying properties. This means we can write the above
markup fragment, happy in the knowledge that RDF tools will be able to
fold it together with other information which mentions me and Edd, 
*regardless of how we are identified*.  This is a pretty typical use of
FOAF, sharing a small fragment of knowledge about two people and their
work and partially describing them through doing so. 

Fixing up our FOAF-specific XML parsers to assume a certain set of XML
structure won't help with these larger problems of wide-area data
integration. Without the RDF layer, it isn't easy to tell that 
the thing with homepage 'http://rdfweb.org/people/danbri/' is the same
thing as the thing whose mailbox is 'mailto:danbri at w3.org'. 

We can patch around with the XML syntax, but unless XML-based tools are 
able to pick up these deeper commonalities, their ability to deal with 
FOAF from the wider Web will remain needlessly fragile and limited.

> Which is all a log winded way of saying, if FOAF can be evolved so that
> the majority of FOAF data out there can be parsed by plain-XML non-RDF
> parsers as well as RDF parsers then it will be more of a success. It'll

I think there are a few tricks we can pull to help get some basics from 
most FOAF docs out using FOAF-specific XML parsers, but that if we have
to choose between...

a) making XML-oriented developers lives easier by hobbling FOAF's extensibility
b) making XML-oriented developers lives easier by helping them learn
   RDF tools and techniques.

...then we should go for (b) every time.  Not that it'll be easy, but it
needs doing. 

> be easier to get widespread implementation. The end result will still be
> RDF. This seems so obvious to me that I think it's up to the purists to
> argue and explain the downside. 

I don't think there is any easy way to have it both ways without a
little technical friction. Consider a tool that uses RDF software (eg.
Mozilla's RDF engine, Jena in Java, Redland, ...) to load up a FOAF
file, amend the graph slightly (perhaps adding a friend) and then save
the FOAF file back out as a .rdf document. Most RDF toolkits offer this 
facility (serialization) 'out of the box'. It saves developers from
having to know the details of the RDF syntax. They just load, edit, save
via APIs. 

Now imagine we make up a bunch of rules for which tags must go inside
which other tags, so that some-but-not-all serializations of the RDF
graph are deemed 'illegal' syntax for FOAF. Life gets a little easier
for people XML-parsing out bits of easily recognised FOAF from RDF files. 
But it suddenly starts to suck if you were using generic RDF tools.

We've seen this before with syntactic profiles of RDF (eg. RSS1, Adobe
XMP). They force you to do things 'their way' where previously you could 
just call a generic API and get on with your job. 

I don't think this makes FOAF implementors who use RDF 'purists'; we're
all of us pragmatists here, trying to get a bunch of jobs done using the 
most appropriate tools to hand. For a generic, mixed-namespace document 
format (FOAF) we use tools designed for such things (RDF parsers,
serializers). It's worth trying to make things easy for people without 
an RDF parser handy, but also I think we need to acknowledge that 
that should be a shrinking use case as RDF tools mature... And work to 
close that gap, so that availability of mature, well documented RDF
tools is not an issue, for developers in any language.

>				If the downside is too big I'll have to
> think again.

Let's try some examples, see how things look in practice. 
Perhaps take an Ecademy FOAF file and do a 'load/save' through Redland and Jena, 
and investigate the markup that comes out the other end...

How does that sound as a practical thing to explore?

Dan