[doap-interest] DOAP + OpenID + PURL = doapurl.org

Wed Dec 12 21:33:38 GMT 2007

Stuart Yeates wrote:
> 
> I'm not sure about the over-all merits of having a DOAP specific
> PURL server. I had imagined that in the long-term most DOAP would be
> published automatically by code repositories and project websites as
> a by-product of their core activities, in much the same way the
> blogs and planets publish RSS and OPML. I had imagined that DOAP
> would be found via micro-formats, standard buttons or see-also type
> headers, also like RSS and OPML.
> 
> Projects are found either using the regular web and these
> mechanisms, via project dependency metadata, via the project
> catalogues of code repositories.
> 
> Having a master list of DOAP files sounds great, but it sounds a
> little like you're recreating DNS in RDF...
> 

That's what I imagine and am hoping for too. But how do we convince
SourceForge and developers in general to use DOAP when there aren't
many tools to use it yet? Until people can find a single DOAP record
for the exact project they are searching for, what's going to
motivate people to adopt DOAP?

I really don't intend this PURL resolver to be anything but a starting
point for some simple tools so we can start finding DOAP easily today,
until everything is 'DOAP-enabled'.

I patiently await the day all the big 'forges' use DOAP, but then we
still need tools that create DOAP for developers who self-host. Tools
like MOAP[1], which we'll need a heck of a lot more of.

How would a command-line tool search for DOAP without some kind
of DNS for DOAP? Lets say you have a tool that knows how to search
Trac and Bugzilla and we want to modify our cool tool to figure out the
URL of the bug-tracker using DOAP.

Here's how I figure a Gentoo Linux user, for example,
would use this tool. Users know the package as
dev-ruby/libfoo so they use a command-line client like so:

bugcli dev-ruby/libfoo <bug # etc.>

The client looks in Gentoo's database of ebuilds, gets
the homepage, then queries someplace like doapstore.org for any
DOAP that has the same homepage or old-homepage and returns
the DOAP, and the bug tracker URL is extracted and shown to the user.

The problem I found in implementing this is that a package maintainer
may be lazy and just put http://kde.org/ for a homepage url, which
of course isn't specific enough, so package maintainers will have
to learn to be very careful when refering to homepages. If
it's hosted on SourceForge, a homepage like
http://sf.net/projects/libfoo won't help us find DOAP when the
package actually has a 'real' homepage hosted on example.com.

Another problem with this method is the tool will need a plugin
for each packaging system out there to figure out the homepage
from a well-known package name.

One more problem. If you query doapstore.org by homepage, you may get
multiple DOAP records for the same project, from different sources.
PyPI[2] for example creates a DOAP file for each release of a package.
I don't think PyPI is doing DOAP correctly, they should just
have a single file, but the damage is already done as far as a big DOAP
triple store that spiders 'all' DOAP go, unless they monitor and
manually figure out which DOAP to present. We need to have some type of
way to find the 'one true' maintainer's DOAP. This is

 > Might something like a semantic wiki[1] not be easier to enable
 > projects hosted in obscure places to be reached by semantic web
 > crawlers?

I'll take another look at that, but I don't see how we could use it
for people to easily create clients to find specific DOAP using
a library or tool.

> Do you have a plan to check that projects are real open source
> projects and not spammers poisoning your index with their hateful
> projects do drive links and clicks to their sites?
> 

I've put a minimal amount of checking to make sure it has
at least the minimal amount of metadata needed, including some
kind of 'valid' license. doapurl.org isn't going to be a browesable
website. It's simply a PURL resolver and anyone can download the list of
PURL's and create their own index website however they want, so I'm not
sure spammers would get anything out of it, since it's up to
the index creators to spider and validate the DOAP. The day RDF
is popular enough for spammers to start injecting RDF is the day
I can happily shut down doapurl.org because by then I'm sure there'll
be a much more Semantic Web-oriented solution ;)

Rob

[1] https://thomas.apestaart.org/moap/trac
[2] http://pypi.python.org/