[redland-dev] State of Redland 2007-02

Dave Beckett dave at dajobe.org
Sun Feb 18 17:28:58 UTC 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

State of Redland 2007-02

   Redland was born 2000-08. Happy 6.5th birthday!

   This is a review of the last approximately 15 months since I moved to
   the USA in Oct 2005 to work for Yahoo! Media Group in Sunnyvale,
   California. It covers:
     * Review of Redland users, current state, releases
     * Redland challenges, tasks including work already underway and my
       future ideas
     * Call for participation: how I want to change the project

   (This is on the web at http://librdf.org/2007/02/18-state/)

1. Redland Users

   Redland is made available by several Linux, Unix and other open source
   projects such as:
     * Debian (sarge, etch)
     * Fedora (FC4 onwards) : just Raptor
     * FreeBSD Ports
     * Gentoo
     * Mandriva (9.1 onwards) : Raptor and Redland
     * SUSE (9.2 onwards)
     * Ubuntu (breezy, hoary, dapper, edgy)

   and the libraries are also used inside other applications and services
   such as, for example:
     * ActiveRDF ruby RDF
     * Amaya web browser and HTML Editor
     * Ardour digital audio workstation
     * Hydrogen simple drum machine/step sequencer
     * Morla RDF graphical editor
     * My Opera
     * Nepomuk KDE semantic desktop app
     * The Venice Project client side (I think!)
     * Venus feed aggregator
     * Yahoo! Food, TV, Personal Finance ... web sites
     * ... but I am not keeping track of these very well in the
       applications list ...

2. State of the libraries

   My summary of the high-level state of the libraries is:

   Raptor syntax parsing and serializing: libraptor
          Very mature. The API is changing rarely, mostly bug fixes or
          adding new features to existing parsers/serializers or adding
          entirely new ones.

   Rasqal query parsing, executing: librasqal
          Under development. The API is changing with each release as it
          is both not complete and the SPARQL query engine implementation
          is not fully functional

   Redland RDF API and triple stores: librdf
          Mature. Some API change is happening to add new features
          especially for query and storage.

   Binding languages
          A mixture of mature bindings such as Perl, Python which are
          well tested, working and complete and immature ones with little
          testing or known incomplete, such as Tcl and Java.
          I feel it is too large for one person to maintain who has all
          the N-language skills unless that person is me and I do nothing
          else!

3. Releases

   For each of the libraries, the period above has seen the following
   releases with major changes:

   Raptor 1.4.8 - 1.4.14 (7 releases)

          + A new user tutorial covering the entire API was written.
          + A new RSS tag soup parser was added
          + New Atom 1.0, RSS 1.0, Turtle and DOT serializers were added.

   Rasqal 0.9.11 - 0.9.13 (3 releases)

          + Updated the SPARQL syntax support to match the November 2005
            and April 2006 W3C Working Drafts.
          + Can now serialize query results to JSON.
          + Added APIs to manager query results serializing
          + The query engine had it's ordering, distinct and limit
            support fixed.
          + Lots of internal query engine changes, in particular to split
            the query parsing ('prepare') and the query execution
            ('execute'). These were too intertwined in earlier versions.
            So now you can nearly execute the same query multiple times.

   Redland 1.0.3-1.0.5 (3 releases)

          + A new PostgreSQL storage was added
          + Many fixes for SQLite storage

   Language Bindings 1.0.3.1-1.0.5.1 (3 releases)

          + Many fixes were made across all the bindings especially to
            handle query results.
          + The Python and Ruby bindings got many fixes

   and all of them have benefited from better API documents using gtk-doc
   to replace the older kernel-doc, giving better DocBook and better HTML
   output. The entire project also switched over from CVS to Subversion
   early in 2006.

4. Challenge

   The main challenge I see is to make the project more scalable - moving
   from the current state where I do all the packaging and am the main
   developer. To help this, my goals for 2007 are to:
     * Try to make the development more of a shared task
     * Make it easier to work on just part of Redland
     * Turn the main website into a shared read/write developer resource
     * Schedule #redland IRC developer meetings if that will help give
       the project more of a regular heartbeat

5. General Tasks

   More of a wishlist than an ordered list
     * Think about a License change to Apache2 only.
     * Make Redland turn SPARQL into underlying SQL queries when
       possible.
     * Create the redland developer's site in something like Drupal.
     * Start the redland (librdf) API tutorial.
     * Create some documentation to explain the libraries structure and
       relationships.
     * Consider not shipping raptor and rasqal inside the redland tarball
     * Create documentation on the data flow inside the libraries
     * Figure out whether to keep writing manual pages as well as gtkdoc.
       (DRY)
     * Figure out where module/implementation documentation goes, such as
       storage options in redland, parser features in raptor etc. This is
       needed in C and in the bindings as it is not about the actually
       functions called. (DRY)
     * The demos need to be updated and the changes made put back into
       subversion.
     * A SPARQL protocol endpoint demo would be good to have

   DRY = Don't Repeat Yourself

5.1 Pending stuff

   There are several tasks already in progress either sitting in a patch,
   in Subversion or underway separately.
     * A new schema for the SQLite store: me (patch)
     * Redland transaction support: me (in Subversion)
     * Object-based PHP5 bindings: Yahoo! (pending)
     * SPARQL syntax extensions called LAQRS: me (in Subversion)
     * Apache2 mod_sparql: David Reid (separate project)
     * A new native Ruby binding not using SWIG: somebody on IRC
     * Complete the Raptor GRDDL support: me (in Subversion)

5.2 Raptor tasks

     * Complete the GRDDL support: nearly done
     * Bug fixes only for 2007

5.3 Rasqal tasks

     * Make Rasqal be able to execute complete SPARQL
     * Make SPARQL OPTIONALs work
     * Make SPARQL GROUP work
     * Make SPARQL UNION work
     * Make datatypes work, especially xsd:date and xsd:decimal (bignum
       library)
     * Read result sets from the sparql query results XML
     * Write a query optimiser
     * Add a way to declare extension functions
     * Look into language extensions
     * Address query engine denial of service:
          + limit query wall clock time
          + limit triple pattern matches
          + callback to allow application to abort queries?
          + limit memory use?
          + limit sorting of results?
          + limit URI fetching is done now with the raptor changes

5.4 Bindings tasks

     * Split the single language bindings package to be one per-binding.
       That would be: Perl, PHP5, Python and Ruby
     * Make the Perl binding into a CPAN installable tarball - partially
       done but not entirely working
     * Deprecate or remove bindings that have no active maintainer. These
       would be C#, Java and Tcl.

6. Future Ideas

6.1 New Version control system

   This is more speculative and I am giving no firm commitment that this
   will happen soon. Subversion is stable and well supported.

   Move from Subversion to a more distributed development-friendly
   version control system.

   My requirements for a new VCS:
     * Distributed - no central repository required
     * Can operate networkless
     * Friendly to managing patches
     * Quick
     * Reliable and successful (no research project, bleeding edge)
     * Mature

   GIT seems one possibility - I tried this conversion already and it
   worked well. Mercurial I couldn't get it converted without losing
   information. SVK I'm not so sure about, as I don't like VCS that are
   layered on others e.g. CVS still leaks it's original RCS basis. I
   didn't try DARCS. Arch / Bazzar / Bazzar-ng is too bleeding edge. This
   is a medium term goal.

6.2 Raptor Version 2

   This is a break-the-binary-API choice, not a rebuild. The main reason
   to do this would be to add a 'world' style argument to constructors,
   like redland has and similar to the curl handle, APR pool or BDB
   environment. This would mean that raptor_init() and raptor_finish()
   would be replaced by something like rw = raptor_new_world() and
   raptor_free_world(rw).

   One other reason to do this wuld be to add a pull-style triple parser,
   where the model is:
parser = new RDFXMLparser()
parser.start_parse( { URI => uri} )
while (not parser.done())
  triple = parser.raptor_get_next_triple()
  ...
delete parser

   ... rather than the current one of receiving triples via a callback.

   However, this would either needed a pull-based WWW library (I know of
   only libwww and I don't want to use that) or batch up the triples in
   memory by wrapping the push-based parser or multiple threads, which
   has it's own set of problems. This would also need an update to the
   raptor_iostream class to add read methods, but that's easier than the
   first problem. So this is likely not V2 stuff.

   For V2 there would also be a bunch of other API cleanups:
     * Rename all raptor_foo functions to be raptor_parser_foo where they
       really are about parsers
     * Ditch the URI context/data and use raptor_world to hold that
     * Alter raptor_statement to have 4 components including a context /
       graph / formula so that Raptor could parse N3. Possibly rename it
       to raptor_triple.

   So in summary: this is not being done soon.

7. Call for participation

   This is your opportunity to help more directly with Redland, in
   particular with language bindings as there are a trickle of patches
   and fixes to these that take me some time to get to looking at and
   releasing.

   These are the areas I've seen that can benefit from an active person:
     * OSX porter / (ObjC binding maintainer)
     * Win32 porter
     * Perl binding maintainer
     * Python binding maintainer
     * (New Ruby binding?)
     * (New PHP5 binding: Yahoo! pays me to look after this)

   and deprecate / remove the bindings for C#, Java and Tcl. They stay in
   Subversion, but are no longer shipped.

   What saying "yes" to one of the roles above would mean is gaining the
   role in the bug tracker for the area and gaining commit to the Redland
   Subversion for the area, which might mean adding a new area if needed.
   It might also be that the bindings single package is split into
   individual language packages means a Subversion change to match.

Thanks for reading.

Dave Beckett
California, USA, 2007-02-18
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF2IzXQ+ySUE9xlVoRAoboAJ4texM+WY5d1qZG7RtUciL8uTpKuACdEpZn
Xtd8367vA6Ahc0IXxOtLss8=
=sW7e
-----END PGP SIGNATURE-----


More information about the redland-dev mailing list