[redland-dev] [Raptor RDF Parsing and Serializing Library 0000290]: Parsing turtle files with lots of namespaces is very slow
Mantis Bug Tracker
mantis-bug-sender at librdf.org
Mon Nov 24 19:41:57 CET 2008
The following issue has been SUBMITTED.
======================================================================
http://bugs.librdf.org/mantis/view.php?id=290
======================================================================
Reported By: anonymous
Assigned To:
======================================================================
Project: Raptor RDF Parsing and Serializing Library
Issue ID: 290
Category: api
Reproducibility: always
Severity: minor
Priority: normal
Status: new
Parsing/Serializing Syntax:
======================================================================
Date Submitted: 2008-11-24 18:41
Last Modified: 2008-11-24 18:41
======================================================================
Summary: Parsing turtle files with lots of namespaces is very
slow
Description:
Turtle documents with lots of @prefix headers are very low to parse. Of eg.
the first 9M triples of the 25M triples BSBM dataset takes 7m58s to parse
on a 2GHz 16GB linux machine.
This is largely down to the way namespaces are repesented. A quick hack to
use a simple hashtable instead of a list cuts the parse time down to
1m47s.
A patch that implements the quick hack is attached. It passes as many test
as before (as far as I can see), but may leak memory, and is a little more
memory hungry on small files.
======================================================================
Issue History
Date Modified Username Field Change
======================================================================
2008-11-24 18:41 anonymous New Issue
2008-11-24 18:41 anonymous File Added: raptor-ns-hash.patch
======================================================================
More information about the redland-dev
mailing list