[redland-dev] sqlite storage performance question

Hugh Miao imoldcat at gmail.com
Wed Feb 24 08:40:03 CET 2010


Dear All

First time to send question here. Sorry in advance, if I did anything wrong.
: )

About Redland, we are working on a RDF store based vocabulary repository and
engine these days.
Still in the feasibility analysis phase, this project requires
cross-platform feature.
Redland becomes the only choice based on related survey.

We tried to use redland & sqlite together and did some performance test with
the following random
generated data:


<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="http://www.sec.com/str20969">
    <ns0:str912 xmlns:ns0="http://www.sec.com/">398517</ns0:str912>
  </rdf:Description>
  <rdf:Description rdf:about="http://www.sec.com/str3648">
    <ns0:str729 xmlns:ns0="http://www.sec.com/">752983</ns0:str729>
  </rdf:Description>
   ... ...
   ... ...
   ... ...
  <rdf:Description rdf:about="http://www.sec.com/str85940">
    <ns0:str797 xmlns:ns0="http://www.sec.com/">531931</ns0:str797>
  </rdf:Description>
</rdf:RDF>


As we can see, the subject is a URI with "str" + rand (100000),
the predicate is "str" + rand(1000"), the object is a literal string.

sqlite store is openned with no option setting. The storage performance
result is like follows:

<Triple_Num, Storage_Time>
: means we save Triple_Num triples into the store together with Storage_Time
(time)
<500, 00:03:07.0295802>
<1000, 00:05:17.3689650>
<2000, 00:10:52.1397176>
<3000, 00:13:01.7167990>
<5000, 00:23:51.0318815>
<8000, 00:35:36.7551649>
<10000, 00:46:42.6747223>
<20000, 01:29:07.9316124>

Though the storage time is linear scale, but for each triple, it takes
around 300 milliseconds.
Surprisingly, the simple query for sqlite works very fast, and it 's
constant scale with avg time less than 1 millisecond.

The Redland version we are using is C# binding, 1.0.3 downloaded from
http://download.librdf.org/binaries/win32/
The testing machine is a high performance PC.

To get the answer why it takes so long, we did some study about previous
emails in the mailist:
I found the similar question raised Nov. 2006
http://lists.librdf.org/pipermail/redland-dev/2006-November/001461.html

Dave mentioned that there is a 'synchronous' setting, we tried to use it
when open the sqlite store, but
the performance is the same.

We think the problem maybe as follows?
1. our version is too old ? [we'll try on 1.0.10 on linux soon]
2. there's no such setting? [as the document said in 1.0.10 for sqlite
there's only 'new' setting option.

 Before digging into the code, we decide ask something here.

Dave, if you have time, would you please give us some description and hint
about sqlite storage current status. I mean
1. how to store data into sqlite faster
2. is the document of sqlite the up to date version? If not, where to find
one? or shall we dig into the code directly.

Dave, we noticed you're calling for maintainer for windows these days. If
possible, would you please
send us some guide about how to port Redland to windows based on experience
before. We noticed there is visual
studio solution file in some source code version back to 2006. If we finally
use Redland & sqlite in the current
project, probably, we can maintain this part for certain period of time
since we definitely need to port our solution
on windows.

Thanks in advance. : )

Best Regards,

Hugh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.librdf.org/pipermail/redland-dev/attachments/20100224/7065ac24/attachment.htm 


More information about the redland-dev mailing list