Monday, September 8, 2008


In reference to a previous post where i discussed the results of initial testing comparing storing the SVD as opposed to creating it. I found that the code creating this had a bug in it causing the temp lucene index data not to be cleared causing the amount of data to get incrementally larger each time it ran.

I changed the code and also changed the LuceneSearch from document to sentence and the results were quite different.

for example
Document: Diagnostic 04.txt 3kb
Original Corpus with SVD:47ms
Writing: 0ms
Reading: 15ms
Corpus 2: 16ms
SVD Object: 24kb

just to see what it could handle i also tried it with a large text,
Document: Sun Tzu, Art of War
Size: plaintext/ 329kb, approx 55,000 words
Original Corpus with SVD: 715,094 ms
Writing: 2438 ms
Reading: 453 ms
Corpus 2: (with read SVD): 1906ms
SVD object: 36mb

looking at task manager , it was consuming about 150mb memory whilst
calculating the SVD.

So these results are obviously significantly different to the initial ones posted earlier and makes the case that storing the SVD object may not always be the best option especially when the document is short.

No comments: