Saturday, November 15, 2008


This will be the last post in this blog. I had my presentation last tuesday where i outlined my work in a 9 minute presentation.

Basically coming into this there was concerns over whether Glosser would scale if demand grew. LSA does not scale well with document length and even with moderately sized documents, increasing users can greatly degrade response time.

The idea of storing the model was what was looked at in the treatise. In Java this was simple as implementing serializable interface allows objects to be stored in binary files. I looked into compressing these files using GZIP however this was a poor time/space trade-off since there is little redundant information.

As the initial results came in from the tests storing the model provided great increases in time over recalculation. And the size of the model was feasible only 190KB for 3000 word documents.

Looking to add further to the project i looked at possibly changing the SVD to use Lanczos which is much faster than the standard method which was used by the Weka Library. However without a freely available Java implmentation there was simply not enough time in this short project to implement any solution. However Lanczos is important for performance and will eventually be integrated into Glosser. Most likely through linking to a Matlab implementation of Lanczso (like PROPACK).

With the remaining time i modified the Weka/Jama implementation of SVD so that it would truncate the model after dimensionality reduction so as to only keep the useful data.

So summing up , storing the model was a good idea. It provides a good time/space trade-off. For example with 200 concurrent users with 3000 word documents. Even if only 30% of the requests were for previously calculated models the system would still gain a 20% increase in performance by storing models. In addition storing the model can bring about other advantages such as been able to process and stored models offline whilst the user does not wait. For example student emails document and then gets reply when feedback is ready.

Overall this was an interesting project to be part of. Best of luck to the Glosser project and thanks to my Supervisors Dr. Rafael Calvo and Jorge Villalon.


Monday, November 3, 2008


Now with the written treatise over my focus is on the presentation and upcoming final exams.

I have come to realise how little 9 minutes is for a presentation especially when you have a semester of stuff to talk about. I met with Jorge last week who gave me a general template of what topics to talk about and what had to be cut out.

To help achieve the time constraints i won't be discussing the GZIP compression and also the Lanczos algorithm.

I have my draft presentation ready though and as per Rafael's request his thesis students including me will be giving a rehearsal presentation tomorrow. This will give me a chance to get some feedback to refine the presentation.

This semester(which should be my last) is closing and now there are only a couple of weeks until its all over.