Monday, October 27, 2008

Treatise Printed!

I went to officeworks today and got the treatise printed. It is now complete and ready to go (Submission on wednesday). To give a bit of an idea of what i did here is the abstract. I still have the presentation to work on and i'll post some charts of the results later.

With the emergence of Web 2.0 and increasing ubiquity of the internet, there has been a rise in the amount of rich internet applications, and particularly collaborative web applications. One particular application, relevant for Collaborative Work and Collaborative Learning is Collaborative Writing (CW), which corresponds to write a document synchronously by more than one author. CW as a learning activity is especially relevant for the teaching and learning of Academic writing in higher education. Google Docs is a CW application that simulates a Word processor within a web page, it is used by students in the School of EIE at the University of Sydney to write collaboratively.

Feedback on students' writing is an important source of students' learning of academic writing, however providing more feedback is too costly because it requires a lot of human time. Automated tools to provide feedback on writing have been proposed and tested. One of these was implemented at the School of EIE, it's called Glosser and uses Machine Learning
(particularly Text Mining) techniques to provide automatic feedback on essays. Glosser works as a web application integrated with Google Docs.

Machine learning programs are able to extract useful information from data. The Glosser tool uses a text mining technique known as Latent Semantic Analysis (LSA) which has a high computational complexity. This poses a problem, given the high cost of creating the model, and the amount of data produced by collaborative applications, it is particularly complex to achieve a good response time for tools such as Glosser. Particularly in web based applications where response time is vital there needs to be ways to minimise the impact of the ML model creation on the users' response time, by managing these models in an intelligent way.

This treatise proposes a model for managing Machine learning models in a collaborative web application. The proposed method is to essentially cache the Machine learning model. By making the model persistent further calls to the same document would not require recalculation and could simply be restored from storage and thus reduce response time for recalculations. Experimental analysis showed that the proposed method was effective and greatly reduced recalculation time. File compression of the stored model was shown to be a bad time-space trade-off whist truncating the model to remove redundant data provided a significant reduction in the model size. The data showed that in a situation with 200 concurrent users with ~3000 words documents the new method would provide a 20% reduction in time even with a cache hit ratio of 30% with each model requiring 190KB of space.

Sunday, October 19, 2008

Quick Update

Just a quick update to say that thesis writing continues. I hope to have most of it done by tuesday to show Jorge so that i can get feedback since i will most likely be finishing and getting it printed by the end of this week.

back to work ...

Monday, October 13, 2008

Countdown to the End

So there is only a little over two weeks to get the written treatise finished. Finally i have begun writing more of the core of the thesis.

Throughout the semester i was constantly refining exactly how and what i was testing. I still have small issues to deal with to get a full set of results but the results i have are good enough to begin writing as they show all the logical relationships between different variables.

I will leave it to a post later on when i have actually written the conclusion to give a sum up of what i have done and the findings i discovered in regards to Singular Value Decomposition(SVD).

Although i know what the results mean and the conclusions that are drawn from the results in regards to my initial aim , it is still by no means a trivial tasks to express this in the form of a written treatise. So its looking like a busy two weeks to finish it all off.

Monday, October 6, 2008

Lanczos and Java

So the deadline is nearing. After a suggestion from Rafael i decided to look into replacing the current SVD method with the more efficient lanczos.

I figured out quickly that to implement this from scratch would be impossible for me within a two week timeframe, especially since i would need to understand the algorithm in detail.

I started searching for things that would help:
  • Java Matrix packages: there seems to be alot of them out, a popular one is Jama, which is infact the matrix package used by Weka. Others include Matrix Toolkits for Java, Colt and JLAPack. None of these had a implementation of SVD based off the lanczos
  • However there was a code out there that did implement lanczos unforuntantley not in Java. These include SVDPack and PROpack. It seemed that the scientific community still sticks to Fortran, C and Matlab for these matrix based solutions.
  • I did look into calling a C or Matlab program from within Java , however it seemed more difficult than i thought .
  • The Matlab program would require a MatLab runtime enviroment on the host machine in addition it would make the entire system dependent on Matlab.
  • As for the C program the code was difficult to read and wasn't structured very well
  • for the time being it doesn't look like this lanczos implementation can be done, its definitely doable but with only a few weeks left i would need to wrap it up quick so i can write the actual thesis.
With the remaining time i looked into implementing my proposed changes to the actual TML code. One thing i noticed was operations besides SVD that actual cost time that i was not measuring. This added significant amounts of time such that the point where SVD generation only really got bigger than reading SVD at about the 4000-5000 word mark.

I have meeting tommorow as usual and will discuss these issues.