Monday, October 27, 2008

Treatise Printed!

I went to officeworks today and got the treatise printed. It is now complete and ready to go (Submission on wednesday). To give a bit of an idea of what i did here is the abstract. I still have the presentation to work on and i'll post some charts of the results later.


With the emergence of Web 2.0 and increasing ubiquity of the internet, there has been a rise in the amount of rich internet applications, and particularly collaborative web applications. One particular application, relevant for Collaborative Work and Collaborative Learning is Collaborative Writing (CW), which corresponds to write a document synchronously by more than one author. CW as a learning activity is especially relevant for the teaching and learning of Academic writing in higher education. Google Docs is a CW application that simulates a Word processor within a web page, it is used by students in the School of EIE at the University of Sydney to write collaboratively.

Feedback on students' writing is an important source of students' learning of academic writing, however providing more feedback is too costly because it requires a lot of human time. Automated tools to provide feedback on writing have been proposed and tested. One of these was implemented at the School of EIE, it's called Glosser and uses Machine Learning
(particularly Text Mining) techniques to provide automatic feedback on essays. Glosser works as a web application integrated with Google Docs.

Machine learning programs are able to extract useful information from data. The Glosser tool uses a text mining technique known as Latent Semantic Analysis (LSA) which has a high computational complexity. This poses a problem, given the high cost of creating the model, and the amount of data produced by collaborative applications, it is particularly complex to achieve a good response time for tools such as Glosser. Particularly in web based applications where response time is vital there needs to be ways to minimise the impact of the ML model creation on the users' response time, by managing these models in an intelligent way.

This treatise proposes a model for managing Machine learning models in a collaborative web application. The proposed method is to essentially cache the Machine learning model. By making the model persistent further calls to the same document would not require recalculation and could simply be restored from storage and thus reduce response time for recalculations. Experimental analysis showed that the proposed method was effective and greatly reduced recalculation time. File compression of the stored model was shown to be a bad time-space trade-off whist truncating the model to remove redundant data provided a significant reduction in the model size. The data showed that in a situation with 200 concurrent users with ~3000 words documents the new method would provide a 20% reduction in time even with a cache hit ratio of 30% with each model requiring 190KB of space.

No comments: