We address in this work the problem of document clustering. Our approach is based on the following pipeline. First, we quantify the topics in a document. Then, a number of clusters is set automatically. Finally, a multi-criteria distance is defined to cluster the documents. The advantage of this approach is that it allows us to have a number of multi-criteria clusters based on structural analysis of each document. We have applied our method on Twitter data and showed the accuracy of our results compared to a random choice number of clusters.
@inproceedings{soict15,
Author = {Quang Vu Bui and Karim Sayadi and Marc Bui},
Booktitle = {Proceedings of the Sixth International Symposium on Information and Communication Technology, Hue City, Vietnam,
December 3-4, 2015},
Pages = {38--45},
Title = {A multi-criteria document clustering method based on topic modeling and pseudoclosure function},
Year = {2015}}