A multi-criteria document clustering method based on topic modeling and pseudoclosure function pdf

Quang Vu Bui, Karim Sayadi, Marc Bui

Abstract

We address in this work the problem of document clustering. Our approach is based on the following pipeline. First, we quantify the topics in a document. Then, a number of clusters is set automatically. Finally, a multi-criteria distance is defined to cluster the documents. The advantage of this approach is that it allows us to have a number of multi-criteria clusters based on structural analysis of each document. We have applied our method on Twitter data and showed the accuracy of our results compared to a random choice number of clusters.

Citation

BibTeX

@inproceedings{soict15,
	Author = {Quang Vu Bui and Karim Sayadi and Marc Bui},
	Booktitle = {Proceedings of the Sixth International Symposium on Information and Communication Technology, Hue City, Vietnam, 
	December 3-4, 2015},
	Pages = {38--45},
	Title = {A multi-criteria document clustering method based on topic modeling and pseudoclosure function},
	Year = {2015}}