Extracting and rendering representative sequences
Title | Extracting and rendering representative sequences |
Publication Type | Book Chapter |
Year of Publication | 2011 |
Authors | Gabadinho, A, Ritschard, G, Studer, M, Müller, NS |
Editor | Fred, A, Dietz, JLG, Liu, K, Filipe, J |
Book Title | Knowledge Discovery, Knowledge Engineering and Knowledge Management |
Series Title | Communications in Computer and Information Science |
Number | Vol. 128 |
Pagination | 94-106 |
Publisher | Springer |
Place Published | Berlin |
ISBN Number | 978-3-642-19031-5 |
Keywords | categorical sequences, dairwise dissimilarities, discrepancy of sequences, representatives, summarizing sets of sequences, visualization |
Abstract | This paper is concerned with the summarization of a set of categorical sequences. More specifically, the problem studied is the determination of the smallest possible number of representative sequences that ensure a given coverage of the whole set, i.e. that have together a given percentage of sequences in their neighbourhood. The proposed heuristic for extracting the representative subset requires as main arguments a pairwise distance matrix, a representativeness criterion and a distance threshold under which two sequences are considered as redundant or, identically, in the neighborhood of each other. It first builds a list of candidates using a representativeness score and then eliminates redundancy. We propose also a visualization tool for rendering the results and quality measures for evaluating them. The proposed tools have been implemented in our TraMineR R package for mining and visualizing sequence data and we demonstrate their efficiency on a real world example from social sciences. The methods are nonetheless by no way limited to social science data and should prove useful in many other domains. |