T1 - Analyzing state sequences with probabilistic suffix trees: the PST R package
AB - This article presents the PST R package for categorical sequence analysis with probabilistic suffix trees (PSTs), i.e., structures that store variable-length Markov chains (VLMCs). VLMCs allow to model high-order dependencies in categorical sequences with parsimonious models based on simple estimation procedures. The package is specifically adapted to the field of social sciences, as it allows for VLMC models to be learned from sets of individual sequences possibly containing missing values; in addition, the package is extended to account for case weights. This article describes how a VLMC model is learned from one or more categorical sequences and stored in a PST. The PST can then be used for sequence prediction, i.e., to assign a probability to whole observed or artificial sequences. This feature supports data mining applications such as the extraction of typical patterns and outliers. This article also introduces original visualization tools for both the model and the outcomes of sequence prediction. Other features such as functions for pattern mining and artificial sequence generation are described as well. The PST package also allows for the computation of probabilistic divergence between two models and the fitting of segmented VLMCs, where sub-models fitted to distinct strata of the learning sample are stored in a single PST.
T1 - What matters in differences between life trajectories: A comparative review of sequence dissimilarity measures
KW - timing
AB - This is a comparative study of the multiple ways of measuring dissimilarities between state sequences. For sequences describing life courses, such as family life trajectories or professional careers, the important differences between the sequences essentially concern the sequencing (the order in which successive states appear), the timing, and the duration of the spells in the successive states. Even if some distance measures underperform, it has been shown that there is no universally optimal distance index and that the choice of a measure depends on which aspect we want to focus on. This study also introduces novel ways of measuring dissimilarities that overcome the flaws in existing measures.
T1 - A comparative review of sequence dissimilarity measures
AB - This is a comparative study of the multiple ways of measuring dissimilarities between state sequences. For sequences describing life courses, such as family life trajectories or professional careers, the important differences between the sequences essentially concern the sequencing (the order in which successive states appear), the timing, and the duration of the spells in the successive states. Even if some distance measures underperform, it has been shown that there is no universally optimal distance index and that the choice of a measure depends on which aspect we want to focus on. This study also introduces novel ways of measuring dissimilarities that overcome the flaws in existing measures.
T1 - A decorated parallel coordinate plot for categorical longitudinal data
AB - This article proposes a decorated parallel coordinate plot for longitudinal data featuring a jitter mechanism revealing the diversity observed longitudinal patterns and allowing the tracking of each pattern variable point and line widths reflecting weighted frequencies the rendering of simultaneous events and different options for highlighting typical patterns. The proposed visual has been developed for describing and exploring the temporal of events but it can be equally applied to other types longitudinal categorical data. Alongside the description of the plot we demonstrate the scope of the plot with two real applications.
T1 - Rendering the order of life events
AB - This article proposes a decorated parallel coordinate plot for longitudinal categorical data, featuring a jitter mechanism revealing the diversity of observed longitudinal patterns and allowing the tracking of each individual pattern, variable point and line widths reflecting weighted pattern frequencies, the rendering of simultaneous events, and different flter options for highlighting typical patterns. The proposed visual display has been developed for describing and exploring the temporal ordering of events, but it can be equally applied to other types of longitudinal categorical data. Alongside the description of the principle of the plot, we demonstrate the scope of the plot with two real applications.
T1 - Analyzing and visualizing state sequences in R with TraMineR
AB - This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed features include the description of sets of sequences by means of transversal aggregated views, the computation of longitudinal characteristics of individual sequences and the measure of pairwise dissimilarities. Special emphasis is put on the multiple ways of visualizing sequences. The core element of the package is the state sequence object in which we store the set of sequences together with attributes such as the alphabet, state labels and the color palette. The functions can then easily retrieve this information to ensure presentation homogeneity across all printed and graphical displays. The article also demonstrates how TraMineR’s outcomes give access to advanced analyses such as clustering and statistical modeling of sequence data.
