TY - JOUR
T1 - Analyzing state sequences with probabilistic suffix trees: the PST R package
JF - Journal of Statistical Software
Y1 - 2016
A1 - Gabadinho, Alexis
A1 - Ritschard, Gilbert
KW - categorical sequences
KW - Probabilistic suffix trees
KW - R
KW - Sequence data mining
KW - sequence visualization
KW - state sequences
KW - Variable-length Markov chains
AB - This article presents the PST R package for categorical sequence analysis with probabilistic suffix trees (PSTs), i.e., structures that store variable-length Markov chains (VLMCs). VLMCs allow to model high-order dependencies in categorical sequences with parsimonious models based on simple estimation procedures. The package is specifically adapted to the field of social sciences, as it allows for VLMC models to be learned from sets of individual sequences possibly containing missing values; in addition, the package is extended to account for case weights. This article describes how a VLMC model is learned from one or more categorical sequences and stored in a PST. The PST can then be used for sequence prediction, i.e., to assign a probability to whole observed or artificial sequences. This feature supports data mining applications such as the extraction of typical patterns and outliers. This article also introduces original visualization tools for both the model and the outcomes of sequence prediction. Other features such as functions for pattern mining and artificial sequence generation are described as well. The PST package also allows for the computation of probabilistic divergence between two models and the fitting of segmented VLMCs, where sub-models fitted to distinct strata of the learning sample are stored in a single PST.
VL - 72
Y1 - aug
CP - 3
PY - doi:10.18637/jss.v072.i03
ER -
TY - JOUR
T1 - What matters in differences between life trajectories: A comparative review of sequence dissimilarity measures
JF - Journal of the Royal Statistical Society: Series A (Statistics in Society)
Y1 - 2016
A1 - Studer, Matthias
A1 - Ritschard, Gilbert
KW - dissimilarity
KW - distance
KW - duration
KW - optimal matching
KW - sequencing
KW - spells
KW - state sequences
KW - timing
AB - This is a comparative study of the multiple ways of measuring dissimilarities between state sequences. For sequences describing life courses, such as family life trajectories or professional careers, the important differences between the sequences essentially concern the sequencing (the order in which successive states appear), the timing, and the duration of the spells in the successive states. Even if some distance measures underperform, it has been shown that there is no universally optimal distance index and that the choice of a measure depends on which aspect we want to focus on. This study also introduces novel ways of measuring dissimilarities that overcome the flaws in existing measures.
VL - 179
CP - 2
PY - 10.1111/rssa.12125
ER -
TY - JOUR
T1 - A comparative review of sequence dissimilarity measures
JF - LIVES Working Papers
Y1 - 2014
A1 - Studer, Matthias
A1 - Ritschard, Gilbert
KW - dissimilarity
KW - distance
KW - duration
KW - optimal matching
KW - sequencing
KW - spells
KW - state sequences
KW - timing
AB - This is a comparative study of the multiple ways of measuring dissimilarities between state sequences. For sequences describing life courses, such as family life trajectories or professional careers, the important differences between the sequences essentially concern the sequencing (the order in which successive states appear), the timing, and the duration of the spells in the successive states. Even if some distance measures underperform, it has been shown that there is no universally optimal distance index and that the choice of a measure depends on which aspect we want to focus on. This study also introduces novel ways of measuring dissimilarities that overcome the flaws in existing measures.
PB - NCCR LIVES
CY - Lausanne
VL - 2014
CP - 33
PY - 10.12682/lives.2296-1658.2014.33
ER -
TY - JOUR
T1 - A decorated parallel coordinate plot for categorical longitudinal data
JF - The American Statistician
Y1 - 2014
A1 - Bürgin, Reto
A1 - Ritschard, Gilbert
KW - event sequences
KW - exploratory data analysis
KW - graphical statistics
KW - longitudinal data
KW - multiple time series plot
KW - sequence analysis
KW - state sequences
KW - visualization
AB - This article proposes a decorated parallel coordinate plot for longitudinal data featuring a jitter mechanism revealing the diversity observed longitudinal patterns and allowing the tracking of each pattern variable point and line widths reflecting weighted frequencies the rendering of simultaneous events and different options for highlighting typical patterns. The proposed visual has been developed for describing and exploring the temporal of events but it can be equally applied to other types longitudinal categorical data. Alongside the description of the plot we demonstrate the scope of the plot with two real applications.
VL - 68
CP - 2
PY - 10.1080/00031305.2014.887591
ER -
TY - JOUR
T1 - Rendering the order of life events
JF - LIVES Working Papers
Y1 - 2013
A1 - Bürgin, Reto
A1 - Ritschard, Gilbert
KW - event sequences
KW - exploratory data analysis
KW - graphical statistics
KW - longitudinal categorical data
KW - multiple time series plot
KW - sequence analysis
KW - state sequences
KW - visualization
AB - This article proposes a decorated parallel coordinate plot for longitudinal categorical data, featuring a jitter mechanism revealing the diversity of observed longitudinal patterns and allowing the tracking of each individual pattern, variable point and line widths reflecting weighted pattern frequencies, the rendering of simultaneous events, and different flter options for highlighting typical patterns. The proposed visual display has been developed for describing and exploring the temporal ordering of events, but it can be equally applied to other types of longitudinal categorical data. Alongside the description of the principle of the plot, we demonstrate the scope of the plot with two real applications.
PB - NCCR LIVES
CY - Lausanne
VL - 2013
CP - 29
PY - 10.12682/lives.2296-1658.2013.29
ER -
TY - JOUR
T1 - Analyzing and visualizing state sequences in R with TraMineR
JF - Journal of Statistical Software
Y1 - 2011
A1 - Gabadinho, Alexis
A1 - Ritschard, Gilbert
A1 - Nicolas S Müller
A1 - Studer, Matthias
KW - categorical sequences
KW - dissimilarities
KW - optimal matching
KW - R
KW - representative sequences
KW - sequence complexity
KW - sequence visualization
KW - state sequences
AB - This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed features include the description of sets of sequences by means of transversal aggregated views, the computation of longitudinal characteristics of individual sequences and the measure of pairwise dissimilarities. Special emphasis is put on the multiple ways of visualizing sequences. The core element of the package is the state sequence object in which we store the set of sequences together with attributes such as the alphabet, state labels and the color palette. The functions can then easily retrieve this information to ensure presentation homogeneity across all printed and graphical displays. The article also demonstrates how TraMineR’s outcomes give access to advanced analyses such as clustering and statistical modeling of sequence data.
PB - The American Statistical Association
CY - Alexandria, VA
VL - 40
UR - http://www.jstatsoft.org/v40/i04
CP - 4
ER -