|Title||The link between previous life trajectories and a later life outcome: A feature selection approach|
|Publication Type||Journal Article|
|Year of Publication||2020|
|Secondary Authors||Studer, M|
|Journal||LIVES Working paper|
|Keywords||life course methodology, machine learning, sequence analysis, variable selection|
Several studies have investigated the link between a previous trajectory and a given later-life outcome. Trajectories are complex objects. Identifying which aspects of the trajectories are relevant is of primary interest in terms both of prediction and testing specific theories. In this work, we propose an innovative approach based on data mining feature selection algorithms. The approach is in two steps. We start by automatically extracting several properties of the sequences. Using a life course approach, we focus here on features related to three key aspects of the life course: sequencing, timing and duration of life events. Then, in a second step, we use feature selection algorithms to identify the most relevant properties associated with the outcome. We discuss the use of two features selection approaches a random forest approach (Boruta) and a LASSO method (Stability Selection). We also discuss the inclusion of control variable such as socio-demographic characteristics of the respondent in this selection process. The proposed approach is illustrated through a study of the effects of family and work trajectories between age 20 and 40 on health and income conditions in midlife.