Quantitative gait analysis represents a fundamental tool for the clinical evaluation of motor disorders and for monitoring the effectiveness of rehabilitation treatments. However, the interpretation of data coming from motion capture systems requires highly specialized expertise and is influenced by subjective aspects such as the experience of the operators. In this context, the use of machine learning techniques can provide an objective and automated support to the analysis of human movement. The aim of this study is the development of a machine learning model capable of automatically classifying the gait patterns of healthy and pathological subjects, in particular patients affected by Cerebral Palsy and Down Syndrome. The data were acquired in a laboratory using an optoelectronic system and subsequently processed for the extraction of spatiotemporal, kinematic, and dynamic parameters aggregated into a representative feature set. The resulting dataset was used for the training and validation of five classification models: Logistic Regression, Support Vector Machine, Random Forest, Multi-Layer Perceptron, and XGBoost. The classification task was addressed both as a binary problem, where all pathological classes were aggregated, and as a multi-class problem, distinguishing among the different pathological conditions. In both cases, multiple configurations of the control group were tested, allowing the assessment of their effect on model performance. Overall, the results can be considered satisfactory and consistent with previous findings in the literature; however, the performances were strongly affected by the definition of the control group. In the scenarios unaffected by domain bias, the best performances were achieved by the Logistic Regression model, with macro–F1 scores of 0.76 (multi-class) and 0.72 (binary), indicating that linear relationships among gait features were sufficient for effective classification. Despite the encouraging performances, the study presents some limitations related to the small sample size and the quality of the available data. Future developments may involve expanding the dataset, incorporating higher-quality data, applying domain adaptation techniques, and implementing automated systems applicable in real clinical settings.
L’analisi quantitativa del cammino rappresenta uno strumento fondamentale per la valutazione clinica dei disturbi motori e per il monitoraggio dell’efficacia dei trattamenti riabilitativi. Tuttavia, l’interpretazione dei dati provenienti dai sistemi di motion capture richiede competenze altamente specialistiche ed è influenzata da aspetti soggettivi quali l’esperienza degli operatori. In questo contesto, l’impiego di tecniche di machine learning può fornire un supporto oggettivo e automatizzato all’analisi del movimento umano. Lo scopo di questo studio è lo sviluppo di un modello di machine learning in grado di classificare automaticamente i pattern di cammino di soggetti sani e patologici, in particolare pazienti affetti da Paralisi Cerebrale e Sindrome di Down. I dati sono stati acquisiti in laboratorio mediante un sistema optoelettronico e successivamente elaborati per l’estrazione di parametri spaziotemporali, cinematici e dinamici aggregati in un set di feature rappresentativo. Il dataset così ottenuto è stato utilizzato per l’addestramento e la validazione di cinque modelli di classificazione: Regressione Logistica, Support Vector Machine, Random Forest, Multi-Layer Perceptron e XGBoost. La classificazione è stata affrontata sia come problema binario, considerando le classi patologiche in forma aggregata, sia come problema multiclasse, distinguendo le diverse condizioni patologiche. In entrambi i casi, sono state testate diverse configurazioni del gruppo di controllo, permettendo la valutazione del loro effetto sulle prestazioni dei modelli. I risultati complessivi possono essere considerati soddisfacenti e coerenti con quanto riportato in letteratura; tuttavia, le prestazioni sono risultate fortemente condizionate dalla definizione del gruppo di controllo. Nei contesti non influenzati da bias di dominio, le migliori prestazioni sono state ottenute dal modello di Regressione Logistica, con valori di macro–F1 pari a 0.76 nello scenario multiclasse e 0.72 in quello binario, indicando che le relazioni lineari tra le caratteristiche del cammino erano sufficienti per una classificazione efficace. Nonostante le prestazioni incoraggianti, il lavoro presenta alcune limitazioni legate alla dimensione ridotta del campione e alla qualità dei dati disponibili. Sviluppi futuri potranno riguardare l’ampliamento del dataset, l’inclusione di dati di qualità superiore, l’utilizzo di tecniche di domain adaptation, e l’implementazione di sistemi automatizzati in contesti clinici reali.
Design and evaluation of a machine learning pipeline for automated gait classification
Zanotto, Luca;ANCHISI, BEATRICE
2024/2025
Abstract
Quantitative gait analysis represents a fundamental tool for the clinical evaluation of motor disorders and for monitoring the effectiveness of rehabilitation treatments. However, the interpretation of data coming from motion capture systems requires highly specialized expertise and is influenced by subjective aspects such as the experience of the operators. In this context, the use of machine learning techniques can provide an objective and automated support to the analysis of human movement. The aim of this study is the development of a machine learning model capable of automatically classifying the gait patterns of healthy and pathological subjects, in particular patients affected by Cerebral Palsy and Down Syndrome. The data were acquired in a laboratory using an optoelectronic system and subsequently processed for the extraction of spatiotemporal, kinematic, and dynamic parameters aggregated into a representative feature set. The resulting dataset was used for the training and validation of five classification models: Logistic Regression, Support Vector Machine, Random Forest, Multi-Layer Perceptron, and XGBoost. The classification task was addressed both as a binary problem, where all pathological classes were aggregated, and as a multi-class problem, distinguishing among the different pathological conditions. In both cases, multiple configurations of the control group were tested, allowing the assessment of their effect on model performance. Overall, the results can be considered satisfactory and consistent with previous findings in the literature; however, the performances were strongly affected by the definition of the control group. In the scenarios unaffected by domain bias, the best performances were achieved by the Logistic Regression model, with macro–F1 scores of 0.76 (multi-class) and 0.72 (binary), indicating that linear relationships among gait features were sufficient for effective classification. Despite the encouraging performances, the study presents some limitations related to the small sample size and the quality of the available data. Future developments may involve expanding the dataset, incorporating higher-quality data, applying domain adaptation techniques, and implementing automated systems applicable in real clinical settings.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_12_Anchisi_Zanotto_Tesi.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Tesi
Dimensione
5.37 MB
Formato
Adobe PDF
|
5.37 MB | Adobe PDF | Visualizza/Apri |
|
2025_15_Anchisi_Zanotto_Executive Summary.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Executive Summary
Dimensione
562.88 kB
Formato
Adobe PDF
|
562.88 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/247259