Predictions on bike sharing use based on the weather conditions

This Master Thesis presents the development of machine learning algorithms, aimed at forecasting the use of a bike sharing service on the basis of the weather conditions. These services, which are more and more important in the urban mobility, require a variety of supporting activities to be reliably provided to citizens: better forecasts allow to improve the system’s overall efficiency. The machine learning algorithms investigate the complex relations that exist between weather conditions, the calendar, and the number of bikes rented. The ones developed in this Thesis belongs to the supervised learning techniques: algorithms learn to predict the desired parameter through the analysis of a huge number of historic examples, all contained in the available database, consisting in the actual value to be predicted and the set of attributes on which the estimation must be based. In this case, the database consists of two years of records of the number of bikes rented in London, together with the weather conditions that characterized the city in that period. The development and the execution of the algorithms are based on the Python programming language and its libraries, that provide useful tools both in the first phases of data analysis and in the building of the predictive models. Among the various algorithms analysed, the ones that provided the best results were the random forest and the adaptive boosting, both based on regression trees, together with the neural network.

Questa Tesi di Laurea Magistrale presenta lo sviluppo di algoritmi di machine learning, volti a prevedere la domanda di utilizzo di un servizio di biciclette in condivisione sulla base delle condizioni meteorologiche della giornata. Questi servizi, sempre più importanti nella mobilità urbana, richiedono varie attività di supporto al fine di essere erogati in modo affidabile alla cittadinanza: migliori previsioni consentono di aumentare l'efficienza complessiva del sistema. Gli algoritmi di machine learning indagano le complesse relazioni che esistono tra le condizioni meteorologiche, il calendario e il numero di biciclette noleggiate. Quelli sviluppati in questa Tesi appartengono al gruppo delle tecniche di apprendimento supervisionato: gli algoritmi imparano a prevedere il parametro desiderato attraverso l'analisi di un enorme numero di esempi passati, contenuti all'interno del database utilizzato, costituiti dal valore effettivo del dato da prevedere e dall’insieme degli attributi su cui la stima deve basarsi. In questo caso, il database si compone di due anni di dati sul numero di biciclette noleggiate a Londra, oltre che dalle condizioni meteorologiche che caratterizzavano la città in quel periodo. Gli algoritmi sono stati sviluppati utilizzando il linguaggio di programmazione Python e le sue molteplici librerie, che hanno fornito strumenti utili sia nelle prime fasi di analisi dei dati che nella costruzione dei modelli di previsione. Tra i vari algoritmi, quelli che hanno fornito i migliori risultati sono stati la foresta casuale e l’adaptive boosting, entrambi basati sugli alberi di regressione, oltre che la rete neurale.