Predicting residential houses values using a machine learning approach

Real estate is the world’s largest asset class in the world and is it in constant growth. In this context the always increasing amount of available data in . Countless hours of work are put into traditional property valuations because they’re essential for more than just valuing a property. Yeldo is a startup that provides direct access to real estate investments to professional investors.The investments process undergoes a rigorous quality. The idea of develop a machine learning algorithm to automatically evaluate the values of real estate assets has lead to the collaboration with the Politecnico di Milano and produced this master thesis project. In the first part of this work we describe and analyze the use of scraping algorithm to obtain the biggest possible amount of information regarding a generic house available on the market, ranging from structural data of the house itself like the size and number of rooms, to a qualitative informations regarding the geographical area such as the presence and quality of point of interests, and in the end to create the final dataset. Subsequently, we proceed with multiple statistical analysis on the obtained dataset, along with techniques to clean the dataset, create new features from the original one and in some case remove the useless one. Finally we implemented and trained several models able to predict the price of the different houses available in our dataset. As expected we find out that the data regarding the presence of point of interests in the area surrounding the house had a huge importance in estimating the value of the asset. The best models were able to obtain predictions with a median percentage error of around 9%.

Il settore immobiliare è la più grande asset class del mondo ed è in costante crescita. In questo contesto, la quantità sempre crescente di dati disponibili in . Le valutazioni immobiliari tradizionali richiedono innumerevoli ore di lavoro, perché sono essenziali non solo per valutare un immobile. Yeldo è una startup che fornisce agli investitori professionali un accesso diretto agli investimenti immobiliari. L'idea di sviluppare un algoritmo di machine learning per valutare automaticamente i valori degli asset immobiliari ha portato alla collaborazione con il Politecnico di Milano e ha prodotto questo progetto di tesi magistrale. Nella prima parte del lavoro viene descritto e analizzato l'utilizzo dell'algoritmo di scraping per ottenere il maggior numero possibile di informazioni relative a una generica casa disponibile sul mercato, dai dati strutturali della casa stessa come la dimensione e il numero di stanze, alle informazioni qualitative relative all'area geografica come la presenza e la qualità dei punti di interesse, fino alla creazione del dataset finale. Successivamente, abbiamo proceduto a molteplici analisi statistiche sul dataset ottenuto, insieme a tecniche per pulire il dataset, creare nuove caratteristiche da quelle originali e in alcuni casi rimuovere quelle inutili. Infine, abbiamo implementato e addestrato diversi modelli in grado di prevedere il prezzo delle diverse case disponibili nel nostro dataset. Come previsto, abbiamo scoperto che i dati relativi alla presenza di punti di interesse nell'area circostante la casa avevano un'enorme importanza nella stima del valore del bene. I modelli migliori sono stati in grado di ottenere previsioni con un errore percentuale mediano di circa il 9%.