Building a data-driven culture in SMEs: development of a python library for database automation and reporting

Small and medium-sized enterprises (SMEs) often struggle to become truly data-driven: limited technical skills, tight budgets, and a heavy reliance on Excel make operations slow and hard to scale. This thesis presents a modular Python toolkit of 24 reusable functions that standardizes the construction of analysis-ready databases and the production of outputs for Excel/Power BI. The work stems from a real post-M&A integration context, where the need to consolidate heterogeneous KPIs highlighted hours wasted on repetitive, error-prone tasks. The approach is pragmatic and iterative: parametric scripts and hands-on checks support adoption by non-technical users. End-to-end examples show that, once the pipeline is set up, periodic refreshes of reports and databases become fast and standardized: fewer manual steps, fewer errors, greater consistency, and above all scalability, with the ability to reliably handle very large datasets (even millions of rows), going beyond the limits of purely spreadsheet-based tools. The project also clarifies the proper scope: Python is particularly effective for building durable, reusable databases, while Excel remains quicker for small, ad-hoc reports. Beyond operational efficiency, the goal is to promote a more data-driven culture in SMEs: transparent code, reusable components, and automated validations increase trust in results and free up time for analysis and decision-making. Challenges remain around upstream data quality and user adoption; future work includes direct ERP/database connectors, governance rules, and targeted training.

Le piccole e medie imprese (PMI) faticano a diventare davvero data-driven: competenze tecniche limitate, budget ridotti e una forte dipendenza da Excel rendono l’operatività lenta e poco scalabile. Questa tesi presenta un toolkit Python modulare composto da 24 funzioni riutilizzabili che standardizzano la costruzione di database pronti all’analisi e la produzione di output per Excel/Power BI. Il lavoro nasce in un contesto reale di integrazione post-M&A, dove l’esigenza di consolidare KPI eterogenei metteva in evidenza ore sprecate in attività ripetitive e ad alto rischio di errore. L’approccio è pragmatico e iterative: script parametrici e verifiche pratiche favoriscono l’adozione da parte di utenti non tecnici. Gli esempi end-to-end mostrano che, una volta impostata la pipeline, l’aggiornamento periodico di report e database diventa rapido e standardizzato: meno passaggi manuali, meno errori, più coerenza e soprattutto scalabilità, con la possibilità di gestire in modo affidabile dataset molto grandi (anche milioni di righe), superando i limiti degli strumenti puramente spreadsheet. Il progetto chiarisce anche il corretto perimetro d’uso: Python è particolarmente efficace per costruire basi dati durevoli e riutilizzabili, mentre per piccoli report estemporanei Excel resta più veloce. Oltre all’efficienza operativa, l’obiettivo è promuovere una cultura più data-driven nelle PMI: codice trasparente, componenti riusabili e validazioni automatiche aumentano la fiducia nei risultati e liberano tempo per l’analisi e le decisioni. Permangono sfide di qualità dei dati a monte e di adozione; sviluppi futuri includono connettori diretti a ERP/database, regole di governance e formazione mirata.