This thesis investigates efficient compression of per-cell Key Performance Indicator (KPI) time series in 4G/5G networks while preserving information relevant to monitoring and analytics. Using a real hourly per-cell dataset spanning seven days, this thesis evaluates lossy and lossless pipelines composed of standard building blocks: scalar quantisation (Uniform and Lloyd-Max) after per-cell max normalisation; Differential Pulse-Code Modulation (DPCM) on INTRA-DAY and INTER-DAY residuals (open- and closed-loop) with residual quantisation; entropy coding of indices (Huffman, RLE/Zero-Run-Length with Golomb-Rice); and general-purpose lossless codecs (gzip/DEFLATE, bzip2/BWT, xz/LZMA). Rate-distortion and entropic metrics are reported-MSE, RMSE, Pearson correlation, SNR, entropy 𝐻, average Huffman length 𝐿, and efficiency 𝐻/𝐿 - under consistent macro/micro aggregation. Results follow canonical scalar-quantisation behaviour: SNR increases by ≈6 dB per additional bit and MSE decreases nearly linearly on semilog axes. At low bit budgets (2-4 bits), Lloyd-Max achieves lower distortion than Uniform at the same rate; at 8-16 bits the gap becomes negligible (quasi-lossless). Entropically, differences between Uniform and Lloyd-Max are noticeable mainly at 2 bits (Uniform indices slightly more compressible with Huffman); from 4 bits upward, 𝐻 and 𝐿 converge. DPCM further reduces residual entropy and increases sparsity, enabling effective ZRL+Golomb–Rice coding in selected highly sparse low-rate configurations. Among lossless codecs, xz attains the highest compression ratios, gzip the best speed, and bzip2 an intermediate trade-off. This thesis concludes with practical guidance on where lossless preservation is mandatory and where controlled error is acceptable, together with a reproducible methodology adaptable to different KPI sets and sampling regimes.
La presente tesi analizza la compressione efficiente delle serie temporali di KPI per cella nelle reti 4G/5G, preservando le informazioni utili per monitoraggio e analytics. Su un dataset reale a risoluzione oraria di sette giorni, la presente tesi valuta pipeline lossy e lossless basate su: quantizzazione scalare (Uniforme e Lloyd-Max) dopo normalizzazione al massimo per cella; DPCM su residui INTRA-DAY e INTER-DAY (open- e closed-loop) con quantizzazione dei residui; codifica entropica degli indici (Huffman, RLE/Zero-Run-Length con Golomb-Rice); e codec lossless general-purpose (gzip/DEFLATE, bzip2/BWT, xz/LZMA). Si riportano metriche di rate-distortion ed entropia-MSE, RMSE, correlazione di Pearson, SNR, entropia 𝐻, lunghezza media di Huffman 𝐿, ed efficienza 𝐻/𝐿 - con aggregazioni macro/micro coerenti. I risultati seguono l’andamento canonico della quantizzazione scalare: lo SNR cresce di ≈6 dB per bit aggiunto e la MSE decresce quasi linearmente in scala semilog. A pochi bit (2-4) Lloyd-Max riduce la distorsione rispetto all’Uniforme a pari bitrate; a 8-16 bit il divario diventa trascurabile (quasi-lossless). Sul piano entropico, differenze tra Uniforme e Lloyd-Max emergono soprattutto a 2 bit (indici Uniformi lievemente più comprimibili con Huffman); da 4 bit in su 𝐻 e 𝐿 convergono. La DPCM riduce ulteriormente l’entropia dei residui e ne aumenta la sparsezza, abilitando una codifica ZRL+Golomb-Rice efficace solo in specifici scenari a 2 bit e per stream molto sparsi. Tra i codec lossless, xz raggiunge i rapporti di compressione più elevati, gzip i tempi migliori e bzip2 un compromesso intermedio. La presente tesi conclude con indicazioni pratiche su dove mantenere il lossless e dove è accettabile un errore controllato, e con una metodologia riproducibile adattabile a diversi insiemi di KPI e regimi di campionamento.
Compression of 4G mobile network KPIs: a comparative study on a real-world dataset
Forte, Carlo
2024/2025
Abstract
This thesis investigates efficient compression of per-cell Key Performance Indicator (KPI) time series in 4G/5G networks while preserving information relevant to monitoring and analytics. Using a real hourly per-cell dataset spanning seven days, this thesis evaluates lossy and lossless pipelines composed of standard building blocks: scalar quantisation (Uniform and Lloyd-Max) after per-cell max normalisation; Differential Pulse-Code Modulation (DPCM) on INTRA-DAY and INTER-DAY residuals (open- and closed-loop) with residual quantisation; entropy coding of indices (Huffman, RLE/Zero-Run-Length with Golomb-Rice); and general-purpose lossless codecs (gzip/DEFLATE, bzip2/BWT, xz/LZMA). Rate-distortion and entropic metrics are reported-MSE, RMSE, Pearson correlation, SNR, entropy 𝐻, average Huffman length 𝐿, and efficiency 𝐻/𝐿 - under consistent macro/micro aggregation. Results follow canonical scalar-quantisation behaviour: SNR increases by ≈6 dB per additional bit and MSE decreases nearly linearly on semilog axes. At low bit budgets (2-4 bits), Lloyd-Max achieves lower distortion than Uniform at the same rate; at 8-16 bits the gap becomes negligible (quasi-lossless). Entropically, differences between Uniform and Lloyd-Max are noticeable mainly at 2 bits (Uniform indices slightly more compressible with Huffman); from 4 bits upward, 𝐻 and 𝐿 converge. DPCM further reduces residual entropy and increases sparsity, enabling effective ZRL+Golomb–Rice coding in selected highly sparse low-rate configurations. Among lossless codecs, xz attains the highest compression ratios, gzip the best speed, and bzip2 an intermediate trade-off. This thesis concludes with practical guidance on where lossless preservation is mandatory and where controlled error is acceptable, together with a reproducible methodology adaptable to different KPI sets and sampling regimes.| File | Dimensione | Formato | |
|---|---|---|---|
|
Carlo_Forte-10492575-Tesi_Magistrale_2025-11-12.pdf
accessibile in internet per tutti
Dimensione
5.41 MB
Formato
Adobe PDF
|
5.41 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/246365