Benchmarking and modeling of multi-threaded multi-core processors

In this master thesis a specific type of multithreading technique – Simultaneous Multithreading (SMT) is investigated. All experiments are conducted with Intel’s proprietary implementation of it, officially referred to as Intel Hyper - Threading Technology (HTT), which in fact represents two-way SMT. Simultaneous Multithreading aims to improve the utilization of processor resources by exploiting thread-level parallelism at the core level, resulting in an increase in overall system throughput. Intel uses this technology ubiquitously in its latest series of processors. An extensive number of benchmark runs have been performed to understand the performance impact of HTT resulting in around 90 hours of run under different configurations of the system under test (SUT). Two different benchmarking suites with their own features and measuring approaches have been utilized for this purpose: SPEC Power SSJ 2008 benchmark and a synthetic benchmark stressing both ALUs and memory hierarchy. It has been experimentally proven that SMT, and HTT in particular, has the potential to dramatically improve utilization of the processor, increase overall throughput of the system (up to 33%) and decrease the system response time. Unfortunately, modeling of SMT in Queuing Networks is still an open problem. Hence, we propose two Queuing Network models able to adequately predict performance impacts enabled by the technology. The first model is based on a birth-death Markov chain and the second is based on a Queuing Network with Finite Capacity Region (FCR). We validated the two proposed models on the datasets obtained from benchmark runs and observed that they achieve good accuracy with estimation error within the 3% - 10% interval. Lastly, we performed extensive comparisons between our models and the state of the art.

In questa tesi prendiamo in considerazione una particolare tecnica di multithreading, il Simultaneous Multithreading (SMT). Gli esperimenti sono stati effettuati sull'implementazione proprietaria realizzata da Intel, chiamata ufficialmente Intel Hyper-Threading Technonlogy (HTT), che è sostanzialmente un SMT a due vie. Il Simultaneous Multithreading ha lo scopo di migliorare l'utilizzo delle risorse del processore sfruttando il parallelismo a livello di thread, risultando quindi in un aumento del throughput del sistema. Intel usa questa tecnologia nelle più recenti serie dei sui processori. Sono stati effettuati un vasto numero di benchmark per valutare l'impatto dell'HTT, per un totale di 90 ore, utilizzando differenti configurazioni del sistema di test (System Under Test - SUT). Due differenti suite di benchmark, con differenti caratteristiche e approcci di misurazione, sono state utilizzate per questo scopo: SPEC POWER SSJ 2008 e un benchmark sintetico che stressa sia le ALU che la gerarchia di memoria. Abbiamo mostrato sperimentale che il SMT, e l'HTT in particolare, hanno il potenziale di migliorare l'utilizzo del processore, incrementare drammaticamente il throughput del sistema (sino al 33%) e abbassare il tempo di risposta del sistema. Sfortunatamente, modellare il SMT nelle reti di code è ancora un problema aperto. Pertanto, proponiamo due modelli a code in grado di predire l'impatto prestazione della tecnologia e valutiamo la loro accuratezza sui data-set ottenuti dall'esecuzione dei benchmark. Il primo modello è modello è basato una catena di Markov nascita-e-morrte e il secondo è basato sulle reti di code con regioni a capacità finita (Finite Capacity Region - FCR). Abbiamo validato i due modelli posti sui dataset ottenuti dall'esecuzione dei benchmark e abbiamo osservato che ottengono una buona precisione con un errore di stima nell'intervallo 3%-10%. Infine, abbiamo effettuato un confronto esaustivo con lo stato dell'arte.