Knowledge graphs in question difficulty estimation

Question Difficulty Estimation (QDE), also known as question calibration, holds significant importance in the field of education. It enables the estimation of students' knowledge level, commonly referred to as skill, based on the accuracy of their responses to exam questions and the difficulty level of those questions. Accurately assessing the difficulty of questions can further aid in providing students with exercises that align with their skill level. There are two traditional methods for question calibration: manual calibration and pretesting. In manual calibration, one or more experts in the subject assign a numerical value to each question, representing its difficulty, but this process is inherently subjective. Pretesting involves administering questions to students in an actual test scenario and estimating their difficulty based on the correctness of the students' answers. This process introduces a significant delay between question generation and its use for student scoring. Recent research addressed this issue by leveraging Natural Language Processing (NLP) techniques to estimate question difficulty solely based on their textual information. The underlying idea is to reduce or eliminate the reliance on manual calibration and pretesting by estimating question difficulty directly from the text, which is readily available at the time of question creation. These models rely solely on textual information and do not leverage the potential of Knowledge Graphs (KGs), which can model relationships between various knowledge entities. In the context of QDE, KGs can be utilized to model the topics of the questions themselves, providing valuable additional information that can enhance the performance of existing transformer-based models. In this work, we explore various methods to embed information coming from Knowledge Graphs and incorporate it into transfomer text based models, and we propose two models which are capable of outperforming previous research. Specifically, we incorporate topic information from the leaf node of the KG into two nodes: using Node2Vec (a graph embedding algorithm) or by averaging the difficulty of the training questions of the leaf node topic. Then, we pass this information to a model that utilizes fine-tuned BERT through Embedding Concatenation (Node2Vec) or Stacking (AVG). Both KG-based models outperform previous text-only models, resulting in an decrease in MAE of up to 8% and confirming our intuition about the effectiveness of KGs in addition to text for QDE. Finally, we leverage the information from the KG to conduct an in-depth study on the generalization ability of models on unseen topics and experiments on the optimal concept of difficulty (one for the entire dataset vs one for each topic).

La stima della difficoltà delle domande (QDE) o la calibrazione delle domande è fondamentale nell'ambito dell'istruzione poiché consente di valutare il livello di conoscenza degli studenti (skill) in base alla precisione delle risposte alle domande degli esami e alla difficoltà delle stesse. Una valutazione accurata della difficoltà delle domande aiuta a offrire esercizi adatti al livello di abilità degli studenti. Esistono due metodi tradizionali per la calibrazione delle domande: la calibrazione manuale e il pre-testing. La calibrazione manuale coinvolge esperti della materia che assegnano un valore a ciascuna domanda per indicarne la difficoltà introducendo però soggettività. Il pre-testing prevede la somministrazione di domande agli studenti e la determinazione della loro difficoltà in base alla correttezza delle risposte. Tuttavia, ciò comporta un notevole ritardo tra la creazione delle domande e il loro utilizzo. Ricerche recenti hanno utilizzato tecniche di Natural Language Processing (NLP) per stimare la difficoltà delle domande utilizzando solo le informazioni testuali ed eliminando quindi la necessità di calibrazione manuale e pre-testing. Tuttavia, questi modelli si basano esclusivamente sulle informazioni testuali e non sfruttano i Knowledge Graphs (KGs), che possono modellare relazioni tra diverse entità. L'intuizione per il QDE è che i KG possano essere utilizzati per modellare gli argomenti delle domande, fornendo informazioni aggiuntive preziose che potrebbero migliorare le prestazioni dei modelli esistenti. In questo lavoro, abbiamo esplorato vari metodi di embedding per le informazioni dei KGs e come integrarle nei modelli per testo. In particolare, abbiamo incorporato le informazioni del KG in due modi: utilizzando Node2Vec (un algoritmo di embedding per grafi) e la media della difficoltà delle domande di training del nodo foglia. Successivamente, abbiamo unito questo output con un modello text-based che usa BERT tramite Embedding Concatenation (Node2Vec) o Stacking (AVG). Entrambi i modelli hanno superato i modelli solo text-based precedenti, ottenendo un decremento del MAE fino al 8% e confermando l'intuizione sull'efficacia dei KG per la QDE. Infine, sfruttando le informazioni provenienti dal KG abbiamo condotto uno studio approfondito sulla capacità di generalizzazione dei modelli su argomenti nuovi ed esperimenti sul concetto ottimale di difficoltà (uno per l'intero dataset vs uno per ciascun argomento).