BERTrans : an implementation of the neural machine translation model combining BERT and transformer

Pre-training and fine-tuning is one of the most popular natural language processing (NLP) technologies at present, because in 2018, BERT has achieved great success in natural language understanding. Most NLP tasks have a better result than previous works after the introduction of BERT, especially the kind of classification tasks. Inspired by the success of BERT, I try to implement a model by using BERT to solve the machine translation (MT) tasks. In the sub-fields of MT, the neural machine translation (NMT) is currently more widely used than statistical machine translation (SMT), because the effect of NMT is almost completely better than SMT, although the development of NMT is not long. At present, the most mainstream model of NMT is Transformer. As a kind of sequence to sequence model, Transformer is based on the encoder-decoder structure. The main idea is to produce a context vector from the input sequence, and then produce the output sequence according to the context vector. The biggest advantage of Transformer compared to other models is its application to attention mechanism, which makes it more capable of extracting features and supports batch parallel computing. In fact, BERT is a development based on Transformer. It could produce a more powerful context vector than the encoder of Transformer. Based on this idea, I implement a encoder-decoder model that the encoder part is a pre-trained BERT model, and the decoder part refers to the Transformer decoder part, to fine-tuning the MT tasks, and the model is called BERTrans. In the low-resource MT experiments, I trained the BERTrans model using the part of parallel corpus provided by Workshop on Machine Translation (WMT), and finally obtained good results in terms of BLEU metric compared with some previous works.

Pre-training e fine-tuning sono attualmente una delle più popolari tecnologie di elaborazione del linguaggio naturale (NLP) perché, nel 2018, il BERT ha ottenuto un grande successo nella comprensione del linguaggio naturale. La maggior parte delle attività di NLP ha un risultato migliore rispetto ai lavori precedenti dopo l'introduzione del BERT, in particolare il tipo di compiti di classificazione. Ispirato dal successo di BERT, cerco di implementare un modello utilizzando BERT per risolvere i compiti di traduzione automatica (MT). Nei sottocampi di MT, la traduzione della macchina neurale (NMT) è attualmente più ampiamente utilizzata della traduzione automatica statistica (SMT), perché l'effetto di NMT è quasi completamente migliore di SMT, sebbene lo sviluppo di NMT non sia lungo. Attualmente, il modello più tradizionale di NMT è Transformer. Come una sorta di sequenza per il modello di sequenza, Transformer si basa sulla struttura encoder-decoder. L'idea principale è produrre un vettore di contesto dalla sequenza di input e quindi produrre la sequenza di output in base al vettore di contesto. Il più grande vantaggio di Transformer rispetto ad altri modelli è la sua applicazione al meccanismo di attenzione, che lo rende più capace di estrarre funzionalità e supporta il calcolo parallelo in batch. In effetti, BERT è uno sviluppo basato su Transformer. Potrebbe produrre un vettore di contesto più potente rispetto al codificatore di Transformer. Sulla base di questa idea, implemento un modello di encoder-decoder che la parte dell'encoder è un modello BERT pre-trained, e la parte del decoder si riferisce alla parte del Transformer, per mettere a punto le attività MT e il modello si chiama BERTrans. Negli esperimenti di MT a bassa risorsa, ho addestrato il modello BERTrans usando la parte di corpus parallelo fornita da Workshop on Machine Translation (WMT), e infine ottenuto buoni risultati in termini di metrica BLEU rispetto ad alcuni lavori precedenti.