This thesis focuses on the study of the bovine daily milk yield through the application of machine learning, so as to enhance the potential of genomics and herd management in the dairy sector. Specifically, our main goal consists in assessing the applicability of machine learning in genomic prediction (i.e., estimation of the genetic merit about a phenotype) and phenomic forecasting (i.e., prediction of the actual trait value) with genotypic information (on 100k DNA markers), in a Holstein cattle herd of about 500 individuals. Moreover, we aim at showing the potential of phenomic forecasting in our context, by proposing a model entirely based on variables that are available to farms with milking robots (i.e., automatic milking systems) and at least partially controllable by breeders: in such a scenario, the decisions of the latter ones could be supported to improve the sustainability of the sector, without introducing additional costs for data collection. The results of this work show that, for situations with multiple phenotype records per individual, machine learning models can not achieve significant performance improvements with respect to state of the art genomic prediction methods (i.e., linear models for animal breeding); indeed, the preprocessing applied in the literature, which calculates the machine learning targets by correcting the traits for environmental effects, intrinsically biases the problem towards the linear model solutions. Instead, in the phenomic forecasting task, we are able to propose a problem formulation explicitly taking advantage of genotypic information, by the inclusion of the milk yield genetic merit as a feature; moreover, our model can also consider additional variables, such as the number of milkings and the concentrate consumption inside the milking robot, which prove to highly impact on the prediction. To the best of our knowledge, our solution achieves state of the art performances for individual daily milk yield forecasting, thanks to this novel formulation. Concluding, our work shows the potential of phenomic forecasting with genotype data, especially in scenarios with multiple phenotype records per individual, in which the application of machine learning in genomic prediction is particularly disadvantageous.
Questa tesi è incentrata sullo studio della produzione giornaliera di latte bovino attraverso l'applicazione di machine learning, al fine di valorizzare l'utilizzo della genomica nel settore lattiero. In particolare, gli obiettivi principali concernono la valutazione dell'applicabilità del machine learning in selezione genomica (i.e., stima dell'impatto genetico individuale su un tratto fenotipico) e in predizione fenomica (i.e., predizione del fenotipo osservato) attraverso informazioni genetiche (su 100k marcatori), in una mandria di circa 500 Frisone. Si intende inoltre mostrare il potenziale della predizione fenomica nel contesto di interesse, proponendo un modello basato su variabili almeno parzialmente controllabili dagli allevatori e disponibili in allevamenti dotati di sistemi di mungitura automatica: in tal modo, sarebbe possibile offrire supporto alle decisioni di gestione del bestiame, accrescendo così la sostenibilità del settore senza introdurre ulteriori costi legati alla raccolta dati. Dal lavoro emerge che, ove siano disponibili diverse registrazioni fenotipiche per individuo, i metodi di machine learning non possono migliorare significativamente le prestazioni raggiunte dai modelli lineari (i.e., stato dell'arte) in selezione genomica; infatti, la fase di preparazione dati applicata in letteratura, allo scopo di calcolare i target del machine learning sottraendo gli effetti ambientali ai fenotipi, causa un bias intrinseco nella formulazione del problema. Invece, in predizione fenomica, si propone con successo l'inclusione esplicita del merito genetico associato alla produzione di latte come feature del modello; quest'ultimo è inoltre in grado di considerare variabili aggiuntive, come il numero di mungiture e l'assunzione di concentrato nel sistema di mungitura, che dimostrano di avere un impatto considerevole sulle predizioni. Secondo le nostre conoscenze, grazie alla formulazione proposta, la soluzione raggiunge prestazioni superiori rispetto allo stato dell'arte nella predizione individuale della produzione giornaliera di latte bovino. In conclusione, il lavoro mostra il potenziale della predizione fenomica basata su dati genotipici, in particolare negli scenari con molteplici registrazioni fenotipiche per individuo, per i quali l'applicazione del machine learning in selezione genomica è svantaggiosa.
Shifting from machine learning in genomic prediction to phenomic forecasting for the study of bovine daily milk yield with genotype data
VERGANI, ANDREA MARIO
2021/2022
Abstract
This thesis focuses on the study of the bovine daily milk yield through the application of machine learning, so as to enhance the potential of genomics and herd management in the dairy sector. Specifically, our main goal consists in assessing the applicability of machine learning in genomic prediction (i.e., estimation of the genetic merit about a phenotype) and phenomic forecasting (i.e., prediction of the actual trait value) with genotypic information (on 100k DNA markers), in a Holstein cattle herd of about 500 individuals. Moreover, we aim at showing the potential of phenomic forecasting in our context, by proposing a model entirely based on variables that are available to farms with milking robots (i.e., automatic milking systems) and at least partially controllable by breeders: in such a scenario, the decisions of the latter ones could be supported to improve the sustainability of the sector, without introducing additional costs for data collection. The results of this work show that, for situations with multiple phenotype records per individual, machine learning models can not achieve significant performance improvements with respect to state of the art genomic prediction methods (i.e., linear models for animal breeding); indeed, the preprocessing applied in the literature, which calculates the machine learning targets by correcting the traits for environmental effects, intrinsically biases the problem towards the linear model solutions. Instead, in the phenomic forecasting task, we are able to propose a problem formulation explicitly taking advantage of genotypic information, by the inclusion of the milk yield genetic merit as a feature; moreover, our model can also consider additional variables, such as the number of milkings and the concentrate consumption inside the milking robot, which prove to highly impact on the prediction. To the best of our knowledge, our solution achieves state of the art performances for individual daily milk yield forecasting, thanks to this novel formulation. Concluding, our work shows the potential of phenomic forecasting with genotype data, especially in scenarios with multiple phenotype records per individual, in which the application of machine learning in genomic prediction is particularly disadvantageous.File | Dimensione | Formato | |
---|---|---|---|
Classical_Format_Thesis___Scuola_di_Ingegneria_Industriale_e_dell_Informazione___Politecnico_di_Milano.pdf
non accessibile
Descrizione: Thesis
Dimensione
2.28 MB
Formato
Adobe PDF
|
2.28 MB | Adobe PDF | Visualizza/Apri |
Executive_Summary___Scuola_di_Ingegneria_Industriale_e_dell_Informazione___Politecnico_di_Milano.pdf
non accessibile
Descrizione: Executive summary
Dimensione
459.72 kB
Formato
Adobe PDF
|
459.72 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/195716