Introduction
⌅Artificial Intelligence (AI) has transformed various sectors, including livestock farming (Parrado-Alvarez et al., 2019PARRADO-ALVAREZ, O.L.; CARRIÓN-CABRERA, L.; CUESTA-LÓPEZ, R.: “El pensamiento y obra de Fidel Castro Ruz sobre la formación de profesionales para la agricultura: Array”, Maestro y Sociedad, 172-183, 2019, ISSN: 1815-4867.). This field depends largely on reproductive efficiency and the sustainability of its production and reproductive systems. Over the last two decades, the development of advanced algorithms, precision sensors, and data analysis platforms has made it possible to address historical challenges in livestock reproductive management by integrating AI-based solutions (Hinestroza, 2018HINESTROZA RAMÍREZ, D.: “El Machine Learning a través de los tiempos, y los aportes a la humanidad”, 2018.; Souza y de Oliveira, 2022SOUZA, V.; DE OLIVEIRA, G.: "Application of Articial Intelligence in Cattle Farming: A Scope Review". Revista electrónica de Veterinaria (REDVET), 23 (2), 2022. Disponible en https://veterinaria.org/index.php/REDVET/article/download/160/37/. ISSN: 1695-7504.). Among the most relevant processes are early estrus detection, genetic improvement, and pregnancy monitoring, among others.
Cuba is promoting a process of digital transformation, which includes the incorporation of AI in the country's priority sectors (Caballero et al., 2024CABALLERO, Y.; BELLO, R.; ROSETE, A.: “La inteligencia artificial dentro de la transformación digital para el desarrollo”, Anales de la Academia de Ciencias de Cuba, 14(1), 2024, ISSN: 2304-0106., Disponible en: http://www.revistaccuba.cu/index.php/revacc/article/view/1530.). One example of this is livestock farming, where special attention is being paid to reproduction with the aim of increasing livestock activity. The “El Guayabal” University Farm, belonging to the Agrarian University of Havana (UNAH), is a key space for the development of various research projects, some of which incorporate AI technologies as part of the digital transformation process. One of the farm's most important activities is artificial insemination in bovines, which is why estrus detection plays an important role in reproductive efficiency (Bekara y Bareille, 2019BEKARA, M.E.A.; BAREILLE, N.: “Quantification by simulation of the effect of herd management practices and cow fertility on the reproductive and economic performance of Holstein dairy herds”, Journal of dairy science, 102(10): 9435-9457, 2019, ISSN: 0022-0302.. DOI: https://doi.org/10.3168/jds.2018-15484 ). Currently, estrus identification on the farm is a challenge due to staff shortages. For this reason, alternatives are being sought to facilitate the work of specialists.
Therefore, the objective of this research is to: determine the most appropriate machine learning algorithm for estrus prediction in bovine cattle belonging to the “El Guayabal” University Farm.
Development of the Topic
⌅Artificial insemination in bovines and estrus detection
⌅Artificial insemination emerged with the aim of improving animal reproduction, controlling diseases, and preserving genetic diversity (Hafez y Hafez, 2000HAFEZ, E. S. E.; HAFEZ, B.: Reproduction in Farm Animals (7th ed.). 2000. Disponible en: https://cuvas.edu.pk/cuvas_libraries/ebooks/Reproduction%20In%20Farm%20Animals%20HAFEEZ.pdf; Foote, 2002FOOTE, R. H.: "The history of artificial insemination: Selected notes and notables". Journal of Animal Science, 80 (2), 1-10, 2002. Disponible en: https://www.asas.org/docs/default-source/midwest/mw2020/publications/footehist.pdf?sfvrsn=59da6c07_0; Thibier, 2005THIBIER, M.: "The zootechnical applications of biotechnology in animal reproduction: current methods and perspectives". Reproduction Nutrition Development, 45, 235-242, 2005. DOI: https://doi.org/10.1051/rnd:2005016). Since its inception, it has evolved into a key tool for increasing milk and meat production in bovine cattle, responding to the growing global demand for food (Hoyos et al., 2023HOYOS, J. F.; VELÁSQUEZ, B. L.; RICO, D.; GARCÍA, N.: "Impacto transformador de la inteligencia artificial y aprendizaje autónomo en la producción agropecuaria: un enfoque en la sostenibilidad y eficiencia". Revista Formación Estratégica, 7 (1), 2023. Disponible en https://formacionestrategica.com/index.php/foes/article/view/111/80. ISSN: 2805-9832.). Currently, there is a need to introduce new technologies that optimize this process and make it more efficient.
The reproductive development of female bovines goes through important stages such as heifer, calf, and cow, where essential changes occur to reach sexual maturity. The first estrus usually appears in the heifer stage, although it can vary between 9 and 15 months of age (Montes de Oca, 2016MONTES DE OCA, E. A.: Desarrollo histórico y tendencial de la ganadería vacuna en la Isla de la Juventud: período 1573-actualidad. [Tesis de Maestría, Universidad de La Habana]. 2016. Disponible en: https://accesoabierto.uh.cu/files/original/2134433/Elvira_Aleida_Montes_de_Oca_Garcia_[2017].pdf). The estrous cycle, which lasts an average of 21 days, is divided into two phases: luteal and follicular, each with two specific stages. Ovulation occurs during estrus, considered the beginning of the cycle, and lasts approximately 12 to 18 hours, which makes it difficult to detect due to its brevity (Carvajal et al., 2020CARVAJAL, A. M.; MARTÍNEZ, M. E.; TAPIA, M.: "Ciclo estral en la hembra bovina y su importancia reproductiva". INIA, 246, 2020. ).
During estrus, cows exhibit characteristic behaviors such as receptivity to mounting, restlessness, decreased milk production, genital licking, reduced food intake, and physical changes such as vulvar edema or mucus secretion (Hernández y Ortega, 2009HERNÁNDEZ, J.; ORTEGA, A.: Manual de Inseminación Artificial en Bovinos., Ed. Universidad Nacional Autónoma de México, México D.F., 2009.; Strappini et al., 2015STRAPPINI, A.C.; NORAMBUENA, L.; MATAMALA, F.: Importancia de la detección de celo utilizando métodos amigables con el bienestar animal, Inst. XXXVIII Reunión Científica Anual de la Asociación Peruana de Producción Animal, Perú, 2015.; Ávila, 2024ÁVILA, I.M.: Incidencia de factores intrínsecos y extrínsecos en la efectividad reproductiva mediante la inseminación artificial en las parroquias de salcedo en el período 2023., Universidad Técnica de Cotopaxi]. Ecuador., Tesis de Grado, Cotopaxi, Ecuador, 2024. ). However, factors such as the inexperience of the observer, the environment, or stress can make identification difficult. Correct estrus detection brings significant benefits: higher birth rates, increased milk production, and reduced costs for artificial insemination (Strappini et al., 2015STRAPPINI, A.C.; NORAMBUENA, L.; MATAMALA, F.: Importancia de la detección de celo utilizando métodos amigables con el bienestar animal, Inst. XXXVIII Reunión Científica Anual de la Asociación Peruana de Producción Animal, Perú, 2015.).
Methods and errors in estrus detection
⌅Methods for detecting estrus in bovines are classified as visual, non-visual, electronic, and chemical (Ortiz y Avila, 2020ORTIZ, S. D.; AVILA, K. Y.: Fundamentos y métodos actuales de detección de celo en bovinos. [Tesis de Grado, Universidad Cooperativa de Colombia]. Colombia, 2020. ). Visual methods include direct observation, mounting detection tags, rump-attached ampoules, marker crayons, and the use bulls as detectors. Non-visual methods include physiological changes such as temperature and hormonal activity, while electronic methods use pedometers, microchips, and surveillance cameras, often supplemented with software that records and reports activities. Chemical methods include androgenization and hormone implants. Observation remains the most viable method due to its low cost and effectiveness, provided that the watcher is trained and conducts frequent inspections (Hernández y Ortega, 2009HERNÁNDEZ, J.; ORTEGA, A.: Manual de Inseminación Artificial en Bovinos., Ed. Universidad Nacional Autónoma de México, México D.F., 2009.).
Despite the implementation of these technological advances, there are still failures in estrus detection. Among the main factors that cause errors are anestrus, caused by poor nutrition, stress, reproductive diseases, genetics, or ovarian cysts; the inexperience of the observer; silent estrus, in which some females do not show visible symptoms; and the absence of monitoring of the estrous cycle and the post-insemination process, according to the criteria provided by Hernández y Ortega (2009)HERNÁNDEZ, J.; ORTEGA, A.: Manual de Inseminación Artificial en Bovinos., Ed. Universidad Nacional Autónoma de México, México D.F., 2009. and Jiménez (2010)JIMÉNEZ, D.: Técnicas de Inseminación Artificial aplicadas en bovinos. [Tesis de Grado, Universidad Autónoma Agraria "Antonio Narro"]. México, 2010. . To reduce these problems, advanced technologies capable of identifying animal behavior have been incorporated, and the application of AI in bovine reproduction is being explored (Strappini et al., 2015STRAPPINI, A.C.; NORAMBUENA, L.; MATAMALA, F.: Importancia de la detección de celo utilizando métodos amigables con el bienestar animal, Inst. XXXVIII Reunión Científica Anual de la Asociación Peruana de Producción Animal, Perú, 2015.), although its implementation still faces challenges.
Challenges, applications, and impact of Artificial Intelligence in bovine reproduction
⌅Bovine reproduction continues to be a developing field for the incorporation of new technologies, which is why AI faces challenges in this sector, including resistance to change on the part of livestock farmers (Álvarez, 2024 ÁLVAREZ, S.: “Luces, sombras y riesgos de la inteligencia artificial”, PAPELES de relaciones ecosociales y cambio global, 164: 5-12, 2024.), high implementation costs (Patel y Prajapati, 2018PATEL, H. H.; PRAJAPATI, P.: "Study and Analysis of Decision Tree Based Classification Algorithms". International Journal of Computer Sciences and Engineering, 6(10), 2018.), the need for staff training, and the requirements for large volumes of data along with advanced storage and processing equipment. Despite these limitations, AI offers significant benefits when applied to processes such as genetic improvement, disease prediction and prevention, monitoring and pattern analysis for estrus detection, integration with electronic devices for real-time tracking, and optimization of artificial insemination (Chávez et al., 2024CHÁVEZ, N.; GONZÁLEZ, R.; MARRERO, Y.; GONZÁLEZ, L. M.: "La Inteligencia Artificial en la Reproducción Bovina", Ciencias Universitarias, 22. 2024. Disponible en: https://revistas.unah.edu.cu/index.php/cu), contributing to greater accuracy, efficiency in livestock reproduction, and decision-making.
AI applied to bovine reproduction offers multiple benefits to the livestock sector. It allows for more accurate identification of the optimal time for insemination, optimizes the selection of high-quality embryos, and enables constant monitoring of bovine cattle health to detect problems early (González et al., 2018GONZÁLEZ, N.; LEYVA, M. Y.; FAGGONI, K. M.; ÁLVAREZ, P. J.: "Estudio comparado de las técnicas de Inteligencia Artificial para el diagnóstico de enfermedades en la ganadería". Sistemas, Cibernética e Informática, 15(2), 2018.; Perdigón y González, 2021PERDIGÓN, R.; GONZÁLEZ, N.: "Comparación y selección de técnicas de inteligencia artificial para pronosticar las producciones de leche bovina". Revista Cubana de Ciencias Informáticas, 15(2), 24-43, 2021. ). In addition, process automation reduces human error and improves operational efficiency, while predictive analytics and assisted genetic selection increase herd productivity and sustainability. Together, these applications strengthen strategic decision-making, enhance animal welfare, and contribute to more profitable and efficient cattle production management (Horrach et al., 2020HORRACH, M. N.; BERTOT, J. A.; VÁZQUEZ, R.; GARAY, M.: "Eficiencia reproductiva de sistemas vacunos en inseminación artificial. Tendencias actuales y perspectivas". Revista de Producción Animal, 32 (3), 2020. Disponible en; https://revistas.reduc.edu.cu/index.php/rpa/article/view/e3592).
Currently, bovine reproduction is focused on milk production, making it essential to promote AI techniques in this area in order to increase production levels (Perdigón y González, 2021PERDIGÓN, R.; GONZÁLEZ, N.: "Comparación y selección de técnicas de inteligencia artificial para pronosticar las producciones de leche bovina". Revista Cubana de Ciencias Informáticas, 15(2), 24-43, 2021. ). Its incorporation into digital transformation and cattle reproduction seeks to respond to the needs for progress and development, proposing strategies that allow the advantages of these technologies to be exploited to improve production and efficiency in livestock farming (Bekara y Bareille, 2019BEKARA, M.E.A.; BAREILLE, N.: “Quantification by simulation of the effect of herd management practices and cow fertility on the reproductive and economic performance of Holstein dairy herds”, Journal of dairy science, 102(10): 9435-9457, 2019, ISSN: 0022-0302.. DOI: https://doi.org/10.3168/jds.2018-15484 ).
Among the AI techniques applicable to bovine reproduction, the following stand out: machine learning, which allows data to be processed and analyzed using different types of algorithms (Hinestroza, 2018HINESTROZA RAMÍREZ, D.: “El Machine Learning a través de los tiempos, y los aportes a la humanidad”, 2018.); Bayesian networks, useful for decision-making under uncertainty (Rodríguez y Dolado, 2007RODRÍGUEZ, D.; DOLADO, J.: Redes Bayesianas en la Ingeniería del Software. 1-21, 2007. ); vector support machines, which optimize data classification (Resendiz, 2006RESENDIZ, J. A.: Las maquinas de vectores de soporte para identificación en línea. [Máster en Control Automático, Centro de Investigaciones del Instituto Politécnico Nacional de México], México, 2006. ); and decision trees, used in classification and regression tasks (Martí et al., 2022MARTÍ, A.; MILBERBERG, A.; MARESA, D.; PRIETO, A. S.; LLANES-SANTIAGO, O.: "Propuesta de metodología para el diagnóstico de fallos basado en árboles de dicisión y lógica difusa". Revista de Ingeniería Electrónica, Automática y Comunicaciones, 43(2), 2022. ). According to Souza y de Oliveira (2022)SOUZA, V.; DE OLIVEIRA, G.: "Application of Articial Intelligence in Cattle Farming: A Scope Review". Revista electrónica de Veterinaria (REDVET), 23 (2), 2022. Disponible en https://veterinaria.org/index.php/REDVET/article/download/160/37/. ISSN: 1695-7504., these techniques can be applied to accurate estrus detection, animal health monitoring, genetic selection of embryos, optimization of artificial insemination, analysis of large volumes of genetic data, and improvement of reproductive efficiency.
Decision trees represent a simple solution and offer robust results. Their advantages include ease of interpretation of results, rapid translation into principles applicable to production, the ability to classify both categorical and numerical data, and the absence of prior assumptions about the shape of the data or the behavior of the model (Taha y Mohsin, 2021TAHA, B.; Y MOHSIN, A.: "Classification Based on Decision Tree Algorithm for Machine Learning". Journal of Applied and Technology Trends, 02(01), 20-28, 2021. Disponible en: https://www.jastt.org/index.php/jasttpath/article/view/65). Furthermore, they do not require many resources, making them a quick and efficient option for moderately sized data sets (Bouza y Santiago, 2012BOUZA, C. N.; SANTIAGO, A.: "La minería de datos: Árboles de Decisión y su aliación en estudios médicos". Modelación Matemática de Fenómenos del Medio Ambiente y la Salud, 2, 64-78, 2012. Disponible en: https://rideca.cs.buap.mx/web/files/articulo_itBUo0uWlAaJENf.pdf].).
Decision tree for regression
⌅Decision trees for regression are non-parametric tools that allow information to be predicted by dividing data into smaller segments based on specific characteristics. They are composed of decision nodes and leaf nodes that represent categories or values, facilitating classification and regression (Ghiasi et al., 2020GHIASI, M. M.; ZENDEHBOUDI, S.; MOHSENIPOUR, A. A.: "Decision tree-based diagnosis of coronary artery disease: CART model". Computer Methods and Programs in Biomedicine, 192. 2020. Disponible en: https://www.sciencedirect.com/science/article/abs/pii/S0169260719308971]; Taha y Mohsin, 2021TAHA, B.; Y MOHSIN, A.: "Classification Based on Decision Tree Algorithm for Machine Learning". Journal of Applied and Technology Trends, 02(01), 20-28, 2021. Disponible en: https://www.jastt.org/index.php/jasttpath/article/view/65). These tools are notable for their accuracy in data analysis and process optimization (Barrientos et al., 2009BARRIENTOS, R. E.; CRUZ, N.; ACOSTA, H. G.; IVONNE, R.; GOGEASCOECHEA, M.; PAVÓN, P.; BELÁZQUEZ, S. L.: "Árboles de decisión como herramienta en el diagnóstico médico". Revista Médica de la Universidad Veracruzana, 9(2), 2009. Disponible en: https://www.soporte.uv.mx/rm/num_anteriores/revmedica_vol9_num2/articulos/arboles.pdf; Martí et al., 2022MARTÍ, A.; MILBERBERG, A.; MARESA, D.; PRIETO, A. S.; LLANES-SANTIAGO, O.: "Propuesta de metodología para el diagnóstico de fallos basado en árboles de dicisión y lógica difusa". Revista de Ingeniería Electrónica, Automática y Comunicaciones, 43(2), 2022. ). They can also be combined with other models to improve their accuracy (Kotsiantis, 2013KOTSIANTIS, S. B.: "Decision trees: a recent overview". Artificial Intellence Review, 39(4), 261-283, 2013. Disponible en: https://link.springer.com/article/10.1007/s10462-011-9272-4#citeas].) and are constructed by grouping homogeneous data that allow modeling relationships between dependent and independent variables (Kocarık y Deveci, 2020KOCARIK, B.; DEVECI, İ.: "Regresyon Analizleri mi Karar Ağaçları mı?" Manisa Celal Bayar Üniversitesi Sosyal Bilimler Dergisi, 18 (4), 251-260, 2020. ISSN: 1304-4796.). However, they have disadvantages such as a tendency to overfit when the tree is too deep and increased computational complexity as the training sample size increases (Taha y Mohsin, 2021TAHA, B.; Y MOHSIN, A.: "Classification Based on Decision Tree Algorithm for Machine Learning". Journal of Applied and Technology Trends, 02(01), 20-28, 2021. Disponible en: https://www.jastt.org/index.php/jasttpath/article/view/65).
In addition to decision trees, there are other regression models such as support vector regression, artificial neural networks, and logistic regression. Each has different resource requirements and levels of accuracy (Perdigón y González, 2021PERDIGÓN, R.; GONZÁLEZ, N.: "Comparación y selección de técnicas de inteligencia artificial para pronosticar las producciones de leche bovina". Revista Cubana de Ciencias Informáticas, 15(2), 24-43, 2021. ; Shafiee et al., 2021SHAFIEE, S.; LIED, L. M.; BURUD, I.; DIESETH, J. A.; MUATH, A.: "Sequential forward selection an dsupport vector regression in comparison to LASSO regression for spring wheat yield prediction based on UAV imagery". Computers and Electronics in Agriculture, 183, 2021. DOI: https://doi.org/10.1016/j.compag.2021.106036. ISSN: 0168-1699.; Olascoaga-Del Angel et al., 2022OLASCOAGA-DEL ANGEL, K. S.; KONIGSBERG-FAINSTEIN, M.; PÉREZ-VILLANUEVA, J.; LÓPEZ, N. E.: "Uso de la inteligencia artificial en la investigación para el reposicionamiento de fármacos". TIP Revista Especializada en Ciencias Químico-Biológicas, 25, 1-17. 2022. DOI: https://doi.org/10.22201/fesz.23958723e.2022.450). In terms of specific decision tree algorithms, CART, Random Forest, and XGBoost stand out. CART is valued for its simplicity and ability to handle moderate amounts of data with high accuracy, while Random Forest and XGBoost are less efficient with moderate-sized datasets, although they offer greater robustness in more complex scenarios (Ejea, 2017EJEA, D. G.: Árboles de Regresión. Algunos algoritmos y extensiones a métodos de consenso. [Tesis de Grado, Universidad de Zaragoza]. 2017. Disponible en: https://zaguan.unizar.es/record/63779/files/TAZ-TFG-2017-4733.pdf). Taking into account the analysis carried out, the CART method is selected because, due to its characteristics, it is well suited to the scenario described above.
CART method
⌅The CART machine learning method belongs to the supervised learning group and is used for both data classification and regression. It is characterized by its flexibility, as it can learn from training sets and reuse parameters in different sections of the model, allowing it to identify complex interdependencies between variables (Ghiasi et al., 2020GHIASI, M. M.; ZENDEHBOUDI, S.; MOHSENIPOUR, A. A.: "Decision tree-based diagnosis of coronary artery disease: CART model". Computer Methods and Programs in Biomedicine, 192. 2020. Disponible en: https://www.sciencedirect.com/science/article/abs/pii/S0169260719308971]). Its construction is based on division criteria that seek to minimize prediction error and generate homogeneous nodes, thus facilitating data analysis.
Among its main advantages are the ease of interpreting results, the ability to handle categorical variables without the need for coding, and the possibility of identifying nonlinear relationships and modeling complex patterns (Pérez, 2024PÉREZ, A.: Detección de Patrones de Fallas de Automóviles Basada en Técnicas de Aprendizaje automático y Bases de Conocimiento. [Tesis de Maestría, Benemérita Universidad Autónoma de Puebla]. México, 2024. Disponible en: https://repositorioinstitucional.buap.mx/server/api/core/bitstreams/8c9719a0-939c-4e74-8c45-2b089e0b88d5/content). In addition, CART does not require large volumes of data for training, making it an efficient and low-cost tool in terms of technological resources. To develop a CART-based model, three fundamental processes must be carried out: training, evaluation, and model adjustment.
The process of training a decision tree model begins with the collection and preparation of well-structured data, which is then divided into two subsets: training and testing (García, 2023GARCÍA, C.: Aplicación para predecir afluencia de gente en las calles de Madrid. [Tesis de Grado, Escuela Técnica Superior de Ingeniería y Sistema de Telecomunicación]. 2023. Disponible en: https://oa.upm.es/77956/1/PFG_GARCIA_CALVO_CRISTINA.pdf). The training set, which should contain most of the records, is used to teach the model to identify relationships between variables, apply division criteria, and detect patterns that enable it to generate accurate predictions. The test set, although smaller, must be representative of the total data, as its function is to evaluate the model's ability to generalize and verify the reliability of its results. The division can be done randomly or intentionally, but it must always ensure a balance between the different classes of the model (Trujillano et al., 2008TRUJILLANO, J.; SARRIA-SANTAMERA, A.; ESQUERDA, A.; BADIA, M.; PALMA, M.; MARCH, J.: "Aproximación a la metodología basada en árboles de decisión (CART). Mortalidad hospitalaria del infarto agudo de miocardio". Gac Sanit, 22(1), 65-72, 2008.). A common practice is to assign approximately 80% of the data to training and 20% to the test set, which ensures that the model has enough records to learn without losing its validation capacity. This process is essential to avoid bias, reduce errors, and ensure that the model can be successfully applied in new scenarios, becoming a reliable tool for prediction and data analysis.
During the model evaluation process, it is essential to monitor errors in its performance and learning capability. A high training error indicates learning difficulties, which may be due to insufficient or noisy data and reflects high bias. In contrast, a low error suggests that the model has correctly captured the relationships between the data, although it is necessary to analyze the test set to detect possible cases of overfitting (Benitez et al., 2018BENITEZ, R.; CENCERRADO, A.; ESCUDERO, G.; KANAAN, S.: Inteligencia Artificial Avanzada (Vol. 1). Universitat Oberta de Catalunya, 2018. Disponible en: https://openaccess.uoc.edu/bitstream/10609/140427/8/Inteligencia%20artificial%20avanzada_M%C3%B3dulo%201_Inteligencia%20artificial%20avanzada.pdf). This phenomenon occurs when the model learns not only the patterns but also the noise in the training data, generating high variance and poor performance on new data. To avoid this, techniques such as cross-validation and regularization are used, which help to build more robust models that are capable of generalizing adequately (Hernández, 2022HERNÁNDEZ, L.: Análisis predictivo de funcionamiento de Sistema Híbrido Off Grid mediante Machine Learning. [Tesis de Grado, Escuela Técnica Superior de Ingeniería y Sistema de Telecomunicación]. Madrid, España, 2022. Disponible en: https://oa.upm.es/72650/1/TFG_LAURA_HERNANDEZ_CUBO.pdf).
Evaluating the performance of a regression model requires a thorough analysis of both the training and test sets, comparing performance metrics to measure its generalization ability (Pérez, 2024PÉREZ, A.: Detección de Patrones de Fallas de Automóviles Basada en Técnicas de Aprendizaje automático y Bases de Conocimiento. [Tesis de Maestría, Benemérita Universidad Autónoma de Puebla]. México, 2024. Disponible en: https://repositorioinstitucional.buap.mx/server/api/core/bitstreams/8c9719a0-939c-4e74-8c45-2b089e0b88d5/content). Methods such as cross-validation allow the data to be divided into multiple folds and more accurate and reliable results to be obtained. In addition, acceptance criteria based on error thresholds can be established to determine model accuracy and differentiate between correct and incorrect predictions (Vivaracho-Pascual et al., 2016VIVARACHO-PASCUAL, C.; SIMON-HURTADO, A.; MANSO-MARTINEZ, E.; PASCUAL-GASPAR, J. M.: "Client Threshold Prediction in Biometric Signature Recognition by Means of Multiple Linear Regression and Its Use for Scrore Normalization". The Journal of the Pattern Recognition Society, 55, 1-13, 2016. DOI: https://doi.org/10.1016/j.patcog.2016.02.007). This evaluation stage facilitates hyperparameter tuning, which helps improve the quality of predictions and optimize model performance.
Decision trees can be affected by problems such as overfitting, especially when there are unbalanced classes in the data (Ghiasi et al., 2020GHIASI, M. M.; ZENDEHBOUDI, S.; MOHSENIPOUR, A. A.: "Decision tree-based diagnosis of coronary artery disease: CART model". Computer Methods and Programs in Biomedicine, 192. 2020. Disponible en: https://www.sciencedirect.com/science/article/abs/pii/S0169260719308971]). To avoid this, the model can be adjusted using strategies such as hyperparameter tuning, pruning, and cross-validation. The most common hyperparameters include tree depth, classification criteria, and the minimum number of samples in a node, all of which can be modified during model development to improve its performance and generalization ability (Hernández, 2022HERNÁNDEZ, L.: Análisis predictivo de funcionamiento de Sistema Híbrido Off Grid mediante Machine Learning. [Tesis de Grado, Escuela Técnica Superior de Ingeniería y Sistema de Telecomunicación]. Madrid, España, 2022. Disponible en: https://oa.upm.es/72650/1/TFG_LAURA_HERNANDEZ_CUBO.pdf).
Pérez (2024)PÉREZ, A.: Detección de Patrones de Fallas de Automóviles Basada en Técnicas de Aprendizaje automático y Bases de Conocimiento. [Tesis de Maestría, Benemérita Universidad Autónoma de Puebla]. México, 2024. Disponible en: https://repositorioinstitucional.buap.mx/server/api/core/bitstreams/8c9719a0-939c-4e74-8c45-2b089e0b88d5/content considers the pruning process to be a key technique for reducing tree complexity and preventing the model from capturing unnecessary noise. Among its variants are pruning by cost complexity, which seeks to balance simplicity and accuracy (Ejea, 2017EJEA, D. G.: Árboles de Regresión. Algunos algoritmos y extensiones a métodos de consenso. [Tesis de Grado, Universidad de Zaragoza]. 2017. Disponible en: https://zaguan.unizar.es/record/63779/files/TAZ-TFG-2017-4733.pdf); pruning by height, which limits the maximum depth of the tree (McTavish et al., 2022MCTAVISH, H.; ZHONG, C.; ACHERMANN, R.; KARIMALIS, I.; CHEN, J.; RUDIN, C.; SELTZER, M.: Fast Sparse Decision Tree Optimization via Reference Ensembles The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22), 2022. Disponible en: https://cdn.aaai.org/ojs/21194/21194-13-25207-1-2-20220628.pdf.); pruning by minimum number of leaf samples, which ensures more reliable predictions and pruning by minimum number of samples to split a node, which avoids splits based on insufficient data (Tong et al., 2022TONG, L.; LIU, Z.; JIANG, Z.; ZHOU, F.; CHEN, L.; LYU, J.; ZHANG, X.; WANG, Y.; LI, L.; ZHOU, H.: "Cost-sensitive boosting pruning trees for depression detection on Twitter". IEEE Transactions on Affective Computing, 2022.). All these techniques contribute to creating more robust, interpretable, and efficient models.
Cross-validation is an essential method for evaluating the performance of decision trees. It consists of dividing the data into multiple subsets or folds, training and testing the model on each of them. This allows for a more accurate and reliable estimate of its generalization ability, ensuring that the final model is robust and performs well in different application scenarios (Ochoa, 2019OCHOA, L. L.: Evaluación de Algoritmos de Clasificación utilizando Validación Cuzada The 17th LACCEI International Multi-Conference for Engineering, Education, and Technology: “Industry, Innovation, and Infrastructure for Sustainable Cities and Communities”, Montego Bay, Jamaica, 2019. ).
Conclusions
⌅The analysis of the CART, Random Forest, and XGBoost techniques identified CART as the most suitable alternative for predicting estrus considering the conditions of the "El Guayabal" University Farm. This algorithm offers interpretability, ease of implementation, and low computational requirements. On the other hand, Random Forest is presented as an alternative option with better performance, although it is more complexity. However, XGBoost, despite its high accuracy, requires resources and technical knowledge that exceed the institution's current capabilities.