Data driven and expert knowledge approach for climate zoning
Main Article Content
Abstract
Increasing climate variability demands robust methodologies to characterize rainfall regimes, a critical factor for food security and water management. In Cuba, the application of advanced data mining techniques for this purpose remains an underexplored area. This study proposes a novel framework for clustering precipitation time series by combining automated feature extraction (tsfresh) with expert-defined hydroclimatic indicators, followed by the application of the Hierarchical Clustering Algorithm (HCA). The methodology, validated with data from 17 stations in Ciego de Ávila (1960-2013), identified six homogeneous rainfall regimes with significantly different seasonal patterns and extreme event intensities. This contribution enables precise climate zoning, surpassing traditional statistical approaches. The projection of this work is direct for the agricultural context, as it facilitates the development of differentiated cultivation strategies, optimized planting calendars, and improved climate risk management, thereby increasing the resilience of the Cuban agricultural sector.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Those authors that have publications with this journal accept the following terms:
1. They will retain their copyright and guarantee the journal the right of first publication of their work, which will be simultaneously subject to the License Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) that allows third parties to share the work whenever its author is indicated and its first publication this journal. Under this license the author will be free of:
• Share — copy and redistribute the material in any medium or format
• Adapt — remix, transform, and build upon the material
• The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
• Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
• NonCommercial — You may not use the material for commercial purposes.
• No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
2. The authors may adopt other non-exclusive license agreements to distribute the published version of the work (e.g., deposit it in an institutional telematics file or publish it in a monographic volume) whenever the initial publication is indicated in this journal.
3. The authors are allowed and recommended disseminating their work through the Internet (e.g. in institutional telematics archives or on their website) before and during the submission process, which can produce interesting exchanges and increase the citations of the published work. (See the Effect of open access).
References
ARULRAJ, M.; PETKOVIC, V.; FERRARO, R.R.; MENG, H.: "Precipitation Vertical Structure Characterization: A Feature-Based Approach", Journal of Hydrometeorology, 24(12): 2281-2297, 2023, ISSN: 1573-1650, DOI: https://doi.org/10.1175/JHM-D-23-0034.1
ASHARI, I.F.; BANJARNAHOR, R.; FARIDA, D.R.; AISYAH, S.P.; DEWI, A.P.; HUMAYA, N.: "Application of Data Mining with the K-Means Clustering Method and Davies Bouldin Index for Grouping IMDB Movies", Journal of Applied Informatics and Computing, 6(1): 07-15, 2022, ISSN: 2548-6861, DOI: https://doi.org/10.30871/jaic.v6i1.3485
BONACCORSO, B.; PERES, D.J.: "Analysis of Extreme Hydrometeorological Events", Resources, 11(6): 55, 2022, ISSN: 2079-9276, DOI: https://doi.org/10.3390/resources11060055
CHRIST, M.; BRAUN, N.; NEUFFER, J.; KEMPA-LIEHR, A.W.: "Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package)", Neurocomputing, 307: 72-77, 2018, ISSN: 1872-8286, DOI: https://doi.org/10.1016/j.neucom.2018.03.067
DUAN, H.; LI, Q.; HE, L.; ZHANG, J.; AN, H.; ALI, R.; VAZIFEDOUST, M.: "Climate Classification for Major Cities in China Using Cluster Analysis", Atmosphere, 15(7): 741, 2024, ISSN: 2073-4433, DOI: https://doi.org/10.3390/atmos15070741
FRANSISKA, H.; AGUSTINA, D.; SETYORINI, D.; SUMARTAJAYA, I.M.; KURNIA, A.: "Time Series Clustering Analysis Using Dynamic Time Warping Technique of Daily Rainfall in Bengkulu Province", IOP Conference Series: Earth and Environmental Science, 1359(1): 012026, 2024, ISSN: 1755-1315, DOI: https://doi.org/10.1088/1755-1315/1359/1/012026
GARCIA-MOYA, A.; ALONSO-HERNÁNDEZ, C.M.; SÁNCHEZ-MURILLO, R.; MORERA-GÓMEZ, Y.; SÁNCHEZ-LLULL, M.; DÍAZ RIZO, O.; CUESTA SANTOS, O.; LÓPEZ LEE, R.; BRÍGIDO FLORES, O.; RAMOS VILTRE, E.O.; ORTEGA, L.: "Spatiotemporal characterization of the isotopic composition of meteoric waters in Cuba", Scientific Data, 11(1): 1398, 2024, ISSN: 2542-2987, DOI: https://doi.org/10.1038/s41597-024-04178-z
GHASSEMPOUR, S.; GIROSI, F.; MAEDER, A.: "Clustering Multivariate Time Series Using Hidden Markov Models", International Journal of Environmental Research and Public Health, 11(3): 2741-2763, 2014, ISSN: 1660-4601, DOI: https://doi.org/10.3390/ijerph110302741
HERNÁNDEZ, C.A.; MARTÍNEZ, F.H.; GIRAL, D.A.: "Evaluación del desempeño de modelos de decisión espectral predictivos", Información tecnológica, 33(3): 149-158, 2022, ISSN: 0718-0764, DOI: http://dx.doi.org/10.4067/S0718-07642022000300149
HENDRAWATI, T.; WIGENA, A.H.; SUMERTAJAYA, I.M.; SARTONO, B.; PRAVITASARI, A.A.; ASNAWI, M.H.: "The ensemble distance on model-based clustering for regions clustering based on rainfall: The case of rainfall in West Java Indonesia", International Journal of Data and Network Science, 8(2): 1187-1196, 2024, ISSN: 2561-8148, DOI: https://doi.org/10.5267/j.ijdns.2023.11.015
LEE, S.; DANANDEH MEHR, A.; MORIASI, D.; MIRCHI, A.: "Large‐Scale Drought Forecasting in the U.S. Southern Plains Through a Hybrid Cluster‐Based Wavelet‐Machine Learning Approach", Water Resources Research, 61(11): e2024WR039744, 2025, ISSN: 0043-1397, DOI: https://doi.org/10.1029/2024WR039744
LOUKAS, A.; GARROTE, L.; VASILIADES, L.: "Hydrological and Hydro-Meteorological Extremes and Related Risk and Uncertainty", Water, 13(3): 377, 2021, ISSN: 0043-1397, DOI: https://doi.org/10.3390/w13030377
PAPARRIZOS, J.; GRAVANO, L.: "Fast and Accurate Time-Series Clustering", ACM Transactions on Database Systems, 42(2): 1-49, 2017, ISSN: 0362-5915, DOI: https://doi.org/10.1145/3044711
PAPARRIZOS, J.; YANG, F.; LI, H.: "Bridging the Gap: A Decade Review of Time-Series Clustering Methods", arXiv:2412.20582, 2024, ISSN: 2331-8422, DOI: https://doi.org/10.48550/arXiv.2412.20582
RADOVANOVIC, A.; LI, J.; MILANOVIC, J.V.; MILOSAVLJEVIC, N.; STORCHI, R.: "Application of Agglomerative Hierarchical Clustering for Clustering of Time Series Data", 2020, IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe): 640-644, 2020, ISSN: 2165-4824, DOI: https://doi.org/10.1109/ISGT-Europe47291.2020.9248759
RODRIGUES, P.P.; GAMA, J.; PEDROSO, J.P.: "ODAC: Hierarchical Clustering of Time Series Data Streams", Proceedings of the 2006 SIAM International Conference on Data Mining: 499-503, 2006, ISBN: 978-0-89871-611-5, DOI: https://doi.org/10.1137/1.9781611972764.48
SHAHAPURE, K.R.; NICHOLAS, C.: "Cluster Quality Analysis Using Silhouette Score", 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA): 747-748, 2020, ISSN: 2472-1573, DOI: https://doi.org/10.1109/DSAA49011.2020.00096
SINGH, J.; SINGH, V.; OJHA, C.S.P.: "A Cluster‐Based Data Assimilation Approach to Generate New Daily Gridded Time Series Precipitation Data in the Himalayan River Basins", Water Resources Research, 61(1): e2024WR037324, 2025, ISSN: 2165-4824, DOI: https://doi.org/10.1029/2024WR037324
TAVENARD, R.; FAOUZI, J.; VANDEWIELE, G.; DIVO, F.; ANDROZ, G.; HOLTZ, C.; PAYNE, M.; YURCHAK, R.; RUßWURM, M.; KOLAR, K.; WOODS, E.: "Tslearn, A Machine Learning Toolkit for Time Series Data", Journal of Machine Learning Research, 21: 1-6, 2020, ISSN: 1533-7928
TIANO, D.; BONIFATI, A.; NG, R.: "Feature-driven Time Series Clustering", OpenProceedings.org, 2021, ISSN: 2367-2005, DOI: https://doi.org/10.5441/002/EDBT.2021.33
WANG, X.; XU, Y.: "An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index", IOP Conference Series: Materials Science and Engineering, 569(5): 052024, 2019, ISSN: 1873-0191, DOI: https://doi.org/10.1088/1757-899X/569/5/052024
ZAIFOGLU, H.; AKINTUG, B.; YANMAZ, A.M.: "Regional Frequency Analysis of Precipitation Using Time Series Clustering Approaches", Journal of Hydrologic Engineering, 23(6): 05018007, 2018, ISSN: 1084-0699, DOI: https://doi.org/10.1061/(ASCE)HE.1943-5584.0001659
ZHANG, F.; BIEDERMAN, J.A.; DANNENBERG, M.P.; YAN, D.; REED, S.C.; SMITH, W.K.: "Five Decades of Observed Daily Precipitation Reveal Longer and More Variable Drought Events Across Much of the Western United States", Geophysical Research Letters, 48(7): e2020GL092293, 2021, ISSN: 1944-8007, DOI: https://doi.org/10.1029/2020GL092293
ZHANG, Z.; LI, D.; ZHANG, Z.; DUFFIELD, N.: "Mining Spatiotemporal Mobility Patterns Using Improved Deep Time Series Clustering", ISPRS International Journal of Geo-Information, 13(11): 374, 2024, ISSN: 2220-9964, DOI: https://doi.org/10.3390/ijgi13110374