Watersheds act as systems of processes and responses where surface runoff is the most obvious consequence, being responsible for catastrophes and natural disasters, and numerous damages mainly related with floods (Triviño & Ortiz, 2004). The comprehension of the dynamics of this system requires the development of models, through which it is possible to have an approximation to its physical reality, as well as a prediction and a forecasting of the process.
Hydrological models, understood as an approximation to the real functioning of water cycle in a watershed, can be classified as: physical and abstract. On the other hand, abstract models, according to the randomness of the variables used, can be stochastic or deterministic (Triviño & Ortiz, 2004).
Currently, stochastic models are widely used in various applications (Liang y Zhuang, 2014; Niezgoda et al., 2014; Sun et al., 2014; Dell' Amico et al., 2015). Among them, linear stochastic modeling or time series is widely used, especially in several studies of river basins (Fry et al., 2013; Sang, 2013; Wu & Chau, 2013).
In Cuba, there is a significant tendency of using deterministic models in basin-scale processes (Rodríguez et al., 2010; Estrada & Pacheco, 2012; Rodríguez & Marrero, 2015). As a consequence, the stochastic dimension in modeling has been less explored. Therefore, the objective of this article is to forecast in advance the runoff in the subwatershed ¨V Aniversario¨ belonging to Pinar del Río Province, at annual and monthly scales, including the randomness of the process. To accomplish this task, linear stochastic modeling will be used, implementing the White Noise (WN) and moving average autoregressive family (ARMA) models in the R software.
The study focuses on the “V Aniversario” subwatershed, which is one of the natural hydrographic closures of Cuyaguateje River basin, located in Pinar del Río Province. This subwatershed extends over part of Viñales and Minas de Matahambre Municipalities, with an extension of 157 km2 and is located at 22º24’6”N - 22º35’40”N and 83º47’55”O - 83º56’9”O. Figure 1 shows the geography of the study area. Its latitudinal position favors the development of a volume of runoff, which is higher than other karstic regions in the country. In particular, given its geomorphological drainage (soil) and coverage characteristics, it presents a rapid response in form of floods (Consejo Territorial de Cuencas Hidrográficas (CTCH) de Pinar del Río, 2000).
Discharge series was used to calibrate and validate the physically-based model. These variables are monitored at the subwatershed’s outlet (“V Aniversario” hydrometric station) by the traditional methods, using a current meter and the water level is also monitored continuously with a limnimeter. The data available for this study were daily mean discharge rates in the period from 1971 to 1990 (Alonso, 2016). It is interesting to notice that, in this period, the hydrometric station of “V Aniversario” subwatershed has recorded an annual runoff average of 3,72 m3/s, an observed maximum of 453 m3/s and the monthly mean varies between 0,7 and 8,9 m3/s (Consejo Territorial de Cuencas Hidrográficas (CTCH) de Pinar del Río, 2000). In this study, the daily mean values were approximately converted into annual and monthly mean values. In the case of the annual mean values, the conversion was made finding the mean of all the daily mean values in each year. Figure 2a shows the average daily runoff, as well as the empirical cumulative density function (Figure 2b).
Linear stochastic modeling of runoff in rivers results in a hydrological response that is predicted from the previous steps. White Noise and ARMA models (Metcalfe y Cowpertwait, 2009): autoregressive (AR(p)), moving average (MA(q)) and its variants allow obtaining additional information about time series, such as seasonality. Due to the importance of these models to arrive to the results in this article, they will be briefly discussed below.
White Noise (WN). A residual error is the difference between the observed value and the model predicted value at time t. If we suppose the model is defined for the variable y t and ŷ t is the value predicted by the model, the residual error x t is
As the residual errors occur in time, they form a time series: x 1 ,x 2 ,…,x n .
Autoregressive Model AR(p). The series {x t } is an autoregressive process of order p, abbreviated to AR(p), if:
Eq. 2 can be expressed as a polynomial of order p in terms of the backward shift operator (B):
The current value of the series xt is a linear combination of the most recent past p values of itself plus an “innovation” term wt that incorporates everything new in the series at time t, that is not explained by the past values (Cryer y Chan, 2010).
Moving Average Model MA(q). A moving average (MA) process of order q is a linear combination of the current White Noise term and the most recent past q terms of White Noise. This is defined by
Eq. 4 can be rewritten in terms of the backward shift operator B and ϕq polynomial of order q as:
Because MA processes consist of a finite sum of stationary White Noise terms, they are stationary and hence have a time-invariant mean and autocovariance.
Autoregressive Moving Angle Expression. A time series {xt} follows an autoregressive moving average (ARMA) process of order (p, q), denoted ARMA(p, q), when
Eq. 6 may be represented in terms of the backward shift operator B and rearranged in the more concise polynomial form:
For arriving to favorable results, it is necessary to make choices based on statistical criteria and tests. Then, Akaike´s Information Criterion (AIC) of Shumway and Stoffer (2011)and the Mann-Kendall trend test according to Wilks (2011), are described. They are fundamental in this article.
Akaike's Information Criterion (AIC). This criterion is given by the following definition:
σ̂ k 2 is given by:
SSE k denotes the residual sum of squares under the model with k regression coefficients.
The value of k that yields the minimum AIC species is the best model. The idea is roughly minimizing σ̂ k 2, which would be a reasonable objective, except that it decreases monotonically as k increases. Therefore, it is necessary to penalize the error variance by a term proportional to the number of parameters. The choice for the penalty term, in this case, is given by Eq. 8.
Mann-Kendall Trend Test. This test is a popular nonparametric alternative for testing for the presence of a trend, or for a non-stationary central tendency of a time series. In the context of examining the possibility of trend underlying a time series x i , where i is the time index and it takes values i=1,…,n, the trend is, by definition, monotonically increasing, which simplifies the calculations.
The test statistic for the Mann-Kendall trend is
where
For moderate (n about 10) or larger series lengths, the sampling distribution of the test statistic in Eq. 10 is approximately Gaussian, and if the null hypothesis (no trend) is true, this Gaussian null distribution will have zero mean. The variance of this distribution depends on whether all the x’s are distinct, or if some are repeated values.
The representation of the annual runoff series of the "V Anniversary" subwatershed is shown in Figure 3a. In this series also appears the non-parametric Mann-Kendall test results for significance of the slope (pval). In Figure 3b, the normality graph is observed.
In the plotted time series, it can be seen that there is not a significant trend. In the Mann-Kendall test, the null hypothesis (H0) corresponds to no trend, which is corroborated by the result of the pval (much higher than 0.05). In addition, the distribution of the series, assumed with random variables, is very close to normal, with small deviations in the tail. In general, the series can be considered as stationary.
Table 1 summarizes the basic descriptive statistics for the analyzed series at annual scale. According to the skewness (1.41) that is greater than zero, it is determined that the series is right skewed, typical pattern of the hydrological series.
In Figure 4, the autocorrelation (ACF) and partial autocorrelation (PACF) functions of the series are shown. It can be concluded from these graphs that this series behaves like a pure White Noise, because there is not a significant temporal correlation between years, that is, none of the orders (lag) has a correlation value that exceeds the dashed lines (significance test), with values of approximately ±0,4. Only the zero order, in Figure 4a, has a correlation value of one, but this order means null phase shift, so it is concluded that there is not temporal memory in the system that can be exploited. Therefore, the process, at this scale, is explained by the random component.
Statistics | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Annual | Min | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Var | Std | Cv | Skew |
"V Aniversario“ | 1,58 | 2,33 | 3,06 | 3,26 | 3,89 | 6,98 | 1,55 | 1,24 | 0,38 | 1,41 |
The result shown by the previous graphs (Figure 4), can be corroborated by Figure 5, where the Akaike's Information Criterion (AIC) is represented for an AR(p) y MA(q), respectively. In the case of the AR(p) (Figure 5a), the only significant value is obtained for the zero order, so it is meaningless to use this modeling structure. However, for the MA(q) in order three (Figure 5b), it seems to be an indication of a possible explanation of the variability of the series. This variability can be associated to the interannual variability of the precipitations and, therefore, of the runoff as a response of them. Specifically, the effects of ENOS (El Niño South Oscillation), an atmospheric phenomenon that influences the climatic conditions of the Caribbean, provoke cycles between three and four years, as average. The results in the AIC maybe reflect this context. However, there are not enough elements to conclude that, the amount of values analyzed seems to be still insufficient for detecting a correlation in this phenomenon that is cyclical, but erratic. On the other hand, Akaike penalizes the models by the number of parameters, which in this case is three, and, as it can be observed in Figure 5b, the MA(3) model does not have better AIC in comparison with the MA(0) or White Noise. In addition, the results of this criterion for the order three are corroborated by the ACF and PACF graphics, where it is demonstrated, for this order, that there is not practically correlation. Therefore, it is considered that the best model that represents the behavior of this series is the White Noise, as it had already been expressed previously. However, this is not in correspondence with what Aviles et al. (2016) obtained, where several models were tested at annual scale and according to the AIC, the best model that was adjusted to their data series was the ARMA (1,1). On the other hand, according to Díaz and Guevara (2016), the appropriate model for describing the annual average flows in watersheds of Santa river, Peru is the AR (1).
Table 2 shows the observed runoff statistic and White Noise generations. In this case, the model reproduces the statistical measurements very well. Figure 6, known as “spaghetti” graph, represents the 100 series that were generated by the White Noise, where the observed data series appears in red. Therefore, the synthetic and observed realizations, which emerge from the same process, can be visually compared.
Mean | Std | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Mean | Var | Std | Cv | Skew | Mean | Var | Std | Cv | Skew | |
Data | 3,26 | 1,55 | 1,24 | 0,38 | 1,41 | - | - | - | - | - |
WN | 3,27 | 1,50 | 1,21 | 2,77 | -0,04 | 0,28 | 0,45 | 0,18 | 0,55 | 0,46 |
Due to the limited temporal memory that was found in the annual runoff series, it was decided to explore the monthly scale. Figure 7 shows monthly runoff series of “V Aniversario” watershed. In addition, observed values and the decomposed elements of the time series: trend, seasonality and randomness, are represented in Figure 8a. The main component to take into account in this series is seasonality. From Figure 8b, it is evident that data are distributed approximately normal, only with small deviations in the tails, as it happened at annual scale.
Another similarity between this series and the series at annual scale, is that both have a p value higher than 0.05, then it can be proved that there is not trend in the series, in the period studied, according to the Mann-Kendall Trend Test (Figure 9).
Table 3 summarizes the fundamental descriptive statistic for analyzing the time series at monthly scale, being this series right skewed, in accordance with the annual. On the other hand, Figure 10 shows the autocorrelation (ACF) and partial autocorrelation (PACF) functions. As it is expected, memory or persistence is much higher in shorter periods in runoff observations, which is in correspondence with the study carried out by Valipour et al. (2013), at this same scale. Therefore, the significant linear correlation, with different steps back in time, is very superior in the monthly measurements compared with the annual ones, despite the influence of the sample size on the confidence limits. The seasonality of the monthly series can be easily recognized in these graphs. The linear correlation coefficient, either positive or negative, alternates every 6 months.
Statistics | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Seasonal | Min | 1st Qu. | Median | Mean | 3rd Qu. | Max. | Var | Std | Cv | Skew |
V Aniversario | 0,05 | 0,60 | 1,81 | 3,29 | 4,61 | 27,78 | 16,10 | 4,01 | 1,22 | 2,40 |
From the previous graphs, the seasonality of the monthly runoff series is corroborated. That is why an appropriate model for the series would be a seasonal autoregressive integrated moving average (SARIMA). SARIMA has six parameters: p, d, q, P, D y Q, then it is more complex than ARMA models.
For the mean annual runoff series of the “V Aniversario” subwatershed, it was corroborated, through the results of the ACF and PACF graphs, that there is not a significant temporal correlation between the years. Therefore, this series, at this scale, does not present good temporal memory and is explained, mostly, by a high randomness. The Akaike Information Criterion (AIC) showed that the MA(3) could explain part of the variability in runoff, giving indications of possible cycles every 3 or 4 years in the series. However, after having demonstrated, from the graphs of ACF and PACF, that in order 3 there is practically no correlation and, on the other hand, that the value of AIC is not inferior compared to the White Noise, it was concluded that the latest model was better suited to the series. This conclusion was corroborated from the similarities found between the statistic of the observed series and from the generated series with the White Noise model, highlighting the coincidence in mean, variance and standard deviation.
Finally, for the monthly runoff series, based on the ACF and PACF graphs, better memory or persistence and higher linear correlation were found in comparison with the annual series. Similarly, the seasonal component was recognized, because the linear correlation coefficient is both positive and negative and alternates every 6 months. Therefore, this series must be modeled by a SARIMA