A method to characterize climate, Earth or environmental vector random processes

We propose a methodology to characterize a multivariate non-stationary vector random process 18 that can be used for simulating random realizations that keep the probabilistic behavior of the 19 original time series. The marginal probability distribution of each component process is as- 20 sumed to be a piecewise function deﬁned by several weighted parametric probability models. 21 The weights are obtained analytically by ensuring that the probability density function is well 22 deﬁned and that it is continuous at the common endpoints. The probability model is assumed to 23 vary periodically in time over a predeﬁned time period by deﬁning the model parameters and the 24 common endpoints as truncated generalized Fourier series. The coeﬃcients of the expansions 25 are obtained with the maximum likelihood method. Three diﬀerent types of sets of orthogonal 26 functions are tested. The method is applied to three time series with diﬀerent particularities. 27 Firstly, it is shown its good behavior to capture the highly variable freshwater discharges at a 28 dam located in a semiarid zone in Andalucía (Spain) which is inﬂuenced not only by the climate 29 variability but also by management decisions. Secondly, for the Wolf sunspot number time se- ries, the Schwabe cycle and time variations close to the 7.5 and 17 years are analyzed along a 31 22-year cycle. Finally, the method is applied to a bivariate (velocity and direction) wind time 32 series observed at a location of the Atlantic Ocean. For this case, the analysis, that was combined 33 with a vectorial autoregresive model, focus on the assessment of the goodness of the method- 34 ology to replicate the statistical features of the original series. In particular, it is found that it 35 reproducesthemarginalandjointdistributions, thewindrose, andthedurationofsojournsabove 36 given thresholds. 37


39
The long-term analysis of a natural phenomenon is usually done from observations of multivariate time series 40 whose statistical properties are representative of the conditions during regular time intervals known as states. For 41 meteorological and wave climate, the duration of a state usually ranges from several minutes to a few hours. Those   Baquerizo) In environmental sciences, there are many proposals for the simulation of time series that focus on the generation 59 of the values above a given threshold (known as storm conditions for climate variables) or the full time series. Some of 60 them treat the series as stationary while more recent approaches consider their non-stationarity. The earliest attempts 61 to reproduce stormy conditions in sea state wave climate analysis treated the occurrence of storms as Poisson events 62 with exponential interarrival times. Their persistence was usually obtained by means of the joint distribution of peaks 63 and durations and they used idealized storm shapes (e.g. ( wave energy in the storms using empirical orthogonal functions. 66 In the field of Geostatistics, a full theoretical framework for spatiotemporal processes has been developed (Chris- well as a method to validate the ability of models to capture statistical features like marginal distributions, covariance 80 functions and sojourns durations above/below a threshold level for analyze stormy/calm conditions for a given dataset.

81
In this work we propose a general procedure that is based on the research line initiated by Solari and Losada  The article is organized as follows. Section 2 presents the theoretical foundations of the methodology. Section 3 91 illustrates its application to three environmental time series with different particularities. In Section 3.1 is analyzed 92 the freshwater river discharge from a dam which watershed shows a strong seasonal variability and where rain events   99 We consider a vector random process, ⃗ = 1 ( ), ..., ( ), ..., ( ) , that can be multivariate or univariate (for

Fit of data to marginal NS distributions
We assume that each variable ( = 1, ..., ), from now on denoted by , is a continuous random variable whose probability density function ( ) can be expressed as a piecewise function where a finite number, , of weighted probability models (PMs) fit within a partition of the real axis into intervals: { ∶ = 1, ..., } where = −1 , for = 2, ..., − 1, 1 = (−∞, 1 ] and = ( −1 , +∞). That is: where denotes the probability density function of the model selected for . The function defined in eq. (1) is 109 required to be continuous at the common matching points of the intervals by imposing the following conditions: Also, in order to guarantee that eq. (1) is well defined, the parameters are required to fulfil the following condition: where denotes the corresponding probability distribution function.

112
The solution to eqs. Fourier series over the interval [0, ] which expression, truncated to terms, is: where are the coefficients of the best approach in the subspace spanned by a set of orthogonal functions, ( ) =1 . This set may be, among others, the set of eigenfunctions of a periodic or regular Sturm Liouville problem (SLP) with ordinary differential equation:

118
The orthogonality is interpreted in regards to the inner product < ( ), ( ) > = ∫ ( ) ( ) ( ]. It also includes the nomenclature used for the presentation of results in section 3. Table 1 Sets of basis expansion solutions (first column) that solves the differential eq. (6) with the functions (second column) and conditions (third column) given. The negative log-likelihood function (NLLF) is used as the objective function in the optimization algorithm. It 122 reads:

Orthogonal set of eigenfunctions Differential equation and domain Conditions imposed
where ⃗ is a vector of dimension that contains the Fourier coefficients of the expansion of the parameters and the 124 percentiles of the common matching points, and ( ) for = 1, ..., are the observations.

125
The optimization problem is defined as the search for values of ⃗ that minimize the NLLF. When necessary, the 126 optimization problem will be subject to conditions imposed on the sign of certain parameters of the distributions 127 involved. An approximation of the solution is found by means of the Sequential Least SQuares Programming (SLSQP)

128
(Von Stryk, 1993), and by using as initial solution a first guess of the values of the coefficients obtained from stationary 129 conditions and also a guess of the percentiles of the common endpoints of the intervals.

130
The resulting distributions where the parameters are those obtained from the optimization problem, are NS and, 131 therefore, hereinafter denoted by ( ( ); ) for each .   options that the differences can only be spotted in figure 1 for the upper tail (percentiles > 75%) which is also rather 160 well reproduced along the whole year. As it is observed, it is hard to visually select the best fit between these models, 161 however, it may be accounted for that the expansion with a sinusoidal basis needs a considerable smaller number of 162 parameters (51 vs 63 for a) and c) and 75 for d)). 163 The performance of the most representative options considered is analyzed in terms of the dimension of the opti- to the Gibbs phenomenon associated to the limitation of working with discrete, and therefore, non-smooth data.

237
The temporal description of the parameters (eq. 6) has been done in terms of SLPs. However, the expansion may This is the case of river discharges at dams that regulate rivers in semiarid zones where most of the time the flow is the 298 minimum ecological discharge. Those low values are exceptionally exceeded when intense and persistent precipitation 299 events occur and the dam releases for safety purposes. Those differences are also not evenly distributed along time 300 due to, for example, to strong seasonal and yearly climate variation. Under such circumstances, it is convenient to 301 transform the data into Gaussian distributed values using a -parameter Box-Cox transformation (Box and Cox, 1964).

302
Other power transformations can also be applied (Yeo and Johnson, 2000).

303
A.2. Temporal dependence 304 The Vector Auto-regressive, VAR( ) model is applied to the normalized series obtained from the observations as: where Φ −1 is the inverse of the Gaussian cumulative distribution function with zero mean and unit standard deviation