Validation of chirps gauge – satellite based rainfall dataset over Nicaragua, 2011 - 2021 Validación del conjunto de datos de precipitación chirps para Nicaragua, 2011 - 2021

Rainfall is a key input for many weather and climate numerical models. Therefore the strong need to have a dense enough monitoring network for this parameter. Satellite-based rainfall products have emerged in recent decades as an alternative to the more expensive gauge stations. However, a proper validation of such satellite-based products against gauge data must be performed before using their data. This study presents a validation of CHIRPS dataset against gauge data for 17 stations across Nicaragua. The performance of the product was validated at different temporal scales (daily, pentadal, monthly and annual) by different error metrics. A total of six quantitative error metrics was assessed: Bias Percentage (PBIAS), Mean Error (ME), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Pearson’s r and Nash Stucliffe Efficiency (NSE). A total three categorical indices were assessed at daily time scale: Probability of Detection (POD), False Alarm Ratio (FAR) and Critical Success Index (CSI). The results showed that CHIRPS dataset have better performance at monthly and annual time scales, while it is not capable of adequately represent the daily variability.


INTRODUCTION
Rainfall is a key component of the water cycle and one of the most important variables related to atmospheric circulation (Bras, 1996;Chris Kidd and Huffman, 2011).Therefore, it is a primary input when developing different climate and weather numerical models.Long enough records of this variable are necessary to carry out a wide range of studies in many water-related disciplines, such as water resources management, extreme events analysis, climate change, among others (Simpson et al., 2017;Tabari, 2020).
The conventional way to obtain such records is through the installation of rain gauge stations to measure at different time scales.Despite of being considered the most precise way to obtain rainfall records (Paredes-Trejo et al., 2021;Urrea et al., 2016), great challenges arise since reliability of data obtained in this way is especially limited by the spatial coverage given by the number and location of the gauge stations (Sun et al., 2018).In developing countries, particularly, budget limitations make even more difficult to have adequate weather monitoring services which in turn affects these countries' capacity to manage natural resources and related risks (Strigaro et al., 2019).
As an alternative in recent decades have emerged satellite-based monitoring missions which have much better temporal-spatial resolution and at a lower cost.Some of them are purely satellite-derived products, e.g. the Integrated Multi-satellite Retrievals for GPM dataset that is commonly known as IMERG dataset (NASA, n.d.).Others are gauge-satellite derived products which combines on ground station data with satellite-derived products.In the latest category can be mentioned the Climate Hazards group Infrared Precipitation with Stations dataset (CHIRPS), which provides information since 1981 to date (Funk et al., 2015).
While those satellite-based products usually offer a better temporal-spatial resolution when compared to gauge stations, they are prone to biases and systematic errors (Paredes-Trejo et al., 2021).Therefore, the need to validate them against gauge data before their usage for different applications.A first attempt to validate CHIRPS dataset for Nicaragua was made by Castaño et al. (2022), who compared around 100 gauge station data against CHIRPS dataset for the period from 1981 to 2010.However, their study didn't evaluate the satellite product's performance at pentadal and seasonal time scale.This study validates CHIRPS dataset again gauge stations for the period from 2011 to 2021, at daily, pentadal, seasonal, monthly, and annual time scales.A total of six quantitative error metrics and three categorical indices are used for the validation.

Study area
Nicaragua is located between 11° and 15° North latitude, 83 and 88 West longitude.Some of the main meteorological systems in the country are the Zone of Intertropical Convergence (IZTC), tropical cyclones, El Niñosouthern oscillation (ENSO), tropical waves, convection cells, troughs, sea breezes and mountains waves (INETER, n.d.).Average annual rainfall ranges from 800 mm at some parts of the Pacific region to 5000 mm at the Atlantic region of the country.In the country, there are two well-defined weather seasons: the wet season, which runs from May to October, and the dry season, from November to April.According to the Koppen classification, there are 11 subtypes or climatic zones in the country (INETER, n.d.).
The rain gauge stations selected for the study are distributed across the country, as depicted in Figure 1 (see also Table 1).It should be noticed that gauge density is very poor in the Caribbean region, when compared to the Pacific region of the country.Although there are many other stations all over the country, only 17 met the criteria of having a continuous record during the period here analyzed.

Gauge data preparation
The rain gauges dataset is managed by the Instituto Nacional de Estudios Territoriales (INETER).As a first step the dataset has been quality checked manually.This quality check included not only missing data imputation but also normality and homogeneity tests completion.Since the available variable in all cases was 24 hours accumulated rainfall, the pentadal, monthly and annual totals were determined by summing the daily rainfall values over the corresponding period.
Following the World Meteorological Organization (WMO) recommendations, missing data imputation was performed only when the total missing values was lower than 10 no consecutive days per month, or 5 consecutive days per month (World Meteorological Organization, 2017).The imputed missing values were then used to calculate either pentadal, monthly or annual totals, but they were excluded from the daily time series.Deletion is a valid missing value handling strategy, at the expense of a shorter sample without additional uncertainty associated with using estimated values (Longman et al., 2020).
After missing data filling, normality was examined through the Saphiro -Wilk test.The nonparametric Mann -Kendall trends test was used to verify the homogeneity of the sample, which is presented in ( 1) and (2).To determine any monotonic trend in a time series, the null hypothesis (H0) of the Mann-Kendall test is that there is no monotonic trend in the series.The alternative hypothesis (H1) is that the data follow a monotonic trend over time.
Where n is the sample size, j > k, k = 1,2, …, n -1, and j = 2,3, …, n.For n > 8, the value of S is close to that of a normal distribution.Therefore, the variance of S (Var) is calculated as (3).
Group InfraRed Precipitation with Station data (CHIRPS V2.0) dataset was downloaded from the website https://app.climateengine.com/climateEngine.This gauge-satellite based dataset provides information from 1981 to present, with a spatial resolution of 0.05° x 0.05° (5.55 x 5.55 km 2 ), a quasiglobal coverage (50° N -50° S), and daily, pentadal and monthly temporal resolution (Funk et al., 2015).For every gauge station, its local coordinates were used for the extraction of the corresponding CHIRPS time series.The temporal scales examined were daily, pentadal (including seasonal segmentation), monthly and annual.
The validation was performed by comparing point to pixel statistical metrics, over the studied temporal scales.The six quantitative error metrics used for the validation are described next.
Percent BIAS (PBIAS): with a 0 optimum value, measures the average tendency of the simulated values (Si) to be larger (PBIAS > 0) or smaller (PBIAS < 0) than their corresponding observed ones (Oi).N is the total number of observations available.

PBIAS = 100
Mean Error (ME): it is the averaged difference between the simulated vector and its corresponding observed vector (true values).In addition, the detection ability of CHIRPS dataset was also evaluated using three categorical indices (Table 1): the Probability of Detection (POD), the False Alarm Ratio (FAR) and the Critical Success Index (CSI).
The POD measures the ratio of observed events correctly detected by the satellite product, i.e., CHIRPS.
The FAR measures the ratio of events incorrectly detected by the satellite product that were not really observed.And the CSI is an overall measurement of the satellite product's ability to detect real rainfall events.In Table 1, A corresponds to a rainfall event detected by both the satellite product and the gauge station (hits), B corresponds to a rainfall event detected only by the satellite product (false positives) and C corresponds to a rainfall event detected by the gauge station and missed by the satellite product (misses).These categorical indices were calculated on a daily time scale.A threshold of 3.5 mm/d suggested by Funk et al. (2015) was used as a rain/ no rain threshold.

Pentadal and Seasonal validation
The pentadal time scale is presented first because it is obtained directly from the statistical blending procedure, while the remaining time scales are obtained using the derived pentadal rainfall (Funk et al., 2015).Figure 1 shows a comparison between the pentadal rainfall for both, CHIRPS and Gauges data.It can be observed that the boxes and whiskers itself look very similar one to another, but differences are evident when it comes to the atypical values (maybe associated with extreme events pentads).A few stations show notable different boxes size, i.e., CHIRPS fails to adequately represent their variability.They are Stations 4, 6, 9 and 10, respectively.Such stations are all located over the northern part of the pacific coast of the country, all of them at low elevations (< 70 m).

Boxplot for pentadal rainfall by Station
Figure 3 shows the spatial distribution of the quantitative error metrics used for the validation at pentadal scale.For the PBIAS and ME a diverging palette was used, i.e., the light color in the middle is the desired 0 value for these two metrics and the positive/negative extremes are emphasized with dark cold/warm colors.At pentadal time scale, CHIRPS tends to overestimate/underestimate rainfall in mountainous/low-lying coastal areas.
For the rest of the quantitative metrics, a sequential palette was used.Therefore, the light colors correspond to the lowest values of such metrics and the darkest cold colors correspond to their highest values.It can be observed (Figure 3) that the error tends to be higher in the two gauge-stations located at the Caribbean cost of the country (see table 1, IDs 2 and 14).With a tropical monsoon climate (Am type), that is the area of the country where the annual rainfall is maximum (above 4000 mm per year) (INETER, 2005).The Pearson's r ranges from 0.54 to 0.73, meaning that there is a moderate positive linear correlation between CHIRPS dataset and gauge data.Regarding the NSE, Duc and Sawada (2023) suggest to choose NSE = 0 as a threshold to distinguish between acceptable and not acceptable simulated value.The NSE ranges from 0.19 to 0.53, i.e., it falls into the acceptable performance category.

Seasonal validation
The pentadal data were divided into two subsets per station, one for each of the two well defined seasons in the country.The wet season was defined from May to October, while the remaining months are considered as dry season.The same criteria were followed for all the gauge stations, although some parts of the country typically have a longer wet season.A better performance of CHIRPS dataset was found during the dry season ( Figure 4 and Figure 5), when compared to the wet season and to the pentadal overall results (see 3.1).

Daily validation
Table 3 shows the resulting error metrics per station at daily time scale.It can be observed that correlation decreases dramatically when compared to its value at pentadal scale.CHIRPS dataset has a weak positive linear correlation with gauge data, with Pearson's r ranging from 0.25 to 0.42.Again, the maximum errors correspond to Stations 2 and 14.
According to the categorical indices, the best value of POD corresponds to Station 2 (POD = 0.64) followed by Station 14 (POD = 0.59).In general, both POD and CSI have relative low values.The results show a median value of 0.46 for the POD and 0.35 for the CSI.The FAR had a median value of 0.41, with Station 7 and 14 having the worst performance for this metric.

5 )
Mean Absolute Error (MAE): with a 0 optimum value, can range from 0 to ∞.It measures the average magnitude of the errors in a set of simulated values without considering their direction.Error (RMSE): with a 0 optimum value, can range from 0 to ∞.It is calculated as the standard deviation of the prediction errors (Si -Oi).Pearson correlation coefficient (Pearson's r): is the most common way of measuring a linear correlation, with its values ranging from -1 to 1.It allows to measure not only the strength but the direction of the relationship between the simulated time series and its corresponding observed one.A r with a value of one indicates a perfect positive correlation.Nash Stucliffe Efficiency (NSE): indicates how well the plot of observed versus simulated data fits a 45° line.Its values range from 1 to negative infinity, with a value of one indicating a perfect fit.

Figure 4 .
Figure 4. Spatial distribution of the quantitative error metrics at pentadal time scale for the dry season.a) PBIAS, b) ME, c) MAE, d) RMSE, e) Pearson's r, f) NSE.

Figure 5 .
Figure 5. Spatial distribution of the quantitative error metrics at pentadal time scale for the wet season.a) PBIAS, b) ME, c) MAE, d) RMSE, e) Pearson's r, f) NSE.

Table 2 .
List of categorical metrics used in this study.

Table 3 .
Error metrics per Station at daily time scale.

Table 4 .
Error metrics per Station at monthly time scale.

Table 5 .
Error metrics per Station at annual time scale.