Can you help preserve "blue gold" using data to predict water availability?
Acea Smart Water Analytics
https://www.kaggle.com/c/acea-water-prediction/data
This competition uses nine different datasets, completely independent and not linked to each other. Each dataset can represent a different kind of waterbody. As each waterbody is different from the other, the related features as well are different from each other. So, if for instance we consider a water spring we notice that its features are different from the lake’s one. This is correct and reflects the behavior and characteristics of each waterbody. The Acea Group deals with four different type of waterbodies: water spring (for which three datasets are provided), lake (for which a dataset is provided), river (for which a dataset is provided) and aquifers (for which four datasets are provided).
Let’s see how these nine waterbodies differ from each other.
Each waterbody has its own different features to be predicted. The table below shows the expected feature to forecast for each waterbody.
It is of the utmost importance to notice that some features like rainfall and temperature, which are present in each dataset, don’t go alongside the date. Indeed, both rainfall and temperature affect features like level, flow, depth to groundwater and hydrometry some time after it fell down. This means, for instance, that rain fell on 1st January doesn’t affect the mentioned features right the same day but some time later. As we don’t know how many days/weeks/months later rainfall affects these features, this is another aspect to keep into consideration when analyzing the dataset.
A short, tabular description of the waterbodies is available also downloading all datasets.
More information about the behavior of each kind of waterbody can be found at the following links:
C:\Users\VanOp\AppData\Local\Temp\ipykernel_15916\1212899292.py:14: DeprecationWarning: Importing display from IPython.core.display is deprecated since IPython 7.14, please import from IPython display from IPython.core.display import display, HTML, SVG
Autosaving every 500 seconds
'%.4f'
In recent years, long and frequent droughts have affected many countries in the world. These events require an ever more careful and rational management of water resources. Most of the globe’s unfrozen freshwater reserves are stored in aquifers. Groundwater is generally a renewable resource that shows good quality and resilience to fluctuations. Thus, if properly managed, groundwater could ensure long-term supply in order to meet increasing water demand.
For this purpose, it is of crucial importance to be able to predict the flow rates provided by springs. These represent the transitions from groundwater to surface water and reflect the dynamics of the aquifer, with the whole flow system behind. Moreover, spring influences water bodies into which they discharge. The importance of springs in groundwater research is highlighted in some significant contributions. In-depth studies on springs started only after the concept of sustainability was introduced in the management of water resources.
A spring hydrograph is the consequence of several processes governing the transformation of precipitation in the spring recharge area into the single output discharge at the spring. A water balance states that the change rate in water stored in the feeding aquifer is balanced by the rate at which water flows into and out of the aquifer. A quantitative water balance generally has to take the following terms into account: precipitation, infiltration, surface runoff, evapotranspiration, groundwater recharge, soil moisture deficit, spring discharge, lateral inflow to the aquifer, leakage between the aquifer and the underlying aquitard, well pumpage from the aquifer, and change of the storage in the aquifer.
In many cases, the evaluation of the terms of the water balance is very complicated. The complexity of the problem arises from many factors: hydrologic, hydrographic, and hydrogeological features, geologic and geomorphologic characteristics, land use, land cover, water withdrawals, and climatic conditions.
Even more complicated would be to estimate future spring discharges by using a model based on the balance equations. Therefore, simplified approaches are frequently pursued for practical purposes.
Many authors have addressed the problem of correlating the spring discharges to the rainfall through different approaches...
The Methodology to use depends on the properties of the water source, local geology, and the unknown parameters. There is no clear cut path.
An example of a methodology is presented in the next illustration. It provides a view how the topics I handled are related to each other, and how unknown info and truthfull data matters to obtain a reasonable result.
<!img src="http://vanoproy.be/css/Lupa.jpg"-->
432000 l/h 10368000 l/d
Nella dorsale montuosa che occupa la parte orientale della regione, esistono due sistemi idrogeologici
separati dalla linea tettonica denominata “linea della Valnerina”, dove è individuabile un limite di
permeabilità che corre a quote variabili tra 350 e 700 m s.l.m.: a sud il “Sistema della Valnerina” e a
nord il “Sistema dell’Umbria nord-orientale”.
Con “Sistema della Valnerina”, viene identificata l’imponente struttura idrogeologica presente al
margine sud-orientale del territorio regionale. Questa si estende dal corso del Fiume Nera, ad ovest,
fino alla linea tettonica Ancona-Anzio, la sua superficie in territorio umbro è di circa 1.100 km2.
Il sistema nel suo complesso è caratterizzato dalla presenza di una serie di acquiferi costituiti
principalmente dalle formazioni della Scaglia s.l., della Maiolica e della Corniola-Calcare Massiccio.
Questi presentano comunque continuità idraulica sia per contatti laterali che verticali. La formazione
della Scaglia s.l. ospita l’acquifero più superficiale, che dà luogo a sorgenti puntuali per lo più di
modesta portata e contribuisce all’alimentazione del deflusso di base dei corsi d’acqua o alla ricarica
degli acquiferi più profondi.
I livelli piezometrici raggiungono quote superiori a 800 m s.l.m. e decrescono da est ad ovest fino a
raggiungere la minima quota in corrispondenza dell'alveo del Nera, che costituisce il livello di base
principale del sistema. Lungo questa linea di drenaggio dominante, diretta SO-NE, si hanno importanti
sorgenti lineari responsabili di notevoli incrementi di portata del fiume Nera. Studi pregressi hanno
stimato che, lungo il tratto umbro del fiume Nera, si hanno emergenze in alveo per una portata media
complessiva superiore a 15 m3 al secondo. Oltre alle emergenze in alveo, si trovano numerose
sorgenti localizzate, che rilasciano una frazione molto più modesta delle acque della struttura,
valutabile in qualche centinaio di litri al secondo. Le restituzioni sorgentizie, di tipo sia lineare sia
puntuale, sono stimate in un volume di circa 700 Mm3 annui.
The Lupa water spring is located in the central Apennines range on the left side of the river Nera near Arenno, and has an historical flowrate of about 120 l/sec. The aquifer of this karstic system contains deposits named Scaglia. The net recharge of the "Scaglia Calcarea" complex proved to be 170-425 mm/year.
The hydrological unit is called "Monte Coscerno", which has an infiltration efficiency estimated at 475 mm per year based on data from 1997-2007.
Monte Coscerno feeds several waterbodies: the river Nera, the stream F. di Castellone, and 3 notable continuous water springs. The infiltration efficiency increases going from North to South.
Spring | Elevation | Outflow (l/s) |
---|---|---|
Scheggino | 300 | 200 |
Lupa | 365 | 125 |
Pacce | 475 | 80 |
Castellone | 450-325 | 115 |
The combined outflow of all the waterbodies of the Monte Coscerno system is +-:
3615 Lupa: 3.46 %, estimated drainage area: 8.3333
520 Lupa: 24 %
Update on estimated drainage area:
Location
Soils of Italy
Water Balance and Soil Moisture Deficit of Different Vegetation Units under Semiarid Conditions in the Andes of Southern Ecuador - Andreas Fries, Karen Silva, Franz Pucha-Cofrep, Fernando Oñate-Valdivieso and Pablo Ochoa-Cueva; Climate 2020, 8(2), 30; https://doi.org/10.3390/cli8020030
The "APEX" Agricultural Policy Environmental eXtender Model theoretical documentation, Version 0604, BREC Report # 2008-17.
Date | Rainfall_Terni | Flow_Rate_Lupa | |
---|---|---|---|
0 | 01/01/2009 | 2.8 | NaN |
1 | 02/01/2009 | 2.8 | NaN |
2 | 03/01/2009 | 2.8 | NaN |
3 | 04/01/2009 | 2.8 | NaN |
4 | 05/01/2009 | 2.8 | NaN |
I started with the original dataset of ACEA, which only had errorprone flowrate data for uncomplete 3 years. So I felt I was forced to find better data, which I found on some Italian websites. Later I'd merge the original with the missing flow rate data. But after completing that, the monthly rainfall data was no longer good enough. Moreover this was Terni pluvio data, which is 11 km away from Lupa, plus it is located on the wrong side of the waterbody.
In short: I collected new meaningfull data, and gradually created a new data set
Date | Rainfall_Terni | Flow_Rate_Lupa | doy | Month | Year | ET01 | Infilt_ | Infiltsum | Rainfall_Ter | Flow_Rate_Lup | Infilt_m3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2009-01-01 | 2.8 | 135.47 | 1.0 | 1.0 | 2009.0 | NaN | NaN | NaN | 352422.0 | 11704.61 | NaN |
1 | 2009-01-02 | 2.8 | 135.24 | 2.0 | 1.0 | 2009.0 | NaN | NaN | NaN | 352422.0 | 11684.74 | NaN |
2 | 2009-01-03 | 2.8 | 135.17 | 3.0 | 1.0 | 2009.0 | NaN | NaN | NaN | 352422.0 | 11678.69 | NaN |
3 | 2009-01-04 | 2.8 | 134.87 | 4.0 | 1.0 | 2009.0 | NaN | NaN | NaN | 352422.0 | 11652.77 | NaN |
4 | 2009-01-05 | 2.8 | 134.80 | 5.0 | 1.0 | 2009.0 | NaN | NaN | NaN | 352422.0 | 11646.72 | NaN |
Date | Rainfall_Terni | Flow_Rate_Lupa | doy | Month | Year | ET01 | Infilt_ | Infiltsum | Rainfall_Ter | Flow_Rate_Lup | Infilt_m3 | Week | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2010-01-01 | 3.27 | 82.24 | 1.0 | 1.0 | 2010.0 | 1.34 | 1.93 | 1.93 | 412398.0 | 7105.54 | 143639.37 | 53 |
1 | 2010-01-02 | 3.27 | 88.90 | 2.0 | 1.0 | 2010.0 | 1.70 | 1.57 | 3.51 | 412398.0 | 7680.96 | 130966.87 | 53 |
2 | 2010-01-03 | 3.27 | 93.56 | 3.0 | 1.0 | 2010.0 | 0.94 | 2.33 | 5.84 | 412398.0 | 8083.58 | 157582.00 | 53 |
3 | 2010-01-04 | 3.27 | 96.63 | 4.0 | 1.0 | 2010.0 | 1.00 | 2.28 | 8.12 | 412398.0 | 8348.83 | 155554.40 | 1 |
4 | 2010-01-05 | 3.27 | 98.65 | 5.0 | 1.0 | 2010.0 | 1.28 | 1.99 | 10.11 | 412398.0 | 8523.36 | 145736.74 | 1 |
Rainfall_Terni | Flow_Rate_Lupa | doy | Month | Year | ET01 | |
---|---|---|---|---|---|---|
Date | ||||||
2010-01-01 | 3.27 | 82.24 | 1 | 1 | 2010 | 1.34 |
2010-01-02 | 3.27 | 88.90 | 2 | 1 | 2010 | 1.70 |
2010-01-03 | 3.27 | 93.56 | 3 | 1 | 2010 | 0.94 |
2010-01-04 | 3.27 | 96.63 | 4 | 1 | 2010 | 1.00 |
2010-01-05 | 3.27 | 98.65 | 5 | 1 | 2010 | 1.28 |
2010-01-06 | 3.27 | 102.15 | 6 | 1 | 2010 | 1.21 |
Rainfall_Terni | Flow_Rate_Lupa | doy | Month | Year | ET01 | |
---|---|---|---|---|---|---|
Date | ||||||
2020-06-25 | 0.0 | 74.29 | 177 | 6 | 2020 | 4.03 |
2020-06-26 | 0.0 | 73.93 | 178 | 6 | 2020 | 4.17 |
2020-06-27 | 0.0 | 73.60 | 179 | 6 | 2020 | 4.45 |
2020-06-28 | 0.0 | 73.14 | 180 | 6 | 2020 | 4.51 |
2020-06-29 | 0.0 | 72.88 | 181 | 6 | 2020 | 4.51 |
2020-06-30 | 0.0 | 72.53 | 182 | 6 | 2020 | 4.88 |
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 4199 entries, 2009-01-01 to 2020-06-30 Freq: D Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Rainfall_Terni 4199 non-null float64 1 Flow_Rate_Lupa 4199 non-null float64 2 doy 4199 non-null int64 3 Month 4199 non-null int64 4 Year 4199 non-null int64 5 ET01 3834 non-null float64 dtypes: float64(3), int64(3) memory usage: 229.6 KB
This is data I retrieved from an Italian site which had historical (pré 2010) data about Lupa.
Portata | |
---|---|
Data | |
2009-01-01 | 135.47 |
2009-01-02 | 135.24 |
2009-01-03 | 135.17 |
2009-01-04 | 134.87 |
2009-01-05 | 134.80 |
Portata | |
---|---|
Data | |
2010-12-18 | 189.60 |
2010-12-19 | NaN |
2010-12-20 | 191.03 |
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 4480 entries, 2009-01-01 to 2021-04-07 Freq: D Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Portata 4362 non-null float64 dtypes: float64(1) memory usage: 199.0 KB
We need interpolation to fill up some missing gaps due to data for: 23/05/2018 228,70 13/07/2018 179,22 etc...