The topic hydrology got my interest as practice domain, because this topic has very few clean or directly usable datasets.
You'll always have to clean up these datasets profoundly. Also you got to have some good domain knowledge for correct usage. The complexity of geohydrological systems is a reason why this kind of data is not taken as example datasets by the bulk of the data community.
I tried to use the hydrology module Pastas, but soon I got stuck due to the lack of min. and max. temperature and evapotranspiration data. The same trouble with the lack of some info applies for the use of commercial models. Also, waterbody characteristics change after a major event like a big flood, earth quake or a prolonged drought. The extreme dry seasons of 2012 and 2017 seem to have exhausted the storage reservoir of many springs.
After this, I started to experiment with one of the remaining waterbodies datasets: the data of water spring Lupa. This led me to assemble a completely new dataset by handpicking data from several sources. The new data had to be daily data, with almost no missing data starting from at least 2010.
Here I've used a pretty simple Linear Regression Model using PyTorch built-ins. Also it is partly based on some more recent data, which I collected from raster/nc/satellite data files. Those weren't available more than 2-1.5 years ago. I adjusted data for only a few missing data points.
The point here has been proven: good results in prediction can be obtained with decent data, which is trustworthy, and a system that is really understood. So a combi of domain knowledge and stats/ml/dl know-how is the key.
We'll preprocess the Chilean basins dataset files, train a model on a selection of basins which contain a fraction of carbonic/limestone rocks, and later compare the predictions to the observations over a span of 365 days.Some background info about the Ahr river and tributary debit data
My dataset is based on public data, and also on privately collected data. These are partly, or completely, no longer available (in hourly or 15 minutes frequency for public data) this time.
Background info about Ahr drainage basin
These are treecuttings that occured before the scheduled harvest time of the tree, because they were due to forcing actors like insects, storm, ...
A streamlit web app can be viewed at render.com, but bear in mind that this service must first start up and this could take more than 1 min. to showtime.
Another way is to watch the results is a not-so-recent video of this streamlit web app, which is viewable below.
Mamba is a reimplementation of the conda package manager in C++.
Parallel downloading of repository data and package files using multi-threading, libsolv for much faster dependency solving, and core parts of mamba are implemented in C++ for maximum efficiency.
At the same time, mamba utilizes the same command line parser, package installation and deinstallation code and transaction verification routines as conda to stay as compatible as possible. Mamba is part of a bigger ecosystem to make scientific packaging more sustainable.Such a promising introduction was an invitation to try out conda. For some time I was happy using it. However, after another time of running into (repairing) a broken conda environment, the joy left. So I searched for info about conda, and I found mamba. It turned out that with the aid of Mamba, I could repair most of my conda envs.
After that, I created new mamba environments for specific goals, based on custom yaml-files for env. and config options. (Some packages do not have yet support for python 3.11 e.g., some have no conda install...)
An example of a config yaml-file can be this file:
Conclusion: Mamba can be a powerful tool for managing and repairing virtual environments, but it takes some experience with config options and flags to unlock more potential.