API Reference

This section documents the public API of GaugePredict.

For detailed tutorials and examples, see the Examples section.

Core Package

GaugePredict

GaugePredict forecasts downstream gauge conditions using hybrid neural network models.

A comprehensive tool for predicting gauge conditions at USGS monitoring stations or user specified locations (via csv) using deep learning models trained on historical U.S. Geological Survey (USGS) data.

Modules:

downloader: Download and process USGS NWIS data predict: Neural network models and prediction utilities routines: Core utility and data processing functions plotting: Visualization tools for model outputs and analysis

Downloader Module

GaugePredict/downloader.py

Utilities for retrieving USGS NWIS daily-values (DV) time series and assembling a screened gauge catalog by HUC.

GaugePredict.downloader.GaugebyHUC(start_date, end_date, huc_codes, parameter_code, percent_threshold, data_dir, json_path, siteType=None, *, tz='UTC')[source]

Build a HUC-grouped gauge catalog from NWIS, screened by data completeness.

For each requested HUC code, this function:
  1. Queries NWIS site metadata for the given parameter code.

  2. Downloads daily values (DV) for each site for the requested date range.

  3. Computes data completeness as (non-NaN days / expected days) * 100.

  4. Keeps only sites above percent_threshold.

  5. Writes a JSON cache keyed by HUC then site number.

GaugePredict.downloader.load_target(target_site, full_index, start_date, end_date, parameter_code, *, to_units='metric', tz='UTC', parameter_kind=None)[source]

Retrieve NWIS daily-values for a site and align to a provided daily index.

Downloads DV from USGS NWIS for a single site and parameter, converts the series to a timezone-aware UTC index, and reindexes to full_index (daily). Missing days are filled by interpolation and edge filling (ffill/bfill). Optionally converts units to metric for a small set of parameter codes.

The downloader module provides utilities for retrieving USGS NWIS time series data and assembling gauge catalogs organized by hydrological unit code (HUC).

Predict Module

Provides neural network models and prediction functions for forecasting downstream gauge conditions. Includes support for training, inference, and SHAP explainability.

Features:
  • Hybrid neural network models for discharge prediction

  • Model training and validation utilities

  • Batch prediction capabilities

  • SHAP value computation for model interpretability

  • PyTorch-based deep learning infrastructure

class GaugePredict.predict.CNN_LSTM(input_channels, seq_len)[source]

Bases: Module

CNN-LSTM sequence encoder for regression on daily predictor sequences.

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

set_channel_mask(mask_1d=None)[source]
class GaugePredict.predict.GaugeDataModel(data_files, target_site, start_date, end_date, tz, sequence_length, forecast_horizon, cutoff_date, parameter_code, *, batch_size=64, allowed_site_ids_norm=None, target_csv_path=None, target_csv_date_col=None, target_csv_value_col=None, target_units='metric', target_parameter_kind=None)[source]

Bases: object

Prepare arrays, splits, scalers, and PyTorch DataLoaders for forecasting. Future updates will allow for CNN and LSTM separation and layer number configuration in notebooks

Framework:

  • Builds a full daily time index for the modeling window.

  • Loads predictor channels (multiple gauge sites) and a single target series.

  • Converts predictors + target to aligned arrays, then constructs sliding

    sequences for supervised learning.

  • Splits sequences into train/test using a cutoff date.

  • Standardizes predictors per channel using training statistics.

  • Standardizes target using sklearn.StandardScaler.

Key shapes:

  • processed_X_data: [C, T]

  • X_seq: [N, L, C] (L = sequence_length)

  • y_seq: [N, 1] or [N]

__init__(data_files, target_site, start_date, end_date, tz, sequence_length, forecast_horizon, cutoff_date, parameter_code, *, batch_size=64, allowed_site_ids_norm=None, target_csv_path=None, target_csv_date_col=None, target_csv_value_col=None, target_units='metric', target_parameter_kind=None)[source]

Configures data pipeline but does not load data yet.

create_datasets()[source]

Wrap normalized arrays in SequenceDataset objects and attach site ids.

Site ids are stored in both raw and normalized forms: - site_no_raw: original strings (may include leading zeros) - site_no_norm: leading zeros stripped

normalize()[source]

Normalize predictors and targets using training set statistics.

Predictors (X) are standardized per channel using mean and standard deviation computed over the training samples and time dimension. Targets (y) are standardized with sklearn.StandardScaler.

prepare_data()[source]

Load predictors and target and build sequences.

Framework: - Loads raw predictor arrays and per-channel metadata via routines.load_data() - Loads the target series via CSV or NWIS - Runs routines.process_data() to align and transform predictors - Creates supervised sequences X_seq and y_seq via routines.generate_sequences() - Builds a metadata table for predictor channels (site id, lat/lon, channel index)

Inputs :

None (uses instance configuration)

Outputs :

None (populates instance attributes: processed_X_data, X_seq, y_seq, y, site_meta_df)

setup()[source]

Run the full data preparation pipeline and create DataLoaders.

Inputs :

None

Outputs :

None (populates dataloader attributes)

split_train_test()[source]

Create boolean masks and split sequences into train and test arrays.

Uses routines.generate_train_test_masks() to create per-sequence masks derived from the full daily index, sequence length, forecast horizon, and cutoff date. The masks are then applied to X_seq and y_seq.

Inputs :

None

Outputs :

None (populates train_mask, test_mask, X_train, y_train, X_test, y_test)

class GaugePredict.predict.SequenceDataset(X, y)[source]

Bases: object

Minimal torch.utils.data.Dataset for sequence-to-one regression.

Expects:
  • X: [N, T, C] float-like array

  • y: [N, 1] or [N] float-like array

Stores X and y as float32 tensors and exposes input_channels for model consumption.

__init__(X, y)[source]

Inputs :

X‘array-like’

Feature tensor with shape [N, T, C].

y‘array-like’

Target array with shape [N] or [N, 1].

Outputs :

dataset‘SequenceDataset’

Dataset containing float32 tensors X and y.

class GaugePredict.predict.Trainer(model, datamodule, scaler_y, criterion, optimizer, *, device=None, evaluations=None, max_grad_norm=1.0)[source]

Bases: object

Lightweight training and evaluation loop for a PyTorch regression model.

Features: - Device selection (cpu/cuda) - Gradient clipping - Optional warmup learning-rate scaling - Metric evaluation on inverse-transformed targets

The trainer assumes the target scaler is used to map model outputs and dataset targets back to native units for metric computation.

__init__(model, datamodule, scaler_y, criterion, optimizer, *, device=None, evaluations=None, max_grad_norm=1.0)[source]

Inputs :

model‘torch.nn.Module’

Model to train.

datamodule‘GaugeDataModel’

Prepared data model containing train/test dataloaders.

scaler_y‘sklearn.preprocessing.StandardScaler’

Scaler fit on training targets; used to invert predictions and targets.

criterion‘callable’

Loss function accepting (yhat, y) tensors and returning a scalar loss.

optimizer‘torch.optim.Optimizer’

Optimizer instance.

device‘str or None’

Device to run on (“cpu” or “cuda”). If None, auto-select.

evaluations‘dict or None’

Mapping of metric name -> function(y_true, y_pred). Defaults to r2, nse, and willmott.

max_grad_norm‘float’

Maximum gradient norm for clipping.

Outputs :

Trainer‘Trainer’

Configured trainer instance.

evaluate(dataloader=None)[source]

Run inference on a dataloader and return inverse-transformed y and yhat.

Inputs :

dataloader‘torch.utils.data.DataLoader or None’

DataLoader to evaluate. Defaults to the test DataLoader.

Outputs :

y_true‘numpy.ndarray’

Target values in original (inverse-transformed) units.

y_pred‘numpy.ndarray’

Predicted values in original (inverse-transformed) units.

fit(num_epochs, *, evaluate=True, warmup_epochs=15, warmup_scale_mode='linear', warmup_min_scale=0.1)[source]

Train at lower learning rate for a fixed number of epochs with optional metric evaluation.

Framework: - If warmup_epochs > 0, learning rates are scaled for early epochs. - warmup_scale_mode=”linear” scales from 1/warmup_steps up to 1. - Other modes use a constant warmup_min_scale.

Inputs :

num_epochs‘int’

Number of training epochs.

evaluate‘bool’

If True, compute metrics each epoch using the test set.

warmup_epochs‘int’

Number of epochs to apply learning-rate warmup.

warmup_scale_mode‘str’

Warmup scaling mode (“linear” or constant fallback).

warmup_min_scale‘float’

Constant scale factor used when warmup_scale_mode is not “linear”.

Outputs :

history‘dict’

Training history with keys: “train_loss” and metric names (if enabled).

Raises :

ValueError

If warmup_epochs is negative.

train_epoch()[source]

Train the model for one epoch over the training DataLoader.

Inputs :

None

Outputs :

mean_loss‘float’

Mean training loss over the epoch, weighted by batch size.

GaugePredict.predict.count_parameters(model)[source]

Count the number of trainable parameters in PyTorch model.

Inputs :

model‘torch.nn.Module’

Model instance.

Outputs :

n_params‘int’

Total number of parameters with requires_grad=True.

GaugePredict.predict.evaluate_on_test(trainer, gdm)[source]

Evaluate a trained model on the GaugeDataModel test split.

Inputs :

trainer‘Trainer’

Trained Trainer instance.

gdm‘GaugeDataModel’

Data model containing test mask and indexing metadata.

Outputs :

dates‘pandas.DatetimeIndex or numpy.ndarray’

Target dates corresponding to each test prediction.

y_true‘numpy.ndarray’

True values in original units.

y_pred‘numpy.ndarray’

Predicted values in original units.

metrics‘dict’

Dictionary with keys: “r2”, “nse”, “willmott”.

GaugePredict.predict.export_shap_sites(shap_df, out_dir, *, horizon, min_norm_for_list=0.0)[source]

Export per-site SHAP importance products to CSV and plain-text site list.

Files written: - shap_sites.csv - site_list.txt

GaugePredict.predict.get_allowed_sites_for_horizon(h, *, shap_root=None, site_selection_mode='all', n_shap_by_h=None, default_n_shap=20)[source]

Return the list of allowed predictor site IDs for a given forecast horizon.

If site_selection_mode is “from_shap”, this reads the per-horizon shap_sites.csv file (under shap_root/Hxx/) and returns the top sites by SHAP importance. Otherwise, returns None to indicate that all sites are allowed.

Parameters:
  • h (int or str) – Forecast horizon (days ahead).

  • shap_root (pathlib.Path or str, optional) – Root directory containing per-horizon SHAP outputs.

  • site_selection_mode (str, optional) – Either “from_shap” to limit to top sites or anything else to allow all.

  • n_shap_by_h (dict, optional) – Mapping horizon -> number of sites to keep. Falls back to default_n_shap.

  • default_n_shap (int, optional) – Default number of sites to keep when n_shap_by_h is not provided.

Returns:

List of site_no_norm values (leading zeros stripped) or None.

Return type:

list[str] or None

GaugePredict.predict.get_hardware_info(device_str)[source]

Collect a hardware and software snapshot for reporting and reproducibility.

The returned dictionary is intended for logging. If CUDA is requested and available, GPU properties are also included.

Inputs :

device_str‘str’

Requested device identifier (commonly “cpu” or “cuda”).

Outputs :

info‘dict’

Dictionary containing OS, Python, Torch, CPU, RAM, and optional GPU info.

GaugePredict.predict.horizon_dir(run_root, h)[source]

Construct a standardized subdirectory path for a forecast horizon.

Example:

horizon_dir(“runs”, 3) -> Path(“runs”) / “H03”

GaugePredict.predict.load_previous_shap_df(full_shap_root, h)[source]

Load a previously exported shap_sites.csv for a given horizon if it exists.

If the CSV exists and contains “site_no” but not “site_no_norm”, a normalized id column is added by stripping leading zeros.

GaugePredict.predict.nse_score(y_true, y_pred)[source]

Nash-Sutcliffe Efficiency (NSE) for regression performance. Returns NaN if the denominator is zero (constant observations).

Inputs :

y_true‘array-like’

Observations.

y_pred‘array-like’

Predictions.

Outputs :

nse‘float’

Nash-Sutcliffe efficiency (NaN if undefined).

GaugePredict.predict.prepare_shap_sites_used(shap_df_used, *, allowed_sites=None, n_sites=None)[source]

Normalize and optionally filter a SHAP site table for reporting and saving.

Framework: - Adds site_no_norm if missing - Filters to allowed_sites if provided (by site_no_norm) - Adds importance_norm if missing (max-normalized importance) - Optionally truncates to top n_sites - Returns a compact table of commonly used columns

GaugePredict.predict.run_horizon(forecast_horizon, *, data_files, use_csv_target, target_site, target_parameter_code, start_date, end_date, tz, hp, device, allowed_sites=None, csv_path=None, csv_date_col=None, csv_value_col=None, shap_mode='none', site_selection_mode='all', full_shap_root=None, n_sites=None, out_dir=None, shap_random_state=42, target_units='metric', target_parameter_kind=None)[source]

Train and evaluate a model for a single forecast horizon, optionally with SHAP.

Framework: - Builds a GaugeDataModel and prepares DataLoaders - Instantiates the CNN_LSTM model and optimizer - Trains for hp[“epochs”] epochs - Evaluates on the test split and returns metrics and predictions - Optionally computes SHAP site importance (“run”) or loads previous SHAP results - Optionally writes run artifacts to out_dir

GaugePredict.predict.save_compute_summary(results_root, compute_summary)[source]

Write a compute summary dictionary to compute_summary.json.

GaugePredict.predict.save_run_artifacts(out_dir, *, dates_test, y_true, y_pred, metrics, history, model_state_dict, scaler_y)[source]

Save standard run outputs to a directory.

Files written: - predictions.csv : date, y_true, y_pred - metrics.json : scalar evaluation metrics - history.json : training history time series - model.pt : torch state dict - scaler_y.pkl : pickled sklearn scaler for inverse transforms

GaugePredict.predict.shap_sites_importance(trainer, gdm, *, background_size=32, nsamples=512, use_test=True, random_state=42)[source]

Compute SHAP attributions and aggregate importance to gauge (site) level.

Framework:

  • Moves the trained model to CPU and sets eval() for SHAP computation.

  • Selects a random subset of training samples for SHAP background.

  • Selects a random subset of samples for evaluation (test set by default).

  • Uses shap.GradientExplainer to compute per-sample attributions.

  • Aggregates absolute SHAP values across samples and timesteps to obtain a per-channel importance score.

  • Maps channel importance to site metadata (site_no, lat, lon), then aggregates across channels that belong to the same site.

Expected SHAP shape after conversion: [S, T, C].

GaugePredict.predict.update_compute_summary(compute_summary, *, forecast_horizon=None, fh=None, n_params=None, train_time=None, eval_time=None, metrics=None, hp=None)[source]

Write or update per-horizon results in a compute summary dictionary.

Stores a record under:

compute_summary[“runs”][forecast_horizon]

GaugePredict.predict.willmott_score(y_true, y_pred)[source]

Willmott’s index of agreement for regression performance. This implementation uses the squared-error form. Returns NaN if the denominator is zero.

Inputs :

y_true‘array-like’

Observations.

y_pred‘array-like’

Predictions.

Outputs :

d‘float’

Willmott index of agreement (NaN if undefined).

The predict module contains neural network architectures, training utilities, and inference functions for discharge prediction and model interpretation via SHAP values.

Routines Module

Core utility functions and data processing routines used across GaugePredict.

Features:
  • Data loading and preprocessing

  • File I/O operations

  • Configuration management

  • Common computational tasks

Used by downloader.py, predict.py, and plotting.py.

GaugePredict.routines.build_target_dates(full_index, sequence_length, forecast_horizon, n_samples)[source]

Build the target date vector aligned to generated supervised sequences.

For a sequence length T and forecast horizon H, the first target corresponds to full_index[T + H - 1].

Inputs :

full_index‘pandas.DatetimeIndex’

Full daily index covering the modeling period.

sequence_length‘int’

Number of past days used per input sample (T).

forecast_horizon‘int’

Lead time in days between last input day and target day (H).

n_samples‘int’

Number of supervised samples (typically len(y_seq)).

Outputs :

dates‘numpy.ndarray’

Array of datetime-like values corresponding to each target.

GaugePredict.routines.generate_full_index(start_date, end_date, *, localize=True, tz='UTC')[source]

Generate a daily date range.

If localize=True, returns a timezone-aware DatetimeIndex in timezone tz. If localize=False, returns a naive DatetimeIndex.

Inputs :

start_date, end_date‘str or datetime-like’

Inclusive bounds accepted by pandas.date_range().

localize‘bool’

If True, localize/convert the index to timezone tz.

tz‘str’

Timezone name used for localization or conversion.

Outputs :

idx‘pandas.DatetimeIndex’

Daily index from start_date through end_date.

GaugePredict.routines.generate_sequences(sequence_length, forecast_horizon, x_raw, y)[source]

Convert continuous predictor/target arrays into supervised learning sequences.

Outputs :

X_seq: [N, T, C] y_seq: [N, 1]

GaugePredict.routines.generate_train_test_masks(full_index, sequence_length, y_seq, forecast_horizon, cutoff_date, tz='UTC')[source]

Generate boolean masks for train/test split based on a cutoff target date.

Split is on target dates: - train if target_date < cutoff - test otherwise

Inputs :

full_index‘pandas.DatetimeIndex’

Full daily index.

sequence_length‘int’

Input sequence length (T).

y_seq‘array-like’

Target sequence array used only for its length.

forecast_horizon‘int’

Forecast horizon (H).

cutoff_date‘str or datetime-like’

Cutoff date for splitting.

tz‘str’

Timezone assumption if full_index and cutoff_date are naive.

Outputs :

train_mask‘numpy.ndarray (bool)’

Boolean array of length len(y_seq), True for training samples.

test_mask‘numpy.ndarray (bool)’

Boolean array of length len(y_seq), True for test samples.

GaugePredict.routines.get_project_root(file_path, levels_up=1)[source]

Resolve a project root directory by walking up parent folders.

Inputs :

file_path‘str or pathlib.Path’

Path within the project (often __file__ from a module).

levels_up‘int’

Number of parent levels to traverse. levels_up=1 returns the parent directory of file_path.

Outputs :

project_root‘pathlib.Path’

Resolved project root directory.

GaugePredict.routines.load_data(data_files, full_index, *, allow_site_ids_norm=None, tz='UTC', fill=True)[source]

Load predictor time series from one or more cached JSON files.

Expected JSON layout (Layout B only):

{site_no: payload, …}

Each payload must contain a date-keyed mapping under data_key, typically “parameter”:

payload[data_key] = {“YYYY-MM-DD”: value, …}

GaugePredict.routines.load_hucs_3857(base_dir)[source]

Load HUC2 watershed boundary shapefiles and project to EPSG:3857.

Expected layout under base_dir:

HUC??/WBDHU2.shp

Inputs :

base_dir‘str or pathlib.Path’

Root directory containing HUC?? folders.

Outputs :

gdf‘geopandas.GeoDataFrame’

Concatenated watershed boundaries in EPSG:3857.

Raises :

FileNotFoundError

If no matching shapefiles are found, or none could be loaded successfully.

GaugePredict.routines.load_run_config(run_root)[source]

Load a run configuration JSON from a run directory.

Expects a file named “compute_summary.json” inside run_root.

Inputs :

run_root‘str or pathlib.Path’

Directory containing compute_summary.json.

Outputs :

config‘dict’

Parsed JSON configuration dictionary.

Raises :

FileNotFoundError

If compute_summary.json does not exist under run_root.

GaugePredict.routines.load_target_csv(csv_path, full_index, *, date_col='date', value_col='value', tz='UTC', fill=True)[source]

Load a daily target time series from a CSV file and align to full_index.

GaugePredict.routines.process_data(raw_X_data, target_series, *, smooth_window_days=1)[source]

Construct a final predictor stack by appending target-derived channels.

Appends: - smoothed target (rolling mean) - first difference of smoothed target

GaugePredict.routines.resolve_under_project(project_root, rel_path)[source]

Resolve a relative path under a known project root.

Inputs :

project_root‘str or pathlib.Path’

Project root directory.

rel_path‘str or pathlib.Path’

Relative path under the project root.

Outputs :

path‘pathlib.Path’

Absolute, resolved path under the project root.

The routines module provides core utility functions used across GaugePredict, including data processing, file I/O, and configuration management.

Plotting Module

GaugePredict/plotting.py

Plotting utilities for model outputs, SHAP summaries, and geospatial context.

GaugePredict.plotting.build_aligned_test_series(results, horizons)[source]

Build an aligned observed series and a long-form prediction table across horizons.

This function intersects the available test date ranges across all horizons so that predictions and observations are aligned on a common set of dates. Observations are taken from the largest horizon (max(horizons_sorted)) run to provide a consistent y_true vector aligned to the intersection index.

Inputs :

results‘dict’

Mapping horizon -> loaded run dict (see load_saved_horizon_run()).

horizons‘iterable’

Horizons to include (only those present in results are used).

Outputs :

date_index‘pandas.DatetimeIndex’

Intersection of all test date indices across included horizons.

y_true‘numpy.ndarray’

Observed values aligned to date_index.

pred_df‘pandas.DataFrame’

Long-form table with columns [“date”, “horizon”, “y_pred”] aligned to date_index.

GaugePredict.plotting.build_scores_table(results, horizons)[source]

Build a summary table of evaluation metrics by horizon.

Inputs :

results‘dict’

Mapping horizon -> loaded run dict.

horizons‘iterable’

Horizons to include (only those present in results are used).

Outputs :

df‘pandas.DataFrame’

DataFrame indexed by horizon with columns [“r2”,”nse”,”willmott”].

GaugePredict.plotting.get_examples_results_dir(project_root)[source]

Return the default examples/results directory under a project root.

Inputs :

project_root‘str or pathlib.Path’

Project root directory.

Outputs :

results_dir‘pathlib.Path’

Path to “<project_root>/examples/results”.

GaugePredict.plotting.get_horizon_styles(horizons, cmap=None, min_color=0.15, max_color=0.9)[source]

Assign a distinct color and linestyle for each horizon.

Colors are sampled from a continuous colormap over [min_color, max_color]. Linestyles cycle through a predefined set.

Inputs :

horizons‘iterable’

Horizons to style.

cmap‘matplotlib colormap or None’

Colormap used for horizon colors. Defaults to cmocean.cm.haline.

min_color, max_color‘float’

Fractions in [0, 1] used for colormap sampling range.

Outputs :

colors_h‘dict’

Mapping horizon -> RGBA color.

linestyles_h‘dict’

Mapping horizon -> matplotlib linestyle spec.

GaugePredict.plotting.horizon_dir(results_root, h)[source]

Construct a standardized subdirectory path for a forecast horizon.

Example:

horizon_dir(“results”, 3) -> Path(“results”) / “H03”

Inputs :

results_root‘str or pathlib.Path’

Root directory containing horizon folders.

h‘int’

Forecast horizon.

Outputs :

path‘pathlib.Path’

Horizon directory path.

GaugePredict.plotting.load_saved_horizon_run(results_root, h, *, verbose=True)[source]

Load saved model outputs for a single forecast horizon.

Expects the horizon directory to contain:

  • predictions.csv (date, y_true, y_pred)

  • metrics.json

  • history.json

  • model.pt

  • scaler_y.pkl

Dates in predictions.csv are parsed as UTC and then converted to naive timestamps for plotting convenience.

Inputs :

results_root‘str or pathlib.Path’

Root directory containing horizon subfolders.

h‘int’

Forecast horizon to load.

verbose‘bool’

If True, prints a message when required files are missing.

Outputs :

run‘dict or None’

Dictionary containing:

  • dates_test : numpy array of datetime-like (tz-naive)

  • y_true_test, y_pred_test : numpy arrays

  • metr : dict of metrics

  • history : dict of training curves

  • scaler_y : loaded scaler object

  • model_path : pathlib.Path to model.pt

Returns None if required files are missing.

GaugePredict.plotting.load_saved_runs(results_root, horizons, *, verbose=True, require_any=True)[source]

Load saved runs for multiple horizons.

Inputs :

results_root‘str or pathlib.Path’

Root directory containing per-horizon subfolders.

horizons‘iterable’

Horizons to load.

verbose‘bool’

If True, prints a message per loaded horizon and for missing horizons.

require_any‘bool’

If True, raises if no horizons are found.

Outputs :

results‘dict’

Mapping horizon (int) -> run dict from load_saved_horizon_run().

Raises :

RuntimeError

If require_any is True and no runs are found.

GaugePredict.plotting.load_shap_sites_csv(shap_root, h)[source]

Load and normalize a horizon-specific SHAP site-importance table.

Expects:

<shap_root>/H??/shap_sites.csv

Required columns: “lat”, “lon” Normalized importance is returned in “importance_norm”.

Inputs :

shap_root‘str or pathlib.Path’

Root directory containing SHAP outputs.

h‘int’

Horizon to load.

Outputs :

df‘pandas.DataFrame’

SHAP table with importance normalized to [0, 1].

Raises :

FileNotFoundError

If shap_sites.csv does not exist.

ValueError

If required columns are missing.

GaugePredict.plotting.load_shap_tables_by_horizon(shap_root, horizons, *, filename='shap_sites.csv', verbose=True)[source]

Load SHAP site-importance CSV files across multiple horizons and concatenate.

Each file is expected at:

<shap_root>/H##/<filename>

The output includes an added “horizon” column.

Inputs :

shap_root‘str or pathlib.Path’

Root directory containing horizon subfolders.

horizons‘iterable’

Horizons to attempt to load.

filename‘str’

CSV filename to load from each horizon folder.

verbose‘bool’

If True, prints a message when a horizon file is missing.

Outputs :

df‘pandas.DataFrame’

Concatenated SHAP table with “horizon” column.

Raises :

RuntimeError

If no SHAP files are found for the requested horizons.

GaugePredict.plotting.load_states(states_fp)[source]

Load a states boundary file and standardize to EPSG:4326.

If the input file has no CRS, EPSG:4269 is assumed (common for some US datasets). If available, AK, HI, and US territories are removed to focus on CONUS.

Inputs :

states_fp‘str or pathlib.Path’

Path to a states shapefile/GeoPackage/GeoJSON supported by geopandas.

Outputs :

states‘geopandas.GeoDataFrame’

States boundaries projected to EPSG:4326, filtered to CONUS when possible.

GaugePredict.plotting.parameter_label_from_target(target_variable)[source]

Create a standard y-axis label from a target-variable name.

Currently supports a discharge label with 10^4 scaling and a default water-level label. Broad use will be updated.

Inputs :

target_variable‘str’

Target variable identifier (e.g., “discharge”, “water_level”).

Outputs :

label‘str’

Matplotlib-ready label string.

GaugePredict.plotting.plot_hucs(base_dir, states_fp, *, include_ak=False, label_hucs=True, basemap=True, zoom=4)[source]

Plot basin polygons with optional state boundaries and basemap.

By default, this function produces a CONUS-focused plot: - Alaska, Hawaii, and territories are excluded in default - HUC2 code “19” (Alaska) is excluded from basins

If include_ak=True, the plot is produced in EPSG:3857 and Alaska is included.

Inputs :

base_dir‘str or pathlib.Path’

Root directory containing HUC??/WBDHU2.shp shapefiles.

states_fp‘str or pathlib.Path’

States boundary dataset path.

include_ak‘bool’

If True, include Alaska and use EPSG:3857. If False, use EPSG:4326.

label_hucs‘bool’

If True, annotate each HUC2 polygon group with its HUC code.

basemap‘bool’

If True, add a contextily basemap (requires internet tile access).

zoom‘int’

Contextily basemap zoom level.

xlim, ylim‘tuple or None’

Optional axis limits in the current CRS units. If None, defaults are used.

Outputs :

fig‘matplotlib.figure.Figure’

Figure object.

ax‘matplotlib.axes.Axes’

Axes object.

GaugePredict.plotting.plot_shap_geoplot_grid(*, shap_root, horizons, n_shap_by_h, states_fp=None, xlim=None, ylim=None, fig_w=8.0, fig_h=6.0, nrows=None, ncols=None, s_all=6.0, s_used=18.0, wspace=0.03, hspace=-0.0125, cbar_rect=(0.125, 0.07, 0.775, 0.03), save_path=None, show=True, dpi=300, save_dpi=400, font_size=8)[source]

Creates subplot of SHAP site importance maps for multiple horizons.

Each figure shows: - All available sites in light gray - The top-N sites (per horizon) colored by normalized SHAP importance

State boundaries are optional. If states_fp is provided, boundaries are drawn for geographic context; if states_fp is None, boundaries are skipped.

This was specific for project use. A generalized update to this function is comming soon.

Inputs :

shap_root‘str or pathlib.Path’

Root directory containing per-horizon SHAP outputs.

horizons‘iterable’

Horizons to plot.

n_shap_by_h‘dict’

Mapping horizon -> number of top sites to highlight.

states_fp‘str or pathlib.Path’

States boundary dataset path (read by geopandas).

xlim, ylim‘tuple (float, float)’ or None

Plot bounds in degrees (lon/lat) for EPSG:4326 output.

fig_w, fig_h‘float’

Figure width/height in inches.

nrows, ncols‘int’ or None

Grid arrangement for panels.

s_all‘float’

Marker size for all sites.

s_used‘float’

Marker size for highlighted sites.

wspace, hspace‘float’

Grid spacing.

cbar_rect‘tuple’

Rectangle (left, bottom, width, height) for colorbar axes in figure fraction coordinates.

save_path‘str or pathlib.Path or None’

If provided, figure is saved to this path.

show‘bool’

If True, calls plt.show().

dpi‘int’

Figure display dpi.

save_dpi‘int’

Save dpi when writing to disk.

font_size‘int’

Base matplotlib font size.

Outputs :

fig‘matplotlib.figure.Figure’

The created figure.

Raises :

KeyError

If n_shap_by_h is missing an entry for one of the requested horizons.

GaugePredict.plotting.plot_statistics(target, *, critical_threshold=None, figsize=(6.0, 5.4), dpi=400, ylim=None, hist_bins=40, show_trend=True, trend_label=None)[source]

Subplot of statistics for a daily time series at a target site.

Panels:
  1. time series and optional linear trend

  2. histogram with quantile markers and optional critical threshold

  3. monthly violin plots and optional critical threshold

Inputs :

target‘pandas.Series’

Daily series indexed by datetimes. Timezone-aware (UTC) is recommended.

critical_threshold‘float or None’

Threshold value to annotate on histogram and violin plot.

figsize‘tuple (float, float)’

Figure size in inches.

dpi‘int’

Figure DPI.

ylim‘tuple (float, float) or None’

(ymin, ymax) limits for panels (a) and (c). If None, auto-scaled.

hist_bins‘int’

Number of bins for histogram.

show_trend‘bool’

If True, fit and plot a linear trend on panel (a).

trend_label‘str or None’

If provided, overrides the trend legend label.

Outputs :

fig‘matplotlib.figure.Figure’

Figure object.

axes‘numpy.ndarray of matplotlib.axes.Axes’

Array of axes for panels (a), (b), and (c).

Raises :

ValueError

If target is None or contains fewer than 2 finite values.

TypeError

If target is not a pandas Series-like object.

GaugePredict.plotting.plot_training_and_timeseries(results, horizons, *, date_index, y_true, pred_df, colors_h, linestyles_h, parameter_label=None, roll_window_days=1, fig_w=6.9, fig_h=3.85, dpi=600, site=None)[source]

Plot training curves and aligned observed/predicted test time series.

  • Top row: per-epoch curves for selected metrics (train_loss, r2, willmott) for all horizons.

  • Bottom row: observed vs predicted time series on the common test date intersection.

Discharge scaling:

  • If parameter_label indicates a 10^4 scaling, observed/predicted values are scaled by 1e-4 before plotting to match the label.

Smoothing:

  • roll_window_days defines a centered rolling window (in days) for display smoothing of both observations and predictions.

Inputs :

results‘dict’

Mapping horizon -> loaded run dict (must include “history” and test series).

horizons‘iterable’

Horizons to plot.

date_index‘pandas.DatetimeIndex’

Common test dates (typically from build_aligned_test_series()).

y_true‘array-like’

Observations aligned to date_index.

pred_df‘pandas.DataFrame’

Long-form predictions with columns [“date”,”horizon”,”y_pred”].

colors_h‘dict’

Mapping horizon -> color.

linestyles_h‘dict’

Mapping horizon -> linestyle.

parameter_label‘str’

Y-axis label for the time-series panel.

roll_window_days‘int’

Centered rolling-window (days) used for figure smoothing.

fig_w, fig_h‘float’

Figure size in inches.

dpi‘int’

Figure DPI.

site‘str or None’

Optional label printed on the figure (e.g., site id).

Outputs :

fig‘matplotlib.figure.Figure’

Created figure.

GaugePredict.plotting.top_n_shap_sites(df, n_keep)[source]

Select the top-N sites by normalized SHAP importance.

Inputs :

df‘pandas.DataFrame’

SHAP site table containing “importance_norm”.

n_keep‘int’

Number of rows to keep.

Outputs :

df_top‘pandas.DataFrame’

Copy of top-N sites sorted by descending importance.

The plotting module contains visualization utilities for model predictions, SHAP summaries, and geospatial analysis.