BYOV (Bring Your Own Variable)
EWB ships several derived variables out of the box:
AtmosphericRiverVariables (IVT, AR mask, land intersection),
CravenBrooksSignificantSevere (mixed-layer CAPE × shear), and
TropicalCycloneTrackVariables (TempestExtremes-style track detection).
When you need a derived quantity that isn't covered — a custom composite
index, a variable computed from pressure levels, etc. — you can subclass
DerivedVariable and use it anywhere a variable name string is accepted.
See Usage for how variables feed into EvaluationObject.
The DerivedVariable interface
Two things are required of every subclass:
- A class-level
variableslist — the names of the raw inputs your derivation needs (EWB uses this to subset the dataset before callingderive_variable). - A
derive_variable(self, data, *args, **kwargs)method that accepts anxr.Datasetand returns anxr.DataArray(orxr.Datasetfor multi-output variables).
class DerivedVariable(abc.ABC):
variables: list[str] # raw inputs required
def derive_variable(
self, data: xr.Dataset, *args, **kwargs
) -> xr.DataArray:
...
Detailed Explanation: EWB calls
derive_variableafter subsetting the forecast (or target) dataset to the current case. Thedataargument therefore contains only the spatial and temporal extent of the active case, already renamed viavariable_mapping. Thekwargsmay includecase_metadata(anIndividualCaseobject) and, for derived variables that setrequires_target_dataset = True, a_target_datasetkeyword holding the subset target data.
Example 1 — Surface dewpoint depression
This computes 2 m dewpoint depression (temperature minus dewpoint) as a single derived variable:
import xarray as xr
import extremeweatherbench as ewb
from extremeweatherbench.derived import DerivedVariable
class DewpointDepression(DerivedVariable):
"""2 m dewpoint depression (T2m - Td2m) in Kelvin."""
variables = [
"surface_air_temperature",
"surface_dewpoint_temperature",
]
def __init__(self, name: str = "dewpoint_depression"):
super().__init__(name=name)
def derive_variable(
self, data: xr.Dataset, *args, **kwargs
) -> xr.DataArray:
depression = (
data["surface_air_temperature"]
- data["surface_dewpoint_temperature"]
)
depression.name = self.name
return depression
Using a derived variable in an evaluation
Pass a DerivedVariable instance anywhere a variable name string is
accepted — inside variables on a forecast or target, or as
forecast_variable / target_variable on a metric:
import extremeweatherbench as ewb
dd = DewpointDepression()
my_metric = ewb.metrics.MeanAbsoluteError(
forecast_variable=dd,
target_variable=dd,
)
eval_objects = [
ewb.EvaluationObject(
event_type="heat_wave",
metric_list=[my_metric],
target=ewb.ERA5(),
forecast=my_forecast,
),
]
cases = ewb.load_cases()
runner = ewb.evaluation(case_metadata=cases, evaluation_objects=eval_objects)
outputs = runner.run_evaluation()
Detailed Explanation: When EWB encounters a
DerivedVariablein a metric'sforecast_variableortarget_variable, it uses thevariablesclass attribute to pull the required raw inputs from the dataset, then callsderive_variableto produce the derived DataArray before computing the metric. Only oneDerivedVariableperEvaluationObjectis supported; if you need multiple derived variables, create separateEvaluationObjectinstances for each.
Example 2 — Multi-level wind speed
For a variable that requires pressure-level data, include the 3-D
dimension variable names in variables:
import numpy as np
import xarray as xr
from extremeweatherbench.derived import DerivedVariable
class WindSpeed500hPa(DerivedVariable):
"""Wind speed at 500 hPa from u and v components."""
variables = ["eastward_wind", "northward_wind"]
def __init__(self, name: str = "wind_speed_500hPa"):
super().__init__(name=name)
def derive_variable(
self, data: xr.Dataset, *args, **kwargs
) -> xr.DataArray:
u500 = data["eastward_wind"].sel(level=500)
v500 = data["northward_wind"].sel(level=500)
speed = np.sqrt(u500 ** 2 + v500 ** 2)
speed.name = self.name
return speed
Example 3 — Multi-output derived variable
When your derivation produces several outputs (like AtmosphericRiverVariables),
return an xr.Dataset and specify which variables downstream code
should see via output_variables:
import xarray as xr
from extremeweatherbench.derived import DerivedVariable
class HeatStressIndex(DerivedVariable):
"""Wet-bulb globe temperature approximation and heat index."""
variables = [
"surface_air_temperature",
"surface_relative_humidity",
]
def __init__(
self,
name: str = "heat_stress",
output_variables=None,
):
if output_variables is None:
output_variables = ["heat_index", "wbgt_approx"]
super().__init__(name=name, output_variables=output_variables)
def derive_variable(
self, data: xr.Dataset, *args, **kwargs
) -> xr.Dataset:
t = data["surface_air_temperature"] - 273.15 # to °C
rh = data["surface_relative_humidity"]
# Simplified heat index (Rothfusz regression, °C)
hi = (
-8.784695
+ 1.61139411 * t
+ 2.338549 * rh
- 0.14611605 * t * rh
- 0.01230809 * t ** 2
- 0.01642482 * rh ** 2
)
# Very rough WBGT approximation
wbgt = 0.7 * (t - (100 - rh) / 5) + 0.3 * t
return xr.Dataset(
{"heat_index": hi, "wbgt_approx": wbgt}
)
Specify a single output variable when creating the metric:
import extremeweatherbench as ewb
hi_variable = HeatStressIndex(output_variables=["heat_index"])
hi_metric = ewb.metrics.MaximumMeanAbsoluteError(
forecast_variable=hi_variable,
target_variable=hi_variable,
)
Complete Example
The DewpointDepression derived variable from Example 1, wired into a full
heat wave evaluation.
import datetime
import xarray as xr
import extremeweatherbench as ewb
from extremeweatherbench.derived import DerivedVariable
from extremeweatherbench.cases import IndividualCase
from extremeweatherbench.regions import BoundingBoxRegion
demo_case = IndividualCase(
case_id_number=9005,
title="2017 Lucifer European Heat Wave (demo)",
start_date=datetime.datetime(2017, 8, 2),
end_date=datetime.datetime(2017, 8, 5),
location=BoundingBoxRegion.create_region(
latitude_min=39.0,
latitude_max=45.0,
longitude_min=12.0,
longitude_max=18.0,
),
event_type="heat_wave",
)
cases = [demo_case]
class DewpointDepression(DerivedVariable):
"""2 m dewpoint depression (T2m - Td2m) in Kelvin."""
variables = [
"surface_air_temperature",
"surface_dewpoint_temperature",
]
def __init__(self, name: str = "dewpoint_depression"):
super().__init__(name=name)
def derive_variable(
self, data: xr.Dataset, *args, **kwargs
) -> xr.DataArray:
depression = (
data["surface_air_temperature"]
- data["surface_dewpoint_temperature"]
)
depression.name = self.name
return depression
dd = DewpointDepression()
forecast = ewb.ZarrForecast(
source="gs://weatherbench2/datasets/hres/2016-2022-0012-1440x721.zarr",
name="HRES",
variable_mapping=ewb.HRES_metadata_variable_mapping,
storage_options={"remote_options": {"anon": True}},
)
target = ewb.ERA5(
variables=[
"surface_air_temperature",
"surface_dewpoint_temperature",
]
)
eval_objects = [
ewb.EvaluationObject(
event_type="heat_wave",
metric_list=[
ewb.metrics.MeanAbsoluteError(
forecast_variable=dd,
target_variable=dd,
),
],
target=target,
forecast=forecast,
),
]
runner = ewb.evaluation(
case_metadata=cases,
evaluation_objects=eval_objects,
)
outputs = runner.run_evaluation()
print(outputs)