Data generation
sample_curves(dataset_specification, f=None, w0=None, random_state=2024, measurement_scale=None, callback=None)
Samples synthetic curves given a dataset specification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_specification
|
dict
|
A dataset specification which contains |
required |
f
|
Callable
|
The function to fit the curves. Use this parameter if no function is specified |
None
|
w0
|
ndarray
|
The inital guess for the optimization problem used to synthesize curves. |
None
|
random_state
|
int
|
The random state for reproducablity. |
2024
|
measurement_scale
|
float
|
The scale for the noise applied on the evaluated curves. If not |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
The coefficients for each sampled curve. |
list[LatentInformation]
|
The latent information for each sampled curve. |
ndarray
|
The evaluated sampled curves. |
Source code in driftbench/data_generation/sample.py
Drift
Represents a drift for 1d or 2d input.
Source code in driftbench/data_generation/drifts.py
__init__(start, end, feature=None, dimension=0)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
int
|
The start index. |
required |
end
|
int
|
The end index. |
required |
feature
|
str
|
The feature the drift should be applied on. |
None
|
dimension
|
int
|
The dimension the drift should be applied on. |
0
|
Source code in driftbench/data_generation/drifts.py
transform(X)
abstractmethod
Applies the transformation specified by the drift object on the given input Args: X (numpy.ndarray): The 1d- or 2d-input data to be drifted.
DriftSequence
Represents a sequence of drifts, which will be applied on a latent information object.
Source code in driftbench/data_generation/drifts.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
__init__(drifts)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drifts
|
list[Drift]
|
A list of drifts which are being used for the transformation. |
required |
apply(X)
Applies the transformation by the given drifts on the latent information input.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
list[LatentInformation]
|
The list of latent information the drifts are applied on. |
required |
Returns:
| Type | Description |
|---|---|
list[LatentInformation]
|
A list of drifted latent information according to the drift sequence. |
Source code in driftbench/data_generation/drifts.py
get_aggregated_drift_bounds()
Returns the aggregated drift bounds, i.e. the maximum range where drifts are applied.
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
A tuple of (int, int), where the first value denotes the start |
|
index and the second value the end index of the aggregated drift bounds. |
Source code in driftbench/data_generation/drifts.py
get_drift_intensities()
Returns the intensities for each range in the drift sequence. Each drift has a base intensity of 1, and when multiple drifts overlap, the intensity becomes the number of the drifts present in the given range.
Returns:
| Type | Description |
|---|---|
dict[tuple[int, int], int]
|
A dictionary with tuples as keys and ints as values. |
|
The keys indicate the range of the drift intensity, and the values indicate the intensity. |
Source code in driftbench/data_generation/drifts.py
get_individual_drift_bounds()
Returns the drift bounds for each individual drift in the drift sequence.
Returns:
| Type | Description |
|---|---|
list[tuple[int, int]]
|
A list of tuples of (int, int), where the first value denotes |
|
the start of the drift, and the second value the end of the drift. |
Source code in driftbench/data_generation/drifts.py
LinearDrift
Bases: Drift
Represents a linear drift for a 1d or 2d-input, i.e. a drift where the input data is drifted in a linear fashion.
Source code in driftbench/data_generation/drifts.py
__init__(start, end, m, feature=None, dimension=0)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
int
|
The start index. |
required |
end
|
int
|
The end index. |
required |
m
|
float
|
The slope of the linear drift. Usually in the range (-1, 1) |
required |
feature
|
str
|
The feature the drift should be applied on. |
None
|
dimension
|
int
|
The dimension the drift should be applied on. |
0
|
Source code in driftbench/data_generation/drifts.py
JaxCurveGenerationSolver
Bases: Solver
Fits latent information according to a given function.
Source code in driftbench/data_generation/solvers.py
__init__(f, w0, max_fit_attemps)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
f
|
Callable
|
The function. |
required |
w0
|
list - like
|
The initial guess for the solution. |
required |
max_fit_attemps
|
int
|
The maxmium number of attempts to refit a curve, if optimization didn't succeed. |
required |
random_seed
|
int
|
The random seed for the random number generator. |
required |
Source code in driftbench/data_generation/solvers.py
Solver
Represents a backend for solving an optimization problem.
Source code in driftbench/data_generation/solvers.py
solve(X)
abstractmethod
Solves an optimization problem defined by the solver.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
list - like
|
Input to optimize according to solver instance. |
required |
Returns:
| Type | Description |
|---|---|
ndarray | ndarray
|
The parameters obtained by solving the optimzation problem. |
Source code in driftbench/data_generation/solvers.py
LatentInformation
dataclass
Represents the local latent information for high-dimensional object, which is used to generate such high-dimensional data. Currently, this structure is designed for creating curves meeting the conditions provided by the attributes defined in this class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y0
|
list - like
|
The y-values of a function. |
required |
x0
|
list - like
|
The x-values of a function. Hence, no duplicates are allowed. |
required |
y1
|
list - like
|
The y-values of the derivative of a function. |
required |
x1
|
list - like
|
The x-values of the derivative of a function. |
required |
y2
|
list - like
|
The y-values of the derivative of a function. |
required |
x2
|
list - like
|
The x-values of the second derivative of a function. |
required |
Source code in driftbench/data_generation/latent_information.py
Drift detection
AggregateFeatureAlgorithm
Bases: Detector
Detector that aggregates features over temporal axis.
Source code in driftbench/drift_detection/detectors.py
AlwaysGuessDriftDetector
AutoencoderDetector
Bases: Detector, Module
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_layers
|
list
|
List of number of neurons in each layer after input of encoder |
required |
retrain_always
|
bool
|
If true, model is always retrained when predict is called. |
False
|
Source code in driftbench/drift_detection/detectors.py
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 | |
ClusterDetector
Bases: Detector
Cluster based drift detector.
Source code in driftbench/drift_detection/detectors.py
Detector
Detector base class.
Source code in driftbench/drift_detection/detectors.py
MMDDetector
Bases: Detector
Implementation of MMD algorithm as drift detector based on the Maximum Mean Discrepancy as defined in Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(Mar):723–773, 2012. This implementation is based on the blog post of Onur Tunali in https://www.onurtunali.com/ml/2019/03/08/maximum-mean-discrepancy-in-machine-learning.html
Source code in driftbench/drift_detection/detectors.py
RandomGuessDetector
RollingMeanDifferenceDetector
Bases: Detector
Calculates the maximum value over a rolling mean across time and returns the absolute difference between subsequent steps.
Source code in driftbench/drift_detection/detectors.py
RollingMeanStandardDeviationDetector
Bases: Detector
Detector that applies a rolling mean followed by a rolling standard deviation and returns the result as the drift score.
Source code in driftbench/drift_detection/detectors.py
SlidingKSWINDetector
Bases: Detector
Detector based on KS-test.
Source code in driftbench/drift_detection/detectors.py
The metrics module.
AUC
SoftTAUC
Bases: Metric
A softened version of the TAUC.
Source code in driftbench/drift_detection/metrics.py
TemporalAUC
Bases: Metric
The temporal area under the curve.
Source code in driftbench/drift_detection/metrics.py
Benchmarks
Dataset
Represents a container class for a dataset specification for benchmarking purposes.
Source code in driftbench/benchmarks/data.py
__init__(name, spec, f=None, w0=None, n_variations=5)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the dataset specification. |
required |
spec
|
dict
|
The yaml-specification of the dataset. |
required |
f
|
Callable
|
The function to fit the curves. |
None
|
w0
|
ndarray
|
The inital value for the internal parameters. |
None
|
n_variations
|
int
|
The number of variations each dataset is sampled. |
5
|