The anomaly detection pipeline via ADBox can be easily customize by creating a use-case. In this context, a use-case is a sequence of actions to be performed and the characteristics of the desired outcome.
Examples of “informal” use-cases are:
These informal use-cases can be translated into real action by using a provided YAML template, as explained in the following section.
The YAML file for detector training and prediction includes parameters to configure the training and prediction processes. Below is a guide explaining the purpose of each parameter, its default value, and format.
Attention: Default values are specified in default configuration file. If the file is changed they may not coincide with those listed below.
Represents the data source index where the training data should be fetched from.
default
which will be processed as {current_year}-{current_month}-*
(fetches all data for the current month).YYYY-MM-DD
or with wildcards (*
).Specifies if the given input features include categorical features.
False
.True
or False
).List of columns used as features to train the detector.
rule.firedtimes
, rule.level
.Specifies if the column values should be aggregated.
True
.True
or False
).aggregation
is True
.
Zero
."Linear"
, "Previous"
, "Subsequent"
, "Zero"
or "Fixed"
.fill_na_method
is Fixed
.
0
.fill_na_method
is Fixed
).1min
."1min"
, "5min"
, "1s"
, "1hour"
, etc.).rule.firedtimes
: ["average", "max"]
.rule.level
: ["average", "max"]
."average"
, "max"
, "min"
, "count"
, or "sum"
.10
.8
.default
which is converted asdetector_<current timestamp>
.default
. ("historical"
run mode)."historical"
, "batch"
, "realtime"
).
default
(most recently trained detector).default
(start of the current day at midnight).YYYY-MM-DDTHH:MM:SSZ
.default
(current timestamp).YYYY-MM-DDTHH:MM:SSZ
.batch
run mode.10
.training:
index_date: "default"
categorical_features: false
columns:
- "rule.firedtimes"
- "rule.level"
aggregation: true
aggregation_config:
fill_na_method: "Zero"
granularity: "1min"
features:
rule.firedtimes:
- "average"
- "max"
rule.level:
- "average"
- "max"
train_config:
window_size: 10
epochs: 8
display_name: "default"
prediction:
run_mode: HISTORICAL
detector_id: "default"
start_time: "default"
end_time: "default"
The engine parses and transforms use-case input to data consumable by the detection pipelines, by calling the DetectorsConfigManager
. The manager reads the default values from the default configuration file.
Notice that there are special key whose default value depends on the function call time (i.e., dynamic default values). See key name mapping config_keys
.
For them, if their value is default
, a special methods are called to determine the present value.
### For training: get_train_config_from_uc_yaml
If the key config_keys.UC_TRAINING
(i.e, training
) is missing in the input use-case yaml file, get_train_config_from_uc_yaml
returns None
.
Otherwise, when either no use-case number or key values are provided it completes the training configuration dictionary, using get the default configuration values.
Dynamic default values for training:
detector name: based on request’s timestamp
input index: current month
### For prediction: get_predict_config_from_uc_yaml
If either the key config_keys.UC_PREDICTION
(i.e, prediction
) is missing in the input uc or no use-case number is specified,get_predict_config_from_uc_yaml
returns None
. Otherwise, it uses the default configuration file to get the default values, whenever key values are missing. The keys’ list depends on the runmode.
detector_id
detector_id
, batch_size
Historical: detector_id
, start_time
, end_time
.