idps-escape

Use-case guide

The anomaly detection pipeline via ADBox can be easily customize by creating a use-case. In this context, a use-case is a sequence of actions to be performed and the characteristics of the desired outcome.

Examples of “informal” use-cases are:

  1. Create and train a detector using data about the Linux resource usage using data from March.
  2. Create and train a detector using data about the Linux resource usage using data from March and apply it to check if there were anomalies on May the 3rd.
  3. Use detector X for real-time detection.
  4. Use detector X for batch detection on batches of 10 windows at the time.

These informal use-cases can be translated into real action by using a provided YAML template, as explained in the following section.

Writing a use-case as YAML File

The YAML file for detector training and prediction includes parameters to configure the training and prediction processes. Below is a guide explaining the purpose of each parameter, its default value, and format.

Attention: Default values are specified in default configuration file. If the file is changed they may not coincide with those listed below.

Training Input Parameters

1. index_date

Represents the data source index where the training data should be fetched from.

2. categorical_features

Specifies if the given input features include categorical features.

3. columns

List of columns used as features to train the detector.

4. aggregation

Specifies if the column values should be aggregated.

5. aggregation_config

6. train_config

7. display_name

Prediction/Detection Input Parameters

1. run_mode

3. detector_id

4. start_time

5. end_time

6. batch_size

Example use-case yaml file:

training: 
  index_date: "default"
  categorical_features: false
  columns: 
    - "rule.firedtimes"  
    - "rule.level"  
  aggregation: true
  aggregation_config: 
    fill_na_method: "Zero"
    granularity: "1min"
    features: 
      rule.firedtimes:
        - "average"
        - "max"
      rule.level:
        - "average"
        - "max"
  train_config: 
    window_size: 10
    epochs: 8
  display_name: "default"

prediction: 
  run_mode: HISTORICAL
  detector_id: "default"
  start_time: "default"
  end_time: "default"

Default behavior and use-case parsing

The engine parses and transforms use-case input to data consumable by the detection pipelines, by calling the DetectorsConfigManager. The manager reads the default values from the default configuration file.

Notice that there are special key whose default value depends on the function call time (i.e., dynamic default values). See key name mapping config_keys.

For them, if their value is default, a special methods are called to determine the present value.

### For training: get_train_config_from_uc_yaml

If the key config_keys.UC_TRAINING (i.e, training) is missing in the input use-case yaml file, get_train_config_from_uc_yaml returns None.

Otherwise, when either no use-case number or key values are provided it completes the training configuration dictionary, using get the default configuration values.

### For prediction: get_predict_config_from_uc_yaml

If either the key config_keys.UC_PREDICTION (i.e, prediction) is missing in the input uc or no use-case number is specified,get_predict_config_from_uc_yaml returns None. Otherwise, it uses the default configuration file to get the default values, whenever key values are missing. The keys’ list depends on the runmode.