Here we provide a complete tutorial ranging from use case definition to building a custom Wazuh dashboard for a nicely formatted and dynamically generated visualization of the ADBox prediction results for an easier and more efficient analysis.
Our (informal) objectives are as follows:
Note that the chosen setting is similar to the one in the example using notebooks.
Outline
To achieve (1), we make the following selections:
data.memory_usage_%
(avg
)data.cpu_usage_%
(avg
)rule.firedtimes
(count
)detector_test_mem_cpu_alert_count
30s
window_size
10)For (2), we have to choose HISTORICAL
prediction with start_time
2024-11-01T00:00:00Z
. Indeed, if not specified, the end time is determined using the request’s timestamp.
We define a use case:
training:
aggregation: true
aggregation_config:
features:
data.cpu_usage_%:
- average
data.memory_usage_%:
- average
rule.firedtimes:
- count
fill_na_method: Zero
granularity: 30s
padding_value: 0
categorical_features: false
columns:
- data.memory_usage_%
- data.cpu_usage_%
- rule.firedtimes
display_name: detector_test_mem_cpu_alert_count
index_date: '2024-10-*'
train_config:
epochs: 30
window_size: 10
prediction:
run_mode: HISTORICAL
start_time: "2024-11-01T00:00:00Z"
Let’s assume we have cloned the repository into /home/user/SIEM-MTAD-GAT
and there are already 12 use cases defined, we must save our newly created ones at /home/user/SIEM-MTAD-GAT/siem_mtad_gat/assets/drivers/uc_13.yaml
.
We have chosen to unify the training and historical prediction into a single use case. However, by creating two separate use cases (one for training and one for historical prediction) and running them in sequence, we would obtain the same result.
(Re)build ADBox container:
$ ./build-adbox.sh
Run ADBox with the shipping option/flag:
$ ./adbox.sh -u 13 -s
The effect of this command is:
the creation of a detector folder
../siem_mtad_gat/assets/detector_models/baa9b7bf-e05d-4ce9-a1c1-3e82ff4c9f15
baa9b7bf-e05d-4ce9-a1c1-3e82ff4c9f15
├── input
│ ├── detector_input_parameters.json
│ └── training_config.json
├── prediction
│ ├── uc-13_predicted_anomalies_data-1_2024-11-08T09:43:41Z.json
│ └── uc-13_predicted_data-1_2024-11-08T09:43:41Z.json
└── training
├── losses_train_data.json
├── model.pt
├── scaler.pkl
├── spot
│ ├── spot_feature-0.pkl
│ ├── spot_feature-1.pkl
│ ├── spot_feature-2.pkl
│ └── spot_feature-global.pkl
├── test_output.pkl
├── train_losses.png
├── train_output.pkl
└── validation_losses.png
the creation of a detector data stream adbox_detector_mtad_gat_baa9b7bf-e05d-4ce9-a1c1-3e82ff4c9f15
in the Wazuh indexer and its corresponding template and template component.
This can be verified by accessing the Wazuh dashboard, and navigating to Indexer Management>Index Management (see figures below).
the addition of historical prediction. This can be either verified by following the steps explained in dashboard integration or using the Wazuh Indexer API (OpenSearch API) at Indexer Management>Dev Tools.
To achieve (3), we need a different use case choosing realtime
.
If not specified, the engine runs the prediction pipeline with the last trained detector.
To specify a detector, save a new use case ../siem_mtad_gat/assets/drivers/uc_14.yaml
prediction:
run_mode: "realtime"
detector_id: "baa9b7bf-e05d-4ce9-a1c1-3e82ff4c9f15"
Otherwise, we can use an already defined use case ../siem_mtad_gat/assets/drivers/uc_7.yaml
prediction:
run_mode: "realtime"
Tip 2: If you plan to close the terminal, use GNU screen
to be able to detach while the detector keeps running
Run ./adbox.sh -u 13 -s
The effect of this command is:
baa9b7bf-e05d-4ce9-a1c1-3e82ff4c9f15
├── input
│ ├── detector_input_parameters.json
│ └── training_config.json
├── prediction
│ ├── spot
│ │ ├── spot_feature-0.pkl
│ │ ├── spot_feature-1.pkl
│ │ ├── spot_feature-2.pkl
│ │ └── spot_feature-global.pkl
│ ├── uc-13_predicted_anomalies_data-1_2024-11-08T09:43:41Z.json
│ ├── uc-13_predicted_data-1_2024-11-08T09:43:41Z.json
│ ├── uc-7_predicted_anomalies_data-1_2024-11-08T09:45:38Z.json
│ └── uc-7_predicted_data-1_2024-11-08T09:45:38Z.jso
└── training
├── losses_train_data.json
├── model.pt
├── scaler.pkl
├── spot
│ ├── spot_feature-0.pkl
│ ├── spot_feature-1.pkl
│ ├── spot_feature-2.pkl
│ └── spot_feature-global.pkl
├── test_output.pkl
├── train_losses.png
├── train_output.pkl
└── validation_losses.png
Add the detector data stream pattern to the Wazuh dashboard as explained in dashboard integration.
In the Wazuh dashboard (ref. version 4.8.1)
A dedicated dashboard can include different visualizations that provide a complete overview. We provide some examples both regarding global and feature-wise statistics. Indeed, even though feature-wise scores and thresholds do not contribute to determining if a window is anomalous, they can help us understand trends in time series. See also Example.
Since the detector data stream adbox_detector_mtad_gat_baa9b7bf-e05d-4ce9-a1c1-3e82ff4c9f15
was defined using a dedicated template,
we can manipulate all the numerical fields.
Eventually, the dashboard should look as shown below:
The MTAD-GAT algorithm marks as anomalous windows with an anomaly score higher than the local threshold. Therefore, as base graphic we add visualizations showing both these parameters in our timeseries .
Remarks:
We used vizbuilder, but different option are also valuable
Remember to select the correct pattern. Moreover, for aggregating
score and threshold we suggest using max
to be sure to visualize anomalies.
Using the same visualization type for every feature, we add a visualization for feature-wise score and threshold to the dashboard.
Among the feature-wise fields of the adbox_detector_mtad_gat_baa9b7bf-e05d-4ce9-a1c1-3e82ff4c9f15
document,
we find the (preprocessed) true value, its forecasted and reconstructed univariate timeseries for every feature.
Therefore, we can build a visualization comparing them.
We use a table to display numerical and boolean values explicitly.
Gauges provide a fairly simple visualization.
Simply adding a filter using the using the boolean field is_anomaly
with the count metric we get an anomaly counter.
To complement the data produced by the detector, we add some visualizations using the index pattern
wazuh-alerts-*
to the dashboard. Namely, information from the the original data.
A table displaying the value of original data.
A time series line plot displaying the value of original memory and cpu usage in percentage.
A time series line plot displaying the alert counts.
Combining Discover Dashboard and our Detector Dashboard we can investigate anomalies.