idps-escape

ADBox detectors can be used to find anomalies by computing over Wazuh data both on historical as well as online data. If nothing is specified, the prediction pipeline outcomes are stored in the corresponding detector folder.

Since IDPS-ESCAPE version 0.1.4, it is possible to ship prediction outcomes to the Wazuh indexer and consult them directly via the Wazuh Dashboard.

Outline

Enable data shipping to Wazuh
Detector data streams
Integration into the Wazuh dashboard

Enable data shipping to the Wazuh Indexer

Install

The shipping must be installed as follows:

$ ./build-adbox.sh
$ ./adbox.sh -s

Detector with shipping

The training and prediction pipelines are instructed via use cases. To enable data shipping to the Wazuh indexer, it is sufficient to run ADBox with the flag -s:

$ ./build-adbox.sh
$ ./adbox.sh -u {number} -s

This way, ADBox:

at training time, creates the template of the detector data stream associated with the detector.
at prediction time, adds the outcomes to the corresponding detector data stream.

Attention: running a prediction-only use case with shipping enabled, but using a detector ID that does not have an associated detector data stream will fail (and raise/throw an exception).

Detector data streams

A detector stream is a data stream index in the Wazuh Indexer (i.e., its underlying OpenSearch distribution).

When ADBox is run with the -s flag, after training, the data shipper creates a template for the detector stream in the Wazuh indexer. Later, if the stream is not removed, every time a prediction is run with the same detector (using the shipping flag), the data is shipped to such a data stream index, i.e. the detector stream.

Depending on the detector’s characteristics and usage (e.g., continuous real time), it is crucial to perform regular rollovers. Along with data, the shipping of a basic rollover policy is defined and installed, but we suggest tailoring it according to the expected usage.

While currently the only AD algorithm implemented is MTAD-GAT, the Data Shipper supports possible extensions of ADBox via constructing detector streams by building templates following a hierarchical convention. Major advantages of using templates are:

inherent typing for the documents’ fields, which in practice simplifies the search and visualization of predicted data.
resource efficiency; indeed, while the detector’s template is created at training time, the data stream is only created when the first documents are actually shipped.

Templates and template components

Detector templates are OpenSearch composable index templates. We refer to the OpenSearch official documentation for details.

The base template is the ADBox stream template which has only one field, except for the timestamp, which is the boolean field is_anomaly.

For every algorithm used for detection, we would then have a dedicated template. This template is obtained by using the base template and adding a specific template component, encoding the specific fields generated by the algorithm. For example, the component associated to MTAD-GAT includes the numeric fields anomaly_score and threshold.

As part of the prediction outcomes of MTAD-GAT, other than the two fields mentioned above, we also have fields associated to the features, e.g, feature-wise thresholds and score. Since these fields depend on the single detector, we use dedicated template components.

All these templates are automatically created and installed behind the scenes when enabling the shipping feature. In fact, when ADBox is run with the -s flag for the first time, the base template, the (existing) algorithms’ components and templates are installed.

Then, if a detector is trained with the shipping option on, the data shipper creates:

detector data stream components,
the detector data stream template, and
the detector data stream.

Subsequent predictions will be appended to the existing detector data stream.

Naming convention

Templates

With xyz and 123 being ID examples:

Template name	Components	Priority	Index Patterns
`adbox_stream_template`	-	1	`adbox_detector_*`
`adbox_stream_template_mtad_gat`	`component_template_mtad_gat`	2	`adbox_detector_mtad_gat_*`
`adbox_detector_mtad_gat_xyz`	`component_template_mtad_gat`, `component_template_xyz`	3	`adbox_detector_mtad_gat_xyz`
`adbox_detector_mtad_gat_123`	`component_template_mtad_gat`, `component_template_123`	3	`adbox_detector_mtad_gat_123`

Notice that detector’s templates have the same priority.

Data streams

Stream name	Template name	Timefield
`adbox_detector_mtad_gat_xyz`	`adbox_detector_mtad_gat_xyz`	`timestamp`
`adbox_detector_mtad_gat_123`	`adbox_detector_mtad_gat_123`	`timestamp`

Suppose a new detection algorithm is included in ADBox. In order to be able to ship the prediction to Wazuh, it is necessary to update the engine, extend the data shipper, the request-response handler, etc. In particular, concerning the shipper one must:

Add an algorithm component template in siem_mtad_gat/shipper/wazuh_base_template.py
Add the algorithm’s keyword to ML_ALGORITHMS_LIST in settings.

Integration into the Wazuh dashboard

To visualize the prediction via the Wazuh Dashboard, it is necessary to include the index pattern in the dashboard patterns (pictures ref. Wazuh 4.8.1).

Open Dashboard Management.
Select Dashboard Management>Index patterns and create a new index.
Add the data stream pattern using the naming convention. For example, a detector using MTAD-GAT algorithm with ID xyz should be named adbox_detector_mtad_gat_xyz. The detector data stream, if it exists, must appear in the list.
Select timestamp as time field.

Finally, the pattern is created. The field names correspond to the prediction outcome fields. Additionally, their types are automatically generated by ADBox according to the detector’s template.

The data can be navigated using the Discover dashboard.

Wazuh Dashboard Discover ADBox Detector

For an improved visualization, we explain here how to construct a dedicated Detector Dashboard in the Wazuh Dashboard.

Tip: add the pattern adbox-*. Indeed, using this pattern you can visualize the data from every detector on the same page.

Rollover policy and ISM

A rollover rolls an alias over to a new writing index, see the official documentation. In practice, rollover break down the data stream into smaller indices:

When you perform a rollover operation on a data stream, the API generates a fresh write index for that stream. Simultaneously, the stream’s preceding write index transforms into a regular backing index. Additionally, the rollover process increments the generation count of the data stream.

Rollover can be performed

manually,
- using the dashboard
- via the API
automatically, when the managed index meets one of the rollover conditions via policies. A policy can be defined
- via the ISM API
- using the dashboard in Index Management>State Management policies.

When ADBox is installed via ./adbox -s a base rollover policy adbox_detectors_rollover is installed too. The policy performs a rollover when at least 4000 documents are stored. Once the policy is in place, all the new detector data streams will be managed accordingly.

The body of the rollover policy is at /siem_mtad_gat/assets/shipper/wazuh_adbox_rollover_policy.json.

At the moment, policy update is not supported directly from ADBox, but can be done using the dashboard. New policies can be added using the method add_policy from the WazuhDataShipper class.

Before either changing or defining new policies, we recommend consulting the OpenSearch documentation as the policy application may not be straightforward. For example:

ISM checks the conditions for operations on every execution of the policy based on the set interval, not continuously. The rollover will be performed if the value has reached or has exceeded the configured limit when the check is performed.

Example

To display the effects of changing the base policy, we modified the adbox_detectors_rollover condition to at least 5 documents.

Thus, we trained a new detector with granularity 30s and run it in real time. The picture below shows that the newly created indices contain a non-uniform number of documents. The reason is that the policy application interval and the shipping are not perfectly aligned.

This site is open source. Improve this page.