ADBox detectors can be used to find anomalies by computing over Wazuh data both on historical as well as online data. If nothing is specified, the prediction pipeline outcomes are stored in the corresponding detector folder.
Since IDPS-ESCAPE version 0.1.4, it is possible to ship prediction outcomes to the Wazuh indexer and consult them directly via the Wazuh Dashboard.
Outline
The shipping must be installed as follows:
$ ./build-adbox.sh
$ ./adbox.sh -s
The training and prediction pipelines are instructed via use cases. To enable data shipping to the Wazuh indexer, it is sufficient to run ADBox with the flag -s
:
$ ./build-adbox.sh
$ ./adbox.sh -u {number} -s
This way, ADBox:
Attention: running a prediction-only use case with shipping enabled, but using a detector ID that does not have an associated detector data stream will fail (and raise/throw an exception).
A detector stream is a data stream index in the Wazuh Indexer (i.e., its underlying OpenSearch distribution).
When ADBox is run with the -s
flag, after training, the data shipper creates a template for the detector stream in the Wazuh indexer.
Later, if the stream is not removed, every time a prediction is run with the same detector (using the shipping flag), the data is shipped to such a data stream index, i.e. the detector stream.
Depending on the detector’s characteristics and usage (e.g., continuous real time), it is crucial to perform regular rollovers. Along with data, the shipping of a basic rollover policy is defined and installed, but we suggest tailoring it according to the expected usage.
While currently the only AD algorithm implemented is MTAD-GAT, the Data Shipper supports possible extensions of ADBox via constructing detector streams by building templates following a hierarchical convention. Major advantages of using templates are:
Detector templates are OpenSearch composable index templates. We refer to the OpenSearch official documentation for details.
The base template is the ADBox stream template which has only one field, except for the timestamp, which is the boolean field is_anomaly
.
For every algorithm used for detection, we would then have a dedicated template. This template is obtained by using the base template and adding a specific template component, encoding the specific fields generated by the algorithm. For example, the component associated to MTAD-GAT includes the numeric fields anomaly_score
and threshold
.
As part of the prediction outcomes of MTAD-GAT, other than the two fields mentioned above, we also have fields associated to the features, e.g, feature-wise thresholds and score. Since these fields depend on the single detector, we use dedicated template components.
All these templates are automatically created and installed behind the scenes when enabling the shipping feature. In fact, when ADBox is run with the -s
flag for the first time, the base template, the (existing) algorithms’ components and templates are installed.
Then, if a detector is trained with the shipping option on, the data shipper creates:
Subsequent predictions will be appended to the existing detector data stream.
With xyz
and 123
being ID examples:
Template name | Components | Priority | Index Patterns |
---|---|---|---|
adbox_stream_template |
- | 1 | adbox_detector_* |
adbox_stream_template_mtad_gat |
component_template_mtad_gat |
2 | adbox_detector_mtad_gat_* |
adbox_detector_mtad_gat_xyz |
component_template_mtad_gat , component_template_xyz |
3 | adbox_detector_mtad_gat_xyz |
adbox_detector_mtad_gat_123 |
component_template_mtad_gat , component_template_123 |
3 | adbox_detector_mtad_gat_123 |
Notice that detector’s templates have the same priority.
Stream name | Template name | Timefield |
---|---|---|
adbox_detector_mtad_gat_xyz |
adbox_detector_mtad_gat_xyz |
timestamp |
adbox_detector_mtad_gat_123 |
adbox_detector_mtad_gat_123 |
timestamp |
Suppose a new detection algorithm is included in ADBox. In order to be able to ship the prediction to Wazuh, it is necessary to update the engine, extend the data shipper, the request-response handler, etc. In particular, concerning the shipper one must:
siem_mtad_gat/shipper/wazuh_base_template.py
ML_ALGORITHMS_LIST
in settings
.To visualize the prediction via the Wazuh Dashboard, it is necessary to include the index pattern in the dashboard patterns (pictures ref. Wazuh 4.8.1).
Open Dashboard Management.
Select Dashboard Management>Index patterns and create a new index.
xyz
should be named adbox_detector_mtad_gat_xyz
. The detector data stream, if it exists, must appear in the list.
timestamp
as time field.Finally, the pattern is created. The field names correspond to the prediction outcome fields. Additionally, their types are automatically generated by ADBox according to the detector’s template.
The data can be navigated using the Discover dashboard.
For an improved visualization, we explain here how to construct a dedicated Detector Dashboard in the Wazuh Dashboard.
Tip: add the pattern adbox-*
. Indeed, using this pattern you can visualize the data from every detector on the same page.
A rollover rolls an alias over to a new writing index, see the official documentation. In practice, rollover break down the data stream into smaller indices:
When you perform a rollover operation on a data stream, the API generates a fresh write index for that stream. Simultaneously, the stream’s preceding write index transforms into a regular backing index. Additionally, the rollover process increments the generation count of the data stream.
Rollover can be performed
When ADBox is installed via ./adbox -s
a base rollover policy adbox_detectors_rollover
is installed too.
The policy performs a rollover when at least 4000 documents are stored. Once the policy is in place, all the new detector data streams will be managed accordingly.
The body of the rollover policy is at /siem_mtad_gat/assets/shipper/wazuh_adbox_rollover_policy.json
.
At the moment, policy update is not supported directly from ADBox, but can be done using the dashboard. New policies can be added using the method add_policy
from the WazuhDataShipper
class.
Before either changing or defining new policies, we recommend consulting the OpenSearch documentation as the policy application may not be straightforward. For example:
ISM checks the conditions for operations on every execution of the policy based on the set interval, not continuously. The rollover will be performed if the value has reached or has exceeded the configured limit when the check is performed.
To display the effects of changing the base policy, we modified the adbox_detectors_rollover
condition to at least 5 documents.
Thus, we trained a new detector with granularity 30s
and run it in real time. The picture below shows that the newly created indices contain a non-uniform number of documents. The reason is that the policy application interval and the shipping are not perfectly aligned.