1.0 ADBox training pipeline SWD-001

The diagram depicts the sequence of operations of the training pipeline, orchestrated by the ADBox Engine.

ADBox training pipeline sequence diagram

Parent links: LARC-001 ADBox training pipeline flow

2.0 ADBox prediction pipeline SWD-002

The diagram depicts the sequence of operations in the prediction pipeline, orchestrated by the ADBox Engine.

ADBox predict pipeline sequence diagram

Parent links: LARC-002 ADBox historical data prediction pipeline flow, LARC-008 ADBox batch and real-time prediction flow

3.0 MTAD-GAT training SWD-003

The diagram depicts the sequence of operations run by the function train_MTAD_GAT of the MTAD_GAT ML-subpackage of ADBox.

ADBox train_MTAD_GAT sequence diagram

Parent links: LARC-009 ADBox machine learning package

4.0 MTAD-GAT prediction SWD-004

MTAD GAT prediction sequence diagram

The diagram depicts the sequence of operations run by the function predict_MTAD_GAT of the MTAD_GAT ML-subpackage of ADBOX.

ADBox predict_MTAD_GAT sequence diagram

Parent links: LARC-009 ADBox machine learning package

5.0 Peak-over-threshold (POT) SWD-005

POT evaluation sequence diagram

The diagram depicts the sequence of operations run by the function pot_eval of the MTAD_GAT subpackage of ADBox.

This function runs the dynamic POT (i.e., peak-over-threshold) evaluation.

ADBox pot_eval sequence diagram

Parent links: LARC-009 ADBox machine learning package

6.0 ADBox Predictor score computation SWD-006

Predictor score computation sequence diagram

The diagram depicts the sequence of operations run by the function get_score method of the Predictor class in the MTAD GAT PyTorch subpackage of ADBox.

ADBox Predict.get_scores sequence diagram

Parent links: LARC-009 ADBox machine learning package

7.0 ADBox MTAD-GAT anomaly prediction SWD-007

ADBox MTAD GAT anomaly prediction sequence diagram

The diagram depicts the sequence of operations run by the function predict_anomalies method of the Predictor class in the MTAD GAT PyTorch subpackage of ADBox.

ADBox Predict.predict_anomalies sequence diagram

Parent links: LARC-009 ADBox machine learning package

8.0 ADBox MTAD-GAT Predictor SWD-008

ADBox MTAD GAT Predictor class diagram

The diagram below depicts the Predictor class of the MTAD GAT PyTorch subpackage of ADBox.

ADBox Predictor class diagram

Parent links: LARC-009 ADBox machine learning package

9.0 ADBox data managers SWD-009

ADBox data manager class diagrams

The diagram below depicts the Data manager classes of ADBox, all designed and implemented as Singleton classes.

ADBox Managers class diagram

Parent links: LARC-010 ADBox data manager

10 ADBox data transformer SWD-010

ADBox data transformer class diagram

The diagram below depicts the Data Transformer class of ADBox.

ADBox Transformer class diagram

Parent links: LARC-003 ADBox preprocessing flow

11 ADBox preprocessing SWD-011

ADBox preprocessing sequence diagram

The diagram summarizes the sequence of actions of the method Preprocessor.preprocessing in the ADBox DataTransformer.

ADBox Preprocessor.preprocessing sequence diagram

Parent links: LARC-003 ADBox preprocessing flow

12 ADBox TimeManager SWD-012

ADBox time manager class diagrams

The diagram below depicts the Time manager classes of ADBox.

ADBox TimeManager class diagram

Parent links: LARC-011 ADBox TimeManager

13 ADBox Prediction pipeline's inner body SWD-013

Prediction pipeline sequence diagram

The diagram depicts the sequence of operations in the prediction pipeline body (private method), called by the prediction pipeline.

ADBox predict pipeline sequence diagram

Parent links: LARC-002 ADBox historical data prediction pipeline flow, LARC-008 ADBox batch and real-time prediction flow

14 ADBox config managersactive: true SWD-014

ADBox config manager class diagrams

The diagram below depicts the Config manager classes of ADBox.

ADBox Managers class diagram

Parent links: LARC-012 ADBox ConfigManager

15 ADBox Shipper and Template Handler SWD-015

ADBox Shipper and Template Handler class diagrams

The diagram below depicts the DataShipper,WazuhDataShipper and TamplateHandler classes of ADBox.

ADBox Shipper classes diagram

Parent links: LARC-014 ADBox Shipper

16 ADBox shipping of prediction data SWD-016

Sequence diagram of ADBox shipping of prediction data

The diagram below depicts the sequence of actions orchestrated by the ADBox Engine when shipping is enabled within the prediction pipeline.

ADBox ship prediction sequence diagram

Parent links: LARC-014 ADBox Shipper

17 ADBox creation of a detector stream SWD-017

Sequence diagram of ADBox creation of a detector stream

The diagram below depicts the sequence of actions orchestrated by the ADBox Engine when shipping is enabled within the training pipeline. Specifically, the __ship_to_wazuh_training_pipeline method which - creates a detector data stream amd the correspondig templates. - and can ship the test and training data predictions.

ADBox ship prediction sequence diagram

Parent links: LARC-014 ADBox Shipper

18 RATF: ingestion phase SWD-018

Sequence diagram of data ingestion and user setup in RATF

The diagram below depicts the sequence of actions orchestrated by the RADAR Automated Test Framework in ingestion phase, which - bulk ingests the corresponding dataset into Opensearch index. - and for Suspicious Login scenario creates users in Single Sign-On system.

RADAR Automated Test Framework ingestion phase sequence diagram

Parent links: LARC-015 RADAR scenario setup flow

19 RATF: setup phase SWD-019

Sequence diagram of environment setup in RATF

The diagram below depicts the sequence of actions orchestrated by the RADAR Automated Test Framework in setup phase, which - sets up Docker environments by copying rules, active responses and setting needed permissions.

RADAR Automated Test Framework setup phase sequence diagram

Parent links: LARC-015 RADAR scenario setup flow, LARC-016 RADAR active response flow

20 RATF: simulation phase SWD-020

Sequence diagram of threat simulation in RATF

The diagram below depicts the sequence of actions orchestrated by the RADAR Automated Test Framework in simulation phase, which - simulates the threat scenario in corresponding agent - feeds the resulted log to corresponding index in Opensearch.

RADAR Automated Test Framework simulation phase sequence diagram

Parent links: LARC-016 RADAR active response flow

21 RATF: evaluation phase SWD-021

Sequence diagram of metrics evaluation in RATF

The diagram below depicts the sequence of actions orchestrated by the RADAR Automated Test Framework in evaluation phase, which - retrieves the events from corresponding index to scenario - calculates evaluation metrics by comparing with the dataset and simulation results.

RADAR Automated Test Framework evaluation phase sequence diagram

Parent links: LARC-016 RADAR active response flow

22 SONAR class structure and relationships SWD-022

Class diagram overview

The SONAR subsystem employs a modular class structure organized around configuration management, data access, and processing pipelines.

Core configuration classes

UML Diagram

Scenario management classes

UML Diagram

Design patterns

Pattern Application Benefit
Dataclasses Configuration objects Type safety, immutability, validation
Factory Method UseCase.from_yaml() Encapsulates YAML parsing logic
Strategy Data providers (Wazuh/Local) Interchangeable data sources
Facade Engine wrapper Simplifies MVAD library interface

Related documentation

  • Complete class diagram: docs/manual/sonar_docs/uml-diagrams.md#class-diagram
  • Architecture guide: docs/manual/sonar_docs/architecture.md#module-structure

Parent links: LARC-019 SONAR training pipeline sequence, LARC-020 SONAR detection pipeline sequence

23 SONAR feature engineering design SWD-023

Feature extraction module (features.py)

The feature engineering module transforms raw Wazuh alerts into time-series data suitable for multivariate anomaly detection.

Time-series bucketing algorithm

def bucket_by_time(alerts: List[dict], bucket_minutes: int) -> pd.DataFrame:
    """
    Groups alerts into fixed-size time buckets.

    Algorithm:
    1. Parse timestamps from alerts
    2. Round timestamps to bucket boundaries
    3. Group by bucket timestamp
    4. Aggregate numeric features within each bucket
    5. Return time-indexed DataFrame
    """

Feature types

Feature Type Extraction Method Example Fields
Numeric Direct field extraction rule.level, data.win.eventdata.processId
Categorical Top-K encoding agent.name, rule.id, data.srcip
Aggregated Count/sum per bucket Alert count, unique IPs
Derived Computed features Time-of-day, day-of-week

Bucketing strategy

Raw Alerts (variable frequency)
    ↓
Time Buckets (fixed intervals: 1, 5, or 10 minutes)
    ↓
Aggregated Features (one row per bucket)
    ↓
Time-Series DataFrame (input to MVAD)

Example bucketing: - Bucket size: 5 minutes - Input: 1000 alerts over 1 hour - Output: 12 rows (one per 5-minute bucket)

Missing data handling

Scenario Strategy
Empty bucket Fill with zeros or forward-fill previous value
Missing field Use default value or skip feature
Sparse data Interpolate or flag as anomalous

Categorical encoding

Top-K frequency encoding: 1. Count occurrences of each categorical value 2. Keep top K most frequent values 3. Map others to "other" category 4. One-hot encode or label encode

Related documentation

  • Feature configuration: docs/manual/sonar_docs/scenario-guide.md
  • Architecture: docs/manual/sonar_docs/architecture.md#feature-engineering

Parent links: LARC-019 SONAR training pipeline sequence, LARC-020 SONAR detection pipeline sequence

24 SONAR data shipping design SWD-024

Shipper module architecture

The shipper module (sonar/shipper/) manages the indexing of anomaly detection results to Wazuh data streams.

Component structure

sonar/shipper/
├── __init__.py
├── opensearch_shipper.py       # Main shipper class
├── template_manager.py         # Index template installation
└── bulk_processor.py           # Bulk API operations

Data stream strategy

SONAR uses OpenSearch data streams for time-series storage:

Data Stream Purpose Retention
logs-sonar.anomalies-default Anomaly events 30 days
logs-sonar.scores-default Raw anomaly scores 7 days
logs-sonar.metrics-default Training/detection metrics 90 days

Bulk indexing design

class OpenSearchShipper:
    def ship_anomalies(self, documents: List[dict]) -> BulkResult:
        """
        Ships anomaly documents using bulk API.

        Process:
        1. Validate document structure
        2. Add @timestamp and data_stream fields
        3. Batch into chunks (500 documents/batch)
        4. Execute bulk API request
        5. Handle partial failures
        6. Return success/failure counts
        """

Document format

Each anomaly document includes:

{
  "@timestamp": "2026-02-04T12:00:00Z",
  "event": {
    "kind": "alert",
    "category": ["intrusion_detection"],
    "type": ["info"]
  },
  "sonar": {
    "scenario_id": "baseline_scenario",
    "model_name": "baseline_model_20260204",
    "anomaly_score": 0.92,
    "threshold": 0.85,
    "window_start": "2026-02-04T11:55:00Z",
    "window_end": "2026-02-04T12:00:00Z"
  },
  "tags": ["sonar", "anomaly"]
}

Error handling

Error Type Strategy Recovery
Connection failure Retry with exponential backoff Queue for later
Document rejection Log invalid docs Continue with valid docs
Bulk partial failure Retry failed documents Track success rate
Template missing Auto-install templates Retry operation

Integration with pipeline

The shipper is invoked from pipeline.py after post-processing:

Anomaly Detection → Post-Processing → Shipper → Data Streams

Related documentation

  • Data shipping guide: docs/manual/sonar_docs/data-shipping-guide.md

Parent links: LARC-019 SONAR training pipeline sequence, LARC-020 SONAR detection pipeline sequence

25 SONAR debug mode design SWD-025

Local data provider (local_data_provider.py)

The debug mode enables offline testing and development without requiring a live Wazuh Indexer instance.

Interface compatibility

The LocalDataProvider implements the same interface as WazuhIndexerClient:

class DataProviderInterface:
    def fetch_alerts(
        self,
        start_time: datetime,
        end_time: datetime,
        filters: Optional[Dict] = None
    ) -> List[dict]:
        """Retrieve alerts for the specified time range."""
        pass

Data source configuration

Debug mode is configured in scenario YAML:

debug:
  enabled: true
  data_dir: "./sonar/test_data/synthetic_alerts"
  training_data_file: "normal_baseline.json"
  detection_data_file: "with_anomalies.json"

JSON data formats supported

The provider handles multiple JSON formats:

  1. JSON array (preferred): json [{"@timestamp": "...", "rule": {...}}, ...]

  2. Single object: json {"@timestamp": "...", "rule": {...}}

  3. OpenSearch API response: json {"hits": {"hits": [{"_source": {...}}, ...]}}

Time filtering

Local data provider applies the same time filtering as Wazuh client:

  1. Parse @timestamp from each alert
  2. Filter alerts within [start_time, end_time]
  3. Return filtered list

Test data structure

Test data files in sonar/test_data/synthetic_alerts/:

File Alerts Purpose Anomalies
normal_baseline.json 12,000 Training data None
with_anomalies.json 6,000 Detection testing Yes (injected)

Dependency injection pattern

The CLI selects the appropriate provider based on debug configuration:

if config.debug.enabled:
    provider = LocalDataProvider(config.debug)
else:
    provider = WazuhIndexerClient(config.wazuh)

# Both providers have identical interfaces
alerts = provider.fetch_alerts(start_time, end_time)

Benefits

  • No infrastructure: Test without Wazuh deployment
  • Reproducibility: Consistent test data across runs
  • Speed: No network latency
  • Isolation: Test feature changes independently

Related documentation

  • Debug mode guide: docs/manual/sonar_docs/setup-guide.md
  • Data injection: docs/manual/sonar_docs/data-injection-guide.md

Parent links: LARC-019 SONAR training pipeline sequence, LARC-020 SONAR detection pipeline sequence