The increasing complexity and scale of modern cyber threats necessitate advanced, automated security solutions that not only detect anomalies but also respond in real time. In this context, the SOAR-RADAR framework (Security Orchestration, Automation, and Response – Risk-Aware Detection-based Active Response) was developed as a modular and extensible testbed for simulating and evaluating anomaly detection capabilities in Security Information and Event Management (SIEM) systems.
This thesis presents the design and implementation of the RADAR Test Framework, a structured and automated evaluation environment built around Wazuh’s anomaly detection capabilities. The framework is designed to support multiple threat scenarios, simulating adversarial behavior and measuring detection effectiveness in realistic settings. Specifically, the system evaluates the performance of Wazuh’s anomaly detection in identifying:
Each of these scenarios is implemented as an isolated module following a consistent operational pipeline consisting of four key phases:
A central configuration file (config.yaml
) provides modular control over the framework, allowing users to override parameters such as dataset location, OpenSearch indices, hostnames, thresholds, and attack intensity. This makes the framework adaptable to both research and production-oriented security setups.
The entire pipeline is orchestrated through a unified Python script (mainctl.py
), which sequentially executes all phases while maintaining isolation between scenarios. The system architecture is containerized using docker-compose
, supporting easy deployment and teardown, but it can be extended to work with remote hosts or cloud environments by adapting the Ansible configurations.
Note: Please remember to populate the wazuh_indexer_ssl_certs with the required SSL certificates, which can be generated using the built-in Wazuh certificates deployment script.
The RADAR Test Framework is designed as a modular, containerized, and scenario-driven environment that facilitates the end-to-end lifecycle of anomaly detection research — from data ingestion and system setup, to attack simulation and evaluation. This architecture emphasizes automation, reproducibility, and extensibility, making it suitable for both experimental research and prototyping in real-world security environments.
The framework operates across four main components:
Each component runs inside its own Docker container and is orchestrated using docker-compose
. Communication between containers occurs over a shared virtual bridge network defined in the docker-compose.yml
.
The docker/docker-compose.yml
file defines the multi-container environment. Key services include:
wazuh.manager
: Hosts the Wazuh Manager, exposing ports for REST API, agent communication, and Kibana UI.wazuh.indexer
: Runs OpenSearch to persist log and detection data.wazuh.dashboard
: Provides Kibana-based UI to visualize logs and detection outputs.wazuh.agent
: Simulated endpoint agent for log generation and attack simulation.keycloak
: Identity provider (used in Suspicious Login scenario) with custom realm and users created during ingest.webhook-server
: Python Flask service listening for Wazuh alerts, used for active response triggering and metric validation.The containers mount relevant configuration directories (ossec.conf
, rules, and anomaly detector configs) and allow persistent volumes for OpenSearch indices.
wazuh-ad-insider-threat-*
).ossec.conf
, ruleset additions, etc.) into Wazuh manager and agent containers.ansible/hosts
) can be modified to target remote systems via SSH.simulate/insider_threat_simulator.py
) trigger predefined user behaviors or network actions inside the agent container to emulate malicious or suspicious activity.Note: The attack simulation phase for DDoS Detection, Malware Communication and Suspicious Login is currently not working due to ongoing development. This feature will be restored in a future update.
mainctl.py
The entire lifecycle (ingest → setup → simulate → evaluate) is orchestrated via a central control script, mainctl.py
. This script:
config.yaml
to determine scenario parameters, index names, host IPs, and run toggles.The architecture supports both:
Furthermore, the modular design ensures that:
A key design objective of the RADAR Test Framework is configurability and adaptability. The framework must accommodate different use cases, datasets, deployment environments, and threat models. This is achieved through the central configuration file config.yaml
, along with a flexible orchestration and automation layer that allows users to easily adapt the framework to their infrastructure and experimentation needs.
config.yaml
The config.yaml
file defines the core configuration logic of the framework and acts as the runtime blueprint for scenario-specific detection settings. It is structured to allow the orchestration logic (mainctl.py
) to dynamically load scenario metadata, anomaly detection parameters, and feature extraction settings for any selected anomaly type.
At the top level, the file includes:
default_scenario
: The scenario that will be executed by default if no override is provided at runtime.default_scenario: insider_threat
scenarios
: A hierarchical mapping of scenario names to their respective configuration blocks. Each block provides full details necessary for dataset loading, detection configuration, simulation, and evaluation. The four currently supported scenarios are:
insider_threat
ddos_detection
malware_communication
suspicious_login
Each scenario block includes the following fields:
container_name
: The name of the container that will run the simulation for the scenario.index_prefix
: OpenSearch index pattern used to store ingested data.result_index
: OpenSearch index where the anomaly detection plugin will store results.log_index_pattern
(optional): Pattern for log indices, used during query and evaluation (present in some scenarios).dataset_dir
or label_csv_path
: Directory or CSV file path for the labeled dataset used in evaluation.time_field
: The field in the document that represents the event timestamp (e.g., "@timestamp"
).categorical_field
: The primary dimension used to segment the data (e.g., user.keyword
, Destination_IP.keyword
).detector_interval
: How often the anomaly detector runs (in minutes).delay_minutes
: Optional delay between data ingestion and anomaly detection to allow for buffering.monitor_name
: Name of the monitor created in OpenSearch.trigger_name
: Name of the trigger that fires when an anomaly is detected.anomaly_grade_threshold
: Minimum anomaly grade (0.0 to 1.0) for triggering alerts.confidence_threshold
: Minimum confidence score required to consider the result valid.Each scenario defines a list of features
, where each feature block contains:
feature_name
: A semantic label used for reference.feature_enabled
: Boolean flag to include/exclude the feature in the detection pipeline.aggregation_query
: Defines how the feature is computed using OpenSearch aggregations (e.g., sum, average, cardinality).Each scenario has between 3 to 5 carefully selected features. These features are tailored to the specific anomaly behavior being modeled
One of the most important capabilities is the support for pre-ingested datasets. If users have already indexed their data into OpenSearch (e.g., for longitudinal testing or A/B comparisons), they can bypass the ingest
phase by:
index_prefix
to point to their existing dataset.This flexibility decouples the evaluation logic from the dataset ingestion pipeline and encourages reuse of long-running or curated datasets.
While the default deployment uses Docker and Docker Compose for simplicity and isolation, the framework is designed to support external or hybrid environments.
This is made possible through:
ossec.conf
, agent registration, rule updates) are abstracted via roles and playbooks in the ansible
folder.ansible/hosts
can be updated to reflect real server IPs, credentials, and SSH configuration. No code changes are needed to target a live cluster.Note: In live environments, certain volumes or ports (e.g.,
/var/ossec
,tcp/55000
) must be exposed or routed accordingly.
The configuration system is scenario-agnostic. To add a new scenario:
config.yaml
(e.g., ransomware_outbreak:
)./simulate
, /evaluate
, /ingest
(optional).run()
method).This structure allows modular plug-and-play development for new use cases without modifying the orchestration logic.
While config.yaml
controls the default flow, the mainctl.py
script supports runtime overrides for:
-scenario insider_threat
)-phase simulate
)-phase all
)The Ingest Phase is the first operational step in the RADAR Test Framework pipeline. Its primary goal is to prepare the system for anomaly detection by injecting synthetic or pre-collected datasets into the OpenSearch backend, and by optionally provisioning external components (e.g., user accounts in Keycloak for authentication-based scenarios).
This phase is executed using Python classes tailored to each scenario, and it is automatically invoked by the mainctl.py
controller. Alternatively, it can be skipped when using pre-ingested datasets, as discussed in Section 3.
The ingest modules handle the following core responsibilities:
wazuh-ad-insider-threat-*
).Each scenario has its own implementation under the the scenario folder with a script wazuh_ingest.py
. The ingest logic is scenario-aware and utilizes both OpenSearch and third-party APIs.
file*.csv
) containing simulated user activity: file access, content volume, and number of distinct files accessed.user
, filename
, content_bytes
, etc.).wazuh-ad-insider-threat-*
.Syn.csv
) representing network flow records: source/destination IPs, port numbers, flags, and traffic rates.Flow_Packets_s
, Flow_Bytes_s
, SYN_Flag_Count
, and others.wazuh-ad-ddos-detection-*
.dataset_dir
).id.orig_h
, id.resp_h
, orig_bytes
, resp_bytes
, etc.wazuh-ad-malware-c2-*
index, which is used later by both the detector and evaluation module.User ID
, Country
, event_hour
, etc.wazuh-ad-suspicious-login-*
index.In all scenarios, ingestion is implemented via OpenSearch’s _bulk
API to ensure efficient indexing of large datasets. The code:
This approach ensures the ingest phase is efficient, repeatable, and compatible with OpenSearch’s anomaly detection plugin.
The Setup Phase is responsible for configuring the detection environment, including Wazuh agents and managers, log collection rules, and the anomaly detection infrastructure in OpenSearch. This phase ensures that all necessary components are initialized and synchronized before the attack simulations begin. It is designed to be fully automated using Ansible, with built-in support for both containerized and non-containerized deployments.
The setup logic is modularized and scenario-aware. It reads configuration values from config.yaml
and applies them consistently across Wazuh, OpenSearch, and related components.
The Setup Phase performs the following main tasks:
ossec.conf
, rules, decoders) to Wazuh manager and agent containers or remote servers.The framework uses Ansible to automate the configuration and deployment process. The ansible/
directory contains the following components:
site.yml
: The master playbook executed for setup.hosts
: Inventory file that defines the target machines or containers.roles/
: Ansible roles for wazuh_manager
and wazuh_agent
.Each role performs scenario-specific setup, including:
ossec.conf
file.wazuh-control restart
inside the target container to reload the configuration.⚙️ Docker vs SSH:
By default, the inventory targets containers such as
agent.insider
orwazuh-manager
. For remote deployments, users can modify thehosts
file to specify IP addresses and SSH credentials of external servers. This allows seamless transition from testing to production environments.
Each scenario’s setup includes creating a custom anomaly detector in OpenSearch using the REST API. This detector is configured based on the parameters defined in config.yaml
:
index_prefix
(e.g., wazuh-ad-insider-threat-*
).@timestamp
, defining the time axis for aggregation.features
defined in the config file.The setup module constructs and submits a detector creation request using this structure and stores the detector_id
for later reference during evaluation.
Once the detector is created, the system automatically registers:
Example configuration (for insider_threat
):
InsiderThreat-Monitor
Insider-Threat-Detected
Trigger condition:
if (anomaly_grade >= 0.6) AND (confidence >= 0.7)
These monitors and triggers ensure continuous evaluation of detection logic and form the basis for active response escalation.
The Setup Phase also ensures that Wazuh’s active response mechanism can communicate with the webhook server running in a container. The process involves:
Webhook alerts are later consumed during the Evaluate Phase to determine whether detections occurred and whether they met the defined thresholds.
Finally, the setup logic performs a restart of relevant services to apply the new configurations. This includes:
wazuh-control restart
)The setup phase is fully extensible:
roles/wazuh_manager
files.setup/
Python module.The Simulate Phase plays a critical role in the RADAR Test Framework by generating realistic, scenario-specific behavioral data. It emulates threat behavior within the Wazuh agent container, enabling Wazuh and the anomaly detection pipeline to detect deviations from normal activity. This phase ensures that synthetic but realistic threat signals are produced under controlled conditions.
Each simulation script is scenario-specific and is designed to:
The simulation logic is structured as independent Python classes, each with a run()
method invoked by the mainctl.py
orchestration controller.
This script emulates abnormal file access behavior by a user, such as:
Implementation Highlights:
repeat_count
to simulate multiple iterations of the same behavior, mimicking persistence or repeated internal data exfiltration attempts.This script simulates a SYN flood-style DDoS attack using repeated packet flow events. Each record imitates a single TCP flow targeting the server.
Implementation Highlights:
Source_IP
, Destination_IP
, Flow_Packets_s
, SYN_Flag_Count
, Total_Backward_Packets
.random
generation to simulate traffic bursts across multiple source IPs.This module simulates beaconing behavior, a common trait of malware communicating with external command-and-control (C2) servers.
Implementation Highlights:
uid
, id.orig_h
, id.resp_h
, orig_bytes
, resp_bytes
.This script mimics anomalous login behavior in a Keycloak-backed environment.
Implementation Highlights:
User ID
, Country
, and event_hour
.The Evaluate Phase is the final and most critical stage of the RADAR Test Framework. Its primary goal is to assess the effectiveness of anomaly detection by comparing the outputs of the detection pipeline (from OpenSearch) with the ground-truth labels defined in scenario datasets.
This phase systematically computes standard classification metrics, True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN), and generates structured CSV reports summarizing detection accuracy.
Each scenario has a dedicated evaluator script implemented as a class-based Python module. These scripts query the detection result index created during the setup phase and align results with labeled events to quantify detection performance.
Each evaluator follows this general workflow:
Load Ground Truth Labels:
Load the scenario’s CSV file containing timestamps and labels (e.g., label_csv_path
in config.yaml
). Labels typically indicate whether an event was malicious (1
) or benign (0
).
Query Anomaly Detection Results:
Connect to OpenSearch and fetch all anomaly results from the detector’s result_index
(e.g., opensearch-ad-plugin-result-insider_threat
).
Match Events with Anomalies:
For each labeled event identify corresponding anomaly detection result (same user/IP and closest timestamp).
Assign Detection Outcome:
Based on the label and detection result:
Write Evaluation Report: Save a CSV file containing the per-event record.
This output allows downstream computation of:
Each evaluator writes its results to the evaluation_results/
directory in CSV format. These outputs can be consumed by:
This section provides practical instructions for setting up, configuring, and executing the RADAR Test Framework in both containerized and hybrid environments. The framework is built for flexibility and reproducibility, allowing users to deploy it locally for experimentation or adapt it to enterprise environments for real-world security evaluations.
Before deploying the framework, ensure the following components are installed:
pip
suspicious_login
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
docker compose up -d
This brings up the full environment:
wazuh.manager
wazuh.agent
(with logs injected via simulator)wazuh.indexer
(OpenSearch backend)wazuh.dashboard
(UI)keycloak
webhook-server
Wait until all containers are healthy and accessible.
config.yaml
file to:index_prefix
, thresholds, and log pathsenv_var.env
to .env
.You may also edit ansible/hosts
to match your infrastructure (e.g., if using IPs or SSH access to bare-metal Wazuh instances).
To run all four phases (ingest, setup, simulate, evaluate):
python mainctl.py --scenario insider_threat --phase all
To execute only selected phases:
python mainctl.py --scenario ddos_detection --phase setup
Supported arguments include:
-scenario <name>
-phase <phase>
To shut down and remove all containers:
docker compose down
While the RADAR Test Framework provides a modular and reproducible environment for evaluating anomaly detection systems, it has inherent limitations that arise from its architectural assumptions, scope of simulation, and reliance on synthetic data. This section outlines key limitations and proposes directions for future research and system improvements.
Although the simulation modules attempt to mimic realistic attack behavior, they inevitably simplify system interactions:
This could result in:
The framework currently supports four scenarios:
While these represent common threat vectors, the framework does not yet include:
Adding more diverse and chained attack simulations would improve coverage.
Each scenario defines a static list of features in config.yaml
. While these features are sufficient for basic detection:
The evaluation pipeline assumes:
This makes it difficult to:
The framework is tightly coupled with:
Adapting the framework to other SIEMs, cloud-based platforms (e.g., Elastic Cloud), or alternate identity providers would require significant reconfiguration.
To address the limitations above and enhance the research utility of the framework, we propose the following extensions: