idps-escape

RADAR test framework

1. Introduction

The increasing complexity and scale of modern cyber threats necessitate advanced, automated security solutions that not only detect anomalies but also respond in real time. In this context, the SOAR-RADAR framework (Security Orchestration, Automation, and Response – Risk-Aware Detection-based Active Response) was developed as a modular and extensible testbed for simulating and evaluating anomaly detection capabilities in Security Information and Event Management (SIEM) systems.

This thesis presents the design and implementation of the RADAR Test Framework, a structured and automated evaluation environment built around Wazuh’s anomaly detection capabilities. The framework is designed to support multiple threat scenarios, simulating adversarial behavior and measuring detection effectiveness in realistic settings. Specifically, the system evaluates the performance of Wazuh’s anomaly detection in identifying:

Each of these scenarios is implemented as an isolated module following a consistent operational pipeline consisting of four key phases:

  1. Ingest: Preparation of the dataset and injection into Wazuh/OpenSearch. In certain cases, this includes provisioning external components such as user identities in Keycloak (e.g., for login anomaly detection).
  2. Setup: Automated deployment of system configurations including log collection rules, detection pipelines, and webhook integration. This is accomplished using Ansible scripts applied to a containerized environment or an external infrastructure.
  3. Simulate: Execution of scenario-specific attack behaviors through scripted interactions with the system, mimicking real-world threat vectors and user actions.
  4. Evaluate: Analysis of detection outputs by comparing actual events against labeled ground truth. Metrics such as True Positives (TP), False Positives (FP), False Negatives (FN), and True Negatives (TN) are computed to quantify system performance.

A central configuration file (config.yaml) provides modular control over the framework, allowing users to override parameters such as dataset location, OpenSearch indices, hostnames, thresholds, and attack intensity. This makes the framework adaptable to both research and production-oriented security setups.

The entire pipeline is orchestrated through a unified Python script (mainctl.py), which sequentially executes all phases while maintaining isolation between scenarios. The system architecture is containerized using docker-compose, supporting easy deployment and teardown, but it can be extended to work with remote hosts or cloud environments by adapting the Ansible configurations.

Dependencies

Note: Please remember to populate the wazuh_indexer_ssl_certs with the required SSL certificates, which can be generated using the built-in Wazuh certificates deployment script.

2. Project Architecture

The RADAR Test Framework is designed as a modular, containerized, and scenario-driven environment that facilitates the end-to-end lifecycle of anomaly detection research — from data ingestion and system setup, to attack simulation and evaluation. This architecture emphasizes automation, reproducibility, and extensibility, making it suitable for both experimental research and prototyping in real-world security environments.

2.1 High-Level Architecture Overview

The framework operates across four main components:

  1. Wazuh Manager: Collects and analyzes security logs, hosts the anomaly detection module, and triggers alerts or responses.
  2. Wazuh Agent: Deployed on endpoints to collect logs and forward them to the manager.
  3. OpenSearch: Stores ingested data and anomaly detection results for querying and evaluation.
  4. Supporting Components:
    • Keycloak (for the Suspicious Login scenario): Acts as the identity provider, simulating real-world SSO login activity.
    • Webhook Server: Captures active response alerts from Wazuh.

Each component runs inside its own Docker container and is orchestrated using docker-compose. Communication between containers occurs over a shared virtual bridge network defined in the docker-compose.yml.

2.2 Containerized Environment via Docker Compose

The docker/docker-compose.yml file defines the multi-container environment. Key services include:

The containers mount relevant configuration directories (ossec.conf, rules, and anomaly detector configs) and allow persistent volumes for OpenSearch indices.


2.3 Logical Architecture by Phase

Ingest Phase

Setup Phase

Simulate Phase

Note: The attack simulation phase for DDoS Detection, Malware Communication and Suspicious Login is currently not working due to ongoing development. This feature will be restored in a future update.

Evaluate Phase


2.4 Orchestration via mainctl.py

The entire lifecycle (ingest → setup → simulate → evaluate) is orchestrated via a central control script, mainctl.py. This script:


2.5 Flexibility and Adaptability

The architecture supports both:

Furthermore, the modular design ensures that:

3. Configuration & Flexibility

A key design objective of the RADAR Test Framework is configurability and adaptability. The framework must accommodate different use cases, datasets, deployment environments, and threat models. This is achieved through the central configuration file config.yaml, along with a flexible orchestration and automation layer that allows users to easily adapt the framework to their infrastructure and experimentation needs.


3.1 Central Role of config.yaml

The config.yaml file defines the core configuration logic of the framework and acts as the runtime blueprint for scenario-specific detection settings. It is structured to allow the orchestration logic (mainctl.py) to dynamically load scenario metadata, anomaly detection parameters, and feature extraction settings for any selected anomaly type.

At the top level, the file includes:

default_scenario: insider_threat

Each scenario block includes the following fields:

General Metadata

Detection Engine Parameters

Alerting Configuration

Feature Engineering

Each scenario defines a list of features, where each feature block contains:

Each scenario has between 3 to 5 carefully selected features. These features are tailored to the specific anomaly behavior being modeled


3.2 Handling Pre-Ingested Data

One of the most important capabilities is the support for pre-ingested datasets. If users have already indexed their data into OpenSearch (e.g., for longitudinal testing or A/B comparisons), they can bypass the ingest phase by:

  1. Defining the correct index_prefix to point to their existing dataset.

This flexibility decouples the evaluation logic from the dataset ingestion pipeline and encourages reuse of long-running or curated datasets.


3.3 Environment Agnosticism

While the default deployment uses Docker and Docker Compose for simplicity and isolation, the framework is designed to support external or hybrid environments.

This is made possible through:

Note: In live environments, certain volumes or ports (e.g., /var/ossec, tcp/55000) must be exposed or routed accordingly.


3.4 Extensibility for New Scenarios

The configuration system is scenario-agnostic. To add a new scenario:

  1. Add a new scenario section in config.yaml (e.g., ransomware_outbreak:).
  2. Create a folder structure under /simulate, /evaluate, /ingest (optional).
  3. Implement the corresponding logic using the same interface (e.g., class-based script with run() method).
  4. Optionally define detection index and thresholds.

This structure allows modular plug-and-play development for new use cases without modifying the orchestration logic.


3.5 Runtime Control via CLI

While config.yaml controls the default flow, the mainctl.py script supports runtime overrides for:

4. Ingest Phase

The Ingest Phase is the first operational step in the RADAR Test Framework pipeline. Its primary goal is to prepare the system for anomaly detection by injecting synthetic or pre-collected datasets into the OpenSearch backend, and by optionally provisioning external components (e.g., user accounts in Keycloak for authentication-based scenarios).

This phase is executed using Python classes tailored to each scenario, and it is automatically invoked by the mainctl.py controller. Alternatively, it can be skipped when using pre-ingested datasets, as discussed in Section 3.


4.1 General Responsibilities

The ingest modules handle the following core responsibilities:


4.2 Scenario-Specific Ingest Logic

Each scenario has its own implementation under the the scenario folder with a script wazuh_ingest.py. The ingest logic is scenario-aware and utilizes both OpenSearch and third-party APIs.

Insider Threat

DDoS Detection

Malware Communication

Suspicious Login


4.4 OpenSearch Ingestion Method

In all scenarios, ingestion is implemented via OpenSearch’s _bulk API to ensure efficient indexing of large datasets. The code:

This approach ensures the ingest phase is efficient, repeatable, and compatible with OpenSearch’s anomaly detection plugin.

5. Setup Phase

The Setup Phase is responsible for configuring the detection environment, including Wazuh agents and managers, log collection rules, and the anomaly detection infrastructure in OpenSearch. This phase ensures that all necessary components are initialized and synchronized before the attack simulations begin. It is designed to be fully automated using Ansible, with built-in support for both containerized and non-containerized deployments.

The setup logic is modularized and scenario-aware. It reads configuration values from config.yaml and applies them consistently across Wazuh, OpenSearch, and related components.


5.1 Core Responsibilities

The Setup Phase performs the following main tasks:

  1. Copying configuration files (ossec.conf, rules, decoders) to Wazuh manager and agent containers or remote servers.
  2. Restarting services to apply the updated configuration.
  3. Registering agents with the Wazuh manager (if not already registered).
  4. Creating anomaly detectors in OpenSearch for each scenario.
  5. Defining monitors and triggers that continuously evaluate incoming data for anomalies.
  6. Registering webhook receivers to enable active response on detection.

5.2 Configuration Deployment via Ansible

The framework uses Ansible to automate the configuration and deployment process. The ansible/ directory contains the following components:

Each role performs scenario-specific setup, including:

⚙️ Docker vs SSH:

By default, the inventory targets containers such as agent.insider or wazuh-manager. For remote deployments, users can modify the hosts file to specify IP addresses and SSH credentials of external servers. This allows seamless transition from testing to production environments.


5.3 Detector Creation in OpenSearch

Each scenario’s setup includes creating a custom anomaly detector in OpenSearch using the REST API. This detector is configured based on the parameters defined in config.yaml:

The setup module constructs and submits a detector creation request using this structure and stores the detector_id for later reference during evaluation.


5.4 Monitor and Trigger Registration

Once the detector is created, the system automatically registers:

Example configuration (for insider_threat):

These monitors and triggers ensure continuous evaluation of detection logic and form the basis for active response escalation.


5.5 Webhook Receiver Setup

The Setup Phase also ensures that Wazuh’s active response mechanism can communicate with the webhook server running in a container. The process involves:

Webhook alerts are later consumed during the Evaluate Phase to determine whether detections occurred and whether they met the defined thresholds.


5.6 Restart and Validation

Finally, the setup logic performs a restart of relevant services to apply the new configurations. This includes:


5.7 Extensibility Considerations

The setup phase is fully extensible:

6. Simulate Phase

The Simulate Phase plays a critical role in the RADAR Test Framework by generating realistic, scenario-specific behavioral data. It emulates threat behavior within the Wazuh agent container, enabling Wazuh and the anomaly detection pipeline to detect deviations from normal activity. This phase ensures that synthetic but realistic threat signals are produced under controlled conditions.

Each simulation script is scenario-specific and is designed to:

The simulation logic is structured as independent Python classes, each with a run() method invoked by the mainctl.py orchestration controller.


6.1 Insider Threat Simulation

This script emulates abnormal file access behavior by a user, such as:

Implementation Highlights:


6.2 DDoS Detection Simulation

This script simulates a SYN flood-style DDoS attack using repeated packet flow events. Each record imitates a single TCP flow targeting the server.

Implementation Highlights:


6.3 Malware Communication Simulation

This module simulates beaconing behavior, a common trait of malware communicating with external command-and-control (C2) servers.

Implementation Highlights:


6.4 Suspicious Login Simulation

This script mimics anomalous login behavior in a Keycloak-backed environment.

Implementation Highlights:

7. Evaluate Phase

The Evaluate Phase is the final and most critical stage of the RADAR Test Framework. Its primary goal is to assess the effectiveness of anomaly detection by comparing the outputs of the detection pipeline (from OpenSearch) with the ground-truth labels defined in scenario datasets.

This phase systematically computes standard classification metrics, True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN), and generates structured CSV reports summarizing detection accuracy.

Each scenario has a dedicated evaluator script implemented as a class-based Python module. These scripts query the detection result index created during the setup phase and align results with labeled events to quantify detection performance.


7.1 Evaluation Workflow

Each evaluator follows this general workflow:

  1. Load Ground Truth Labels:

    Load the scenario’s CSV file containing timestamps and labels (e.g., label_csv_path in config.yaml). Labels typically indicate whether an event was malicious (1) or benign (0).

  2. Query Anomaly Detection Results:

    Connect to OpenSearch and fetch all anomaly results from the detector’s result_index (e.g., opensearch-ad-plugin-result-insider_threat).

  3. Match Events with Anomalies:

    For each labeled event identify corresponding anomaly detection result (same user/IP and closest timestamp).

  4. Assign Detection Outcome:

    Based on the label and detection result:

    • TP: Label = 1, detected = Yes
    • FP: Label = 0, detected = Yes
    • FN: Label = 1, detected = No
    • TN: Label = 0, detected = No
  5. Write Evaluation Report: Save a CSV file containing the per-event record.

This output allows downstream computation of:


7.2 Output Structure and Post-Processing

Each evaluator writes its results to the evaluation_results/ directory in CSV format. These outputs can be consumed by:

9. Deployment & Usage Guide

This section provides practical instructions for setting up, configuring, and executing the RADAR Test Framework in both containerized and hybrid environments. The framework is built for flexibility and reproducibility, allowing users to deploy it locally for experimentation or adapt it to enterprise environments for real-world security evaluations.


9.1 Prerequisites

Before deploying the framework, ensure the following components are installed:


9.2 Environment Setup

Step 1: Install Dependencies

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Step 2: Start the Containerized Environment

docker compose up -d

This brings up the full environment:

Wait until all containers are healthy and accessible.


9.3 Configuration

Step 1. Edit the config.yaml file to:

Step 2. Rename env_var.env to .env.

You may also edit ansible/hosts to match your infrastructure (e.g., if using IPs or SSH access to bare-metal Wazuh instances).


9.4 Running the Full Pipeline

To run all four phases (ingest, setup, simulate, evaluate):

python mainctl.py --scenario insider_threat --phase all

To execute only selected phases:

python mainctl.py --scenario ddos_detection --phase setup

Supported arguments include:


9.5 Stopping the Environment

To shut down and remove all containers:

docker compose down

10. Limitations & Future Work

While the RADAR Test Framework provides a modular and reproducible environment for evaluating anomaly detection systems, it has inherent limitations that arise from its architectural assumptions, scope of simulation, and reliance on synthetic data. This section outlines key limitations and proposes directions for future research and system improvements.


10.1 Limitations

1. Synthetic Data vs. Real-World Complexity

Although the simulation modules attempt to mimic realistic attack behavior, they inevitably simplify system interactions:

This could result in:

2. Limited Scenario Coverage

The framework currently supports four scenarios:

While these represent common threat vectors, the framework does not yet include:

Adding more diverse and chained attack simulations would improve coverage.

3. Fixed Feature Space

Each scenario defines a static list of features in config.yaml. While these features are sufficient for basic detection:

4. Evaluation Simplification

The evaluation pipeline assumes:

This makes it difficult to:

5. Dependency on Specific Stack

The framework is tightly coupled with:

Adapting the framework to other SIEMs, cloud-based platforms (e.g., Elastic Cloud), or alternate identity providers would require significant reconfiguration.


10.2 Future Work

To address the limitations above and enhance the research utility of the framework, we propose the following extensions:

1. Integration of Real Logs

2. Expand Scenario Library

3. Dynamic Feature Selection

4. Enhanced Evaluation Framework

5. Platform Agnosticism