Complete guide for installing, configuring, and using SONAR (SIEM-Oriented Neural Anomaly Recognition).
# 1. Install dependencies
cd /home/alab/soar
poetry install --with sonar
# 2. Verify installation
poetry run sonar check
# 3. Run a scenario (from project root)
poetry run sonar scenario --use-case sonar/scenarios/brute_force_detection.yaml
# Install with all dependencies
poetry install --with sonar,test
# Or production only (faster)
poetry install --with sonar
# Check Wazuh connection
poetry run sonar check
# Or with custom config
poetry run sonar check --config my_config.yaml
SONAR uses default_config.yaml when no --config flag is provided.
# Wazuh connection settings
wazuh:
base_url: "https://localhost:9200"
username: "admin"
password: "admin"
verify_ssl: false
alerts_index_pattern: "wazuh-alerts-*"
anomalies_index: "wazuh-anomalies-mvad"
# MVAD model settings
mvad:
sliding_window: 200
device: "cpu"
extra_params: {}
# Feature extraction defaults
features:
numeric_fields: ["rule.level"]
bucket_minutes: 5
categorical_fields: []
categorical_top_k: 10
derived_features: true
# Debug mode settings
debug:
enabled: false
data_dir: "./test_data/synthetic_alerts"
training_data_file: "normal_baseline.json"
detection_data_file: "with_anomalies.json"
# Model storage
model_path: "./model/mvad_model.pkl"
Create custom configs for different environments:
# Development
poetry run sonar scenario --use-case scenario.yaml --config dev_config.yaml
# Production
poetry run sonar scenario --use-case scenario.yaml --config prod_config.yaml
Verify Wazuh connectivity:
poetry run sonar check
poetry run sonar check --config custom.yaml
Train MVAD model on historical data:
# Train on last 24 hours
poetry run sonar train --lookback-hours 24
# With custom config
poetry run sonar train --lookback-hours 168 --config prod.yaml
# With custom model name
poetry run sonar train --model-name "baseline_v2" --lookback-hours 168
# Enable data shipping (creates data stream)
poetry run sonar train --lookback-hours 168 --ship
# Debug mode (local data)
poetry run sonar train --lookback-hours 24 --debug
All train options:
| Option | Description | Default |
|---|---|---|
--config PATH |
Path to config YAML | default_config.yaml |
--scenario PATH |
Path to scenario YAML (alternative to –config) | - |
--model-name NAME |
Custom model name for saving | Auto-generated |
--lookback-hours N |
Hours of historical data for training | 24 |
--fill-with-synthetic |
Generate synthetic alerts if data insufficient | False |
--synthetic-count N |
Number of synthetic alerts to create | Auto |
--synthetic-mode MODE |
Synthetic content mode: constant, random, copy | constant |
--synthetic-level N |
Rule level for constant mode | 5 |
--synthetic-srcip IP |
Source IP for constant mode | 192.0.2.0 |
--print-payloads |
Print JSON payloads before sending | False |
--dry-run |
Simulate without sending to Wazuh | False |
--payload-dir DIR |
Directory for saved payloads | ./payloads |
--debug |
Use local test data instead of Wazuh | False |
--ship |
Enable data shipping (create data stream) | False |
Detect anomalies in recent data:
# Detect in last 10 minutes
poetry run sonar detect --lookback-minutes 10
# Detect on longer lookback period
poetry run sonar detect --lookback-minutes 30
# Enable data shipping (ship to data stream)
poetry run sonar detect --lookback-minutes 10 --ship
# Debug mode
poetry run sonar detect --lookback-minutes 10 --debug
Note: Detection threshold and min_consecutive parameters are configured in scenario YAML files, not as CLI flags.
All detect options:
| Option | Description | Default |
|---|---|---|
--config PATH |
Path to config YAML | default_config.yaml |
--scenario PATH |
Path to scenario YAML (alternative to –config) | - |
--lookback-minutes N |
Minutes of recent data to detect on | 10 |
--fill-with-synthetic |
Generate synthetic alerts if data insufficient | False |
--synthetic-count N |
Number of synthetic alerts to create | Auto |
--synthetic-mode MODE |
Synthetic content mode: constant, random, copy | constant |
--synthetic-level N |
Rule level for constant mode | 5 |
--synthetic-srcip IP |
Source IP for constant mode | 192.0.2.0 |
--print-payloads |
Print JSON payloads before sending | False |
--dry-run |
Simulate without sending to Wazuh | False |
--payload-dir DIR |
Directory for saved payloads | ./payloads |
--debug |
Use local test data instead of Wazuh | False |
--ship |
Enable data shipping (ship to data stream) | False |
Execute complete workflows defined in YAML:
# Run scenario with training and detection (from project root)
poetry run sonar scenario --use-case sonar/scenarios/brute_force_detection.yaml
# Training only (build baseline)
poetry run sonar scenario --use-case sonar/scenarios/training_only.yaml
# Detection only (use existing model)
poetry run sonar scenario --use-case sonar/scenarios/detection_only.yaml
# Debug mode
poetry run sonar scenario --use-case sonar/scenarios/example_scenario.yaml --debug
Note: All paths are relative to project root (/home/alab/soar/).
Run SONAR without Wazuh infrastructure using local JSON test data.
Option 1: Command-line flag
poetry run sonar train --debug
poetry run sonar detect --debug
poetry run sonar scenario --use-case scenario.yaml --debug
Option 2: Configuration file
debug:
enabled: true
data_dir: "./test_data/resource_monitoring" # Relative to sonar/ directory
training_data_file: "resource_monitoring_training.json"
detection_data_file: "resource_monitoring_detection.json"
Test data should be JSON files containing Wazuh alert arrays:
[
{
"timestamp": "2025-12-29T10:00:00.000Z",
"rule": {
"id": "5406",
"level": 3,
"groups": ["authentication", "sudo"]
},
"agent": {
"id": "001",
"name": "web-server-01"
},
"data": {
"cpu_usage_%": 45.2,
"memory_usage_%": 62.1
}
}
]
Path resolution: All paths in debug configuration are relative to the sonar/ module directory.
| Dataset | Location | Description |
|---|---|---|
| Normal baseline | test_data/synthetic_alerts/normal_baseline.json |
12,000 normal alerts |
| With anomalies | test_data/synthetic_alerts/with_anomalies.json |
6,000 alerts with anomalies |
| Normal training | test_data/generated_scenarios/normal_training.json |
12,000 alerts with realistic patterns |
| Attack scenarios | test_data/generated_scenarios/attack_scenarios.json |
2,000 alerts with known attacks |
| Resource monitoring (train) | test_data/resource_monitoring/resource_monitoring_training.json |
24h normal system activity |
| Resource monitoring (detect) | test_data/resource_monitoring/resource_monitoring_detection.json |
6h with CPU/memory attacks |
Use the provided generators:
# Resource monitoring data
poetry run python sonar/test_data/generate_resource_data.py
# Attack scenarios
poetry run python sonar/test_data/generate_attack_data.py
SONAR reads from Wazuh alert indices:
wazuh-alerts-* (configurable)timestamp fieldDetected anomalies are written to:
wazuh-anomalies-mvad (configurable)anomaly_score: Confidence score (0-1)detection_timestamp: When anomaly was detectedmodel_version: Model identifierSONAR anomalies can trigger RADAR responses:
wazuh-anomalies-mvad index# 1. Initial baseline (weekly) - from project root
poetry run sonar scenario --use-case sonar/scenarios/brute_force_detection.yaml
# 2. Continuous monitoring (cron/systemd)
*/10 * * * * poetry run sonar detect --lookback-minutes 10
# 3. Weekly retraining (cron)
0 2 * * 0 poetry run sonar train --lookback-hours 168
SONAR is included in the main IDPS-ESCAPE stack:
# Build container
docker build -f sonar.Dockerfile -t sonar:latest .
# Run in docker-compose stack
docker-compose up -d sonar
SONAR uses Python logging:
import logging
logging.basicConfig(level=logging.INFO)
Logs include:
categorical_top_k parametername: "Brute Force Detection"
training:
lookback_hours: 168 # 1 week baseline
numeric_fields: ["rule.level"]
categorical_fields: ["agent.id", "rule.groups"]
bucket_minutes: 5
detection:
mode: "historical"
lookback_minutes: 60
threshold: 0.7
name: "Linux Resource Monitoring"
training:
lookback_hours: 24
numeric_fields: ["data.cpu_usage_%", "data.memory_usage_%", "rule.level"]
categorical_fields: ["agent.name", "rule.groups"]
bucket_minutes: 1
detection:
mode: "batch"
lookback_minutes: 10
threshold: 0.85
name: "Lateral Movement Detection"
training:
lookback_hours: 336 # 2 weeks
numeric_fields: ["rule.level"]
categorical_fields: ["agent.id", "data.srcip", "data.dstip"]
bucket_minutes: 10
detection:
mode: "realtime"
polling_interval_seconds: 60
threshold: 0.8
# All tests
poetry run pytest tests/
# Specific test file
poetry run pytest tests/engine_test.py
# With coverage
poetry run pytest --cov=sonar --cov-report=html
# Test with debug mode
poetry run sonar scenario --use-case scenarios/example_scenario.yaml --debug
# Verify outputs
ls model/mvad_model.pkl
See troubleshooting.md for detailed error diagnosis and solutions.
Connection errors:
# Check Wazuh is running
curl -u admin:admin https://localhost:9200/_cluster/health
# Use debug mode for development
poetry run sonar train --debug
Feature mismatch errors:
Model not found:
# Train before detection
poetry run sonar train --lookback-hours 24
# Or specify model path
poetry run sonar detect --model-path ./custom_model.pkl