This directory contains sample Wazuh alert data files for use with SONAR debug mode.
test_data/
├── generated_scenarios/ # Generated attack and training data
│ ├── normal_training.json # 14 days normal baseline (~12,000 alerts)
│ └── attack_scenarios.json # 2 days with attack patterns (~2,000 alerts)
├── synthetic_alerts/ # Legacy synthetic data
│ ├── normal_baseline.json # 30 days simple training data (~12,000 alerts)
│ └── with_anomalies.json # 30 days with anomalies (~6,000 alerts)
├── generate_attack_data.py # Data generation script
├── inject_wazuh_data.py # Inject synthetic data into Wazuh Indexer
└── README.md # This file
| Subdirectory | File | Purpose | Alerts | Time span |
|---|---|---|---|---|
generated_scenarios/ |
normal_training.json |
Normal baseline for training | ~12,000 | 14 days |
generated_scenarios/ |
attack_scenarios.json |
Detection testing with attacks | ~2,000 | 2 days |
synthetic_alerts/ |
normal_baseline.json |
Legacy simple training data | ~12,000 | 30 days |
synthetic_alerts/ |
with_anomalies.json |
Legacy detection test data | ~6,000 | 30 days |
The generated_scenarios/attack_scenarios.json file contains realistic attack patterns:
cd /home/alab/soar/sonar/test_data
python generate_attack_data.py
This creates:
generated_scenarios/normal_training.json - 14 days of normal activitygenerated_scenarios/attack_scenarios.json - 2 days with injected attacksTrain with normal baseline (using local JSON files):
cd /home/alab/soar/sonar
poetry run sonar train --debug --lookback-hours 168
Note: By default, --debug mode uses data from test_data/synthetic_alerts/. To use generated scenarios instead, configure data_dir in your config YAML or use the LocalDataProvider with a custom path.
For testing against a real Wazuh instance without waiting for data accumulation:
Step 1: Inject synthetic data into Wazuh Indexer
cd /home/alab/soar/sonar/test_data
# Inject last 24 hours of normal baseline
python inject_wazuh_data.py --config ../default_config.yaml --hours 24
# Or inject 14 days for comprehensive training
python inject_wazuh_data.py --config ../default_config.yaml --days 14 --source generated_scenarios/normal_training.json
# Or inject attack scenarios for detection testing
python inject_wazuh_data.py --config ../default_config.yaml --hours 48 --source generated_scenarios/attack_scenarios.json
Step 2: Train SONAR (production mode)
cd /home/alab/soar/sonar
poetry run sonar train --scenario scenarios/brute_force_detection.yaml
Step 3: Run detection (production mode)
poetry run sonar detect --scenario scenarios/brute_force_detection.yaml
cd /home/alab/soar/sonar
poetry run sonar train --debug --lookback-hours 168
Note: By default, --debug mode uses data from test_data/synthetic_alerts/. To use generated scenarios instead, configure data_dir in your config YAML or use the LocalDataProvider with a custom path.
To detect attacks using the generated scenarios, point the data_dir to test_data/generated_scenarios/:
cd /home/alab/soar/sonar
# Modify config or use custom LocalDataProvider setup
cd /home/alab/soar/sonar
poetry run sonar scenario --use-case scenarios/brute_force_detection.yaml --debug
The inject_wazuh_data.py tool populates a Wazuh installation with synthetic alerts for:
# Preview what would be injected (safe, no changes)
python inject_wazuh_data.py --config ../default_config.yaml --hours 24 --dry-run
# Inject last 24 hours of data
python inject_wazuh_data.py --config ../default_config.yaml --hours 24
# Inject 14 days of baseline for training
python inject_wazuh_data.py --config ../default_config.yaml --days 14 --source generated_scenarios/normal_training.json
# Inject attack scenarios for detection testing
python inject_wazuh_data.py --config ../default_config.yaml --hours 48 --source generated_scenarios/attack_scenarios.json
# Custom time range
python inject_wazuh_data.py --config ../default_config.yaml \
--range "2025-12-29 00:00:00" "2025-12-30 23:59:59"
# Override Wazuh connection details
python inject_wazuh_data.py --host wazuh.example.com --port 9200 \
--username admin --password secret --hours 24
# Adjust bulk size for performance tuning
python inject_wazuh_data.py --config ../default_config.yaml --hours 24 --bulk-size 5000
wazuh-alerts-4.x-2025.12.30)--no-verify-sslrequests and pyyaml Python packages (already in Poetry dependencies)Connection refused:
# Check if Wazuh Indexer is running
curl -k -u admin:admin https://localhost:9200
# Verify config file has correct host/port
grep -A5 "wazuh_indexer:" ../default_config.yaml
SSL certificate errors:
# Disable SSL verification (dev/test only)
python inject_wazuh_data.py --config ../default_config.yaml --hours 24 --no-verify-ssl
Not enough data after injection:
curl -k -u admin:admin https://localhost:9200/wazuh-alerts-*/_countlookback_minuteslookback_minutes / bucket_minutes ≥ 200 for MVAD# 1. Generate fresh attack scenarios
cd /home/alab/soar/sonar/test_data
python generate_attack_data.py
# 2. Inject 14 days of normal baseline for training
python inject_wazuh_data.py --config ../default_config.yaml --days 14 \
--source generated_scenarios/normal_training.json
# 3. Wait a few seconds for indexing
sleep 5
# 4. Train SONAR (production mode)
cd /home/alab/soar/sonar
poetry run sonar train --scenario scenarios/brute_force_detection.yaml
# 5. Inject 2 days with attacks
cd test_data
python inject_wazuh_data.py --config ../default_config.yaml --hours 48 \
--source generated_scenarios/attack_scenarios.json
# 6. Run detection
cd /home/alab/soar/sonar
poetry run sonar detect --scenario scenarios/brute_force_detection.yaml
The generated data uses realistic Wazuh rule IDs:
| Rule ID | Level | Description | Groups |
|---|---|---|---|
| 5715 | 3 | SSH authentication success | authentication_success, sshd, ssh |
| 5710 | 5 | Non-existent user login attempt | authentication_failed, sshd, ssh |
| 5711 | 5 | Denied user login attempt | authentication_failed, sshd, ssh, invalid_login |
| 5712 | 10 | Brute force attempt | authentication_failed, sshd, ssh, brute_force |
| 5402 | 3 | Successful sudo to ROOT | pam, syslog, sudo |
| 5401 | 5 | Failed sudo attempt | pam, syslog, sudo, authentication_failed |
| 5901 | 8 | New user added | syslog, account_changed, user_management |
| 5902 | 8 | Password changed | syslog, account_changed |
| 5403 | 14 | Privilege escalation attempt | pam, privilege_escalation, attack |
| 5501 | 3 | PAM session opened | pam, syslog, session_opened, authentication_success |
The following categorical fields can be used in scenario YAML files:
rule.groups: Event type classification (authentication_success, failed_login, brute_force, sudo, etc.)agent.id: Target host identifier (001-010)agent.name: Human-readable host name (web-server-01, db-server-01, etc.)decoder.name: Log source (sshd, pam)Edit generate_attack_data.py to customize: