The inject_wazuh_data.py tool solves a critical problem for SONAR testing and development: fresh Wazuh installations don’t have enough historical data for the MVAD model (requires 200+ time points).
Instead of waiting 24+ hours for real alerts to accumulate, inject synthetic data and start testing immediately.
All commands shown from project root (/home/alab/soar/):
# Navigate to test data directory
cd /home/alab/soar/sonar/test_data
# 1. Preview what would be injected (safe, no changes)
poetry run python inject_wazuh_data.py --hours 24 --dry-run
# 2. Inject 24 hours of normal baseline
poetry run python inject_wazuh_data.py --hours 24
# 3. Train SONAR (from project root)
cd /home/alab/soar
poetry run sonar train --scenario sonar/scenarios/brute_force_detection.yaml
# 4. Inject attack data
cd sonar/test_data
poetry run python inject_wazuh_data.py --hours 48 --source generated_scenarios/attack_scenarios.json
# 5. Run detection (from project root)
cd /home/alab/soar
poetry run sonar detect --scenario sonar/scenarios/brute_force_detection.yaml
Problem: Just deployed Wazuh, need to test SONAR but have < 1 hour of alerts
Solution:
# Inject 14 days of baseline for comprehensive training
poetry run python inject_wazuh_data.py --days 14 --source generated_scenarios/normal_training.json
Problem: Have production Wazuh data, want to replicate in test environment
Solution:
# Step 1: Export from production Wazuh using Dev Tools or curl
curl -k -u admin:admin "https://prod-wazuh:9200/wazuh-alerts-*/_search" \
-H "Content-Type: application/json" \
-d '{"query": {"range": {"@timestamp": {"gte": "now-7d"}}}, "size": 10000}' \
> exported_alerts.json
# Step 2: Inject to test environment (OpenSearch format auto-detected)
poetry run python inject_wazuh_data.py --days 7 --source exported_alerts.json
# Step 3: Use for training (also supports OpenSearch format)
cd /home/alab/soar/sonar
poetry run sonar train --debug # If using local file in debug mode
Note: The inject tool automatically detects OpenSearch API response format {hits: {hits: [{_source: {...}}]}} and extracts alert documents.
Path context: When running from sonar/test_data/, paths like generated_scenarios/attack_scenarios.json are relative to that directory.
Problem: Need to demonstrate anomaly detection with known attack patterns
Solution:
# First: Train on normal baseline
poetry run python inject_wazuh_data.py --days 14 --source generated_scenarios/normal_training.json
# Train model...
# Second: Inject attacks
poetry run python inject_wazuh_data.py --hours 48 --source generated_scenarios/attack_scenarios.json
# Run detection...
Problem: Need to validate full production pipeline without real infrastructure under attack
Solution:
# Inject various scenarios in sequence
poetry run python inject_wazuh_data.py --hours 24 --source synthetic_alerts/normal_baseline.json
# Test detection...
poetry run python inject_wazuh_data.py --hours 24 --source synthetic_alerts/with_anomalies.json
# Validate anomaly detection...
Problem: Need specific historical data for backfilling or testing time-based queries
Solution:
# Inject data for specific date range
poetry run python inject_wazuh_data.py \
--range "2025-12-01 00:00:00" "2025-12-15 23:59:59" \
--source generated_scenarios/normal_training.json
| Option | Description | Example |
|---|---|---|
--config |
Path to config YAML | --config ../default_config.yaml |
--source |
JSON file with alerts | --source generated_scenarios/attack_scenarios.json |
--dry-run |
Preview without indexing | --dry-run |
--verbose |
Detailed logging | --verbose |
| Option | Description | Example |
|---|---|---|
--hours N |
Last N hours | --hours 24 |
--days N |
Last N days | --days 14 |
--range START END |
Custom range | --range "2025-12-29 00:00:00" "2025-12-30 23:59:59" |
| Option | Description | Default |
|---|---|---|
--host |
Wazuh Indexer host | From config or localhost |
--port |
Wazuh Indexer port | From config or 9200 |
--username |
Auth username | From config or admin |
--password |
Auth password | From config or admin |
--no-verify-ssl |
Disable SSL verification | SSL enabled by default |
--bulk-size |
Docs per bulk request | 1000 |
All paths relative to sonar/test_data/:
| File | Alerts | Time span | Use case |
|---|---|---|---|
generated_scenarios/normal_training.json |
~12,000 | 14 days | Training baseline with realistic patterns |
generated_scenarios/attack_scenarios.json |
~2,000 | 2 days | Detection testing with known attacks |
synthetic_alerts/normal_baseline.json |
~12,000 | 30 days | Legacy simple training data |
synthetic_alerts/with_anomalies.json |
~6,000 | 30 days | Legacy detection test data |
wazuh-alerts-4.x-2025.12.30)# Check if Wazuh Indexer is running
curl -k -u admin:admin https://localhost:9200
# Verify config has correct settings
grep -A10 "wazuh:" /home/alab/soar/sonar/default_config.yaml
# Disable SSL verification (dev/test environments only)
poetry run python inject_wazuh_data.py --hours 24 --no-verify-ssl
# Check if alerts were indexed
curl -k -u admin:admin "https://localhost:9200/wazuh-alerts-*/_count"
# Verify time range matches scenario requirements
# For 200+ time points with 5-minute buckets:
poetry run python inject_wazuh_data.py --days 1 # 288 time points
# Override index pattern in config
# Edit default_config.yaml:
wazuh:
base_url: "https://localhost:9200"
alerts_index_pattern: "wazuh-alerts-*" # Must match your Wazuh version
poetry run python inject_wazuh_data.py --hours 24 --dry-run
Preview shows:
For SONAR scenarios with 5-minute buckets:
# Good: 24 hours = 288 time points
poetry run python inject_wazuh_data.py --hours 24
# Better: 7 days = 2016 time points
poetry run python inject_wazuh_data.py --days 7
# Best: 14 days = 4032 time points
poetry run python inject_wazuh_data.py --days 14
# Phase 1: Train on normal baseline
poetry run python inject_wazuh_data.py --days 14 --source generated_scenarios/normal_training.json
poetry run sonar train --scenario scenarios/brute_force_detection.yaml
# Phase 2: Inject attack data
poetry run python inject_wazuh_data.py --hours 48 --source generated_scenarios/attack_scenarios.json
poetry run sonar detect --scenario scenarios/brute_force_detection.yaml
# Delete old test indices (CAUTION: Deletes data!)
curl -k -u admin:admin -X DELETE "https://localhost:9200/wazuh-alerts-4.x-2025.12.*"
# Or use specific date
curl -k -u admin:admin -X DELETE "https://localhost:9200/wazuh-alerts-4.x-2025.12.30"
# Small Wazuh instances (< 4GB RAM)
poetry run python inject_wazuh_data.py --hours 24 --bulk-size 500
# Large instances (8GB+ RAM)
poetry run python inject_wazuh_data.py --hours 24 --bulk-size 5000
# 1. Generate fresh attack scenarios
cd /home/alab/soar/sonar/test_data
python generate_attack_data.py
# 2. Preview injection (dry run)
poetry run python inject_wazuh_data.py --days 14 \
--source generated_scenarios/normal_training.json \
--dry-run
# 3. Inject training data
poetry run python inject_wazuh_data.py --days 14 \
--source generated_scenarios/normal_training.json
# 4. Wait for indexing to complete
sleep 10
# 5. Verify data was indexed
curl -k -u admin:admin "https://localhost:9200/wazuh-alerts-*/_count"
# 6. Train SONAR (production mode)
cd /home/alab/soar/sonar
poetry run sonar train --scenario scenarios/brute_force_detection.yaml
# 7. Inject attack data
cd test_data
poetry run python inject_wazuh_data.py --hours 48 \
--source generated_scenarios/attack_scenarios.json
# 8. Run detection
cd /home/alab/soar/sonar
poetry run sonar detect --scenario scenarios/brute_force_detection.yaml
# 9. Check for anomalies
curl -k -u admin:admin "https://localhost:9200/wazuh-anomalies-mvad/_search?size=10&pretty"
The tool maintains the relative temporal distribution of alerts:
orig_duration = max_ts - min_tstarget_duration = end - startrelative_pos = (alert_ts - orig_start) / orig_duration
new_ts = target_start + (relative_pos * target_duration)
This preserves:
Wazuh uses daily indices: wazuh-alerts-4.x-YYYY.MM.DD
The tool automatically:
{"index": {"_index": "wazuh-alerts-4.x-2025.12.30"}}
{"timestamp": "2025-12-30T10:30:00.000Z", "rule": {...}, ...}
{"index": {"_index": "wazuh-alerts-4.x-2025.12.30"}}
{"timestamp": "2025-12-30T10:31:00.000Z", "rule": {...}, ...}
| Aspect | Debug mode (--debug) |
Production + injection |
|---|---|---|
| Wazuh required | No | Yes |
| OpenSearch API | Not tested | Fully tested |
| Indexing pipeline | Skipped | Validated |
| Realistic testing | Partial | Complete |
| Setup time | Instant | < 1 minute |
| Best for | Quick dev/test | Integration testing |