idps-escape

Model Naming and Identification in SONAR

Overview

SONAR now supports unique model naming and identification for trained models. This feature helps you:

Model Naming

Three Ways to Name Models

1. Via CLI --model-name (Highest Priority)

poetry run sonar train --scenario scenarios/my_scenario.yaml --model-name custom_model_v1
# Saves to: ./sonar/models/custom_model_v1.pkl

2. Via Scenario YAML model_name Field

# scenarios/my_scenario.yaml
name: "My Scenario"
model_name: "my_scenario_v1"  # Model will be saved with this name

training:
  lookback_hours: 24
  # ...
poetry run sonar train --scenario scenarios/my_scenario.yaml
# Saves to: ./sonar/models/my_scenario_v1.pkl

3. Auto-Generated (Default)

If no model name is provided, SONAR generates a unique name:

poetry run sonar train --scenario scenarios/brute_force_detection.yaml
# Saves to: ./sonar/models/brute_force_detection_20260112_143022.pkl

Model Name Priority

When multiple sources provide a model name:

  1. CLI --model-name (highest priority)
  2. Scenario YAML model_name
  3. Auto-generated from scenario name + timestamp (lowest priority)

Example with multiple sources:

# Scenario YAML has: model_name: "yaml_name"
# CLI provides: --model-name cli_name
poetry run sonar train --scenario my_scenario.yaml --model-name cli_name
# Result: ./sonar/models/cli_name.pkl (CLI wins)

Commands Supporting Model Naming

sonar train Command

Train with explicit model name:

# Name via CLI
poetry run sonar train --scenario scenarios/linux_resource_monitoring.yaml --model-name linux_monitor_v1

# Name from scenario YAML
poetry run sonar train --scenario scenarios/linux_resource_monitoring.yaml

# Standard mode (without scenario)
poetry run sonar train --config my_config.yaml --model-name my_model

sonar scenario Command

The scenario command automatically uses model_name from the YAML file:

poetry run sonar scenario --use-case scenarios/brute_force_detection.yaml
# Uses model_name from YAML, or auto-generates if not specified

sonar detect Command

Detection automatically loads the model specified during training:

# If you trained with a specific model name, detection uses the same path
poetry run sonar detect --scenario scenarios/brute_force_detection.yaml
# Loads model from path determined by scenario's model_name

Scenario YAML Structure

Complete Example with Model Naming

name: "Brute Force Detection"
description: "Detect SSH/RDP brute force attempts"

# Optional: Specify model name (recommended for production)
model_name: "brute_force_v2"

wazuh:
  base_url: "https://localhost:9200"
  username: "admin"
  password: "admin"

training:
  lookback_hours: 168  # 7 days
  numeric_fields:
    - "rule.level"
    - "data.srcip"
  categorical_fields:
    - "agent.name"
  bucket_minutes: 5
  sliding_window: 200

detection:
  mode: "historical"
  lookback_minutes: 60

Without Model Name (Auto-Generated)

name: "Quick Test Scenario"
# No model_name specified

training:
  lookback_hours: 24
  numeric_fields: ["rule.level"]
  
detection:
  lookback_minutes: 10

# Model will be: ./sonar/models/quick_test_scenario_20260112_150322.pkl

Model Storage

All models are stored in: ./sonar/models/

Directory structure:

sonar/
├── models/
│   ├── brute_force_v1.pkl
│   ├── brute_force_v2.pkl
│   ├── linux_monitor_20260112_143022.pkl
│   ├── insider_threat_v1.pkl
│   └── .gitkeep
├── scenarios/
│   ├── brute_force_detection.yaml
│   └── linux_resource_monitoring.yaml
└── ...

Use Cases

Version Control of Models

Track model improvements over time:

# scenarios/brute_force_v1.yaml
name: "Brute Force Detection"
model_name: "brute_force_v1"
training:
  lookback_hours: 24
  numeric_fields: ["rule.level"]
# scenarios/brute_force_v2.yaml
name: "Brute Force Detection"
model_name: "brute_force_v2"  # New version
training:
  lookback_hours: 168  # Improved: 7 days instead of 24h
  numeric_fields: ["rule.level", "data.srcip"]  # More features

Experiment Tracking

Use timestamps for experiments:

# Let SONAR auto-generate with timestamp
poetry run sonar train --scenario scenarios/experiment.yaml
# Creates: experiment_20260112_100000.pkl

# Run again with different parameters
poetry run sonar train --scenario scenarios/experiment.yaml
# Creates: experiment_20260112_110000.pkl

# Both models preserved for comparison

Production Deployment

Use stable names in production:

# scenarios/production_monitor.yaml
name: "Production Resource Monitor"
model_name: "prod_resource_monitor"  # Stable name

training:
  lookback_hours: 168
  # ... production config

Update model by retraining:

# Overwrites existing model with same name
poetry run sonar train --scenario scenarios/production_monitor.yaml

A/B Testing

Compare different models:

# Model A: Basic features
poetry run sonar train --scenario scenarios/basic.yaml --model-name comparison_basic

# Model B: Advanced features
poetry run sonar train --scenario scenarios/advanced.yaml --model-name comparison_advanced

# Test both
poetry run sonar detect --scenario scenarios/basic.yaml     # Uses comparison_basic
poetry run sonar detect --scenario scenarios/advanced.yaml  # Uses comparison_advanced

Best Practices

1. Use Semantic Names in Production

# Good: Clear, versioned
model_name: "brute_force_v2"

# Avoid: Generic or unclear
model_name: "model1"

2. Include Version Numbers

# Good: Enables tracking improvements
model_name: "linux_monitor_v1"
model_name: "linux_monitor_v2"

# Avoid: No versioning
model_name: "linux_monitor"  # Hard to track changes

3. Use Descriptive Names

# Good: Describes purpose
model_name: "ssh_brute_force_detector"

# Avoid: Vague
model_name: "detector"

4. Let Auto-Generation Work for Experiments

# For experiments, auto-generation is fine (includes timestamp)
poetry run sonar train --scenario scenarios/test_experiment.yaml
# Creates: test_experiment_20260112_143022.pkl

5. Document Model Configurations

Keep a log of what each model version includes:

brute_force_v1: 24h lookback, rule.level only
brute_force_v2: 7d lookback, rule.level + srcip
brute_force_v3: 7d lookback, rule.level + srcip + categorical agent.name

Migration Guide

Existing Workflows

Old approach (single default model):

poetry run sonar train --config my_config.yaml
# Always overwrites: ./mvad_model.pkl

New approach (unique identified models):

# With explicit name
poetry run sonar train --config my_config.yaml --model-name my_model_v1
# Saves to: ./sonar/models/my_model_v1.pkl

# With scenario
poetry run sonar train --scenario scenarios/my_scenario.yaml
# Saves to: ./sonar/models/my_scenario_20260112_143022.pkl

Updating Scenarios

Add model_name field to existing scenarios:

Before:

name: "My Scenario"
training:
  lookback_hours: 24

After:

name: "My Scenario"
model_name: "my_scenario_v1"  # Add this line
training:
  lookback_hours: 24

Troubleshooting

Model Not Found

Error: FileNotFoundError: model.pkl not found

Solution: Check model path

# List available models
ls -la ./sonar/models/

# Verify scenario uses correct model_name
cat scenarios/my_scenario.yaml | grep model_name

# Train model if missing
poetry run sonar train --scenario scenarios/my_scenario.yaml

Model Overwrite

Issue: Training overwrites existing model

Cause: Same model name used twice

Solutions:

  1. Use versioned names: model_v1, model_v2
  2. Let auto-generation create unique names (includes timestamp)
  3. Rename old model before retraining

Wrong Model Loaded

Issue: Detection loads wrong model

Cause: Model name mismatch between training and detection

Solution: Ensure detection uses same scenario as training

# Train
poetry run sonar train --scenario scenarios/my_scenario.yaml

# Detect with SAME scenario
poetry run sonar detect --scenario scenarios/my_scenario.yaml

CLI Reference

Train Command

sonar train [--scenario YAML] [--model-name NAME] [options]

--scenario PATH       Use scenario YAML file
--model-name NAME     Explicit model name (overrides scenario)
--config PATH         Base configuration file
--lookback-hours N    Hours of historical data

Scenario Command

sonar scenario --use-case YAML [options]

# Model name determined by:
# 1. model_name field in YAML
# 2. Auto-generated from scenario name + timestamp

Detect Command

sonar detect [--scenario YAML] [options]

--scenario PATH       Use scenario YAML file (loads associated model)

Examples

Example 1: Production Model with Stable Name

# scenarios/prod_brute_force.yaml
name: "Production Brute Force Detection"
model_name: "prod_brute_force"  # Stable name for production

training:
  lookback_hours: 168
  numeric_fields: ["rule.level", "data.srcip"]
  
detection:
  mode: "realtime"
  lookback_minutes: 10
# Train (creates/updates ./sonar/models/prod_brute_force.pkl)
poetry run sonar train --scenario scenarios/prod_brute_force.yaml

# Deploy detection
poetry run sonar detect --scenario scenarios/prod_brute_force.yaml

Example 2: Versioned Development Models

# Development iteration 1
poetry run sonar train --scenario scenarios/dev.yaml --model-name dev_v1

# Development iteration 2
poetry run sonar train --scenario scenarios/dev.yaml --model-name dev_v2

# Compare results
poetry run sonar detect --scenario scenarios/dev.yaml --model-name dev_v1
poetry run sonar detect --scenario scenarios/dev.yaml --model-name dev_v2

Example 3: Auto-Generated Names for Experiments

# Experiment 1 (auto-generates with timestamp)
poetry run sonar train --scenario scenarios/experiment.yaml
# Creates: experiment_20260112_100000.pkl

# Experiment 2 (different timestamp)
poetry run sonar train --scenario scenarios/experiment.yaml
# Creates: experiment_20260112_113000.pkl

# Both preserved for comparison

Summary