Files
crypto_trader/docs/architecture/ml_improvements.md
kfox 7bd6be64a4
Some checks are pending
Documentation / build-docs (push) Waiting to run
Tests / test (macos-latest, 3.11) (push) Waiting to run
Tests / test (macos-latest, 3.12) (push) Waiting to run
Tests / test (macos-latest, 3.13) (push) Waiting to run
Tests / test (macos-latest, 3.14) (push) Waiting to run
Tests / test (ubuntu-latest, 3.11) (push) Waiting to run
Tests / test (ubuntu-latest, 3.12) (push) Waiting to run
Tests / test (ubuntu-latest, 3.13) (push) Waiting to run
Tests / test (ubuntu-latest, 3.14) (push) Waiting to run
feat: Add core trading modules for risk management, backtesting, and execution algorithms, alongside a new ML transparency widget and related frontend dependencies.
2025-12-31 21:25:06 -05:00

5.1 KiB

Machine Learning Improvements

This document describes the ML enhancements added to the intelligent autopilot system.

Overview

The ML improvements focus on making the strategy selection model more robust, interpretable, and adaptive to changing market conditions.

Components

1. Online Learning Pipeline

Location: src/autopilot/online_learning.py

Features:

  • Incremental model updates from live trading data
  • Concept drift detection using performance windows
  • Buffered training samples for efficient batch updates
  • Automatic full retraining on drift detection

Usage:

from src.autopilot.online_learning import get_online_learning_pipeline

pipeline = get_online_learning_pipeline(model)

# Add training sample after trade
await pipeline.add_training_sample(
    market_conditions=conditions,
    strategy_name="selected_strategy",
    performance=trade_return
)

# Check for drift and retrain if needed
retrain_result = await pipeline.trigger_full_retrain_if_needed()

2. Confidence Calibration

Location: src/autopilot/confidence_calibration.py

Features:

  • Platt scaling (logistic regression calibration)
  • Isotonic regression calibration
  • Probability distribution calibration
  • Validation data integration

Methods:

  • Platt Scaling: Fast, parametric calibration using logistic regression
  • Isotonic Regression: Non-parametric, more flexible but requires more data

Usage:

from src.autopilot.confidence_calibration import get_confidence_calibration_manager

calibrator = get_confidence_calibration_manager()

# Fit from validation data
calibrator.fit_from_validation_data(
    predicted_probs=[...],
    true_labels=[...]
)

# Calibrate predictions
strategy, calibrated_conf, calibrated_preds = calibrator.calibrate_prediction(
    strategy_name="strategy",
    confidence=0.85,
    all_predictions={...}
)

3. Model Explainability

Location: src/autopilot/explainability.py

Features:

  • SHAP (SHapley Additive exPlanations) value integration
  • Feature importance analysis (global and local)
  • Prediction explanations with top contributing features
  • Support for tree-based and kernel-based models

Usage:

from src.autopilot.explainability import get_model_explainer

explainer = get_model_explainer(model)

# Initialize with background data
explainer.initialize_explainer(background_data_df)

# Explain a prediction
explanation = explainer.explain_prediction(features)
# Returns: feature_importance, top_positive_features, top_negative_features, etc.

# Get global feature importance
global_importance = explainer.get_global_feature_importance()

4. Advanced Regime Detection

Location: src/autopilot/regime_detection.py

Features:

  • Hidden Markov Models (HMM) for regime detection
  • Gaussian Mixture Models (GMM) for regime detection
  • Hybrid detection combining multiple methods
  • Probabilistic regime predictions

Methods:

  • HMM: Models regime transitions as Markov process
  • GMM: Clusters market states using Gaussian mixtures
  • Hybrid: Combines both methods for robust detection

Usage:

from src.autopilot.regime_detection import AdvancedRegimeDetector

detector = AdvancedRegimeDetector(method="hmm")
detector.fit_from_dataframe(ohlcv_df)

regime = detector.detect_regime(returns=0.01, volatility=0.02)

5. Enhanced Feature Engineering

Location: src/autopilot/feature_engineering.py

Enhancements:

  • Multi-timeframe feature aggregation
  • Order book feature extraction
  • Feature interactions (products, ratios)
  • Regime-specific feature engineering
  • Lag features for temporal patterns

Integration

These components integrate with the existing IntelligentAutopilot and StrategySelector classes:

  1. Online Learning: Integrated via _record_trade_for_learning method
  2. Confidence Calibration: Applied in select_best_strategy method
  3. Explainability: Available via API endpoints for UI visualization
  4. Regime Detection: Used in MarketAnalyzer for enhanced regime classification

Configuration

Configuration options in config/config.yaml:

autopilot:
  intelligent:
    online_learning:
      drift_window: 100
      drift_threshold: 0.1
      buffer_size: 50
      update_frequency: 100
    confidence_calibration:
      method: "isotonic"  # or "platt"
    regime_detection:
      method: "hmm"  # or "gmm" or "hybrid"
      n_regimes: 4

Dependencies

Optional dependencies (with fallbacks):

  • hmmlearn: For HMM regime detection
  • shap: For model explainability
  • scipy: For calibration methods (isotonic regression)

Performance Considerations

  • Online Learning: Batches updates for efficiency (configurable buffer size)
  • SHAP Values: Can be slow for large models; consider caching or background computation
  • HMM/GMM: Training is fast, prediction is very fast
  • Calibration: Fitting is fast, prediction is O(1)

Testing

Recommended testing approach:

  1. Use synthetic data for online learning pipeline
  2. Test calibration with known probability distributions
  3. Validate SHAP values against known feature importance
  4. Compare HMM/GMM regimes against rule-based classification