# Machine Learning Improvements This document describes the ML enhancements added to the intelligent autopilot system. ## Overview The ML improvements focus on making the strategy selection model more robust, interpretable, and adaptive to changing market conditions. ## Components ### 1. Online Learning Pipeline **Location**: `src/autopilot/online_learning.py` **Features**: - Incremental model updates from live trading data - Concept drift detection using performance windows - Buffered training samples for efficient batch updates - Automatic full retraining on drift detection **Usage**: ```python from src.autopilot.online_learning import get_online_learning_pipeline pipeline = get_online_learning_pipeline(model) # Add training sample after trade await pipeline.add_training_sample( market_conditions=conditions, strategy_name="selected_strategy", performance=trade_return ) # Check for drift and retrain if needed retrain_result = await pipeline.trigger_full_retrain_if_needed() ``` ### 2. Confidence Calibration **Location**: `src/autopilot/confidence_calibration.py` **Features**: - Platt scaling (logistic regression calibration) - Isotonic regression calibration - Probability distribution calibration - Validation data integration **Methods**: - `Platt Scaling`: Fast, parametric calibration using logistic regression - `Isotonic Regression`: Non-parametric, more flexible but requires more data **Usage**: ```python from src.autopilot.confidence_calibration import get_confidence_calibration_manager calibrator = get_confidence_calibration_manager() # Fit from validation data calibrator.fit_from_validation_data( predicted_probs=[...], true_labels=[...] ) # Calibrate predictions strategy, calibrated_conf, calibrated_preds = calibrator.calibrate_prediction( strategy_name="strategy", confidence=0.85, all_predictions={...} ) ``` ### 3. Model Explainability **Location**: `src/autopilot/explainability.py` **Features**: - SHAP (SHapley Additive exPlanations) value integration - Feature importance analysis (global and local) - Prediction explanations with top contributing features - Support for tree-based and kernel-based models **Usage**: ```python from src.autopilot.explainability import get_model_explainer explainer = get_model_explainer(model) # Initialize with background data explainer.initialize_explainer(background_data_df) # Explain a prediction explanation = explainer.explain_prediction(features) # Returns: feature_importance, top_positive_features, top_negative_features, etc. # Get global feature importance global_importance = explainer.get_global_feature_importance() ``` ### 4. Advanced Regime Detection **Location**: `src/autopilot/regime_detection.py` **Features**: - Hidden Markov Models (HMM) for regime detection - Gaussian Mixture Models (GMM) for regime detection - Hybrid detection combining multiple methods - Probabilistic regime predictions **Methods**: - `HMM`: Models regime transitions as Markov process - `GMM`: Clusters market states using Gaussian mixtures - `Hybrid`: Combines both methods for robust detection **Usage**: ```python from src.autopilot.regime_detection import AdvancedRegimeDetector detector = AdvancedRegimeDetector(method="hmm") detector.fit_from_dataframe(ohlcv_df) regime = detector.detect_regime(returns=0.01, volatility=0.02) ``` ### 5. Enhanced Feature Engineering **Location**: `src/autopilot/feature_engineering.py` **Enhancements**: - Multi-timeframe feature aggregation - Order book feature extraction - Feature interactions (products, ratios) - Regime-specific feature engineering - Lag features for temporal patterns ## Integration These components integrate with the existing `IntelligentAutopilot` and `StrategySelector` classes: 1. **Online Learning**: Integrated via `_record_trade_for_learning` method 2. **Confidence Calibration**: Applied in `select_best_strategy` method 3. **Explainability**: Available via API endpoints for UI visualization 4. **Regime Detection**: Used in `MarketAnalyzer` for enhanced regime classification ## Configuration Configuration options in `config/config.yaml`: ```yaml autopilot: intelligent: online_learning: drift_window: 100 drift_threshold: 0.1 buffer_size: 50 update_frequency: 100 confidence_calibration: method: "isotonic" # or "platt" regime_detection: method: "hmm" # or "gmm" or "hybrid" n_regimes: 4 ``` ## Dependencies Optional dependencies (with fallbacks): - `hmmlearn`: For HMM regime detection - `shap`: For model explainability - `scipy`: For calibration methods (isotonic regression) ## Performance Considerations - **Online Learning**: Batches updates for efficiency (configurable buffer size) - **SHAP Values**: Can be slow for large models; consider caching or background computation - **HMM/GMM**: Training is fast, prediction is very fast - **Calibration**: Fitting is fast, prediction is O(1) ## Testing Recommended testing approach: 1. Use synthetic data for online learning pipeline 2. Test calibration with known probability distributions 3. Validate SHAP values against known feature importance 4. Compare HMM/GMM regimes against rule-based classification