feat: Add core trading modules for risk management, backtesting, and execution algorithms, alongside a new ML transparency widget and related frontend dependencies.

2025-12-31 21:25:06 -05:00
parent 099432bf3f
commit 7bd6be64a4
743 changed files with 8617 additions and 5042 deletions
--- a/docs/architecture/ml_improvements.md
+++ b/docs/architecture/ml_improvements.md
@@ -0,0 +1,183 @@
+# Machine Learning Improvements
+
+This document describes the ML enhancements added to the intelligent autopilot system.
+
+## Overview
+
+The ML improvements focus on making the strategy selection model more robust, interpretable, and adaptive to changing market conditions.
+
+## Components
+
+### 1. Online Learning Pipeline
+
+**Location**: `src/autopilot/online_learning.py`
+
+**Features**:
+- Incremental model updates from live trading data
+- Concept drift detection using performance windows
+- Buffered training samples for efficient batch updates
+- Automatic full retraining on drift detection
+
+**Usage**:
+```python
+from src.autopilot.online_learning import get_online_learning_pipeline
+
+pipeline = get_online_learning_pipeline(model)
+
+# Add training sample after trade
+await pipeline.add_training_sample(
+    market_conditions=conditions,
+    strategy_name="selected_strategy",
+    performance=trade_return
+)
+
+# Check for drift and retrain if needed
+retrain_result = await pipeline.trigger_full_retrain_if_needed()
+```
+
+### 2. Confidence Calibration
+
+**Location**: `src/autopilot/confidence_calibration.py`
+
+**Features**:
+- Platt scaling (logistic regression calibration)
+- Isotonic regression calibration
+- Probability distribution calibration
+- Validation data integration
+
+**Methods**:
+- `Platt Scaling`: Fast, parametric calibration using logistic regression
+- `Isotonic Regression`: Non-parametric, more flexible but requires more data
+
+**Usage**:
+```python
+from src.autopilot.confidence_calibration import get_confidence_calibration_manager
+
+calibrator = get_confidence_calibration_manager()
+
+# Fit from validation data
+calibrator.fit_from_validation_data(
+    predicted_probs=[...],
+    true_labels=[...]
+)
+
+# Calibrate predictions
+strategy, calibrated_conf, calibrated_preds = calibrator.calibrate_prediction(
+    strategy_name="strategy",
+    confidence=0.85,
+    all_predictions={...}
+)
+```
+
+### 3. Model Explainability
+
+**Location**: `src/autopilot/explainability.py`
+
+**Features**:
+- SHAP (SHapley Additive exPlanations) value integration
+- Feature importance analysis (global and local)
+- Prediction explanations with top contributing features
+- Support for tree-based and kernel-based models
+
+**Usage**:
+```python
+from src.autopilot.explainability import get_model_explainer
+
+explainer = get_model_explainer(model)
+
+# Initialize with background data
+explainer.initialize_explainer(background_data_df)
+
+# Explain a prediction
+explanation = explainer.explain_prediction(features)
+# Returns: feature_importance, top_positive_features, top_negative_features, etc.
+
+# Get global feature importance
+global_importance = explainer.get_global_feature_importance()
+```
+
+### 4. Advanced Regime Detection
+
+**Location**: `src/autopilot/regime_detection.py`
+
+**Features**:
+- Hidden Markov Models (HMM) for regime detection
+- Gaussian Mixture Models (GMM) for regime detection
+- Hybrid detection combining multiple methods
+- Probabilistic regime predictions
+
+**Methods**:
+- `HMM`: Models regime transitions as Markov process
+- `GMM`: Clusters market states using Gaussian mixtures
+- `Hybrid`: Combines both methods for robust detection
+
+**Usage**:
+```python
+from src.autopilot.regime_detection import AdvancedRegimeDetector
+
+detector = AdvancedRegimeDetector(method="hmm")
+detector.fit_from_dataframe(ohlcv_df)
+
+regime = detector.detect_regime(returns=0.01, volatility=0.02)
+```
+
+### 5. Enhanced Feature Engineering
+
+**Location**: `src/autopilot/feature_engineering.py`
+
+**Enhancements**:
+- Multi-timeframe feature aggregation
+- Order book feature extraction
+- Feature interactions (products, ratios)
+- Regime-specific feature engineering
+- Lag features for temporal patterns
+
+## Integration
+
+These components integrate with the existing `IntelligentAutopilot` and `StrategySelector` classes:
+
+1. **Online Learning**: Integrated via `_record_trade_for_learning` method
+2. **Confidence Calibration**: Applied in `select_best_strategy` method
+3. **Explainability**: Available via API endpoints for UI visualization
+4. **Regime Detection**: Used in `MarketAnalyzer` for enhanced regime classification
+
+## Configuration
+
+Configuration options in `config/config.yaml`:
+
+```yaml
+autopilot:
+  intelligent:
+    online_learning:
+      drift_window: 100
+      drift_threshold: 0.1
+      buffer_size: 50
+      update_frequency: 100
+    confidence_calibration:
+      method: "isotonic"  # or "platt"
+    regime_detection:
+      method: "hmm"  # or "gmm" or "hybrid"
+      n_regimes: 4
+```
+
+## Dependencies
+
+Optional dependencies (with fallbacks):
+- `hmmlearn`: For HMM regime detection
+- `shap`: For model explainability
+- `scipy`: For calibration methods (isotonic regression)
+
+## Performance Considerations
+
+- **Online Learning**: Batches updates for efficiency (configurable buffer size)
+- **SHAP Values**: Can be slow for large models; consider caching or background computation
+- **HMM/GMM**: Training is fast, prediction is very fast
+- **Calibration**: Fitting is fast, prediction is O(1)
+
+## Testing
+
+Recommended testing approach:
+1. Use synthetic data for online learning pipeline
+2. Test calibration with known probability distributions
+3. Validate SHAP values against known feature importance
+4. Compare HMM/GMM regimes against rule-based classification