Some checks are pending
Documentation / build-docs (push) Waiting to run
Tests / test (macos-latest, 3.11) (push) Waiting to run
Tests / test (macos-latest, 3.12) (push) Waiting to run
Tests / test (macos-latest, 3.13) (push) Waiting to run
Tests / test (macos-latest, 3.14) (push) Waiting to run
Tests / test (ubuntu-latest, 3.11) (push) Waiting to run
Tests / test (ubuntu-latest, 3.12) (push) Waiting to run
Tests / test (ubuntu-latest, 3.13) (push) Waiting to run
Tests / test (ubuntu-latest, 3.14) (push) Waiting to run
184 lines
5.1 KiB
Markdown
184 lines
5.1 KiB
Markdown
# Machine Learning Improvements
|
|
|
|
This document describes the ML enhancements added to the intelligent autopilot system.
|
|
|
|
## Overview
|
|
|
|
The ML improvements focus on making the strategy selection model more robust, interpretable, and adaptive to changing market conditions.
|
|
|
|
## Components
|
|
|
|
### 1. Online Learning Pipeline
|
|
|
|
**Location**: `src/autopilot/online_learning.py`
|
|
|
|
**Features**:
|
|
- Incremental model updates from live trading data
|
|
- Concept drift detection using performance windows
|
|
- Buffered training samples for efficient batch updates
|
|
- Automatic full retraining on drift detection
|
|
|
|
**Usage**:
|
|
```python
|
|
from src.autopilot.online_learning import get_online_learning_pipeline
|
|
|
|
pipeline = get_online_learning_pipeline(model)
|
|
|
|
# Add training sample after trade
|
|
await pipeline.add_training_sample(
|
|
market_conditions=conditions,
|
|
strategy_name="selected_strategy",
|
|
performance=trade_return
|
|
)
|
|
|
|
# Check for drift and retrain if needed
|
|
retrain_result = await pipeline.trigger_full_retrain_if_needed()
|
|
```
|
|
|
|
### 2. Confidence Calibration
|
|
|
|
**Location**: `src/autopilot/confidence_calibration.py`
|
|
|
|
**Features**:
|
|
- Platt scaling (logistic regression calibration)
|
|
- Isotonic regression calibration
|
|
- Probability distribution calibration
|
|
- Validation data integration
|
|
|
|
**Methods**:
|
|
- `Platt Scaling`: Fast, parametric calibration using logistic regression
|
|
- `Isotonic Regression`: Non-parametric, more flexible but requires more data
|
|
|
|
**Usage**:
|
|
```python
|
|
from src.autopilot.confidence_calibration import get_confidence_calibration_manager
|
|
|
|
calibrator = get_confidence_calibration_manager()
|
|
|
|
# Fit from validation data
|
|
calibrator.fit_from_validation_data(
|
|
predicted_probs=[...],
|
|
true_labels=[...]
|
|
)
|
|
|
|
# Calibrate predictions
|
|
strategy, calibrated_conf, calibrated_preds = calibrator.calibrate_prediction(
|
|
strategy_name="strategy",
|
|
confidence=0.85,
|
|
all_predictions={...}
|
|
)
|
|
```
|
|
|
|
### 3. Model Explainability
|
|
|
|
**Location**: `src/autopilot/explainability.py`
|
|
|
|
**Features**:
|
|
- SHAP (SHapley Additive exPlanations) value integration
|
|
- Feature importance analysis (global and local)
|
|
- Prediction explanations with top contributing features
|
|
- Support for tree-based and kernel-based models
|
|
|
|
**Usage**:
|
|
```python
|
|
from src.autopilot.explainability import get_model_explainer
|
|
|
|
explainer = get_model_explainer(model)
|
|
|
|
# Initialize with background data
|
|
explainer.initialize_explainer(background_data_df)
|
|
|
|
# Explain a prediction
|
|
explanation = explainer.explain_prediction(features)
|
|
# Returns: feature_importance, top_positive_features, top_negative_features, etc.
|
|
|
|
# Get global feature importance
|
|
global_importance = explainer.get_global_feature_importance()
|
|
```
|
|
|
|
### 4. Advanced Regime Detection
|
|
|
|
**Location**: `src/autopilot/regime_detection.py`
|
|
|
|
**Features**:
|
|
- Hidden Markov Models (HMM) for regime detection
|
|
- Gaussian Mixture Models (GMM) for regime detection
|
|
- Hybrid detection combining multiple methods
|
|
- Probabilistic regime predictions
|
|
|
|
**Methods**:
|
|
- `HMM`: Models regime transitions as Markov process
|
|
- `GMM`: Clusters market states using Gaussian mixtures
|
|
- `Hybrid`: Combines both methods for robust detection
|
|
|
|
**Usage**:
|
|
```python
|
|
from src.autopilot.regime_detection import AdvancedRegimeDetector
|
|
|
|
detector = AdvancedRegimeDetector(method="hmm")
|
|
detector.fit_from_dataframe(ohlcv_df)
|
|
|
|
regime = detector.detect_regime(returns=0.01, volatility=0.02)
|
|
```
|
|
|
|
### 5. Enhanced Feature Engineering
|
|
|
|
**Location**: `src/autopilot/feature_engineering.py`
|
|
|
|
**Enhancements**:
|
|
- Multi-timeframe feature aggregation
|
|
- Order book feature extraction
|
|
- Feature interactions (products, ratios)
|
|
- Regime-specific feature engineering
|
|
- Lag features for temporal patterns
|
|
|
|
## Integration
|
|
|
|
These components integrate with the existing `IntelligentAutopilot` and `StrategySelector` classes:
|
|
|
|
1. **Online Learning**: Integrated via `_record_trade_for_learning` method
|
|
2. **Confidence Calibration**: Applied in `select_best_strategy` method
|
|
3. **Explainability**: Available via API endpoints for UI visualization
|
|
4. **Regime Detection**: Used in `MarketAnalyzer` for enhanced regime classification
|
|
|
|
## Configuration
|
|
|
|
Configuration options in `config/config.yaml`:
|
|
|
|
```yaml
|
|
autopilot:
|
|
intelligent:
|
|
online_learning:
|
|
drift_window: 100
|
|
drift_threshold: 0.1
|
|
buffer_size: 50
|
|
update_frequency: 100
|
|
confidence_calibration:
|
|
method: "isotonic" # or "platt"
|
|
regime_detection:
|
|
method: "hmm" # or "gmm" or "hybrid"
|
|
n_regimes: 4
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
Optional dependencies (with fallbacks):
|
|
- `hmmlearn`: For HMM regime detection
|
|
- `shap`: For model explainability
|
|
- `scipy`: For calibration methods (isotonic regression)
|
|
|
|
## Performance Considerations
|
|
|
|
- **Online Learning**: Batches updates for efficiency (configurable buffer size)
|
|
- **SHAP Values**: Can be slow for large models; consider caching or background computation
|
|
- **HMM/GMM**: Training is fast, prediction is very fast
|
|
- **Calibration**: Fitting is fast, prediction is O(1)
|
|
|
|
## Testing
|
|
|
|
Recommended testing approach:
|
|
1. Use synthetic data for online learning pipeline
|
|
2. Test calibration with known probability distributions
|
|
3. Validate SHAP values against known feature importance
|
|
4. Compare HMM/GMM regimes against rule-based classification
|