1. Bgolearn 入门指南#
备注
本指南将帮助您在几分钟内安装 Bgolearn 并运行您的第一次优化。
1.1. 安装#
1.1.1. 前置要求#
Bgolearn 需要 Python 3.7 或更高版本。我们建议使用虚拟环境:
# Create virtual environment
python -m venv bgolearn_env
# Activate virtual environment
# On Windows:
bgolearn_env\Scripts\activate
# On macOS/Linux:
source bgolearn_env/bin/activate
1.1.2. 安装 Bgolearn#
从 PyPI 安装主包:
pip install Bgolearn
对于多目标优化,还需安装 MultiBgolearn:
pip install MultiBgolearn
或同时安装两者:
pip install Bgolearn MultiBgolearn
1.1.3. 验证安装#
测试您的安装:
# Test single-objective Bgolearn
from Bgolearn import BGOsampling
print("Bgolearn imported successfully!")
# Test multi-objective MultiBgolearn
try:
from MultiBgolearn import bgo
print("MultiBgolearn imported successfully!")
except ImportError:
print("MultiBgolearn not installed. Install with: pip install MultiBgolearn")
1.2. 基本概念#
1.2.1. 什么是贝叶斯优化?#
贝叶斯优化是一种强大的技术,用于优化评估成本高昂的函数。它在以下情况下特别有用:
实验成本高昂(时间、金钱、资源)
函数评估存在噪声
梯度不可用
您希望最小化实验次数
1.2.2. 核心组件#
代理模型:近似未知函数(通常是高斯过程等)
采集函数:决定下一步在哪里采样
优化循环:迭代改进代理模型
1.3. 您的第一次优化#
1.3.1. 步骤 1:生成样本数据#
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
# Generate synthetic materials data
def create_materials_dataset(n_samples=50, n_features=4, noise=0.1):
"""Create synthetic materials property data."""
np.random.seed(42)
# Generate base features
X, y = make_regression(n_samples=n_samples, n_features=n_features,
noise=noise, random_state=42)
# Create realistic feature names
feature_names = ['Temperature', 'Pressure', 'Composition_A', 'Composition_B']
# Normalize features to realistic ranges
X[:, 0] = (X[:, 0] - X[:, 0].min()) / (X[:, 0].max() - X[:, 0].min()) * 500 + 300 # Temperature: 300-800K
X[:, 1] = (X[:, 1] - X[:, 1].min()) / (X[:, 1].max() - X[:, 1].min()) * 10 + 1 # Pressure: 1-11 GPa
X[:, 2] = np.abs(X[:, 2]) / np.abs(X[:, 2]).max() * 0.8 + 0.1 # Composition: 0.1-0.9
X[:, 3] = 1 - X[:, 2] # Ensure compositions sum to 1
# Create DataFrame
df = pd.DataFrame(X, columns=feature_names)
df['Strength'] = y # Target property (e.g., material strength)
return df
# Create training data
train_data = create_materials_dataset(n_samples=30)
print("Training data shape:", train_data.shape)
print("\nFirst 5 rows:")
print(train_data.head())
1.3.2. 步骤 2:准备优化数据#
from Bgolearn import BGOsampling
# Separate features and target
X_train = train_data.drop('Strength', axis=1)
y_train = train_data['Strength']
# Create virtual candidates for optimization
def create_candidate_materials(n_candidates=200):
"""Create candidate materials for optimization."""
np.random.seed(123)
# Generate candidate space
candidates = []
for _ in range(n_candidates):
temp = np.random.uniform(300, 800) # Temperature
pressure = np.random.uniform(1, 11) # Pressure
comp_a = np.random.uniform(0.1, 0.9) # Composition A
comp_b = 1 - comp_a # Composition B
candidates.append([temp, pressure, comp_a, comp_b])
return pd.DataFrame(candidates, columns=X_train.columns)
X_candidates = create_candidate_materials()
print(f"Created {len(X_candidates)} candidate materials")
1.3.3. 步骤 3:初始化并调用 Bgolearn#
# Initialize Bgolearn optimizer
optimizer = BGOsampling.Bgolearn()
# Fit the model
print("Fitting Bgolearn model...")
model = optimizer.fit(
data_matrix=X_train,
Measured_response=y_train,
virtual_samples=X_candidates,
Mission='Regression',
min_search=False, # We want to maximize strength
CV_test=5, # 5-fold cross-validation
Normalize=True
)
print("Model fitted successfully!")
1.3.4. 步骤 4:单点优化#
# Expected Improvement
print("\n=== Expected Improvement ===")
ei_values, next_point_ei = model.EI()
print(f"Next experiment (EI): {next_point_ei}")
# Upper Confidence Bound
print("\n=== Upper Confidence Bound ===")
ucb_values, next_point_ucb = model.UCB(alpha=2.0)
print(f"Next experiment (UCB): {next_point_ucb}")
# Probability of Improvement
print("\n=== Probability of Improvement ===")
poi_values, next_point_poi = model.PoI(tao=0.01)
print(f"Next experiment (PoI): {next_point_poi}")
1.3.5. 步骤 5:基本可视化#
import matplotlib.pyplot as plt
# Get EI values for visualization
ei_values, recommended_points = model.EI()
# Plot Expected Improvement values
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(ei_values)
plt.title("Expected Improvement Values")
plt.xlabel("Candidate Index")
plt.ylabel("EI Value")
plt.grid(True)
# Plot predicted vs actual (if you have true function)
plt.subplot(1, 2, 2)
plt.scatter(range(len(model.virtual_samples_mean)), model.virtual_samples_mean, alpha=0.6)
plt.title("Predicted Values for All Candidates")
plt.xlabel("Candidate Index")
plt.ylabel("Predicted Value")
plt.grid(True)
plt.tight_layout()
plt.show()
1.4. 理解结果#
1.4.1. 采集函数#
期望提升 (Expected Improvement, EI):
平衡探索和利用
较高的值表示更有前景的区域
良好的通用选择
上置信界 (Upper Confidence Bound, UCB):
由参数
alpha控制更高的
alpha= 更多探索适用于噪声函数
改进概率 (Probability of Improvement, PoI):
简单直观
由参数
tao控制可能过度利用
1.4.2. 模型验证#
# Check cross-validation results
print("\n=== Model Validation ===")
print("Cross-validation results saved in ./Bgolearn/ directory")
# List generated files
import os
bgo_files = [f for f in os.listdir('./Bgolearn') if f.endswith('.csv')]
print("Generated files:")
for file in bgo_files:
print(f" - {file}")
1.5. 下一步#
现在您已经完成了第一次优化,探索这些高级主题:
1.6. 成功技巧#
1.6.1. 数据准备#
确保特征在相似的尺度上(使用
Normalize=True)删除高度相关的特征
适当处理缺失值
1.6.2. 模型选择#
从默认的高斯过程开始
使用交叉验证评估模型质量
考虑实验中的噪声水平
1.6.3. 采集函数选择#
EI:良好的通用选择
UCB:更适合噪声函数
批量方法:当您可以运行并行实验时
1.6.4. 迭代策略#
从空间填充设计开始
使用采集函数进行细化
仔细监控收敛性
1.7. 常见问题#
1.7.1. 内存问题#
# For large candidate sets, use batching
if len(X_candidates) > 100000:
print("Large candidate set detected. Consider using smaller batches.")
1.7.2. 收敛问题#
# Check for proper normalization
print("Feature ranges:")
print(X_train.describe())
# Ensure sufficient training data
if len(X_train) < 10:
print("Warning: Very small training set. Consider collecting more data.")
1.7.3. 数值稳定性#
# Check for extreme values
print("Target variable statistics:")
print(y_train.describe())
# Look for outliers
Q1 = y_train.quantile(0.25)
Q3 = y_train.quantile(0.75)
IQR = Q3 - Q1
outliers = y_train[(y_train < Q1 - 1.5*IQR) | (y_train > Q3 + 1.5*IQR)]
print(f"Potential outliers: {len(outliers)}")