1. Bgolearn 入门指南#

备注

本指南将帮助您在几分钟内安装 Bgolearn 并运行您的第一次优化。

1.1. 安装#

1.1.1. 前置要求#

Bgolearn 需要 Python 3.7 或更高版本。我们建议使用虚拟环境:

# Create virtual environment
python -m venv bgolearn_env

# Activate virtual environment
# On Windows:
bgolearn_env\Scripts\activate
# On macOS/Linux:
source bgolearn_env/bin/activate

1.1.2. 安装 Bgolearn#

从 PyPI 安装主包:

pip install Bgolearn

对于多目标优化,还需安装 MultiBgolearn:

pip install MultiBgolearn

或同时安装两者:

pip install Bgolearn MultiBgolearn

1.1.3. 验证安装#

测试您的安装:

# Test single-objective Bgolearn
from Bgolearn import BGOsampling
print("Bgolearn imported successfully!")

# Test multi-objective MultiBgolearn
try:
    from MultiBgolearn import bgo
    print("MultiBgolearn imported successfully!")
except ImportError:
    print("MultiBgolearn not installed. Install with: pip install MultiBgolearn")

1.2. 基本概念#

1.2.1. 什么是贝叶斯优化?#

贝叶斯优化是一种强大的技术,用于优化评估成本高昂的函数。它在以下情况下特别有用:

  • 实验成本高昂(时间、金钱、资源)

  • 函数评估存在噪声

  • 梯度不可用

  • 您希望最小化实验次数

1.2.2. 核心组件#

  1. 代理模型:近似未知函数(通常是高斯过程等)

  2. 采集函数:决定下一步在哪里采样

  3. 优化循环:迭代改进代理模型

1.3. 您的第一次优化#

1.3.1. 步骤 1:生成样本数据#

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression

# Generate synthetic materials data
def create_materials_dataset(n_samples=50, n_features=4, noise=0.1):
    """Create synthetic materials property data."""
    np.random.seed(42)

    # Generate base features
    X, y = make_regression(n_samples=n_samples, n_features=n_features,
                          noise=noise, random_state=42)

    # Create realistic feature names
    feature_names = ['Temperature', 'Pressure', 'Composition_A', 'Composition_B']

    # Normalize features to realistic ranges
    X[:, 0] = (X[:, 0] - X[:, 0].min()) / (X[:, 0].max() - X[:, 0].min()) * 500 + 300  # Temperature: 300-800K
    X[:, 1] = (X[:, 1] - X[:, 1].min()) / (X[:, 1].max() - X[:, 1].min()) * 10 + 1     # Pressure: 1-11 GPa
    X[:, 2] = np.abs(X[:, 2]) / np.abs(X[:, 2]).max() * 0.8 + 0.1  # Composition: 0.1-0.9
    X[:, 3] = 1 - X[:, 2]  # Ensure compositions sum to 1

    # Create DataFrame
    df = pd.DataFrame(X, columns=feature_names)
    df['Strength'] = y  # Target property (e.g., material strength)

    return df

# Create training data
train_data = create_materials_dataset(n_samples=30)
print("Training data shape:", train_data.shape)
print("\nFirst 5 rows:")
print(train_data.head())

1.3.2. 步骤 2:准备优化数据#

from Bgolearn import BGOsampling

# Separate features and target
X_train = train_data.drop('Strength', axis=1)
y_train = train_data['Strength']

# Create virtual candidates for optimization
def create_candidate_materials(n_candidates=200):
    """Create candidate materials for optimization."""
    np.random.seed(123)

    # Generate candidate space
    candidates = []
    for _ in range(n_candidates):
        temp = np.random.uniform(300, 800)  # Temperature
        pressure = np.random.uniform(1, 11)  # Pressure
        comp_a = np.random.uniform(0.1, 0.9)  # Composition A
        comp_b = 1 - comp_a  # Composition B

        candidates.append([temp, pressure, comp_a, comp_b])

    return pd.DataFrame(candidates, columns=X_train.columns)

X_candidates = create_candidate_materials()
print(f"Created {len(X_candidates)} candidate materials")

1.3.3. 步骤 3:初始化并调用 Bgolearn#

# Initialize Bgolearn optimizer
optimizer = BGOsampling.Bgolearn()

# Fit the model
print("Fitting Bgolearn model...")
model = optimizer.fit(
    data_matrix=X_train,
    Measured_response=y_train,
    virtual_samples=X_candidates,
    Mission='Regression',
    min_search=False,  # We want to maximize strength
    CV_test=5,  # 5-fold cross-validation
    Normalize=True
)

print("Model fitted successfully!")

1.3.4. 步骤 4:单点优化#

# Expected Improvement
print("\n=== Expected Improvement ===")
ei_values, next_point_ei = model.EI()
print(f"Next experiment (EI): {next_point_ei}")

# Upper Confidence Bound
print("\n=== Upper Confidence Bound ===")
ucb_values, next_point_ucb = model.UCB(alpha=2.0)
print(f"Next experiment (UCB): {next_point_ucb}")

# Probability of Improvement
print("\n=== Probability of Improvement ===")
poi_values, next_point_poi = model.PoI(tao=0.01)
print(f"Next experiment (PoI): {next_point_poi}")

1.3.5. 步骤 5:基本可视化#

import matplotlib.pyplot as plt

# Get EI values for visualization
ei_values, recommended_points = model.EI()

# Plot Expected Improvement values
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(ei_values)
plt.title("Expected Improvement Values")
plt.xlabel("Candidate Index")
plt.ylabel("EI Value")
plt.grid(True)

# Plot predicted vs actual (if you have true function)
plt.subplot(1, 2, 2)
plt.scatter(range(len(model.virtual_samples_mean)), model.virtual_samples_mean, alpha=0.6)
plt.title("Predicted Values for All Candidates")
plt.xlabel("Candidate Index")
plt.ylabel("Predicted Value")
plt.grid(True)

plt.tight_layout()
plt.show()

1.4. 理解结果#

1.4.1. 采集函数#

  1. 期望提升 (Expected Improvement, EI)

    • 平衡探索和利用

    • 较高的值表示更有前景的区域

    • 良好的通用选择

  2. 上置信界 (Upper Confidence Bound, UCB)

    • 由参数 alpha 控制

    • 更高的 alpha = 更多探索

    • 适用于噪声函数

  3. 改进概率 (Probability of Improvement, PoI)

    • 简单直观

    • 由参数 tao 控制

    • 可能过度利用

1.4.2. 模型验证#

# Check cross-validation results
print("\n=== Model Validation ===")
print("Cross-validation results saved in ./Bgolearn/ directory")

# List generated files
import os
bgo_files = [f for f in os.listdir('./Bgolearn') if f.endswith('.csv')]
print("Generated files:")
for file in bgo_files:
    print(f"  - {file}")

1.5. 下一步#

现在您已经完成了第一次优化,探索这些高级主题:

  1. 采集函数 - 深入了解不同的采集策略

  2. 批量优化 - 并行实验设计

  3. 可视化 - 高级绘图和仪表板

  4. 材料发现 - 专业工作流程

1.6. 成功技巧#

1.6.1. 数据准备#

  • 确保特征在相似的尺度上(使用 Normalize=True

  • 删除高度相关的特征

  • 适当处理缺失值

1.6.2. 模型选择#

  • 从默认的高斯过程开始

  • 使用交叉验证评估模型质量

  • 考虑实验中的噪声水平

1.6.3. 采集函数选择#

  • EI:良好的通用选择

  • UCB:更适合噪声函数

  • 批量方法:当您可以运行并行实验时

1.6.4. 迭代策略#

  • 从空间填充设计开始

  • 使用采集函数进行细化

  • 仔细监控收敛性

1.7. 常见问题#

1.7.1. 内存问题#

# For large candidate sets, use batching
if len(X_candidates) > 100000:
    print("Large candidate set detected. Consider using smaller batches.")

1.7.2. 收敛问题#

# Check for proper normalization
print("Feature ranges:")
print(X_train.describe())

# Ensure sufficient training data
if len(X_train) < 10:
    print("Warning: Very small training set. Consider collecting more data.")

1.7.3. 数值稳定性#

# Check for extreme values
print("Target variable statistics:")
print(y_train.describe())

# Look for outliers
Q1 = y_train.quantile(0.25)
Q3 = y_train.quantile(0.75)
IQR = Q3 - Q1
outliers = y_train[(y_train < Q1 - 1.5*IQR) | (y_train > Q3 + 1.5*IQR)]
print(f"Potential outliers: {len(outliers)}")