Poster
Towards Imitation Learning to Branch for MIP: A Hybrid Reinforcement Learning based Sample Augmentation Approach
Changwen Zhang · wenli ouyang · Hao Yuan · Liming Gong · Yong Sun · Ziao Guo · Zhichen Dong · Junchi Yan
Halle B
Branch-and-bound (B\&B) has long been favored for tackling complex Mixed Integer Programming (MIP) problems, where the choice of branching strategy plays a pivotal role. Recently, Imitation Learning (IL)-based policies have emerged as potent alternatives to traditional rule-based approaches. However, it is nontrivial of acquiring high-quality training samples, and IL often converges to suboptimal variable choices for branching, restricting the overall performance. In response to these challenges, we propose a novel hybrid online and offline reinforcement learning (RL) approach to enhance the branching policy by cost-effective training sample augmentation. In online phase, we train an online RL agent to dynamically decide the sample generation processes, drawing from either the learning-based policy or the expert policy. The objective here is to strike an optimal balance between the exploration and exploitation of the sample generation process. In offline phase, a value function is trained to fit the cumulative reward for each decision and to filter the samples with high cumulative returns. This dual-purpose function not only reduces training complexity but also enhances the quality of the samples. To assess the efficacy of our proposed data augmentation mechanism, we conduct comprehensive evaluations across a range of MIP problems. The results consistently show that our method excels in making superior branching decisions compared to state-of-the-art learning-based models and the open-source solver SCIP. Notably, it even often outperforms the commercial solver Gurobi.