Microsoft Research Releases OptiMind: A 20B Parameter Model That Transforms Natural Language Into Solver Ready Optimization Models

by
0 comments
Microsoft Research Releases OptiMind: A 20B Parameter Model That Transforms Natural Language Into Solver Ready Optimization Models

Microsoft Research has released OptiMind, an AI based system that converts natural language descriptions of complex decision problems into mathematical formulations that optimization solvers can execute. It targets a long-standing bottleneck in operations research, where translating business intent into mixed integer linear programs typically requires expert modelers and days of work.

What is Optimind and what output does it give??

optimind-sft The experts in the GPT OS Transformer family have a special 20B parameter mixing model. There are approximately 3.6B parameters active per token, so the estimation cost is close to a medium-sized model while keeping high capacity. The context length is 128,000 tokens, allowing longer specifications and multi-step logic traces inside a single request.

The model takes a natural language description of the optimization problem as input. The output is a mathematical formulation as well as executable Python code that uses GurubiPy. The generated script defines the decision variables, constraints, and objective, calls the Gurobi solver, and prints the optimal objective value and decisions.

OptiMind acts as a formulation layer between domain experts and standard MILP solvers. This doesn’t replace the solver, it generates MILP which the solver will optimize.

Architecture, training setup, and dataset

base model is openai/gpt-oss-20bfine tuned microsoft/OptiMind-SFT Using clean optimization datasets. The architecture is a mix of experts’ transformers, with routing that activates a subset of experts per token. The model is released under the MIT license.

Training uses 8 NVIDIA B200 GPUs, and inference and evaluation use 8 NVIDIA H100 GPUs in the reference setup. Reported fine tuning time is approximately 8 hours. For regular use, the team recommends at least 32 GB of GPU memory on hardware like the A100, H100, or B200.

For supervised fine tuning, the research team built clean versions of OR Instruct and OptMATH Train. For testing, they use expert validated and re-cleaned versions of IndustryOR, Mamo Complex and Optmath. These benchmarks cover difficult formulation tasks where existing models often reach only 20 to 50 percent accuracy on the original noise versions.

Class based error analysis and data cleaning

A key technical idea at OptiMind is to combine optimization expertise with LLM training. The research team classifies OR-Instruct and OptMATH problems into 53 seed classes, for example set cover, flow shop scheduling, or the traveling salesman problem.

For each class, they run the gpt-oss-20b-base model on a sample of problems and select instances where the model output disagrees with the ground truth. Optimization experts inspect these items, identify frequently occurring formulation mistakes, and write brief error descriptions and preventive hints. These notations describe correct constraints, variable bounds, or modeling tricks, such as the Miller Tucker Zemlin constraints appropriate for TSP.

The research team then uses a semi-automated pipeline. They refactor solutions with a larger model that is motivated with class specific cues, apply majority voting across samples to improve solution quality, and remove objects that remain inconsistent. They also detect missing parameters and ambiguous statements and reproduce the problem statement when needed. The result is a cleaner training corpus that better aligns with the correct mathematical formulation.

Estimation Pipeline, Signaling, and Test Time Scaling

During inference, OptiMind behaves as a multi-stage system, not just a single signal. The default pipeline first classifies each test example into one of 53 optimization classes used during fault analysis. It then augments the signal with error summaries and signal pairs associated with that class.

The model then generates a logic trace, mathematical formulation and GuroBP code. When more computation is available, the system can enforce self-consistency with majority voting. It generates several candidate scripts, executes them, and selects the solution that appears most frequently within specified numerical tolerances.

Multi turn correction mode can also be enabled. The system runs the generated code, captures solver logs or execution errors, feeds this feedback back to the model, and lets the model modify the formulation and code for a few rounds. This offsets some modeling and coding errors at the expense of higher latency.

Quantitative Advantage on Optimization Benchmarks

On cleaned versions of IndustryOR, Mamo-Complex, and OptMath, the Optimind framework significantly improves solution accuracy. The fine-tuned model improves formulation accuracy by up to 20.7 percent across multiple optimization benchmarks, with further gains when test time scaling techniques such as self-consistency and multi-turn feedback are applied.

In these benchmarks, OptiMind improves absolute accuracy over the gpt-oss-20b-base model and outperforms other open source models of similar or larger size. It reaches performance that is competitive with proprietary Frontier models such as GPT-o4 Mini and GPT-5. evaluation settings.

These results depend on careful cleaning of both training and testing data. The research team reports that many of the apparent model errors on the original benchmarks actually came from missing data, unclear details, or incorrect reference solutions, and re-cleaning can increase the apparent accuracy for a given model from about 40 to 60 percent to a range of 70 to 90 percent on the correct set.

key takeaways

  1. OptiMind is a 20B parameter mixture experts’ transformer in the gpt-oss-family that takes natural language optimization problems as input and outputs both a mathematical formulation and executable GurubiPy code, with approximately 3.6B parameters activated per token and 128,000 token context length.
  2. The model is fine tuned openai/gpt-oss-20b Evaluated on expert validated benchmarks including IndustRI and MaMo Complex, on clean optimization datasets such as OR-Instruct and OptMATH, and focusing on mixed integer linear programming formulations.
  3. OptiMind uses class-based error analysis and expert written hints for 53 optimization classes, then applies these hints at both data cleaning and inference time, systematically reducing common modeling mistakes in the generated MILPs.
  4. The framework improves formulation accuracy by up to 20.7 percent in several optimization benchmarks compared to the base model, and with test time scaling methods such as self-consistency and multi-turn feedback it reaches performance that is competitive with larger proprietary systems.
  5. OptiMind-SFT is released as microsoft/OptiMind-SFT on hugging faces and like microsoft-optimind-sft In Azure AI Foundry, where it can be served as an OpenAI compatible endpoint via SGLang, enabling practical integration into decision support pipelines for supply chain, manufacturing, logistics, and scheduling.

check it out model weight And technical details. Also, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.


Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.

Related Articles

Leave a Comment