Acme: Next Generation Agent with MCP, Skills and Custom Data for Drug Discovery

Multi-agent systems accelerate interdisciplinary research

Imagine multi-agent AI systems collaborating like a team of cross-disciplinary experts, autonomously sifting through massive datasets to uncover new patterns and hypotheses. This can now be easily achieved with the Model Context Protocol (MCP), a new standard for easily integrating diverse data sources and tools. The growing MCP server ecosystem—from knowledge bases to report generators—offers endless capabilities.

what does achemy do

meet achemy, A multi-agent assistant that connects external MCP servers such as OpenTargets, PubChem and PubMed with your own chemical libraries on Databricks to better analyze and simultaneously interpret combined knowledge bases. this is also Skill which can be optionally loaded to provide detailed instructions for producing task-specific reports, consistently formatted for research, regulatory or business needs.

Figure 1. achemy There is a multi-agent observer that includes the external MCP servers PubChem, PubMed, and OpenTargets, and the Databricks-managed MCP servers Jini Space (text-to-SQL for DrugBank structured data) and Vector Search (ZINC for unstructured data such as molecular embeddings). Skills can also be loaded to specify task sequences and report formatting and style to ensure consistent output.

Its key capabilities include identifying disease targets and drug candidates, retrieving their detailed chemical, kinetics properties and providing safety and toxicity assessment. Importantly, AiChemy supports its findings with supporting evidence derived from verifiable data sources, making it ideal for research.

Use Case 1: Understand disease mechanisms, find druggable targets, and lead generation

The Guided Tasks panel provides the prompts and agent skills needed to execute key steps in the drug discovery workflow of disease -> target -> drug -> literature validation.

Identify therapeutic goals: Starting with a specific disease subtype, e.g. Estrogen receptor-positive (ER+)/HER2-negative (HER2-) breast cancer (where ER and HER2 are the major protein biomarkers), discover related therapeutic targets (for example, ESR1).
Search Related Medicines: Use identified targets (for example, ESR1) to find potential drug candidates.
Confirm with literature: For a given drug candidate (for example, chemzystrant), check the scientific literature for supporting evidence.

Use Case 2: Lead production by chemical similarity

To identify follow-on oral selective estrogen receptor modulators (SERMs) approved in 2023, allestrantWe can take advantage of chemical similarity. we search big ZINC15 The chemical library for drug-like molecules is structurally similar to that of allemestrants, as quantitative structure–activity relationship (QSAR) principles suggest that they will share similar properties. This is achieved by querying the Databricks vector search, which uses 1024-bit Extended-Connectivity Fingerprint (ECFP) Molecular embeddings of Alasestrant (as query vector) to find the most similar embeddings within the 250,000-molecule index of ZINC.

Figure 2. AiChemy includes vector search of the ZINC database of 250,000 commercially available molecules. This enables us to generate lead compounds by chemical similarity. In this screenshot, we asked AiChemy to find compounds similar to Alasestrant in the ZINC vector search based on ECFP4 molecular embeddings.

Build your own research multi-agent supervisor

We will customize a multi-agent observer on Databricks by integrating public MCP servers with proprietary data on Databricks. To achieve this, you have the option to use no-code agent bricks Or coding options like notebooks. Databricks Playground Allows quick prototyping and iteration of your agents.

Step 1: Prepare the Necessary Components for the Multi-Agent Supervisor

The multi-agent system has 5 employees:

Open goals: External MCP server of disease-target-drug knowledge graph
PubMed: External MCP server of biomedical literature
PubChem: External MCP Server of Chemical Compounds
Drug Library (Genie): A chemical library containing structured drug properties has been created genie place Providing text-to-SQL capabilities.
Chemical Library (Vector Search): A proprietary library of unstructured chemical data with molecular fingerprint embeddings, modeled as a vector index to facilitate similarity search by embeddings.

Step 1A: Connect securely to public MCP servers via Unity Catalog (UC) Connection In ui or in a Databricks notebook (e.g. 4_connect_ext_mcp_opentarget.py).

Step 1B: Make sure your structured table (like DrugBank) is transformed into a genie place With text-to-SQL functionality using ui. Look 1_load_drugbank and descriptors.py

Step 1c: Make sure your unstructured chemical library is created as a vector index In ui Or in the notebook to enable similarity search. Look 2_create VS zinc15.py

Step 2 (Easy Option): Create Multi-Agent Supervisor Using No-Code supervisory agent in 2 minutes

To assemble them, try no-code agent bricks Which creates a supervisor agent with the above components through the UI and deploys it to a REST API endpoint in a few minutes.

Step 2 (Advanced Options): Create Multi-Agent Supervisor Using Databricks Notebook

For more advanced abilities like agentic memory and skills, develop a langgraph supervisor To integrate with Databricks Notebook LakebaseDatabricks Serverless Postgres Database. check it out code repository Where you can easily define multi-agent components (see step 1). config.yml.

Once config.yml is defined, you can deploy the multi-agent supervisor as a mlflow agent server (FastAPI wrapper) with a React web user interface (UI). deploy to both of them Databricks Apps Through ui Or Databricks CLI. set appropriate Permission For users to use the Databricks app and have access to the underlying resources for the service principal of the app (e.g. usage for logging traces, secret scope if any).

Step 3: Evaluate and Monitor Your Agent

Each invitation to the agent is automatically logged and Figured out For Databricks MLflow experiment using OpenTelemetry standards. it makes it easier Evaluation of offline or online feedback to improve the agent over time. Additionally, your deployed multi-agent uses LLM behind an AI gateway so you can take advantage of centralized governance, built-in security measures, and full visibility to production readiness.

Figure 3. All multiagent invocations will be logged whether through the React UI or REST API mlflow tracesCompliant with open telemetry standards for end-to-end observability.

Figure 4. MLflow trace captures the full execution graph, including logic steps, tool calls, retrieved documents, latency, and token usage, for easy debugging and optimization.

next steps

We invite you to explore achemy web app and github repository. Start building your custom multi-agent system in an intuitive, no-code manner agent bricks Framework on Databricks so you can stop sorting and start exploring!