Retrieval-augmented generation (RAG) pipelines are easy to build; It’s almost impossible to create something that won’t cause hallucinations during a 10-K audit. For developers in the financial sector, the ‘standard’ vector-based RAG approach – segmenting text and hoping for the best – often results in a ‘text soup’ that loses the important structural context of tables and balance sheets.
VectifyAI is attempting to bridge this gap with the launch of Mafin 2.5a multimodal financial agent, and page indexAn open-source framework that moves the industry towards ‘Vectorless RAG’.
Problem: Why does vector RAG fail in finance
Traditional RAG relies on semantic similarity. If you ask about ‘net income’, a vector database looks for parts of the text sound Like net income. However, financial documents depend on the layout. A number in a cell is meaningless without its header, and those headers are often removed during traditional PDF-to-text conversion.
This is the ‘garbage in, garbage out’ trap: even the smartest LLM cannot reason correctly if the input data has lost its hierarchical structure.
Mafin 2.5: accuracy at scale
The Mafin 2.5 is not just a streamlined model; It is a logic engine that has achieved 98.7% accuracy on FinanceBenchSignificantly outperforms GPT-4o and Perplexity in financial recovery tasks.
Its native integration with high-fidelity data sources sets it apart for developers:
- Broad SEC Reach: Direct indexing of 10-K, 10-Q, and 8-K filings.
- Income Intel: Real-time and historical earnings call transcripts.
- Market Data: Live tickers on the Russell 3000 and Nasdaq.

PageIndex: Step towards ‘vectorless’ RAG
The ‘secret sauce’ behind Mafin 2.5’s accuracy page index. PageIndex replaces traditional flat embedding hierarchical tree index.
Instead of searching through random sections, PageIndex allows the LLM to ‘reason’ through the structure of the document. This creates a semantic tree – essentially an intelligent map of the document – ​​enabling the agent to identify the exact section, page and line item needed.
Key technical features include:
- Vision-Native Support: PageIndex supports Vision-Based RAGAllows models to ‘see’ the global layout of a page (charts, complex grids) rather than relying solely on OCR text.
- Hierarchical Navigation: This transforms the PDF into a navigational tree structure, preserving the relationship between headers and data.
- Traceability: Unlike the ‘black box’ of vector similarity, each answer has a clear path through the document tree, providing a much-needed audit trail for regulated financial environments.
key takeaways
- Unprecedented Financial Accuracy (98.7%): Mafin 2.5 sets a new state-of-the-art record financebench Benchmark, achieving 98.7% accuracy. It outperforms general-purpose models like GPT-4o (~31%) and Perplexity (~45%) by focusing on specialized financial logic rather than general retrieval.
- Changes to ‘vectorless RAG’: Moving away from “vibe-based” searching of traditional vector databases, page index Presented by logic-based RAG. It uses LLM to ‘reason’ its way through the structure of the document, mimicking how a human analyst navigates a report to find specific data points.
- Hierarchical ‘tree’ indexing vs chunking: Instead of cutting documents into arbitrary, contextless text segments, PageIndex organizes PDFs into one semantic tree structure (An intelligent table of contents). This preserves important relationships between headers, nested tables, and footnotes that traditional RAG often destroys.
- Vision-native and OCR-free workflows: framework supports Vision-Based Vectorless RAGAllows AI to ‘see’ and retrieve information directly from page images. This is a game-changer for financial documents where the visual layout of a balance sheet or complex grid is as important as the numbers.
- Enterprise-grade traceability: In contrast to the ‘black box’ of vector similarity, PageIndex provides a fully auditable logic path. Each response is linked to specific nodes, pages and sections, providing the transparency needed for high-level financial audit and compliance.
check it out technical details And repo. Also, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.

Michael Sutter is a data science professional and holds a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michael excels in transforming complex datasets into actionable insights.

