AI VISIONS

Let LLMs Be LLMs

Author: Tom Vatland – CTO – AI VISIONS AS
Date: March 29, 2025

As organizations race to adopt Large Language Models (LLMs), many are stumbling into a trap: treating the LLM as a knowledge database. This leads to inefficiencies, hallucinated facts, and brittle systems that can’t keep up with real-world demands. LLMs shine at understanding and generating language—not at storing or maintaining facts. That’s a job for structured external sources like databases, APIs, vector stores, and knowledge graphs.

Take Grab, the Southeast Asian super-app, as an example. By using Retrieval-Augmented Generation (RAG), they’ve automated report generation, saving 3-4 hours per report. Their secret? A modular setup: a knowledge layer for facts, a retrieval layer for context, and an LLM to craft natural language responses. This approach keeps their system accurate, adaptable, and efficient—unlike baking all info into the model, which sacrifices updateability and transparency.

Recent innovations like Microsoft’s KBLaM (March 2025) show progress in encoding knowledge into LLMs, reducing hallucinations. But for real-time or domain-specific needs, external knowledge sources remain critical. The catch? Current RAG implementations are often ad hoc and fragile, lacking a standardized way to connect LLMs with knowledge systems.

A new suggestion for a standardized M2M protocol for LLMs may offer a more feasible and efficient way to connect models with external knowledge systems. Taking inspiration from ideas like gRPC, it explores the use of binary formats and token-based communication aligned with a shared tokenizer. The goal is to let the LLM request only what it needs and receive scoped, attributed context that fits within its available window. This kind of setup could improve speed, enable multilevel caching, and help separate language understanding from knowledge access. It points toward more scalable and real-time RAG systems that are easier to manage and evolve.

What’s needed is a universal machine-to-machine (M2M) interface—a lightweight, expressive schema designed for: query formulation by the LLM, structured, scoped responses from the knowledge layer, clear context boundaries and attribution, and language-agnostic integration. Think HTTP for web requests or SQL for data querying. A shared protocol would make RAG pipelines interoperable, auditable, and scalable across industries—no more custom connectors for every use case.

Building on this idea, we’ve started drafting a standardized LLM RAG M2M protocol that uses binary serialization and token ID-based communication, tied to a shared tokenizer version. The aim is to create a highly efficient and predictable exchange between the LLM and the knowledge layer, with support for dynamic context window sizing, semantic filtering, and attribution. This could further enhance the scalability and precision of real-time RAG systems. We’d love to share more details with you soon and hear your thoughts as it develops. It’s time to let LLMs do what they do best: language. Let’s build a standardized ecosystem that handles knowledge access cleanly and transparently. How do you see standardized interfaces shaping the future of AI systems? Let’s discuss.

Tags: LLMs, RAG, AIStandardization, AI, NLP, TechInnovation, LLM, MachineLearning, DataScience, M2M, TechTrends, AIVISIONS, BinarySerialization, SemanticFiltering