RAG text chunking strategies: optimize LLM knowledge access

Author(s): Abinaya Subramaniam

Originally published on Towards AI.

If retrieval is the search engine of your RAG system, chunking is the foundation on which the search engine stands. Even the strongest LLM fails when pieces are too long, too short, noisy, or cut in the wrong place. That is why practitioners often say:
“Chunking determines 70% of RAG quality.”

Good chunking helps the retriever find information that is complete, relevant, and relevant, while bad chunking creates fragmented passages out of context that forces the LLM to hallucinate.

Image by author

If you’re just joining the series, check out my previous posts: Introducing RAG: Why modern AI needs recovery – It explains the basics of retrieval-enhanced generation.

What is chunking?

The first step in RAG is Document Collection and IngestionWhere all source material documents, articles, or knowledge base entries are collected. Before retrieval, these documents are checked text rebuttalWhich divides them into smaller, meaningful sections called segments.

Each part is designed to be consistent and self-contained, allowing the retriever to efficiently find, rank, and use the most relevant pieces of information when answering a question.

chunking is The process of dividing large text into smaller, meaningful sections Before generating the embedding. These segments are called segments What a retriever actually looks for when answering a question.

Imagine you’re asking someone about a chapter from a textbook but you’ve already broken that chapter into disorganized, uneven pieces. If the pieces do not align with the logical structure of the content, the answer will be confusing or incomplete. RAG systems behave the same way.

A well-segmented document captures ideas cleanly, maintains context, and allows LLMs to reason meaningfully. Poor chunking implies fracture and causes recovery noise. All the rest of the vector stores, embeddings, rerankers come after this basic step.

Why is chunking more important than we think?

Chunking doesn’t just mean dividing text into pieces. This controls how your system retrieves information and how much context the LLM receives.

if there are pieces very bigThey may contain irrelevant or abstract information, which can confuse the model and reduce focus on the query. LLMs may struggle to reason effectively, potentially giving answers that are vague, contradictory, or partially incorrect.

Conversely, if the pieces are tinyThey may lack enough context to understand the full meaning of the model, leaving them lacking information and prone to incomplete or fragmented responses.

Good chunking finds a balance of self-contained ideas that are neither too short nor too long, which aligns with the way humans naturally organize information.

Now let’s look at some small strategies.

fixed size chunking

Fixed-size chunking is the simplest form. The text is divided by a predefined number of characters or tokens, such as 500 tokens per section, regardless of sentence or paragraph boundaries.

It is predictable, fast to generate, and effective for very large, disorganized or mixed datasets. But it has one obvious weakness. Meaning is often halved. For example, a sentence may begin in one clause and end in another clause, reducing the semantic strength of the embedding.

A small overlap between pieces is usually used to maintain continuity:

from langchain.text_splitter import RecursiveCharacterTextSplittersplitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
chunks = splitter.split_text(long_text)

Understanding Chunk Overlap

When dividing text into fragments, a small overlap Often added between consecutive pieces to maintain context and continuity. overlap means that The last few sentences of one section are repeated at the beginning of the next section,

This ensures that important information spanning the boundary of two volumes is not lost. Without overlap, the retriever may return only part of an idea, causing the LLM to miss key context and produce incomplete or confusing answers. A specific overlap range occurs from 10% to 20% of piece lengthBalancing redundancy with efficiency.

Fixed-size chunking is a practical option for logs, email, code repositories, and large corpora where the structure is inconsistent.

sentence-based rebuttal

Sentence-based segmentation is a method where text is divided into segments based on complete sentences rather than arbitrary lengths. This approach ensures that each section contains coherent ideas while preserving grammatical and semantic integrity.

This is especially useful for maintenance clarity and contextBecause each part represents a meaningful unit of thought. By grouping sentences logically, retrieval can return more accurate and understandable information to the LLM, reducing the risk of fragmented or confused responses. Sentence-based segmentation is often combined with small overlaps to maintain continuity across segments.

Paragraph-Based Chunking

Paragraph-based fragmentation divides text into parts based on complete paragraphs rather than individual sentences or fixed token counts. This method preserves the natural structure and flow of the content, making it easier for the retriever to capture coherent ideas and context.

Each part typically represents a different topic or subtopic, which helps the LLM generate more accurate and meaningful responses. Paragraph-based chunking is especially effective for long-form documents, research papers, or articles where it is important to maintain a logical flow of information. Like sentence-based segmentation, it may also include small overlaps to ensure continuity across adjacent segments.

semantic chunking

Semantic chunking looks for meaning rather than length. Instead of arbitrarily dividing text, it identifies natural breaks, topic changes, context changes, or section boundaries, using embeddings or similarity scores.

This produces coherent segments with strong semantic clarity. Since section boundaries follow the meaning, retrieval quality is significantly improved, especially in structured content such as knowledge bases, documentation or articles. The trade-off is computation,semantic chunking is heavy and produces inconsistent chunk lengths.

from langchain_experimental.text_splitter import SemanticChunker
from sentence_transformers import SentenceTransformermodel = SentenceTransformer("all-MiniLM-L6-v2")
chunker = SemanticChunker(model, breakpoint_threshold=0.4)
chunks = chunker.split_text(long_text)

For high quality documents where topic flow matters, semantic chunking is often the most accurate choice.

recursive partitioning

The recursive partitioning sits between the fixed size and semantic approaches. It respects the structure first, and strips text only when necessary.

A common strategy is to try to divide by headings, and if a section is still too long, divide by paragraph, then sentence, and finally just by characters. This creates pieces that are both meaningful and size-controlled.

recursive_splitter = RecursiveCharacterTextSplitter(
separators=("n## ", "n### ", "n", ". ", ""),
chunk_size=600,
chunk_overlap=80
)chunks = recursive_splitter.split_text(long_doc)

This method excels in structured content such as developer documentation, technical manuals, reports, and scholarly content where hierarchy matters.

sliding window chunking

Some content spans meaning across multiple sentences, such as legal contracts, scientific documents, or lengthy explanations. For such documents, a sliding window approach ensures continuity.

Instead of creating separate pieces, the method creates overlapping windows, for example, a 400-token window slides 200 tokens at a time. Each section shares context with the next, preventing meaning from getting lost across boundaries.

This method preserves the context beautifully but increases the number of fragments, which impacts cost and performance.

Sliding windows are particularly valuable in legal RAG, finance, medical research, and compliance systems.

hierarchical chunking

Hierarchical segmentation creates a multi-level structure, with small segments for fine-grained retrieval, medium segments for balanced reasoning, and large segments for maintaining global context.

At recovery time, the system may first bring in a small chunk for precision, but then combine it with a corresponding larger chunk to restore the full context. This reduces hallucinations and improves depth of reasoning.

This technology powers enterprise-level RAG systems and multi-granularity frameworks like LlamaIndex.

Real World Chunking Mistakes

Most RAG projects fail due to subtle chunking problems. Oversized pieces overload the model with irrelevant details. Small pieces lose meaning. Fragments that cut up sentences or mix unrelated sections produce weak embeddings. Lack of overlap creates discontinuity. The lack of metadata confuses the retriever. Using a single universal chunking method for all document types also gives poor results.

Chunking should never be one size fits all. Policies behave differently from textbooks, call transcripts behave differently from research papers. Your strategy should evolve with the document type and retrieval task.

final thoughts

Chunking is not just a preprocessing step, it is the backbone of our RAG pipeline. A good chunk is a meaningful, self-contained unit of knowledge. A bad one is an orphan piece that leads LLM astray.

If recovery is the engine, chunking is the fuel. High quality chunking creates clean, relevant, reliable RAG systems. Bad chunking produces noise and hallucinations, no matter how good the LLM is.

Published via Towards AI