Skip to content
Intum Help
Updated at: 3 min read

Vector Databases

Vector databases enable semantic search — instead of searching by exact word matching, the system understands text meaning and finds similar content.

How Does It Work?

  1. Embedding — each text is converted to a numerical vector (1536 dimensions) by an AI model
  2. Storage — vectors are stored in the database along with the original text and metadata
  3. Search — the user’s question is converted to a vector and compared with stored vectors (cosine distance)
  4. Results — the system returns the best matching entries

Use Cases

  • RAG (Retrieval-Augmented Generation) — enriching AI responses with context from the knowledge base
  • Semantic search — finding similar documents, articles, FAQ
  • Chat with knowledge base — ask a question, AI answers based on your documents

Requirements

A vector database requires an AI connector (OpenAI, Gemini, or Claude) with embedding support for generating vectors.

Entries

Each entry in the vector database contains:

  • Text content
  • Embedding vector
  • Metadata (e.g., source URL)
  • Source association (type + ID, e.g., Kb::Entry #35)
  • Chunk number (when text was split)

Text Chunking

Long texts are automatically split into smaller fragments (chunks) before generating embeddings. Each chunk is a separate entry in the vector database — but all chunks from one source (e.g., a KB entry) are linked together.

“Chunking enabled” Option

In the vector database settings, you can enable the chunking option. This changes behavior:

Setting Chunk size Effect
Chunking off model’s max tokens (e.g., 8191 for OpenAI) Text split only when exceeding model limit. Larger fragments, fewer entries
Chunking on ~500 tokens (~1-2 paragraphs) Text always split into small fragments. More precise search

When to Enable Chunking

  • Enable when the source has long documents (articles, regulations, documentation) and you need search precision — a small chunk matches a specific question better
  • Leave off when entries are short (FAQ, single questions/answers) — splitting short texts doesn’t make sense

How Splitting Works

  1. The system recognizes text structure — Markdown headings (## Section), paragraphs, HTML lists
  2. A new section (heading) is a natural chunk boundary
  3. Each chunk gets a prefix with the section heading it belongs to — so it doesn’t lose context
  4. If text is HTML — the system converts it to structured text preserving headings and paragraphs
  5. Tokens counted exactly by tiktoken (OpenAI tokenizer) — not guessing by characters

Per-model Limits

Each embedding model has a different token limit per call. The system automatically retrieves the limit from the connector:

Model Max tokens Effect with chunking OFF Effect with chunking ON
OpenAI text-embedding-3-small 8,191 chunks up to ~7,800 tokens chunks up to 500 tokens
Cohere embed-v4 (Bedrock) 128,000 practically no splitting chunks up to 500 tokens
Gemini embedding 2,048 chunks up to ~1,900 tokens chunks up to 500 tokens

When switching connectors (e.g., from OpenAI to Cohere), limits adjust automatically — no need to change anything in the database settings.

Was this entry helpful?

Share

Comments