{"id":436,"date":"2026-02-19T08:38:00","date_gmt":"2026-02-19T00:38:00","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=436"},"modified":"2026-02-19T08:38:00","modified_gmt":"2026-02-19T00:38:00","slug":"tutorial-building-a-visual-document-retrieval-pipeline-with-colpali-and-late-interaction-scoring","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=436","title":{"rendered":"[Tutorial] Building a Visual Document Retrieval Pipeline with ColPali and Late Interaction Scoring"},"content":{"rendered":"<p>In this tutorial, we build an end-to-end visual document retrieval pipeline using <a href=\"https:\/\/github.com\/illuin-tech\/colpali\"><strong>ColPali<\/strong><\/a>. We focus on making the setup robust by resolving common dependency conflicts and ensuring the environment stays stable. We render PDF pages as images, embed them using ColPali\u2019s multi-vector representations, and rely on late-interaction scoring to retrieve the most relevant pages for a natural-language query. By treating each page visually rather than as plain text, we preserve layout, tables, and figures that are often lost in traditional text-only retrieval.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">import subprocess, sys, os, json, hashlib\n\n\ndef pip(cmd):\n   subprocess.check_call([sys.executable, \"-m\", \"pip\"] + cmd)\n\n\npip([\"uninstall\", \"-y\", \"pillow\", \"PIL\", \"torchaudio\", \"colpali-engine\"])\npip([\"install\", \"-q\", \"--upgrade\", \"pip\"])\npip([\"install\", \"-q\", \"pillow&lt;12\", \"torchaudio==2.8.0\"])\npip([\"install\", \"-q\", \"colpali-engine\", \"pypdfium2\", \"matplotlib\", \"tqdm\", \"requests\"])<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We prepare a clean and stable execution environment by uninstalling conflicting packages and upgrading pip. We explicitly pin compatible versions of Pillow and torchaudio to avoid runtime import errors. We then install ColPali and its required dependencies so the rest of the tutorial runs without interruptions.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">import torch\nimport requests\nimport pypdfium2 as pdfium\nfrom PIL import Image\nfrom tqdm import tqdm\nimport matplotlib.pyplot as plt\nfrom transformers.utils.import_utils import is_flash_attn_2_available\nfrom colpali_engine.models import ColPali, ColPaliProcessor\n\n\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\ndtype = torch.float16 if device == \"cuda\" else torch.float32\n\n\nMODEL_NAME = \"vidore\/colpali-v1.3\"\n\n\nmodel = ColPali.from_pretrained(\n   MODEL_NAME,\n   torch_dtype=dtype,\n   device_map=device,\n   attn_implementation=\"flash_attention_2\" if device == \"cuda\" and is_flash_attn_2_available() else None,\n).eval()\n\n\nprocessor = ColPaliProcessor.from_pretrained(MODEL_NAME)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We import all required libraries and detect whether a GPU is available for acceleration. We load the ColPali model and processor with the appropriate precision and attention implementation based on the runtime. We ensure the model is ready for inference by switching it to evaluation mode.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">PDF_URL = \"https:\/\/arxiv.org\/pdf\/2407.01449.pdf\"\npdf_bytes = requests.get(PDF_URL).content\n\n\npdf = pdfium.PdfDocument(pdf_bytes)\npages = []\nMAX_PAGES = 15\n\n\nfor i in range(min(len(pdf), MAX_PAGES)):\n   page = pdf[i]\n   img = page.render(scale=2).to_pil().convert(\"RGB\")\n   pages.append(img)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We download a sample PDF and render its pages as high-resolution RGB images. We limit the number of pages to keep the tutorial lightweight and fast on Colab. We store the rendered pages in memory for direct visual embedding.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">page_embeddings = []\nbatch_size = 2 if device == \"cuda\" else 1\n\n\nfor i in tqdm(range(0, len(pages), batch_size)):\n   batch_imgs = pages[i:i+batch_size]\n   batch = processor.process_images(batch_imgs)\n   batch = {k: v.to(model.device) for k, v in batch.items()}\n   with torch.no_grad():\n       emb = model(**batch)\n   page_embeddings.extend(list(emb.cpu()))\n\n\npage_embeddings = torch.stack(page_embeddings)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We generate multi-vector embeddings for each rendered page using ColPali\u2019s image encoder. We process pages in small batches to stay within GPU memory limits. We then stack all page embeddings into a single tensor that supports efficient late-interaction scoring.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">def retrieve(query, top_k=3):\n   q = processor.process_queries([query])\n   q = {k: v.to(model.device) for k, v in q.items()}\n   with torch.no_grad():\n       q_emb = model(**q).cpu()\n   scores = processor.score_multi_vector(q_emb, page_embeddings)[0]\n   vals, idxs = torch.topk(scores, top_k)\n   return [(int(i), float(v)) for i, v in zip(idxs, vals)]\n\n\ndef show(img, title):\n   plt.figure(figsize=(6,6))\n   plt.imshow(img)\n   plt.axis(\"off\")\n   plt.title(title)\n   plt.show()\n\n\nquery = \"What is ColPali and what problem does it solve?\"\nresults = retrieve(query, top_k=3)\n\n\nfor rank, (idx, score) in enumerate(results, 1):\n   show(pages[idx], f\"Rank {rank} \u2014 Page {idx+1}\")\n\n\ndef search(query, k=5):\n   return [{\"page\": i+1, \"score\": s} for i, s in retrieve(query, k)]\n\n\nprint(json.dumps(search(\"late interaction retrieval\"), indent=2))<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define the retrieval logic that scores queries against page embeddings using late interaction. We visualize the top-ranked pages to qualitatively inspect retrieval quality. We also expose a small search helper that returns structured results, making the pipeline easy to extend or integrate further.<\/p>\n<p>In conclusion, we have a compact yet powerful visual search system that demonstrates how ColPali enables layout-aware document retrieval in practice. We embedded pages once, reuse those embeddings efficiently, and retrieve results with interpretable relevance scores. This workflow gives us a strong foundation for scaling to larger document collections, adding indexing for speed, or layering generation on top of retrieved pages, while keeping the core pipeline simple, reproducible, and Colab-friendly.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Computer%20Vision\/colpali_visual_retrieval_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/02\/18\/tutorial-building-a-visual-document-retrieval-pipeline-with-colpali-and-late-interaction-scoring\/\">[Tutorial] Building a Visual Document Retrieval Pipeline with ColPali and Late Interaction Scoring<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we build an &hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-436","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/436","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=436"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/436\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=436"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=436"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=436"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}