{"id":693,"date":"2026-04-10T10:21:18","date_gmt":"2026-04-10T02:21:18","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=693"},"modified":"2026-04-10T10:21:18","modified_gmt":"2026-04-10T02:21:18","slug":"an-end-to-end-coding-guide-to-nvidia-kvpress-for-long-context-llm-inference-kv-cache-compression-and-memory-efficient-generation","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=693","title":{"rendered":"An End-to-End Coding Guide to NVIDIA KVPress for Long-Context LLM Inference, KV Cache Compression, and Memory-Efficient Generation"},"content":{"rendered":"<p>In this tutorial, we take a detailed, practical approach to exploring <a href=\"http:\/\/github.com\/NVIDIA\/kvpress\"><strong>NVIDIA\u2019s KVPress<\/strong><\/a> and understanding how it can make long-context language model inference more efficient. We begin by setting up the full environment, installing the required libraries, loading a compact Instruct model, and preparing a simple workflow that runs in Colab while still demonstrating the real value of KV cache compression. As we move through implementation, we create a synthetic long-context corpus, define targeted extraction questions, and run multiple inference experiments to directly compare standard generation with different KVPress strategies. At the end of the tutorial, we will have built a stronger intuition for how long-context optimization works in practice, how different press methods affect performance, and how this kind of workflow can be adapted for real-world retrieval, document analysis, and memory-sensitive LLM applications.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">import os, sys, subprocess, textwrap, time, gc, json, math, random, warnings, inspect\nwarnings.filterwarnings(\"ignore\")\n\n\ndef run(cmd):\n   print(\"n[RUN]\", \" \".join(cmd))\n   subprocess.check_call(cmd)\n\n\nrun([sys.executable, \"-m\", \"pip\", \"install\", \"-q\", \"--upgrade\", \"pip\"])\nrun([sys.executable, \"-m\", \"pip\", \"install\", \"-q\", \"torch\", \"transformers\", \"accelerate\", \"bitsandbytes\", \"sentencepiece\", \"kvpress==0.4.0\"])\n\n\ntry:\n   from google.colab import userdata\n   hf_token = userdata.get(\"HF_TOKEN\")\nexcept Exception:\n   hf_token = os.environ.get(\"HF_TOKEN\", \"\")\n\n\nif not hf_token:\n   try:\n       import getpass\n       hf_token = getpass.getpass(\"Enter your Hugging Face token (leave empty if model is public and accessible): \").strip()\n   except Exception:\n       hf_token = \"\"\n\n\nif hf_token:\n   os.environ[\"HF_TOKEN\"] = hf_token\n   os.environ[\"HUGGINGFACEHUB_API_TOKEN\"] = hf_token\n\n\nimport torch\nimport transformers\nimport kvpress\n\n\nfrom transformers import pipeline, BitsAndBytesConfig\nfrom kvpress import ExpectedAttentionPress, KnormPress\n\n\nprint(\"Python:\", sys.version.split()[0])\nprint(\"Torch:\", torch.__version__)\nprint(\"Transformers:\", transformers.__version__)\nprint(\"CUDA available:\", torch.cuda.is_available())\nif torch.cuda.is_available():\n   print(\"GPU:\", torch.cuda.get_device_name(0))\n\n\nMODEL_ID = \"Qwen\/Qwen2.5-1.5B-Instruct\"\nMAX_NEW_TOKENS = 96\nSEED = 42\nrandom.seed(SEED)\ntorch.manual_seed(SEED)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We set up the Colab environment and install all required libraries to run the KVPress workflow successfully. We securely collect the Hugging Face token, set environment variables, and import the core modules needed for model loading, pipeline execution, and compression experiments. We also print the runtime and hardware details so we clearly understand the setup in which we perform the tutorial.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">if torch.cuda.is_available():\n   torch.cuda.empty_cache()\n   quantization_config = BitsAndBytesConfig(\n       load_in_4bit=True,\n       bnb_4bit_compute_dtype=torch.float16,\n       bnb_4bit_quant_type=\"nf4\",\n       bnb_4bit_use_double_quant=True,\n   )\n   pipe = pipeline(\n       \"kv-press-text-generation\",\n       model=MODEL_ID,\n       device_map=\"auto\",\n       token=hf_token if hf_token else None,\n       model_kwargs={\n           \"quantization_config\": quantization_config,\n           \"attn_implementation\": \"sdpa\",\n       },\n   )\nelse:\n   pipe = pipeline(\n       \"kv-press-text-generation\",\n       model=MODEL_ID,\n       device_map=\"auto\",\n       torch_dtype=torch.float32,\n       token=hf_token if hf_token else None,\n       model_kwargs={\n           \"attn_implementation\": \"sdpa\",\n       },\n   )\n\n\ndef cuda_mem():\n   if not torch.cuda.is_available():\n       return {\"allocated_gb\": None, \"reserved_gb\": None, \"peak_gb\": None}\n   return {\n       \"allocated_gb\": round(torch.cuda.memory_allocated() \/ 1024**3, 3),\n       \"reserved_gb\": round(torch.cuda.memory_reserved() \/ 1024**3, 3),\n       \"peak_gb\": round(torch.cuda.max_memory_allocated() \/ 1024**3, 3),\n   }\n\n\ndef reset_peak():\n   if torch.cuda.is_available():\n       torch.cuda.reset_peak_memory_stats()\n\n\ndef extract_answer(x):\n   if isinstance(x, list) and len(x) &gt; 0:\n       x = x[0]\n   if isinstance(x, dict):\n       for k in [\"answer\", \"generated_text\", \"text\", \"output_text\"]:\n           if k in x:\n               return x[k]\n       return json.dumps(x, indent=2, ensure_ascii=False)\n   return str(x)\n\n\ndef generate_once(context, question, press=None, label=\"run\"):\n   gc.collect()\n   if torch.cuda.is_available():\n       torch.cuda.empty_cache()\n   reset_peak()\n   start = time.time()\n   out = pipe(\n       context,\n       question=question,\n       press=press,\n       max_new_tokens=MAX_NEW_TOKENS,\n       do_sample=False,\n       temperature=None,\n       return_full_text=False,\n   )\n   elapsed = time.time() - start\n   answer = extract_answer(out)\n   stats = cuda_mem()\n   result = {\n       \"label\": label,\n       \"elapsed_sec\": round(elapsed, 2),\n       \"allocated_gb\": stats[\"allocated_gb\"],\n       \"reserved_gb\": stats[\"reserved_gb\"],\n       \"peak_gb\": stats[\"peak_gb\"],\n       \"answer\": answer.strip(),\n   }\n   return result<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We initialize the kv-press-text-generation pipeline and configure it differently depending on whether GPU support is available. We define the helper functions that measure CUDA memory usage, reset peak memory, extract answers from model outputs, and run a single generation pass cleanly. This part provides the reusable execution logic that powers the rest of the tutorial and enables us to compare baseline inference with KV cache compression.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">company_records = [\n   {\"company\": \"Arcturus Dynamics\", \"hq\": \"Bengaluru\", \"founded\": 2017, \"focus\": \"warehouse robotics\"},\n   {\"company\": \"BlueMesa Energy\", \"hq\": \"Muscat\", \"founded\": 2014, \"focus\": \"grid analytics\"},\n   {\"company\": \"CinderPeak Health\", \"hq\": \"Pune\", \"founded\": 2019, \"focus\": \"clinical imaging AI\"},\n   {\"company\": \"DeltaForge Marine\", \"hq\": \"Kochi\", \"founded\": 2012, \"focus\": \"autonomous vessel telemetry\"},\n   {\"company\": \"EonCircuit Labs\", \"hq\": \"Hyderabad\", \"founded\": 2020, \"focus\": \"edge silicon tooling\"},\n   {\"company\": \"Frostline Aero\", \"hq\": \"Jaipur\", \"founded\": 2016, \"focus\": \"drone inspection\"},\n]\n\n\nneedle_facts = [\n   \"PROJECT NEEDLE 1: The internal codename for the confidential pilot program is SAFFRON-17.\",\n   \"PROJECT NEEDLE 2: The audit escalation owner is Meera Vashisht.\",\n   \"PROJECT NEEDLE 3: The approved deployment region for the first production rollout is Oman North.\",\n   \"PROJECT NEEDLE 4: The emergency rollback phrase is amber lantern.\",\n   \"PROJECT NEEDLE 5: The signed commercial start date is 17 September 2026.\",\n]\n\n\nbackground_block = \"\"\"\nLong-context systems often contain repeated operational notes, historical records, policy sections, and noisy retrieval artifacts.\nThe goal of this demo is to create a realistically long prompt where only a few details matter for downstream answering.\nKV cache compression reduces memory usage by pruning cached key-value pairs while preserving answer quality.\n\"\"\"\n\n\npolicy_block = \"\"\"\nOperational policy summary:\n1. Safety overrides throughput when sensor confidence falls below threshold.\n2. Logs should preserve region, timestamp, device class, and operator approval state.\n3. Field trials may contain duplicated annexes, OCR-style artifacts, and repeated compliance summaries.\n4. A good long-context model must ignore irrelevant repetition and retrieve the specific details that matter.\n\"\"\"\n\n\nrecords_text = []\nfor i in range(120):\n   rec = company_records[i % len(company_records)]\n   records_text.append(\n       f\"Record {i+1}: {rec['company']} is headquartered in {rec['hq']}, founded in {rec['founded']}, and focuses on {rec['focus']}. \"\n       f\"Quarterly memo {i+1}: retention remained stable, operator training progressed, and the compliance appendix was reattached for review.\"\n   )\n\n\nneedle_insert_positions = {18, 41, 73, 96, 111}\nfull_corpus = []\nfor i, para in enumerate(records_text):\n   full_corpus.append(background_block.strip())\n   full_corpus.append(policy_block.strip())\n   full_corpus.append(para)\n   if i in needle_insert_positions:\n       full_corpus.append(needle_facts[len([x for x in needle_insert_positions if x &lt;= i]) - 1])<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We create a synthetic long-context dataset to test the KVPress system in a controlled yet realistic way. We define company records, insert important hidden facts at different positions, and mix them with repeated background and policy blocks, making the prompt long and noisy. This helps us simulate the context in which memory-efficient inference matters and the model must retrieve only the truly relevant details.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">context = \"nn\".join(full_corpus)\n\n\nquestion = textwrap.dedent(\"\"\"\nAnswer using only the provided context.\nGive a compact JSON object with exactly these keys:\ncommercial_start_date\ndeployment_region\naudit_owner\nrollback_phrase\npilot_codename\n\"\"\").strip()\n\n\nprint(\"nContext characters:\", len(context))\nprint(\"Approx words:\", len(context.split()))\n\n\nexperiments = []\n\n\nbaseline = generate_once(context, question, press=None, label=\"baseline_no_compression\")\nexperiments.append(baseline)\n\n\npresses = [\n   (\"expected_attention_0.7\", ExpectedAttentionPress(compression_ratio=0.7)),\n   (\"expected_attention_0.5\", ExpectedAttentionPress(compression_ratio=0.5)),\n   (\"knorm_0.5\", KnormPress(compression_ratio=0.5)),\n]\n\n\nfor label, press in presses:\n   try:\n       result = generate_once(context, question, press=press, label=label)\n       experiments.append(result)\n   except Exception as e:\n       experiments.append({\n           \"label\": label,\n           \"elapsed_sec\": None,\n           \"allocated_gb\": None,\n           \"reserved_gb\": None,\n           \"peak_gb\": None,\n           \"answer\": f\"FAILED: {type(e).__name__}: {e}\"\n       })\n\n\ntry:\n   from kvpress import DecodingPress\n   sig = inspect.signature(DecodingPress)\n   kwargs = {\"base_press\": KnormPress()}\n   if \"compression_interval\" in sig.parameters:\n       kwargs[\"compression_interval\"] = 10\n   elif \"compression_steps\" in sig.parameters:\n       kwargs[\"compression_steps\"] = 10\n   if \"target_size\" in sig.parameters:\n       kwargs[\"target_size\"] = 512\n   elif \"token_buffer_size\" in sig.parameters:\n       kwargs[\"token_buffer_size\"] = 512\n   if \"hidden_states_buffer_size\" in sig.parameters:\n       kwargs[\"hidden_states_buffer_size\"] = 0\n   decoding_press = DecodingPress(**kwargs)\n   decoding_result = generate_once(context, question, press=decoding_press, label=\"decoding_knorm\")\n   experiments.append(decoding_result)\nexcept Exception as e:\n   experiments.append({\n       \"label\": \"decoding_knorm\",\n       \"elapsed_sec\": None,\n       \"allocated_gb\": None,\n       \"reserved_gb\": None,\n       \"peak_gb\": None,\n       \"answer\": f\"SKIPPED_OR_FAILED: {type(e).__name__}: {e}\"\n   })<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We assemble the final context, define the structured extraction question, and launch the core set of inference experiments. We first run the baseline without compression, then apply multiple press strategies to observe how different compression ratios affect the results. We also conduct a decoding-oriented compression experiment, which extends the tutorial beyond prefilling and provides a broader view of the KVPress framework.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">print(\"n\" + \"=\" * 120)\nprint(\"RESULTS\")\nprint(\"=\" * 120)\n\n\nfor r in experiments:\n   print(f\"n[{r['label']}]\")\n   print(\"elapsed_sec:\", r[\"elapsed_sec\"])\n   print(\"allocated_gb:\", r[\"allocated_gb\"])\n   print(\"reserved_gb:\", r[\"reserved_gb\"])\n   print(\"peak_gb:\", r[\"peak_gb\"])\n   print(\"answer:\")\n   print(r[\"answer\"])\n\n\nprint(\"n\" + \"=\" * 120)\nprint(\"SIMPLE SUMMARY\")\nprint(\"=\" * 120)\n\n\ndef safe_float(x):\n   try:\n       return float(x)\n   except Exception:\n       return None\n\n\nbase_peak = safe_float(baseline[\"peak_gb\"]) if baseline.get(\"peak_gb\") is not None else None\nbase_time = safe_float(baseline[\"elapsed_sec\"]) if baseline.get(\"elapsed_sec\") is not None else None\n\n\nfor r in experiments[1:]:\n   peak = safe_float(r[\"peak_gb\"])\n   t = safe_float(r[\"elapsed_sec\"])\n   peak_delta = None if base_peak is None or peak is None else round(base_peak - peak, 3)\n   time_delta = None if base_time is None or t is None else round(base_time - t, 2)\n   print({\n       \"label\": r[\"label\"],\n       \"peak_gb_saved_vs_baseline\": peak_delta,\n       \"time_sec_saved_vs_baseline\": time_delta,\n       \"answer_preview\": r[\"answer\"][:180].replace(\"n\", \" \")\n   })\n\n\nprint(\"n\" + \"=\" * 120)\nprint(\"OPTIONAL NEXT STEPS\")\nprint(\"=\" * 120)\nprint(\"1. Swap MODEL_ID to a stronger long-context instruct model that fits your GPU.\")\nprint(\"2. Increase context length by duplicating records_text more times.\")\nprint(\"3. Try other presses from kvpress, such as SnapKVPress, StreamingLLMPress, QFilterPress, or ChunkKVPress.\")\nprint(\"4. Replace the synthetic corpus with your own long PDF\/text chunks and keep the same evaluation loop.\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We print all experiment outputs in a readable format and summarize the runtime and memory differences relative to the baseline. We calculate simple comparison metrics to quickly see how much memory or time each compression strategy saves. We then conclude with suggested next steps to extend the tutorial to stronger models, longer contexts, additional press methods, and real-world document workloads.<\/p>\n<p>In conclusion, we developed a strong practical understanding of how NVIDIA\u2019s KVPress can be used to optimize long-context inference in a realistic Colab-based setting. We did more than simply run a model: we built an end-to-end workflow that installs the framework, loads the pipeline correctly, constructs a meaningful long-context input, applies multiple compression presses, and evaluates the results in terms of answer quality, runtime, and memory behavior. By comparing baseline generation with compressed KV-cache generation, we clearly saw the trade-offs involved. We gained useful intuition about when these methods can help reduce resource pressure without severely harming output fidelity. We also explored the framework\u2019s flexibility by testing different press configurations and including an optional decoding-oriented compression path, providing a broader view of how KVPress can be used beyond a single static example.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Data%20Science\/nvidia_kvpress_long_context_kv_cache_compression_tutorial_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">Codes and Notebook here<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/an-end-to-end-coding-guide-to-nvidia-kvpress-for-long-context-llm-inference-kv-cache-compression-and-memory-efficient-generation\/\">An End-to-End Coding Guide to NVIDIA KVPress for Long-Context LLM Inference, KV Cache Compression, and Memory-Efficient Generation<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we take a de&hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-693","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/693","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=693"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/693\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=693"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=693"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=693"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}