{"id":982,"date":"2026-05-27T13:24:33","date_gmt":"2026-05-27T05:24:33","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=982"},"modified":"2026-05-27T13:24:33","modified_gmt":"2026-05-27T05:24:33","slug":"memo-a-modular-framework-for-training-a-dedicated-memory-model-on-new-knowledge-without-modifying-llm-parameters","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=982","title":{"rendered":"MEMO: A Modular Framework for Training a Dedicated Memory Model on New Knowledge Without Modifying LLM Parameters"},"content":{"rendered":"<p class=\"wp-block-paragraph\">Large language models become static after pretraining. Their knowledge does not update as the world changes. Retraining a full LLM is too expensive at modern scales. Fine-tuning risks degrading previously learned knowledge. Retrieval-augmented generation (RAG) struggles when answers require reasoning across many documents.<\/p>\n<p class=\"wp-block-paragraph\">A team of researchers from the National University of Singapore, MIT CSAIL, A*STAR, and the Singapore-MIT Alliance for Research and Technology (SMART) proposes a new approach called <strong>MEMO (Memory as a Model)<\/strong>.<\/p>\n<h2 class=\"wp-block-heading\"><strong>What Problem Does MEMO Solve?<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Existing methods for integrating new knowledge into LLMs fall into three categories. Non-parametric methods like RAG retrieve documents at inference time. They are sensitive to retrieval noise and struggle with cross-document reasoning. Parametric methods such as continual pretraining or supervised fine-tuning internalize knowledge into model weights. They are computationally expensive and cause <strong>catastrophic forgetting<\/strong>, where new training degrades previously acquired knowledge. Latent memory methods compress knowledge into soft tokens. These representations are tightly bound to the model that produced them \u2014 a limitation the research team calls <strong>representation coupling<\/strong> which limits transferability across LLMs.<\/p>\n<h2 class=\"wp-block-heading\"><strong><\/strong><strong>MEMORY<\/strong> as a Separate Model<\/h2>\n<p class=\"wp-block-paragraph\">MEMO separates memory from reasoning. The <strong>MEMORY model<\/strong> is a small, dedicated language model trained to internalize knowledge from a target corpus. The <strong>EXECUTIVE model<\/strong> is the main LLM \u2014 frozen and queried only through its standard input-output interface.<\/p>\n<p class=\"wp-block-paragraph\">In experiments, the MEMORY model is Qwen2.5-14B-Instruct. The EXECUTIVE model is either Qwen2.5-32B-Instruct or Gemini-3-Flash, a proprietary closed-source model. Because MEMO treats the EXECUTIVE model as a black box, it does not require weight access or output logits.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1428\" height=\"718\" data-attachment-id=\"80130\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/05\/26\/memo-a-modular-framework-for-training-a-dedicated-memory-model-on-new-knowledge-without-modifying-llm-parameters\/screenshot-2026-05-26-at-10-24-07-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-26-at-10.24.07-PM-1.png\" data-orig-size=\"1428,718\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;,&quot;alt&quot;:&quot;&quot;}\" data-image-title=\"Screenshot 2026-05-26 at 10.24.07\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-26-at-10.24.07-PM-1-1024x515.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-26-at-10.24.07-PM-1.png\" alt=\"\" class=\"wp-image-80130\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2605.15156<\/figcaption><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>How the MEMORY Model is Trained<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Training begins with a <strong>five-step data synthesis pipeline<\/strong> guided by a <strong>GENERATOR model<\/strong> \u2014 Qwen2.5-32B-Instruct in experiments. The pipeline converts a raw document corpus into a <strong>reflection QA dataset<\/strong>: question-answer pairs that represent corpus knowledge under diverse query variations.<\/p>\n<p class=\"wp-block-paragraph\"><strong>The five steps are:<\/strong><\/p>\n<ol class=\"wp-block-list\">\n<li><strong>Fact extraction<\/strong> \u2014 direct extraction of explicitly stated facts, and indirect extraction of inferred information, run in parallel per document chunk.<\/li>\n<li><strong>Consolidation<\/strong> \u2014 QA pairs sharing a common context (entity, time period, relationship) are merged into multi-fact pairs.<\/li>\n<li><strong>Verification and rewriting<\/strong> \u2014 each QA pair is checked for self-containment. Pairs with unresolved pronouns or implicit references are rewritten using the source chunk or discarded.<\/li>\n<li><strong>Entity surfacing<\/strong> \u2014 QA pairs are generated where questions encode entity attributes and relationships, and answers reveal entity identities. This targets the <strong>reversal curse<\/strong>, where models trained on \u201cA is B\u201d fail to infer \u201cB is A.\u201d<\/li>\n<li><strong>Cross-document synthesis<\/strong> \u2014 the GENERATOR model constructs QA pairs spanning multiple documents. It identifies two types of cross-document connections: converging clues (multiple documents about the same entity) and parallel properties (different entities sharing a common attribute or role).<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\">Step-5 is the most critical component. A leave-one-out ablation shows that removing it drops accuracy from 24.00% to 6.37% on NarrativeQA. It is also the dominant source of training pairs in the final dataset.<\/p>\n<p class=\"wp-block-paragraph\">The MEMORY model is then trained via <strong>supervised fine-tuning (SFT)<\/strong>. The loss is computed over answer tokens only. Source documents are never provided at inference. The model must answer from internalized parametric knowledge.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Inference: The Structured Multi-Turn Protocol<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">At inference, the EXECUTIVE model queries the MEMORY model through a <strong>structured multi-turn protocol<\/strong> with three sequential stages.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Stage 1: Grounding.<\/strong> The EXECUTIVE model decomposes the query into atomic sub-questions. Each targets a single identifying constraint. The MEMORY model answers each independently.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Stage 2: Entity identification.<\/strong> Using the grounding responses, the EXECUTIVE model issues targeted follow-up sub-queries. It iteratively narrows down candidate entities until one is confirmed or the stage budget runs out.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Stage 3: Answer seeking and synthesis.<\/strong> Conditioned on the identified entity, the EXECUTIVE model queries the MEMORY model for supporting facts. It then synthesizes all retrieved responses into a final answer.<\/p>\n<p class=\"wp-block-paragraph\">The MEMORY model\u2019s responses are compact natural-language snippets. Their length is independent of corpus size, so retrieval cost does not scale with the number of documents. This contrasts with RAG, where inference cost grows with the corpus.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Experimental Results<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">MEMO is evaluated on three benchmarks: <strong>BrowseComp-Plus<\/strong> (multi-hop deep-research), <strong>NarrativeQA<\/strong> (discourse understanding over books and movie scripts), and <strong>MuSiQue<\/strong> (2\u20134 hop reasoning over Wikipedia paragraphs). Baselines include BM25, NV-Embed-V2, HippoRAG2, and Cartridges. Cartridges requires white-box access to the EXECUTIVE model and scored 0.00% on BrowseComp-Plus and 3.75% on NarrativeQA.<\/p>\n<p class=\"wp-block-paragraph\">On NarrativeQA with Gemini-3-Flash, MEMO achieves <strong>53.58%<\/strong>. HippoRAG2 reaches 23.21% on the same setup. On MuSiQue, MEMO achieves <strong>60.20%<\/strong> against HippoRAG2\u2019s 57.00%. On BrowseComp-Plus, MEMO achieves <strong>66.67%<\/strong> against HippoRAG2\u2019s 66.33%.<\/p>\n<p class=\"wp-block-paragraph\">With Qwen2.5-32B-Instruct as EXECUTIVE model, MEMO achieves 54.22% on BrowseComp-Plus and 48.30% on MuSiQue. Switching to Gemini-3-Flash yields gains of 12.45%, 26.73%, and 11.90% on the three benchmarks. The MEMORY model is not retrained when the EXECUTIVE model changes.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Robustness to retrieval noise<\/strong>: The research team evaluates performance when distractor documents are added to the corpus. NV-Embed-V2 and HippoRAG2 drop by up to 6.22% on BrowseComp-Plus when one negative document is added per evidence document. MEMO\u2019s accuracy on the same benchmark changes by +0.55% \u2014 within one standard deviation.<\/p>\n<p class=\"wp-block-paragraph\"><strong>MEMORY model architecture robustness<\/strong>: The research team also tests three MEMORY model families at similar parameter scale: Qwen2.5-1.5B-Instruct, Gemma3-1B-IT, and LFM2.5-1.2B-Instruct (a hybrid state-space and transformer architecture). Performance is largely consistent across all three, indicating the framework is not sensitive to the specific pretraining lineage of MEMORY model.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Continual Knowledge Integration via Model Merging<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">MEMO supports incremental knowledge updates through <strong>model merging<\/strong>. When a new corpus arrives, a separate MEMORY model is trained on it independently. Its task vector \u2014 the parameter difference from the base model \u2014 is then merged with the existing MEMORY model in parameter space.<\/p>\n<p class=\"wp-block-paragraph\">The research team test this on NarrativeQA using <strong>TIES merging<\/strong> (\u03c1=0.3). For K=2 corpora, merging accumulates 48 GPU-hours versus 72 GPU-hours for full retraining \u2014 a 33% reduction. At K=10, merging scales as \u0398(K) while full retraining scales as \u0398(K\u00b2), yielding a 5.5\u00d7 saving (240 vs. 1,320 GPU-hours).<\/p>\n<p class=\"wp-block-paragraph\">The merged MEMORY model trails full retraining by 11.04% under Qwen2.5-32B-Instruct (15.81% vs. 26.85%). It trails by 19.11% under Gemini-3-Flash (34.47% vs. 53.58%). Despite this gap, it outperforms all retrieval baselines on NarrativeQA.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Marktechpost\u2019s Visual Explainer<\/strong><\/h2>\n<div>\n<div class=\"mx-bar\">\n    <span class=\"mx-bar-label\">Marktechpost \u2014 Research Explainer<\/span><br \/>\n    <span class=\"mx-bar-title\">MEMO: Memory as a Model<\/span>\n  <\/div>\n<div class=\"mx-overflow\">\n<div class=\"mx-track\">\n<div class=\"mx-slide\">\n        <span class=\"mx-eyebrow\">01 \/ 06 \u2014 The Problem<\/span><br \/>\n        <span class=\"mx-h\">LLMs Freeze After Pretraining<\/span><br \/>\n        <span class=\"mx-sub\">Their knowledge becomes outdated as the world evolves.<\/span>\n<p class=\"mx-p\">Large language models are static once pretraining ends. For applications requiring up-to-date or domain-specific knowledge, three approaches exist \u2014 and each has a critical flaw.<\/p>\n<div class=\"mx-hr\"><\/div>\n<div class=\"mx-grid mx-g3\">\n<div class=\"mx-card\"><span class=\"mx-card-icon\">\ud83d\udd0d<\/span><span class=\"mx-card-title\">RAG<\/span><span class=\"mx-card-body\">Sensitive to retrieval noise. Struggles when answers span multiple documents.<\/span><\/div>\n<div class=\"mx-card\"><span class=\"mx-card-icon\">\u2699<\/span><span class=\"mx-card-title\">Fine-Tuning<\/span><span class=\"mx-card-body\">Causes catastrophic forgetting. Expensive. Cannot be used on proprietary LLMs.<\/span><\/div>\n<div class=\"mx-card\"><span class=\"mx-card-icon\">\ud83d\udcbe<\/span><span class=\"mx-card-title\">Latent Memory<\/span><span class=\"mx-card-body\">Representations are tightly coupled to one specific model architecture only.<\/span><\/div>\n<\/div>\n<p class=\"mx-p\">MEMO \u2014 Memory as a Model \u2014 from researchers at NUS, MIT CSAIL, and A*STAR addresses all three limitations simultaneously.<\/p>\n<\/div>\n<div class=\"mx-slide\">\n        <span class=\"mx-eyebrow\">02 \/ 06 \u2014 The Concept<\/span><br \/>\n        <span class=\"mx-h\">Memory Separated From Reasoning<\/span><br \/>\n        <span class=\"mx-sub\">Two models. One frozen. One trained on new knowledge.<\/span>\n<p class=\"mx-p\">MEMO introduces two distinct model roles that operate together.<\/p>\n<div class=\"mx-grid mx-g2\">\n<div class=\"mx-card\"><span class=\"mx-card-title\">\u25c6 MEMORY Model<\/span><span class=\"mx-card-body\">A small, dedicated language model trained to internalize knowledge from a target corpus. It stores facts and cross-document relationships in its parameters. It never sees source documents at inference \u2014 it answers only from what it has learned.<\/span><\/div>\n<div class=\"mx-card\"><span class=\"mx-card-title\">\u25c7 EXECUTIVE Model<\/span><span class=\"mx-card-body\">The main LLM \u2014 frozen and unchanged throughout. It queries the MEMORY model through targeted sub-questions, reasons over retrieved responses, and produces the final answer. Works with any LLM, including closed-source APIs.<\/span><\/div>\n<\/div>\n<div class=\"mx-hr\"><\/div>\n<p class=\"mx-p\">In experiments: <strong>Qwen2.5-14B-Instruct<\/strong> as MEMORY model. <strong>Qwen2.5-32B-Instruct<\/strong> or <strong>Gemini-3-Flash<\/strong> as EXECUTIVE model. Only black-box API access required \u2014 no weights, no logits.<\/p>\n<\/div>\n<div class=\"mx-slide\">\n        <span class=\"mx-eyebrow\">03 \/ 06 \u2014 Training<\/span><br \/>\n        <span class=\"mx-h\">How the MEMORY Model Is Built<\/span><br \/>\n        <span class=\"mx-sub\">A five-step pipeline converts raw documents into a reflection QA dataset.<\/span>\n<div class=\"mx-flow\">\n          <span class=\"mx-pill\">Fact Extraction<\/span><span class=\"mx-arr\">\u2192<\/span><br \/>\n          <span class=\"mx-pill\">Consolidation<\/span><span class=\"mx-arr\">\u2192<\/span><br \/>\n          <span class=\"mx-pill\">Verification<\/span><span class=\"mx-arr\">\u2192<\/span><br \/>\n          <span class=\"mx-pill\">Entity Surfacing<\/span><span class=\"mx-arr\">\u2192<\/span><br \/>\n          <span class=\"mx-pill\">Cross-Doc Synthesis<\/span>\n        <\/div>\n<div class=\"mx-steps\">\n<div class=\"mx-step\">\n<div class=\"mx-sn\">01<\/div>\n<div class=\"mx-sb\"><span class=\"mx-st\">Fact Extraction<\/span><span class=\"mx-sd\">Direct extraction of stated facts and indirect extraction of inferred information run in parallel per document chunk.<\/span><\/div>\n<\/div>\n<div class=\"mx-step\">\n<div class=\"mx-sn\">02<\/div>\n<div class=\"mx-sb\"><span class=\"mx-st\">Consolidation<\/span><span class=\"mx-sd\">QA pairs sharing a common entity, time period, or relationship are merged into multi-fact pairs.<\/span><\/div>\n<\/div>\n<div class=\"mx-step\">\n<div class=\"mx-sn\">03<\/div>\n<div class=\"mx-sb\"><span class=\"mx-st\">Verification &amp; Rewriting<\/span><span class=\"mx-sd\">Each pair is checked for self-containment. Pairs with unresolved pronouns or implicit references are rewritten or discarded.<\/span><\/div>\n<\/div>\n<div class=\"mx-step\">\n<div class=\"mx-sn\">04<\/div>\n<div class=\"mx-sb\"><span class=\"mx-st\">Entity Surfacing<\/span><span class=\"mx-sd\">QA pairs are generated where questions encode entity attributes and answers reveal identities, targeting the reversal curse.<\/span><\/div>\n<\/div>\n<div class=\"mx-step\">\n<div class=\"mx-sn\">05<\/div>\n<div class=\"mx-sb\"><span class=\"mx-st\">Cross-Document Synthesis<\/span><span class=\"mx-sd\">The most critical step. Removing it drops NarrativeQA accuracy from 24.00% to 6.37%. Constructs QA pairs spanning multiple documents via converging clues and parallel properties.<\/span><\/div>\n<\/div><\/div>\n<p class=\"mx-p\">MEMORY model trained via <strong>supervised fine-tuning (SFT)<\/strong> \u2014 loss over answer tokens only. Source documents never provided at inference.<\/p>\n<\/div>\n<div class=\"mx-slide\">\n        <span class=\"mx-eyebrow\">04 \/ 06 \u2014 Inference<\/span><br \/>\n        <span class=\"mx-h\">Three-Stage Query Protocol<\/span><br \/>\n        <span class=\"mx-sub\">The EXECUTIVE model queries the MEMORY model through structured sub-questions.<\/span>\n<p class=\"mx-p\">Complex user queries are decomposed across three sequential stages. No documents are retrieved \u2014 all answers come from internalized parametric knowledge.<\/p>\n<div class=\"mx-steps\">\n<div class=\"mx-step\">\n<div class=\"mx-sn\">S1<\/div>\n<div class=\"mx-sb\"><span class=\"mx-st\">Grounding \u2014 Budget: 1 interaction<\/span><span class=\"mx-sd\">The user query is decomposed into atomic sub-questions, each targeting one identifying constraint. MEMORY model answers each independently.<\/span><\/div>\n<\/div>\n<div class=\"mx-step\">\n<div class=\"mx-sn\">S2<\/div>\n<div class=\"mx-sb\"><span class=\"mx-st\">Entity Identification \u2014 Budget: 7 interactions<\/span><span class=\"mx-sd\">Using grounding responses, the EXECUTIVE model issues follow-up sub-queries to iteratively narrow candidate entities until one is confirmed.<\/span><\/div>\n<\/div>\n<div class=\"mx-step\">\n<div class=\"mx-sn\">S3<\/div>\n<div class=\"mx-sb\"><span class=\"mx-st\">Answer Seeking &amp; Synthesis \u2014 Budget: 8 interactions<\/span><span class=\"mx-sd\">Conditioned on the confirmed entity, the EXECUTIVE model gathers supporting facts then synthesizes all retrieved responses into a final answer.<\/span><\/div>\n<\/div><\/div>\n<div class=\"mx-hr\"><\/div>\n<p class=\"mx-p\">MEMORY model responses are compact natural-language snippets. <strong>Retrieval cost is fixed and does not scale with corpus size<\/strong> \u2014 unlike RAG.<\/p>\n<\/div>\n<div class=\"mx-slide\">\n        <span class=\"mx-eyebrow\">05 \/ 06 \u2014 Advantages<\/span><br \/>\n        <span class=\"mx-h\">What MEMO Does Differently<\/span><br \/>\n        <span class=\"mx-sub\">Compared to RAG, fine-tuning, and latent memory methods.<\/span>\n<div class=\"mx-cmp\">\n<div class=\"mx-cc\">\n            <span class=\"mx-ch\">Other Methods<\/span><br \/>\n            <span class=\"mx-cr mx-cr-x\">Retrieval noise significantly degrades RAG accuracy<\/span><br \/>\n            <span class=\"mx-cr mx-cr-x\">Fine-tuning causes catastrophic forgetting in the LLM<\/span><br \/>\n            <span class=\"mx-cr mx-cr-x\">Latent memory tied to one specific model architecture<\/span><br \/>\n            <span class=\"mx-cr mx-cr-x\">Retrieval cost grows with corpus size at inference<\/span><br \/>\n            <span class=\"mx-cr mx-cr-x\">Cannot be used with proprietary closed-source LLMs<\/span><br \/>\n            <span class=\"mx-cr mx-cr-x\">Adding new knowledge requires full retraining<\/span>\n          <\/div>\n<div class=\"mx-cc yes\">\n            <span class=\"mx-ch\">MEMO<\/span><br \/>\n            <span class=\"mx-cr mx-cr-ok\">Accuracy changes \u00b11.77% under added distractor documents<\/span><br \/>\n            <span class=\"mx-cr mx-cr-ok\">Main LLM stays frozen; no catastrophic forgetting possible<\/span><br \/>\n            <span class=\"mx-cr mx-cr-ok\">Works across Qwen, Gemma, and LFM2.5 architectures<\/span><br \/>\n            <span class=\"mx-cr mx-cr-ok\">Fixed-size responses; cost independent of corpus size<\/span><br \/>\n            <span class=\"mx-cr mx-cr-ok\">Black-box compatible \u2014 works with any LLM including APIs<\/span><br \/>\n            <span class=\"mx-cr mx-cr-ok\">New corpora merged via model merging without full retraining<\/span>\n          <\/div>\n<\/div>\n<p class=\"mx-p\">TIES merging (\u03c1=0.3) cuts compute by <strong>33% at K=2 corpora<\/strong> and <strong>5.5\u00d7 at K=10 corpora<\/strong> vs full retraining.<\/p>\n<\/div>\n<div class=\"mx-slide\">\n        <span class=\"mx-eyebrow\">06 \/ 06 \u2014 Results<\/span><br \/>\n        <span class=\"mx-h\">Benchmark Performance<\/span><br \/>\n        <span class=\"mx-sub\">Qwen2.5-14B-Instruct as MEMORY model. Gemini-3-Flash as EXECUTIVE model.<\/span>\n<div class=\"mx-stats\">\n<div class=\"mx-stat\"><span class=\"mx-sv\">53.58%<\/span><span class=\"mx-sn2\">NarrativeQA<\/span><span class=\"mx-sc\">vs HippoRAG2: 23.21%<\/span><\/div>\n<div class=\"mx-stat\"><span class=\"mx-sv\">60.20%<\/span><span class=\"mx-sn2\">MuSiQue<\/span><span class=\"mx-sc\">vs HippoRAG2: 57.00%<\/span><\/div>\n<div class=\"mx-stat\"><span class=\"mx-sv\">66.67%<\/span><span class=\"mx-sn2\">BrowseComp-Plus<\/span><span class=\"mx-sc\">vs HippoRAG2: 66.33%<\/span><\/div>\n<\/div>\n<div class=\"mx-hr\"><\/div>\n<p class=\"mx-p\">Switching EXECUTIVE model from Qwen2.5-32B-Instruct to Gemini-3-Flash yields gains of <strong>+12.45%<\/strong>, <strong>+26.73%<\/strong>, and <strong>+11.90%<\/strong> across the three benchmarks \u2014 without retraining the MEMORY model.<\/p>\n<p class=\"mx-p\">Under retrieval noise, HippoRAG2 drops 6.22% on BrowseComp-Plus. MEMO changes by <strong>+0.55%<\/strong> on the same benchmark \u2014 within one standard deviation.<\/p>\n<div class=\"mx-hr\"><\/div>\n<p class=\"mx-p\">Source: arXiv 2605.15156 \u2014 Quek, Lee, Leong, Verma et al., NUS \/ MIT CSAIL \/ A*STAR \/ SMART, May 2026.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"mx-nav\">\n    <button class=\"mx-btn\" disabled>\n<p>      PREV<br \/>\n    <\/p><\/button>\n<div class=\"mx-dots\"><\/div>\n<p>    <span class=\"mx-pg\">1 \/ 6<\/span><br \/>\n    <button class=\"mx-btn\"><br \/>\n      NEXT<\/button><\/p>\n<p>    \n  <\/p><\/div>\n<div class=\"mx-foot\">\n    <span class=\"mx-fb\">Marktechpost <span>\u2014 AI Research, Simplified for Engineers<\/span><\/span><br \/>\n    <span class=\"mx-fr\">arXiv: 2605.15156<\/span>\n  <\/div>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h2>\n<ul class=\"wp-block-list\">\n<li>MEMO trains a dedicated MEMORY model on new knowledge, keeping the main LLM frozen and unchanged.<\/li>\n<li>A five-step data synthesis pipeline converts raw documents into a reflection QA dataset capturing cross-document relationships.<\/li>\n<li>At inference, a structured multi-turn protocol decomposes complex queries into targeted sub-queries to the MEMORY model.<\/li>\n<li>Retrieval cost is fixed at inference time \u2014 it does not scale with corpus size, unlike RAG.<\/li>\n<li>Model merging cuts cumulative training compute by 33% at K=2 corpora and 5.5\u00d7 at K=10, with a measurable accuracy trade-off.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<\/p><p class=\"wp-block-paragraph\">\n<\/p><p class=\"wp-block-paragraph\">Check out\u00a0the\u00a0<strong><a href=\"https:\/\/arxiv.org\/pdf\/2605.15156\" target=\"_blank\" rel=\"noreferrer noopener\">Research Paper<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p class=\"wp-block-paragraph\">Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/05\/26\/memo-a-modular-framework-for-training-a-dedicated-memory-model-on-new-knowledge-without-modifying-llm-parameters\/\">MEMO: A Modular Framework for Training a Dedicated Memory Model on New Knowledge Without Modifying LLM Parameters<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Large language models become s&hellip;<\/p>\n","protected":false},"author":1,"featured_media":983,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-982","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/982","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=982"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/982\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/983"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=982"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=982"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=982"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}