{"id":804,"date":"2026-04-28T12:59:49","date_gmt":"2026-04-28T04:59:49","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=804"},"modified":"2026-04-28T12:59:49","modified_gmt":"2026-04-28T04:59:49","slug":"how-to-build-a-lightweight-vision-language-action-inspired-embodied-agent-with-latent-world-modeling-and-model-predictive-control","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=804","title":{"rendered":"How to Build a Lightweight Vision-Language-Action-Inspired Embodied Agent with Latent World Modeling and Model Predictive Control"},"content":{"rendered":"<p>In this tutorial, we build an embodied simulation vision agent that learns to perceive, plan, predict, and replan directly from pixel observations. We create a fully NumPy-rendered grid world in which the agent observes RGB frames rather than symbolic state variables, enabling us to simulate a simplified Vision-Language-Action-style pipeline. We train a lightweight world model that encodes visual input into a latent representation, predicts future states conditioned on actions and goals, and reconstructs the next frame. Using model predictive control in latent space, we enable the agent to sample possible action sequences, evaluate predicted outcomes, and execute the best action in a closed loop.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">import random, numpy as np, torch, torch.nn as nn, torch.nn.functional as F\nimport matplotlib.pyplot as plt\nfrom dataclasses import dataclass\nfrom typing import Tuple, Dict, List\nfrom torch.utils.data import Dataset, DataLoader\n\n\ntry:\n   from tqdm.auto import tqdm\nexcept Exception:\n   def tqdm(x, **kwargs): return x\n\n\nSEED = 7\nrandom.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)\n\n\nif device.type == \"cuda\":\n   torch.backends.cudnn.benchmark = True\n\n\n@dataclass\nclass WorldConfig:\n   grid_size: int = 8\n   cell_px: int = 14\n   max_steps: int = 45\n   n_obstacles: int = 8\n   spawn_margin: int = 1\n\n\nclass GridWorldRGBNoPIL:\n   ACTIONS = {0:(0,-1),1:(0,1),2:(-1,0),3:(1,0),4:(0,0)}\n   ACTION_NAMES = {0:\"UP\",1:\"DOWN\",2:\"LEFT\",3:\"RIGHT\",4:\"STAY\"}\n\n\n   def __init__(self, cfg: WorldConfig):\n       self.cfg = cfg\n       self.reset()\n\n\n   def reset(self) -&gt; Dict:\n       g = self.cfg.grid_size\n       self.steps = 0\n       def sample_empty(exclude=set()):\n           while True:\n               x = random.randint(self.cfg.spawn_margin, g-1-self.cfg.spawn_margin)\n               y = random.randint(self.cfg.spawn_margin, g-1-self.cfg.spawn_margin)\n               if (x,y) not in exclude: return (x,y)\n       self.obstacles = set()\n       ax, ay = sample_empty()\n       gx, gy = sample_empty(exclude={(ax,ay)})\n       used = {(ax,ay),(gx,gy)}\n       for _ in range(self.cfg.n_obstacles):\n           ox, oy = sample_empty(exclude=used)\n           self.obstacles.add((ox,oy))\n           used.add((ox,oy))\n       self.agent = (ax,ay)\n       self.goal = (gx,gy)\n       return {\"image\": self._render_u8()}\n\n\n   def _in_bounds(self, x, y):\n       return 0 &lt;= x &lt; self.cfg.grid_size and 0 &lt;= y &lt; self.cfg.grid_size\n\n\n   def _dist_to_goal(self, pos: Tuple[int,int]) -&gt; float:\n       x,y = pos; gx,gy = self.goal\n       return abs(x-gx)+abs(y-gy)\n\n\n   def _state_vector(self) -&gt; np.ndarray:\n       g = self.cfg.grid_size - 1\n       ax,ay = self.agent; gx,gy = self.goal\n       return np.array([ax\/g, ay\/g, gx\/g, gy\/g], dtype=np.float32)\n\n\n   def step(self, action: int):\n       self.steps += 1\n       dx, dy = self.ACTIONS[int(action)]\n       x,y = self.agent\n       nx, ny = x+dx, y+dy\n       if self._in_bounds(nx,ny) and (nx,ny) not in self.obstacles:\n           self.agent = (nx,ny)\n       done = (self.agent == self.goal) or (self.steps &gt;= self.cfg.max_steps)\n       d_prev = self._dist_to_goal((x,y))\n       d_now = self._dist_to_goal(self.agent)\n       reward = 0.1*(d_prev - d_now) + (1.0 if self.agent == self.goal else 0.0)\n       obs = {\"image\": self._render_u8()}\n       info = {\"state\": self._state_vector()}\n       return obs, float(reward), bool(done), info\n\n\n   def _render_u8(self) -&gt; np.ndarray:\n       g, s = self.cfg.grid_size, self.cfg.cell_px\n       H = W = g*s\n       bg = np.array([245,245,245], np.uint8)\n       gridline = np.array([220,220,220], np.uint8)\n       obstacle_c = np.array([220,70,70], np.uint8)\n       goal_c = np.array([60,180,75], np.uint8)\n       agent_c = np.array([65,105,225], np.uint8)\n       img = np.empty((H,W,3), np.uint8); img[...] = bg\n       img[::s,:,:] = gridline\n       img[:,::s,:] = gridline\n       def paint_cell(x,y,color):\n           y0,y1 = y*s,(y+1)*s\n           x0,x1 = x*s,(x+1)*s\n           img[y0+1:y1-1, x0+1:x1-1] = color\n       for (ox,oy) in self.obstacles: paint_cell(ox,oy, obstacle_c)\n       gx,gy = self.goal; paint_cell(gx,gy, goal_c)\n       ax,ay = self.agent; paint_cell(ax,ay, agent_c)\n       return img\n\n\ncfg = WorldConfig()\nenv = GridWorldRGBNoPIL(cfg)\nplt.figure(figsize=(3,3))\nplt.imshow(env.reset()[\"image\"]); plt.axis(\"off\"); plt.title(\"No-Pillow observation\"); plt.show()\n\n\ndef to_tensor_img_u8(img_u8: np.ndarray) -&gt; torch.Tensor:\n   return torch.from_numpy(img_u8).permute(2,0,1).float() \/ 255.0<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We initialize the environment, set deterministic seeds, and define the lightweight grid-world configuration. We implement a fully NumPy-based RGB renderer so that the agent perceives raw pixel observations without relying on external libraries. We also define the state transition dynamics and prepare image-to-tensor conversion for model training.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">class TransitionDataset(Dataset):\n   def __init__(self, items): self.items = items\n   def __len__(self): return len(self.items)\n   def __getitem__(self, i): return self.items[i]\n\n\ndef collect_transitions(n_episodes=120):\n   items = []\n   e = GridWorldRGBNoPIL(cfg)\n   for _ in tqdm(range(n_episodes), desc=\"Collect\"):\n       obs = e.reset()\n       img_t = to_tensor_img_u8(obs[\"image\"])\n       for _ in range(cfg.max_steps):\n           a = random.randint(0,4)\n           obs2, r, done, info = e.step(a)\n           img_tp1 = to_tensor_img_u8(obs2[\"image\"])\n           st = torch.from_numpy(info[\"state\"]).float()\n           goal = st[2:4].clone()\n           items.append({\n               \"img_t\": img_t,\n               \"action\": torch.tensor(a, dtype=torch.long),\n               \"img_tp1\": img_tp1,\n               \"state_tp1\": st,\n               \"goal\": goal\n           })\n           img_t = img_tp1\n           if done: break\n   return items\n\n\nitems = collect_transitions(n_episodes=120)\nprint(\"Transitions:\", len(items))\nH, W = items[0][\"img_t\"].shape[1], items[0][\"img_t\"].shape[2]\ndl = DataLoader(TransitionDataset(items), batch_size=64, shuffle=True, num_workers=0, drop_last=True)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We collect rollout data by allowing the agent to interact randomly with the environment. We construct transitions that map the current image and action to the next image and state representation. We then wrap this data into a PyTorch Dataset and DataLoader to enable efficient mini-batch training.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">class Encoder(nn.Module):\n   def __init__(self, H, W, zdim=64):\n       super().__init__()\n       self.net = nn.Sequential(\n           nn.Conv2d(3, 24, 5, stride=2, padding=2), nn.ReLU(),\n           nn.Conv2d(24, 48, 5, stride=2, padding=2), nn.ReLU(),\n           nn.Conv2d(48, 64, 3, stride=2, padding=1), nn.ReLU(),\n       )\n       with torch.no_grad():\n           f = self.net(torch.zeros(1,3,H,W))\n       self.feat_shape = f.shape[1:]\n       self.fc = nn.Linear(int(np.prod(self.feat_shape)), zdim)\n   def forward(self, x):\n       return self.fc(self.net(x).flatten(1))\n\n\nclass Decoder(nn.Module):\n   def __init__(self, feat_shape, zdim=64):\n       super().__init__()\n       C,h,w = feat_shape\n       self.C,self.h,self.w = C,h,w\n       self.fc = nn.Linear(zdim, C*h*w)\n       self.net = nn.Sequential(\n           nn.ConvTranspose2d(C, 48, 4, stride=2, padding=1), nn.ReLU(),\n           nn.ConvTranspose2d(48, 24, 4, stride=2, padding=1), nn.ReLU(),\n           nn.ConvTranspose2d(24, 16, 4, stride=2, padding=1), nn.ReLU(),\n           nn.Conv2d(16, 3, 3, padding=1),\n           nn.Sigmoid()\n       )\n   def forward(self, z):\n       x = self.fc(z).view(z.size(0), self.C, self.h, self.w)\n       return self.net(x)\n\n\nclass VLASimLite(nn.Module):\n   def __init__(self, H, W, zdim=64, adim=5):\n       super().__init__()\n       self.enc = Encoder(H,W,zdim)\n       self.dec = Decoder(self.enc.feat_shape, zdim)\n       self.aemb = nn.Embedding(adim, 16)\n       self.gnet = nn.Sequential(nn.Linear(2,16), nn.ReLU(), nn.Linear(16,16))\n       self.dyn = nn.Sequential(\n           nn.Linear(zdim+16+16, 128), nn.ReLU(),\n           nn.Linear(128, zdim)\n       )\n       self.state = nn.Sequential(\n           nn.Linear(zdim, 64), nn.ReLU(),\n           nn.Linear(64, 4),\n           nn.Sigmoid()\n       )\n   def encode(self, img): return self.enc(img)\n   def predict_next_latent(self, z, a, goal):\n       return self.dyn(torch.cat([z, self.aemb(a), self.gnet(goal)], dim=-1))\n   def decode(self, z): return self.dec(z)\n   def forward(self, img_t, a, goal):\n       z = self.encode(img_t)\n       z_next = self.predict_next_latent(z, a, goal)\n       return z_next, self.decode(z_next), self.state(z_next)\n\n\nmodel = VLASimLite(H,W,zdim=64,adim=5).to(device)\nopt = torch.optim.Adam(model.parameters(), lr=2e-3)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define the compact Vision-Language-Action-inspired world model. We build a CNN encoder to compress visual input into a latent space and condition latent dynamics on actions and goals. We also add a decoder and a state-prediction head so the model can reconstruct future frames and predict structured state variables.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">def train(epochs=4):\n   model.train()\n   for ep in range(1, epochs+1):\n       losses = []\n       for b in tqdm(dl, desc=f\"Train {ep}\/{epochs}\"):\n           img_t = b[\"img_t\"].to(device)\n           a = b[\"action\"].to(device)\n           img_tp1 = b[\"img_tp1\"].to(device)\n           st_tp1 = b[\"state_tp1\"].to(device)\n           goal = b[\"goal\"].to(device)\n           z_next, img_pred, st_pred = model(img_t, a, goal)\n           loss = F.l1_loss(img_pred, img_tp1) + 3.0*F.mse_loss(st_pred, st_tp1) + 1e-4*z_next.pow(2).mean()\n           opt.zero_grad(set_to_none=True)\n           loss.backward()\n           nn.utils.clip_grad_norm_(model.parameters(), 2.0)\n           opt.step()\n           losses.append(loss.item())\n       print(\"Epoch\", ep, \"loss\", float(np.mean(losses)))\n\n\ntrain(epochs=4)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We train the world model using a combination of image reconstruction loss and state prediction loss. We optimize the latent dynamics so that the model learns consistent forward prediction from pixels. We keep the architecture lightweight and training stable to ensure smooth execution in constrained runtimes.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">@torch.no_grad()\ndef mpc_action(img_t, horizon=6, n_candidates=120, action_space=5):\n   model.eval()\n   z = model.encode(img_t)\n   st_now = model.state(z)\n   goal = st_now[:,2:4].clamp(0,1)\n   cand = torch.randint(0, action_space, (n_candidates, horizon), device=device)\n   z_roll = z.repeat(n_candidates, 1)\n   goal_k = goal.repeat(n_candidates, 1)\n   for t in range(horizon):\n       z_roll = model.predict_next_latent(z_roll, cand[:,t], goal_k)\n   stT = model.state(z_roll)\n   dist = torch.abs(stT[:,0:2] - stT[:,2:4]).sum(dim=-1)\n   changes = (cand[:,1:] != cand[:,:-1]).float().mean(dim=1)\n   score = dist + 0.12*changes\n   best = torch.argmin(score)\n   return int(cand[best,0].item())\n\n\n@torch.no_grad()\ndef predict_next_frame(img_u8, action):\n   model.eval()\n   img_t = to_tensor_img_u8(img_u8).unsqueeze(0).to(device)\n   z = model.encode(img_t)\n   goal = model.state(z)[:,2:4].clamp(0,1)\n   a = torch.tensor([action], dtype=torch.long, device=device)\n   z_next = model.predict_next_latent(z, a, goal)\n   pred = model.decode(z_next)[0].detach().cpu().permute(1,2,0).numpy()\n   return (pred*255.0).clip(0,255).astype(np.uint8)\n\n\ndef run_episode(max_steps=45):\n   e = GridWorldRGBNoPIL(cfg)\n   obs = e.reset()\n   real, pred, acts, rews = [], [], [], []\n   for _ in range(max_steps):\n       img = obs[\"image\"]\n       real.append(img)\n       a = mpc_action(to_tensor_img_u8(img).unsqueeze(0).to(device), horizon=6, n_candidates=120)\n       pred.append(predict_next_frame(img, a))\n       obs, r, done, info = e.step(a)\n       acts.append(a); rews.append(r)\n       if done:\n           real.append(obs[\"image\"])\n           pred.append(pred[-1])\n           break\n   return real, pred, acts, rews\n\n\nreal, pred, acts, rews = run_episode()\nprint(\"Steps:\", len(acts), \"Return:\", round(sum(rews), 3))\n\n\ndef show(real, pred, acts, every=2, panels=8):\n   idxs = list(range(0, min(len(acts), every*panels), every))\n   n = len(idxs)\n   plt.figure(figsize=(2.4*n, 4.8))\n   for j,i in enumerate(idxs):\n       plt.subplot(2,n,j+1); plt.imshow(real[i]); plt.axis(\"off\"); plt.title(f\"Real t={i}\")\n       plt.subplot(2,n,n+j+1); plt.imshow(pred[i]); plt.axis(\"off\"); plt.title(f\"Pred | {GridWorldRGBNoPIL.ACTION_NAMES[acts[i]]}\")\n   plt.tight_layout(); plt.show()\n\n\nshow(real, pred, acts, every=2, panels=8)\nprint(\"Pipeline OK\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We implement model predictive control directly in latent space. We sample multiple action sequences, roll them forward through the learned dynamics, and select the sequence that minimizes predicted distance to the goal. We then run the full perception\u2013plan\u2013predict\u2013replan loop and visualize how the agent\u2019s predicted future aligns with the actual environment dynamics.<\/p>\n<p>In conclusion, we implemented a complete perception\u2013planning\u2013prediction loop without relying on external rendering libraries. We train a compact vision-based world model, use latent dynamics for forward simulation, and perform real-time replanning using MPC. By keeping the architecture lightweight and stable for constrained runtimes, we demonstrated how embodied agents can reason about future outcomes directly from visual inputs. This approach captures the core idea behind modern Vision-Language-Action systems, where perception and decision-making are tightly integrated within a predictive model of the environment.<\/p>\n<hr class=\"wp-block-separator aligncenter has-alpha-channel-opacity is-style-wide\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Agents-Projects-Tutorials\/blob\/main\/Computer%20Vision\/embodied_vla_latent_mpc_agent_marktechpost.py\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<strong>\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">130k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/27\/how-to-build-a-lightweight-vision-language-action-inspired-embodied-agent-with-latent-world-modeling-and-model-predictive-control\/\">How to Build a Lightweight Vision-Language-Action-Inspired Embodied Agent with Latent World Modeling and Model Predictive Control<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we build an &hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-804","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/804","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=804"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/804\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=804"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=804"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=804"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}