{"id":452,"date":"2026-02-23T14:33:53","date_gmt":"2026-02-23T06:33:53","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=452"},"modified":"2026-02-23T14:33:53","modified_gmt":"2026-02-23T06:33:53","slug":"taalas-is-replacing-programmable-gpus-with-hardwired-ai-chips-to-achieve-17000-tokens-per-second-for-ubiquitous-inference","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=452","title":{"rendered":"Taalas is replacing programmable GPUs with hardwired AI chips to achieve 17,000 tokens per second for ubiquitous inference"},"content":{"rendered":"<p>In the high-stakes world of AI infrastructure, the industry has operated under a singular assumption: flexibility is king. We build general-purpose GPUs because AI models change every week, and we need programmable silicon that can adapt to the next research breakthrough.<\/p>\n<p>But <strong>Taalas<\/strong>, the Toronto-based startup thinks that flexibility is exactly what\u2019s holding AI back. According to Taalas team, if we want AI to be as common and cheap as plastic, we have to stop \u2018simulating\u2019 intelligence on general-purpose computers and start \u2018casting\u2019 it directly into silicon.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Problem: The \u2018Memory Wall\u2019 and the GPU Tax<\/strong><\/h3>\n<p>The current cost of running a Large Language Model (LLM) is driven by a physical bottleneck: the <strong>Memory Wall<\/strong>.<\/p>\n<p>Traditional processors (GPUs) are \u2018Instruction Set Architecture\u2019 (ISA) based. They separate compute and memory. When you run an inference pass on a model like Llama-3, the chip spends the vast majority of its time and energy shuttling weights from High Bandwidth Memory (HBM) to the processing cores. This \u2018data movement tax\u2019 accounts for nearly 90% of the power consumption in modern AI data centers.<\/p>\n<p>Taalas\u2019s solution is radical: <strong>eliminate the memory-fetch cycle.<\/strong> By using a proprietary automated design flow, Taalas translates the computational graph of a specific model directly into the physical layout of a chip. In their <strong>HC1<\/strong> (Hardcore 1) chip, the model\u2019s weights and architecture are literally etched into the wiring of the silicon.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"651\" data-attachment-id=\"78048\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/02\/22\/taalas-is-replacing-programmable-gpus-with-hardwired-ai-chips-to-achieve-17000-tokens-per-second-for-ubiquitous-inference\/screenshot-2026-02-22-at-10-32-46-pm\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-22-at-10.32.46-PM.png\" data-orig-size=\"1594,1014\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-02-22 at 10.32.46\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-22-at-10.32.46-PM-300x191.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-22-at-10.32.46-PM-1024x651.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-22-at-10.32.46-PM-1024x651.png\" alt=\"\" class=\"wp-image-78048\" \/><figcaption class=\"wp-element-caption\">https:\/\/taalas.com\/the-path-to-ubiquitous-ai\/<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Hardcore Models: 17,000 Tokens Per Second<\/strong><\/h3>\n<p>The results of this \u2018direct-to-silicon\u2019 approach redefine the performance ceiling for inference. At their latest unveiling, Taalas demonstrated the <strong>HC1<\/strong> running a Llama 3.1 8B model. While a top-tier NVIDIA H100 might serve a single user at ~150 tokens per second, the HC1 serves a staggering <strong>16,000 to 17,000 tokens per second<\/strong>.<\/p>\n<p><strong>This changes the \u2018unit economics\u2019 of AI:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Performance:<\/strong> A single HC1 chip can outperform a small GPU data center in terms of raw throughput for a specific model.<\/li>\n<li><strong>Efficiency:<\/strong> Taalas claims a <strong>1000x improvement<\/strong> in efficiency (performance-per-watt and performance-per-dollar) compared to conventional chips.<\/li>\n<li><strong>Infrastructure:<\/strong> Because the weights are hardwired, there is no need for external HBM or complex liquid cooling systems. A standard air-cooled rack can house ten of these 250W cards, delivering the power of an entire GPU cluster in a single server box.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Breaking the 60-Day Barrier: The Automated Foundry<\/strong><\/h3>\n<p>The obvious \u2018catch\u2019 for an AI developer is flexibility. If you hardwire a model into a chip today, what happens when a better model comes out tomorrow? Historically, designing an ASIC (Application-Specific Integrated Circuit) took two years and tens of millions of dollars.<\/p>\n<p>Taalas has solved this through <strong>automation<\/strong>. They have built a compiler-like foundry system that takes model weights and generates a chip design in roughly a week. By focusing on a streamlined manufacturing workflow\u2014where they only change the top metal masks of the silicon\u2014they have collapsed the turnaround time from \u2018weights-to-silicon\u2019 to just <strong>two months<\/strong>.<\/p>\n<p>This allows for a \u2018seasonal\u2019 hardware cycle. A company could fine-tune a frontier model in the spring and have thousands of specialized, hyper-efficient inference chips deployed by summer.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1930\" height=\"1078\" data-attachment-id=\"78050\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/02\/22\/taalas-is-replacing-programmable-gpus-with-hardwired-ai-chips-to-achieve-17000-tokens-per-second-for-ubiquitous-inference\/screenshot-2026-02-22-at-10-33-16-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-22-at-10.33.16-PM-1.png\" data-orig-size=\"1930,1078\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-02-22 at 10.33.16\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-22-at-10.33.16-PM-1-300x168.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-22-at-10.33.16-PM-1-1024x572.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-22-at-10.33.16-PM-1.png\" alt=\"\" class=\"wp-image-78050\" \/><figcaption class=\"wp-element-caption\">https:\/\/taalas.com\/the-path-to-ubiquitous-ai\/<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>The Market Shift: From Shovels to Stamps<\/strong><\/h3>\n<p>This transition marks a pivotal moment in the AI hype cycle. We are moving from the \u2018Research &amp; Training\u2019 phase\u2014where GPUs are essential for their flexibility\u2014to the \u2018Deployment &amp; Inference\u2019 phase, where cost-per-token is the only metric that matters.<\/p>\n<p><strong>If Taalas succeeds, the AI market will split into two distinct tiers:<\/strong><\/p>\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>General-Purpose Training:<\/strong> Led by NVIDIA and AMD, providing the massive, flexible clusters needed to discover and train new architectures.<\/li>\n<li><strong>Specialized Inference:<\/strong> Led by \u2018foundries\u2019 like Taalas, which take those proven architectures and \u2018print\u2019 them into cheap, ubiquitous silicon for everything from smartphones to industrial sensors.<\/li>\n<\/ol>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>The \u2018Hardwired\u2019 Paradigm Shift:<\/strong> Taalas is moving from <strong>software-defined AI<\/strong> (running models on general-purpose GPUs) to <strong>hardware-defined AI<\/strong>. By \u2018baking\u2019 a specific model\u2019s weights and architecture directly into the silicon, they eliminate the need for traditional instruction-set overhead, effectively making the model the processor itself.<\/li>\n<li><strong>Death of the Memory Wall:<\/strong> Traditional AI hardware wastes ~90% of its energy moving data between memory and compute. Taalas\u2019s <strong>HC1 (Hardcore 1)<\/strong> chip eliminates the \u201cMemory Wall\u201d by physically wiring the model parameters into the chip\u2019s metal layers, removing the need for expensive High Bandwidth Memory (HBM).<\/li>\n<li><strong>1000x Efficiency Leap:<\/strong> By stripping away the \u2018programmability tax\u2019, Taalas claims a <strong>1,000x improvement<\/strong> in performance-per-watt and performance-per-dollar. In practice, this means an HC1 can hit <strong>17,000 tokens per second<\/strong> on a Llama 3.1 8B model\u2014massively outperforming a standard GPU rack while using far less power.<\/li>\n<li><strong>Automated \u2018Direct-to-Silicon\u2019 Foundry:<\/strong> To solve the problem of model obsolescence, Taalas uses a proprietary <strong>automated design flow<\/strong>. This reduces the time to create a custom AI chip from years to just <strong>weeks<\/strong>, allowing companies to \u2018print\u2019 their fine-tuned models into silicon on a seasonal basis.<\/li>\n<li><strong>The Commodity AI Future:<\/strong> This technology signals a shift from \u2018Cloud-First\u2019 to <strong>\u2018Device-Native\u2019 AI<\/strong>. As inference becomes a cheap, hardwired commodity, AI will move off centralized servers and into local, low-power hardware\u2014ranging from smartphones to industrial sensors\u2014with zero latency and no subscription costs.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/taalas.com\/the-path-to-ubiquitous-ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/02\/22\/taalas-is-replacing-programmable-gpus-with-hardwired-ai-chips-to-achieve-17000-tokens-per-second-for-ubiquitous-inference\/\">Taalas is replacing programmable GPUs with hardwired AI chips to achieve 17,000 tokens per second for ubiquitous inference<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In the high-stakes world of AI&hellip;<\/p>\n","protected":false},"author":1,"featured_media":453,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-452","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/452","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=452"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/452\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/453"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=452"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=452"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=452"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}