{"id":667,"date":"2026-04-04T11:06:56","date_gmt":"2026-04-04T03:06:56","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=667"},"modified":"2026-04-04T11:06:56","modified_gmt":"2026-04-04T03:06:56","slug":"how-to-build-production-ready-agentic-systems-with-z-ai-glm-5-using-thinking-mode-tool-calling-streaming-and-multi-turn-workflows","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=667","title":{"rendered":"How to Build Production-Ready Agentic Systems with Z.AI GLM-5 Using Thinking Mode, Tool Calling, Streaming, and Multi-Turn Workflows"},"content":{"rendered":"<p>In this tutorial, we explore the full capabilities of <a href=\"http:\/\/z.ai\/\"><strong>Z.AI\u2019s GLM-5<\/strong><\/a><strong> <\/strong>model and build a complete understanding of how to use it for real-world, agentic applications. We start from the fundamentals by setting up the environment using the Z.AI SDK and its OpenAI-compatible interface, and then progressively move on to advanced features such as streaming responses, thinking mode for deeper reasoning, and multi-turn conversations. As we continue, we integrate function calling, structured outputs, and eventually construct a fully functional multi-tool agent powered by GLM-5. Also, we understand each capability in isolation, and also how Z.AI\u2019s ecosystem enables us to build scalable, production-ready AI systems.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">!pip install -q zai-sdk openai rich\n\n\nimport os\nimport json\nimport time\nfrom datetime import datetime\nfrom typing import Optional\nimport getpass\n\n\nAPI_KEY = os.environ.get(\"ZAI_API_KEY\")\n\n\nif not API_KEY:\n   API_KEY = getpass.getpass(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f511.png\" alt=\"\ud83d\udd11\" class=\"wp-smiley\" \/> Enter your Z.AI API key (hidden input): \").strip()\n\n\nif not API_KEY:\n   raise ValueError(\n       \"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/274c.png\" alt=\"\u274c\" class=\"wp-smiley\" \/> No API key provided! Get one free at: https:\/\/z.ai\/manage-apikey\/apikey-list\"\n   )\n\n\nos.environ[\"ZAI_API_KEY\"] = API_KEY\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> API key configured (ends with ...{API_KEY[-4:]})\")\n\n\nfrom zai import ZaiClient\n\n\nclient = ZaiClient(api_key=API_KEY)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> ZaiClient initialized \u2014 ready to use GLM-5!\")\n\n\n\n\nprint(\"n\" + \"=\" * 70)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4dd.png\" alt=\"\ud83d\udcdd\" class=\"wp-smiley\" \/> SECTION 2: Basic Chat Completion\")\nprint(\"=\" * 70)\n\n\nresponse = client.chat.completions.create(\n   model=\"glm-5\",\n   messages=[\n       {\"role\": \"system\", \"content\": \"You are a concise, expert software architect.\"},\n       {\"role\": \"user\", \"content\": \"Explain the Mixture-of-Experts architecture in 3 sentences.\"},\n   ],\n   max_tokens=256,\n   temperature=0.7,\n)\n\n\nprint(\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> GLM-5 Response:\")\nprint(response.choices[0].message.content)\nprint(f\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ca.png\" alt=\"\ud83d\udcca\" class=\"wp-smiley\" \/> Usage: {response.usage.prompt_tokens} prompt + {response.usage.completion_tokens} completion tokens\")\n\n\n\n\nprint(\"n\" + \"=\" * 70)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f30a.png\" alt=\"\ud83c\udf0a\" class=\"wp-smiley\" \/> SECTION 3: Streaming Responses\")\nprint(\"=\" * 70)\n\n\nprint(\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> GLM-5 (streaming): \", end=\"\", flush=True)\n\n\nstream = client.chat.completions.create(\n   model=\"glm-5\",\n   messages=[\n       {\"role\": \"user\", \"content\": \"Write a Python one-liner that checks if a number is prime.\"},\n   ],\n   stream=True,\n   max_tokens=512,\n   temperature=0.6,\n)\n\n\nfull_response = \"\"\nfor chunk in stream:\n   delta = chunk.choices[0].delta\n   if delta.content:\n       print(delta.content, end=\"\", flush=True)\n       full_response += delta.content\n\n\nprint(f\"nn<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ca.png\" alt=\"\ud83d\udcca\" class=\"wp-smiley\" \/> Streamed {len(full_response)} characters\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We begin by installing the Z.AI and OpenAI SDKs, then securely capture our API key through hidden terminal input using getpass. We initialize the ZaiClient and fire off our first basic chat completion to GLM-5, asking it to explain the Mixture-of-Experts architecture. We then explore streaming responses, watching tokens arrive in real time as GLM-5 generates a Python one-liner for prime checking.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">print(\"n\" + \"=\" * 70)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f9e0.png\" alt=\"\ud83e\udde0\" class=\"wp-smiley\" \/> SECTION 4: Thinking Mode (Chain-of-Thought)\")\nprint(\"=\" * 70)\nprint(\"GLM-5 can expose its internal reasoning before giving a final answer.\")\nprint(\"This is especially powerful for math, logic, and complex coding tasks.n\")\n\n\nprint(\"\u2500\u2500\u2500 Thinking Mode + Streaming \u2500\u2500\u2500n\")\n\n\nstream = client.chat.completions.create(\n   model=\"glm-5\",\n   messages=[\n       {\n           \"role\": \"user\",\n           \"content\": (\n               \"A farmer has 17 sheep. All but 9 run away. \"\n               \"How many sheep does the farmer have left? \"\n               \"Think carefully before answering.\"\n           ),\n       },\n   ],\n   thinking={\"type\": \"enabled\"},\n   stream=True,\n   max_tokens=2048,\n   temperature=0.6,\n)\n\n\nreasoning_text = \"\"\nanswer_text = \"\"\n\n\nfor chunk in stream:\n   delta = chunk.choices[0].delta\n   if hasattr(delta, \"reasoning_content\") and delta.reasoning_content:\n       if not reasoning_text:\n           print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ad.png\" alt=\"\ud83d\udcad\" class=\"wp-smiley\" \/> Reasoning:\")\n       print(delta.reasoning_content, end=\"\", flush=True)\n       reasoning_text += delta.reasoning_content\n   if delta.content:\n       if not answer_text and reasoning_text:\n           print(\"nn<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Final Answer:\")\n       print(delta.content, end=\"\", flush=True)\n       answer_text += delta.content\n\n\nprint(f\"nn<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ca.png\" alt=\"\ud83d\udcca\" class=\"wp-smiley\" \/> Reasoning: {len(reasoning_text)} chars | Answer: {len(answer_text)} chars\")\n\n\n\n\nprint(\"n\" + \"=\" * 70)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ac.png\" alt=\"\ud83d\udcac\" class=\"wp-smiley\" \/> SECTION 5: Multi-Turn Conversation\")\nprint(\"=\" * 70)\n\n\nmessages = [\n   {\"role\": \"system\", \"content\": \"You are a senior Python developer. Be concise.\"},\n   {\"role\": \"user\", \"content\": \"What's the difference between a list and a tuple in Python?\"},\n]\n\n\nr1 = client.chat.completions.create(model=\"glm-5\", messages=messages, max_tokens=512, temperature=0.7)\nassistant_reply_1 = r1.choices[0].message.content\nmessages.append({\"role\": \"assistant\", \"content\": assistant_reply_1})\nprint(f\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f9d1.png\" alt=\"\ud83e\uddd1\" class=\"wp-smiley\" \/> User: {messages[1]['content']}\")\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> GLM-5: {assistant_reply_1[:200]}...\")\n\n\nmessages.append({\"role\": \"user\", \"content\": \"When should I use a NamedTuple instead?\"})\nr2 = client.chat.completions.create(model=\"glm-5\", messages=messages, max_tokens=512, temperature=0.7)\nassistant_reply_2 = r2.choices[0].message.content\nprint(f\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f9d1.png\" alt=\"\ud83e\uddd1\" class=\"wp-smiley\" \/> User: {messages[-1]['content']}\")\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> GLM-5: {assistant_reply_2[:200]}...\")\n\n\nmessages.append({\"role\": \"assistant\", \"content\": assistant_reply_2})\nmessages.append({\"role\": \"user\", \"content\": \"Show me a practical example with type hints.\"})\nr3 = client.chat.completions.create(model=\"glm-5\", messages=messages, max_tokens=1024, temperature=0.7)\nassistant_reply_3 = r3.choices[0].message.content\nprint(f\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f9d1.png\" alt=\"\ud83e\uddd1\" class=\"wp-smiley\" \/> User: {messages[-1]['content']}\")\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> GLM-5: {assistant_reply_3[:300]}...\")\n\n\nprint(f\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ca.png\" alt=\"\ud83d\udcca\" class=\"wp-smiley\" \/> Conversation: {len(messages)+1} messages, {r3.usage.total_tokens} total tokens in last call\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We activate GLM-5\u2019s thinking mode to observe its internal chain-of-thought reasoning streamed live through the reasoning_content field before the final answer appears. We then build a multi-turn conversation where we ask about Python lists vs tuples, follow up on NamedTuples, and request a practical example with type hints, all while GLM-5 maintains full context across turns. We track how the conversation grows in message count and token usage with each successive exchange.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">print(\"n\" + \"=\" * 70)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f527.png\" alt=\"\ud83d\udd27\" class=\"wp-smiley\" \/> SECTION 6: Function Calling (Tool Use)\")\nprint(\"=\" * 70)\nprint(\"GLM-5 can decide WHEN and HOW to call external functions you define.n\")\n\n\ntools = [\n   {\n       \"type\": \"function\",\n       \"function\": {\n           \"parameters\": {\n               \"type\": \"object\",\n               \"properties\": {\n                   \"city\": {\n                       \"type\": \"string\",\n                       \"description\": \"City name, e.g. 'San Francisco', 'Tokyo'\",\n                   },\n                   \"unit\": {\n                       \"type\": \"string\",\n                       \"enum\": [\"celsius\", \"fahrenheit\"],\n                       \"description\": \"Temperature unit (default: celsius)\",\n                   },\n               },\n               \"required\": [\"city\"],\n           },\n       },\n   },\n   {\n       \"type\": \"function\",\n       \"function\": {\n           \"name\": \"calculate\",\n           \"description\": \"Evaluate a mathematical expression safely\",\n           \"parameters\": {\n               \"type\": \"object\",\n               \"properties\": {\n                   \"expression\": {\n                       \"type\": \"string\",\n                       \"description\": \"Math expression, e.g. '2**10 + 3*7'\",\n                   }\n               },\n               \"required\": [\"expression\"],\n           },\n       },\n   },\n]\n\n\n\n\ndef get_weather(city: str, unit: str = \"celsius\") -&gt; dict:\n   weather_db = {\n       \"san francisco\": {\"temp\": 18, \"condition\": \"Foggy\", \"humidity\": 78},\n       \"tokyo\": {\"temp\": 28, \"condition\": \"Sunny\", \"humidity\": 55},\n       \"london\": {\"temp\": 14, \"condition\": \"Rainy\", \"humidity\": 85},\n       \"new york\": {\"temp\": 22, \"condition\": \"Partly Cloudy\", \"humidity\": 60},\n   }\n   data = weather_db.get(city.lower(), {\"temp\": 20, \"condition\": \"Clear\", \"humidity\": 50})\n   if unit == \"fahrenheit\":\n       data[\"temp\"] = round(data[\"temp\"] * 9 \/ 5 + 32)\n   return {\"city\": city, \"unit\": unit or \"celsius\", **data}\n\n\n\n\ndef calculate(expression: str) -&gt; dict:\n   allowed = set(\"0123456789+-*\/.()% \")\n   if not all(c in allowed for c in expression):\n       return {\"error\": \"Invalid characters in expression\"}\n   try:\n       result = eval(expression)\n       return {\"expression\": expression, \"result\": result}\n   except Exception as e:\n       return {\"error\": str(e)}\n\n\n\n\nTOOL_REGISTRY = {\"get_weather\": get_weather, \"calculate\": calculate}\n\n\n\n\ndef run_tool_call(user_message: str):\n   print(f\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f9d1.png\" alt=\"\ud83e\uddd1\" class=\"wp-smiley\" \/> User: {user_message}\")\n   messages = [{\"role\": \"user\", \"content\": user_message}]\n\n\n   response = client.chat.completions.create(\n       model=\"glm-5\",\n       messages=messages,\n       tools=tools,\n       tool_choice=\"auto\",\n       max_tokens=1024,\n   )\n\n\n   assistant_msg = response.choices[0].message\n   messages.append(assistant_msg.model_dump())\n\n\n   if assistant_msg.tool_calls:\n       for tc in assistant_msg.tool_calls:\n           fn_name = tc.function.name\n           fn_args = json.loads(tc.function.arguments)\n           print(f\"   <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f527.png\" alt=\"\ud83d\udd27\" class=\"wp-smiley\" \/> Tool call: {fn_name}({fn_args})\")\n\n\n           result = TOOL_REGISTRY[fn_name](**fn_args)\n           print(f\"   <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4e6.png\" alt=\"\ud83d\udce6\" class=\"wp-smiley\" \/> Result: {result}\")\n\n\n           messages.append({\n               \"role\": \"tool\",\n               \"content\": json.dumps(result, ensure_ascii=False),\n               \"tool_call_id\": tc.id,\n           })\n\n\n       final = client.chat.completions.create(\n           model=\"glm-5\",\n           messages=messages,\n           tools=tools,\n           max_tokens=1024,\n       )\n       print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> GLM-5: {final.choices[0].message.content}\")\n   else:\n       print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> GLM-5: {assistant_msg.content}\")\n\n\n\n\nrun_tool_call(\"What's the weather like in Tokyo right now?\")\nrun_tool_call(\"What is 2^20 + 3^10 - 1024?\")\nrun_tool_call(\"Compare the weather in San Francisco and London, and calculate the temperature difference.\")\n\n\n\n\nprint(\"n\" + \"=\" * 70)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4cb.png\" alt=\"\ud83d\udccb\" class=\"wp-smiley\" \/> SECTION 7: Structured JSON Output\")\nprint(\"=\" * 70)\nprint(\"Force GLM-5 to return well-structured JSON for downstream processing.n\")\n\n\nresponse = client.chat.completions.create(\n   model=\"glm-5\",\n   messages=[\n       {\n           \"role\": \"system\",\n           \"content\": (\n               \"You are a data extraction assistant. \"\n               \"Always respond with valid JSON only \u2014 no markdown, no explanation.\"\n           ),\n       },\n       {\n           \"role\": \"user\",\n           \"content\": (\n               \"Extract structured data from this text:nn\"\n               '\"Acme Corp reported Q3 2025 revenue of $4.2B, up 18% YoY. '\n               \"Net income was $890M. The company announced 3 new products \"\n               \"and plans to expand into 5 new markets by 2026. CEO Jane Smith \"\n               'said she expects 25% growth next year.\"nn'\n               \"Return JSON with keys: company, quarter, revenue, revenue_growth, \"\n               \"net_income, new_products, new_markets, ceo, growth_forecast\"\n           ),\n       },\n   ],\n   max_tokens=512,\n   temperature=0.1,\n)\n\n\nraw_output = response.choices[0].message.content\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4c4.png\" alt=\"\ud83d\udcc4\" class=\"wp-smiley\" \/> Raw output:\")\nprint(raw_output)\n\n\ntry:\n   clean = raw_output.strip()\n   if clean.startswith(\"```\"):\n       clean = clean.split(\"n\", 1)[1].rsplit(\"```\", 1)[0]\n   parsed = json.loads(clean)\n   print(\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Parsed JSON:\")\n   print(json.dumps(parsed, indent=2))\nexcept json.JSONDecodeError as e:\n   print(f\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/26a0.png\" alt=\"\u26a0\" class=\"wp-smiley\" \/> JSON parsing failed: {e}\")\n   print(\"Tip: You can add response_format={'type': 'json_object'} for stricter enforcement\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define two tools, a weather lookup and a math calculator, then let GLM-5 autonomously decide when to invoke them based on the user\u2019s natural language query. We run a complete tool-calling round-trip: the model selects the function, we execute it locally, feed the result back, and GLM-5 synthesizes a final human-readable answer. We then switch to structured output, prompting GLM-5 to extract financial data from raw text into clean, parseable JSON.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">print(\"n\" + \"=\" * 70)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> SECTION 8: Multi-Tool Agentic Loop\")\nprint(\"=\" * 70)\nprint(\"Build a complete agent that can use multiple tools across turns.n\")\n\n\n\n\nclass GLM5Agent:\n\n\n   def __init__(self, system_prompt: str, tools: list, tool_registry: dict):\n       self.client = ZaiClient(api_key=API_KEY)\n       self.messages = [{\"role\": \"system\", \"content\": system_prompt}]\n       self.tools = tools\n       self.registry = tool_registry\n       self.max_iterations = 5\n\n\n   def chat(self, user_input: str) -&gt; str:\n       self.messages.append({\"role\": \"user\", \"content\": user_input})\n\n\n       for iteration in range(self.max_iterations):\n           response = self.client.chat.completions.create(\n               model=\"glm-5\",\n               messages=self.messages,\n               tools=self.tools,\n               tool_choice=\"auto\",\n               max_tokens=2048,\n               temperature=0.6,\n           )\n\n\n           msg = response.choices[0].message\n           self.messages.append(msg.model_dump())\n\n\n           if not msg.tool_calls:\n               return msg.content\n\n\n           for tc in msg.tool_calls:\n               fn_name = tc.function.name\n               fn_args = json.loads(tc.function.arguments)\n               print(f\"   <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f527.png\" alt=\"\ud83d\udd27\" class=\"wp-smiley\" \/> [{iteration+1}] {fn_name}({fn_args})\")\n\n\n               if fn_name in self.registry:\n                   result = self.registry[fn_name](**fn_args)\n               else:\n                   result = {\"error\": f\"Unknown function: {fn_name}\"}\n\n\n               self.messages.append({\n                   \"role\": \"tool\",\n                   \"content\": json.dumps(result, ensure_ascii=False),\n                   \"tool_call_id\": tc.id,\n               })\n\n\n       return \"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/26a0.png\" alt=\"\u26a0\" class=\"wp-smiley\" \/> Agent reached maximum iterations without a final answer.\"\n\n\n\n\nextended_tools = tools + [\n   {\n       \"type\": \"function\",\n       \"function\": {\n           \"name\": \"get_current_time\",\n           \"description\": \"Get the current date and time in ISO format\",\n           \"parameters\": {\n               \"type\": \"object\",\n               \"properties\": {},\n               \"required\": [],\n           },\n       },\n   },\n   {\n       \"type\": \"function\",\n       \"function\": {\n           \"name\": \"unit_converter\",\n           \"description\": \"Convert between units (length, weight, temperature)\",\n           \"parameters\": {\n               \"type\": \"object\",\n               \"properties\": {\n                   \"value\": {\"type\": \"number\", \"description\": \"Numeric value to convert\"},\n                   \"from_unit\": {\"type\": \"string\", \"description\": \"Source unit (e.g., 'km', 'miles', 'kg', 'lbs', 'celsius', 'fahrenheit')\"},\n                   \"to_unit\": {\"type\": \"string\", \"description\": \"Target unit\"},\n               },\n               \"required\": [\"value\", \"from_unit\", \"to_unit\"],\n           },\n       },\n   },\n]\n\n\n\n\ndef get_current_time() -&gt; dict:\n   return {\"datetime\": datetime.now().isoformat(), \"timezone\": \"UTC\"}\n\n\n\n\ndef unit_converter(value: float, from_unit: str, to_unit: str) -&gt; dict:\n   conversions = {\n       (\"km\", \"miles\"): lambda v: v * 0.621371,\n       (\"miles\", \"km\"): lambda v: v * 1.60934,\n       (\"kg\", \"lbs\"): lambda v: v * 2.20462,\n       (\"lbs\", \"kg\"): lambda v: v * 0.453592,\n       (\"celsius\", \"fahrenheit\"): lambda v: v * 9 \/ 5 + 32,\n       (\"fahrenheit\", \"celsius\"): lambda v: (v - 32) * 5 \/ 9,\n       (\"meters\", \"feet\"): lambda v: v * 3.28084,\n       (\"feet\", \"meters\"): lambda v: v * 0.3048,\n   }\n   key = (from_unit.lower(), to_unit.lower())\n   if key in conversions:\n       result = round(conversions[key](value), 4)\n       return {\"value\": value, \"from\": from_unit, \"to\": to_unit, \"result\": result}\n   return {\"error\": f\"Conversion {from_unit} \u2192 {to_unit} not supported\"}\n\n\n\n\nextended_registry = {\n   **TOOL_REGISTRY,\n   \"get_current_time\": get_current_time,\n   \"unit_converter\": unit_converter,\n}\n\n\nagent = GLM5Agent(\n   system_prompt=(\n       \"You are a helpful assistant with access to weather, math, time, and \"\n       \"unit conversion tools. Use them whenever they can help answer the user's \"\n       \"question accurately. Always show your work.\"\n   ),\n   tools=extended_tools,\n   tool_registry=extended_registry,\n)\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f9d1.png\" alt=\"\ud83e\uddd1\" class=\"wp-smiley\" \/> User: What time is it? Also, if it's 28\u00b0C in Tokyo, what's that in Fahrenheit?\")\nprint(\"   And what's 2^16?\")\nresult = agent.chat(\n   \"What time is it? Also, if it's 28\u00b0C in Tokyo, what's that in Fahrenheit? \"\n   \"And what's 2^16?\"\n)\nprint(f\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> Agent: {result}\")\n\n\n\n\nprint(\"n\" + \"=\" * 70)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2696.png\" alt=\"\u2696\" class=\"wp-smiley\" \/>  SECTION 9: Thinking Mode ON vs OFF Comparison\")\nprint(\"=\" * 70)\nprint(\"See how thinking mode improves accuracy on a tricky logic problem.n\")\n\n\ntricky_question = (\n   \"I have 12 coins. One of them is counterfeit and weighs differently than the rest. \"\n)\n\n\nprint(\"\u2500\u2500\u2500 WITHOUT Thinking Mode \u2500\u2500\u2500\")\nt0 = time.time()\nr_no_think = client.chat.completions.create(\n   model=\"glm-5\",\n   messages=[{\"role\": \"user\", \"content\": tricky_question}],\n   thinking={\"type\": \"disabled\"},\n   max_tokens=2048,\n   temperature=0.6,\n)\nt1 = time.time()\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/23f1.png\" alt=\"\u23f1\" class=\"wp-smiley\" \/>  Time: {t1-t0:.1f}s | Tokens: {r_no_think.usage.completion_tokens}\")\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4dd.png\" alt=\"\ud83d\udcdd\" class=\"wp-smiley\" \/> Answer (first 300 chars): {r_no_think.choices[0].message.content[:300]}...\")\n\n\nprint(\"n\u2500\u2500\u2500 WITH Thinking Mode \u2500\u2500\u2500\")\nt0 = time.time()\nr_think = client.chat.completions.create(\n   model=\"glm-5\",\n   messages=[{\"role\": \"user\", \"content\": tricky_question}],\n   thinking={\"type\": \"enabled\"},\n   max_tokens=4096,\n   temperature=0.6,\n)\nt1 = time.time()\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/23f1.png\" alt=\"\u23f1\" class=\"wp-smiley\" \/>  Time: {t1-t0:.1f}s | Tokens: {r_think.usage.completion_tokens}\")\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4dd.png\" alt=\"\ud83d\udcdd\" class=\"wp-smiley\" \/> Answer (first 300 chars): {r_think.choices[0].message.content[:300]}...\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We build a reusable GLM5Agent class that runs a full agentic loop, automatically dispatching to weather, math, time, and unit conversion tools across multiple iterations until it reaches a final answer. We test it with a complex multi-part query that requires calling three different tools in a single turn. We then run a side-by-side comparison of the same tricky 12-coin logic puzzle with thinking mode disabled versus enabled, measuring both response time and answer quality.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">print(\"n\" + \"=\" * 70)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f504.png\" alt=\"\ud83d\udd04\" class=\"wp-smiley\" \/> SECTION 10: OpenAI SDK Compatibility\")\nprint(\"=\" * 70)\nprint(\"GLM-5 is fully compatible with the OpenAI Python SDK.\")\nprint(\"Just change the base_url \u2014 your existing OpenAI code works as-is!n\")\n\n\nfrom openai import OpenAI\n\n\nopenai_client = OpenAI(\n   api_key=API_KEY,\n   base_url=\"https:\/\/api.z.ai\/api\/paas\/v4\/\",\n)\n\n\ncompletion = openai_client.chat.completions.create(\n   model=\"glm-5\",\n   messages=[\n       {\"role\": \"system\", \"content\": \"You are a writing assistant.\"},\n       {\n           \"role\": \"user\",\n           \"content\": \"Write a 4-line poem about artificial intelligence discovering nature.\",\n       },\n   ],\n   max_tokens=256,\n   temperature=0.9,\n)\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> GLM-5 (via OpenAI SDK):\")\nprint(completion.choices[0].message.content)\n\n\nprint(\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f30a.png\" alt=\"\ud83c\udf0a\" class=\"wp-smiley\" \/> Streaming (via OpenAI SDK):\")\nstream = openai_client.chat.completions.create(\n   model=\"glm-5\",\n   messages=[\n       {\n           \"role\": \"user\",\n           \"content\": \"List 3 creative use cases for a 744B parameter MoE model. Be brief.\",\n       }\n   ],\n   stream=True,\n   max_tokens=512,\n)\n\n\nfor chunk in stream:\n   if chunk.choices[0].delta.content:\n       print(chunk.choices[0].delta.content, end=\"\", flush=True)\nprint()\n\n\n\n\nprint(\"n\" + \"=\" * 70)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f389.png\" alt=\"\ud83c\udf89\" class=\"wp-smiley\" \/> Tutorial Complete!\")\nprint(\"=\" * 70)\nprint(\"\"\"\nYou've learned how to use GLM-5 for:\n\n\n <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Basic chat completions\n <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Real-time streaming responses\n <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Thinking mode (chain-of-thought reasoning)\n <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Multi-turn conversations with context\n <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Function calling \/ tool use\n <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Structured JSON output extraction\n <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Building a multi-tool agentic loop\n <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Comparing thinking mode ON vs OFF\n <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Drop-in OpenAI SDK compatibility\n\n\n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4da.png\" alt=\"\ud83d\udcda\" class=\"wp-smiley\" \/> Next steps:\n \u2022 GLM-5 Docs:       https:\/\/docs.z.ai\/guides\/llm\/glm-5\n \u2022 Function Calling:  https:\/\/docs.z.ai\/guides\/capabilities\/function-calling\n \u2022 Structured Output: https:\/\/docs.z.ai\/guides\/capabilities\/struct-output\n \u2022 Context Caching:   https:\/\/docs.z.ai\/guides\/capabilities\/cache\n \u2022 Web Search Tool:   https:\/\/docs.z.ai\/guides\/tools\/web-search\n \u2022 GitHub:            https:\/\/github.com\/zai-org\/GLM-5\n \u2022 API Keys:          https:\/\/z.ai\/manage-apikey\/apikey-list\n\n\n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4a1.png\" alt=\"\ud83d\udca1\" class=\"wp-smiley\" \/> Pro tip: GLM-5 also supports web search and context caching\n  via the API for even more powerful applications!\n\"\"\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We demonstrate that GLM-5 works as a drop-in replacement with the standard OpenAI Python SDK; we simply point base_url, and everything works identically. We test both a standard completion for creative writing and a streaming call that lists use cases for a 744B MoE model. We wrap up with a full summary of all ten capabilities covered and links to the official docs for deeper exploration.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Codes\/glm5_agentic_systems_tutorial_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes Notebook here<\/a>. \u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/03\/how-to-build-production-ready-agentic-systems-with-z-ai-glm-5-using-thinking-mode-tool-calling-streaming-and-multi-turn-workflows\/\">How to Build Production-Ready Agentic Systems with Z.AI GLM-5 Using Thinking Mode, Tool Calling, Streaming, and Multi-Turn Workflows<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we explore t&hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-667","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/667","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=667"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/667\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=667"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=667"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=667"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}