{"id":719,"date":"2026-04-14T11:23:29","date_gmt":"2026-04-14T03:23:29","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=719"},"modified":"2026-04-14T11:23:29","modified_gmt":"2026-04-14T03:23:29","slug":"google-adk-multi-agent-pipeline-tutorial-data-loading-statistical-testing-visualization-and-report-generation-in-python","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=719","title":{"rendered":"Google ADK Multi-Agent Pipeline Tutorial: Data Loading, Statistical Testing, Visualization, and Report Generation in Python"},"content":{"rendered":"<p>In this tutorial, we build an advanced data analysis pipeline using <a href=\"https:\/\/github.com\/google\/adk-python\"><strong>Google ADK<\/strong><\/a> and organize it as a practical multi-agent system for real analytical work. We set up the environment, configure secure API access, create a centralized data store, and define specialized tools for loading data, exploring datasets, running statistical tests, transforming tables, generating visualizations, and producing reports. As we move through the workflow, we connect these capabilities through a master analyst agent that coordinates specialists, allowing us to see how a production-style analysis system can handle end-to-end tasks in a structured, scalable way.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">!pip install google-adk -q\n!pip install litellm -q\n!pip install pandas numpy scipy matplotlib seaborn -q\n!pip install openpyxl -q\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> All packages installed!\")\n\n\nimport os\nimport io\nimport json\nimport getpass\nimport asyncio\nfrom datetime import datetime\nfrom typing import Optional, Dict, Any, List\n\n\nimport pandas as pd\nimport numpy as np\nfrom scipy import stats\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\n\nfrom google.adk.agents import Agent\nfrom google.adk.models.lite_llm import LiteLlm\nfrom google.adk.sessions import InMemorySessionService\nfrom google.adk.runners import Runner\nfrom google.adk.tools.tool_context import ToolContext\nfrom google.genai import types\n\n\nimport warnings\nwarnings.filterwarnings(\"ignore\")\n\n\nplt.style.use('seaborn-v0_8-whitegrid')\nsns.set_palette(\"husl\")\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Libraries loaded!\")\n\n\ndef make_serializable(obj):\n   if isinstance(obj, dict):\n       return {k: make_serializable(v) for k, v in obj.items()}\n   elif isinstance(obj, list):\n       return [make_serializable(item) for item in obj]\n   elif isinstance(obj, (np.integer, np.int64, np.int32)):\n       return int(obj)\n   elif isinstance(obj, (np.floating, np.float64, np.float32)):\n       return float(obj)\n   elif isinstance(obj, np.ndarray):\n       return obj.tolist()\n   elif isinstance(obj, (np.bool_,)):\n       return bool(obj)\n   elif isinstance(obj, pd.Timestamp):\n       return obj.isoformat()\n   elif pd.isna(obj):\n       return None\n   else:\n       return obj\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Serialization helper ready!\")\n\n\nprint(\"=\" * 60)\nprint(\"  <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f510.png\" alt=\"\ud83d\udd10\" class=\"wp-smiley\" \/> API KEY CONFIGURATION\")\nprint(\"=\" * 60)\n\n\ntry:\n   from google.colab import userdata\n   api_key = userdata.get('OPENAI_API_KEY')\n   print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> API key loaded from Colab Secrets!\")\nexcept:\n   print(\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4dd.png\" alt=\"\ud83d\udcdd\" class=\"wp-smiley\" \/> Enter your OpenAI API key (hidden input):\")\n   api_key = getpass.getpass(\"OpenAI API Key: \")\n\n\nos.environ['OPENAI_API_KEY'] = api_key\n\n\nif api_key and len(api_key) &gt; 20:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> API Key configured: {api_key[:8]}...{api_key[-4:]}\")\nelse:\n   print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/274c.png\" alt=\"\u274c\" class=\"wp-smiley\" \/> Invalid API key!\")\n\n\nMODEL = \"openai\/gpt-4o-mini\"\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Using model: {MODEL}\")\n\n\nclass DataStore:\n   _instance = None\n  \n   def __new__(cls):\n       if cls._instance is None:\n           cls._instance = super().__new__(cls)\n           cls._instance.datasets = {}\n           cls._instance.analysis_history = []\n       return cls._instance\n  \n   def add_dataset(self, name: str, df: pd.DataFrame, source: str = \"unknown\"):\n       self.datasets[name] = {\n           \"data\": df,\n           \"loaded_at\": datetime.now().isoformat(),\n           \"source\": source,\n           \"shape\": (int(df.shape[0]), int(df.shape[1])),\n           \"columns\": list(df.columns)\n       }\n       return f\"Dataset '{name}' stored: {df.shape[0]} rows \u00d7 {df.shape[1]} columns\"\n  \n   def get_dataset(self, name: str) -&gt; Optional[pd.DataFrame]:\n       if name in self.datasets:\n           return self.datasets[name][\"data\"]\n       return None\n  \n   def list_datasets(self) -&gt; List[str]:\n       return list(self.datasets.keys())\n  \n   def log_analysis(self, analysis_type: str, dataset: str, result_summary: str):\n       self.analysis_history.append({\n           \"timestamp\": datetime.now().isoformat(),\n           \"type\": analysis_type,\n           \"dataset\": dataset,\n           \"summary\": result_summary\n       })\n\n\nDATA_STORE = DataStore()\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> DataStore initialized!\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We install the required libraries and import all the modules needed to build the pipeline. We set up the visualization style, configure the API key securely, and define the model that powers the agents. We also create the shared DataStore and the serialization helper so we can manage datasets and return clean JSON-safe outputs throughout the workflow.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">def load_csv(file_path: str, dataset_name: str, tool_context: ToolContext) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4c2.png\" alt=\"\ud83d\udcc2\" class=\"wp-smiley\" \/> Loading CSV: {file_path} as '{dataset_name}'\")\n  \n   try:\n       df = pd.read_csv(file_path)\n       result = DATA_STORE.add_dataset(dataset_name, df, source=file_path)\n      \n       datasets = tool_context.state.get(\"loaded_datasets\", [])\n       if dataset_name not in datasets:\n           datasets.append(dataset_name)\n       tool_context.state[\"loaded_datasets\"] = datasets\n       tool_context.state[\"active_dataset\"] = dataset_name\n      \n       summary = {\n           \"status\": \"success\",\n           \"message\": result,\n           \"preview\": {\n               \"columns\": list(df.columns),\n               \"shape\": [int(df.shape[0]), int(df.shape[1])],\n               \"dtypes\": {k: str(v) for k, v in df.dtypes.items()},\n               \"sample\": make_serializable(df.head(3).to_dict(orient=\"records\"))\n           }\n       }\n       return make_serializable(summary)\n      \n   except Exception as e:\n       return {\"status\": \"error\", \"message\": f\"Failed to load CSV: {str(e)}\"}\n\n\n\n\ndef create_sample_dataset(dataset_type: str, dataset_name: str, tool_context: ToolContext) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4c2.png\" alt=\"\ud83d\udcc2\" class=\"wp-smiley\" \/> Creating sample dataset: {dataset_type} as '{dataset_name}'\")\n  \n   np.random.seed(42)\n  \n   if dataset_type == \"sales\":\n       n = 500\n       dates = pd.date_range(\"2023-01-01\", periods=n, freq=\"D\")\n       df = pd.DataFrame({\n           \"order_id\": range(1000, 1000 + n),\n           \"date\": dates[:n].astype(str),\n           \"product\": np.random.choice([\"Laptop\", \"Phone\", \"Tablet\", \"Watch\", \"Headphones\"], n),\n           \"category\": np.random.choice([\"Electronics\", \"Accessories\"], n, p=[0.6, 0.4]),\n           \"region\": np.random.choice([\"North\", \"South\", \"East\", \"West\"], n),\n           \"quantity\": np.random.randint(1, 10, n),\n           \"unit_price\": np.random.uniform(50, 1500, n).round(2),\n           \"discount\": np.random.choice([0.0, 0.05, 0.10, 0.15, 0.20], n),\n           \"customer_type\": np.random.choice([\"New\", \"Returning\", \"VIP\"], n, p=[0.3, 0.5, 0.2])\n       })\n       df[\"revenue\"] = (df[\"quantity\"] * df[\"unit_price\"] * (1 - df[\"discount\"])).round(2)\n       df[\"profit_margin\"] = np.random.uniform(0.15, 0.45, n).round(3)\n       df[\"profit\"] = (df[\"revenue\"] * df[\"profit_margin\"]).round(2)\n      \n   elif dataset_type == \"customers\":\n       n = 300\n       df = pd.DataFrame({\n           \"customer_id\": range(5000, 5000 + n),\n           \"age\": np.random.randint(18, 75, n),\n           \"gender\": np.random.choice([\"M\", \"F\", \"Other\"], n, p=[0.48, 0.48, 0.04]),\n           \"income\": np.random.lognormal(10.5, 0.5, n).round(0),\n           \"education\": np.random.choice([\"High School\", \"Bachelor\", \"Master\", \"PhD\"], n, p=[0.25, 0.45, 0.22, 0.08]),\n           \"membership_years\": np.random.exponential(3, n).round(1),\n           \"total_purchases\": np.random.randint(1, 100, n),\n           \"avg_order_value\": np.random.uniform(25, 500, n).round(2),\n           \"satisfaction_score\": np.clip(np.random.normal(7.5, 1.5, n), 1, 10).round(1),\n           \"churn_risk\": np.random.choice([\"Low\", \"Medium\", \"High\"], n, p=[0.6, 0.3, 0.1])\n       })\n       df[\"lifetime_value\"] = (df[\"total_purchases\"] * df[\"avg_order_value\"]).round(2)\n      \n   elif dataset_type == \"timeseries\":\n       dates = pd.date_range(\"2022-01-01\", \"2024-01-01\", freq=\"D\")\n       n = len(dates)\n       trend = np.linspace(100, 200, n)\n       seasonal = 30 * np.sin(np.linspace(0, 6 * np.pi, n))\n       noise = np.random.normal(0, 10, n)\n      \n       df = pd.DataFrame({\n           \"date\": dates.astype(str),\n           \"value\": (trend + seasonal + noise).round(2),\n           \"volume\": np.random.randint(1000, 10000, n),\n           \"category\": np.random.choice([\"A\", \"B\", \"C\"], n)\n       })\n      \n   elif dataset_type == \"survey\":\n       n = 200\n       df = pd.DataFrame({\n           \"respondent_id\": range(1, n + 1),\n           \"age_group\": np.random.choice([\"18-24\", \"25-34\", \"35-44\", \"45-54\", \"55+\"], n),\n           \"q1_satisfaction\": np.random.randint(1, 6, n),\n           \"q2_likelihood_recommend\": np.random.randint(0, 11, n),\n           \"q3_ease_of_use\": np.random.randint(1, 6, n),\n           \"q4_value_for_money\": np.random.randint(1, 6, n),\n           \"q5_support_quality\": np.random.randint(1, 6, n),\n           \"response_time_mins\": np.random.exponential(10, n).round(1)\n       })\n   else:\n       return {\"status\": \"error\", \"message\": f\"Unknown dataset type: {dataset_type}. Use: sales, customers, timeseries, survey\"}\n  \n   result = DATA_STORE.add_dataset(dataset_name, df, source=f\"sample_{dataset_type}\")\n  \n   datasets = tool_context.state.get(\"loaded_datasets\", [])\n   if dataset_name not in datasets:\n       datasets.append(dataset_name)\n   tool_context.state[\"loaded_datasets\"] = datasets\n   tool_context.state[\"active_dataset\"] = dataset_name\n  \n   return make_serializable({\n       \"status\": \"success\",\n       \"message\": result,\n       \"description\": f\"Created sample {dataset_type} dataset\",\n       \"columns\": list(df.columns),\n       \"shape\": [int(df.shape[0]), int(df.shape[1])],\n       \"sample\": df.head(3).to_dict(orient=\"records\")\n   })\n\n\n\n\ndef list_available_datasets(tool_context: ToolContext) -&gt; dict:\n   print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4cb.png\" alt=\"\ud83d\udccb\" class=\"wp-smiley\" \/> Listing datasets\")\n  \n   datasets = DATA_STORE.list_datasets()\n  \n   if not datasets:\n       return {\"status\": \"info\", \"message\": \"No datasets loaded. Use create_sample_dataset or load_csv.\"}\n  \n   info = {}\n   for name in datasets:\n       ds = DATA_STORE.datasets[name]\n       info[name] = {\n           \"rows\": int(ds[\"shape\"][0]),\n           \"columns\": int(ds[\"shape\"][1]),\n           \"column_names\": ds[\"columns\"]\n       }\n  \n   return make_serializable({\n       \"status\": \"success\",\n       \"datasets\": info,\n       \"active_dataset\": tool_context.state.get(\"active_dataset\")\n   })\n\n\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Data loading tools defined!\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define the core data loading capabilities of the system. We create functions to load CSV files, generate sample datasets for different use cases, and list the datasets currently available in memory. We use this part to make sure our pipeline always has structured data ready for downstream analysis and agent interaction.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">def describe_dataset(dataset_name: str, tool_context: ToolContext) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ca.png\" alt=\"\ud83d\udcca\" class=\"wp-smiley\" \/> Describing dataset: {dataset_name}\")\n  \n   df = DATA_STORE.get_dataset(dataset_name)\n   if df is None:\n       return {\"status\": \"error\", \"message\": f\"Dataset '{dataset_name}' not found\"}\n  \n   numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()\n   categorical_cols = df.select_dtypes(include=['object', 'category']).columns.tolist()\n  \n   result = {\n       \"status\": \"success\",\n       \"dataset\": dataset_name,\n       \"overview\": {\n           \"total_rows\": int(len(df)),\n           \"total_columns\": int(len(df.columns)),\n           \"numeric_columns\": numeric_cols,\n           \"categorical_columns\": categorical_cols,\n           \"memory_mb\": round(float(df.memory_usage(deep=True).sum() \/ 1024 \/ 1024), 2),\n           \"duplicate_rows\": int(df.duplicated().sum()),\n           \"missing_total\": int(df.isnull().sum().sum())\n       }\n   }\n  \n   if numeric_cols:\n       stats_dict = {}\n       for col in numeric_cols:\n           col_data = df[col].dropna()\n           if len(col_data) &gt; 0:\n               stats_dict[col] = {\n                   \"count\": int(len(col_data)),\n                   \"mean\": round(float(col_data.mean()), 3),\n                   \"std\": round(float(col_data.std()), 3),\n                   \"min\": round(float(col_data.min()), 3),\n                   \"25%\": round(float(col_data.quantile(0.25)), 3),\n                   \"50%\": round(float(col_data.median()), 3),\n                   \"75%\": round(float(col_data.quantile(0.75)), 3),\n                   \"max\": round(float(col_data.max()), 3),\n                   \"skewness\": round(float(col_data.skew()), 3),\n                   \"missing\": int(df[col].isnull().sum())\n               }\n       result[\"numeric_summary\"] = stats_dict\n  \n   if categorical_cols:\n       cat_dict = {}\n       for col in categorical_cols[:10]:\n           vc = df[col].value_counts()\n           cat_dict[col] = {\n               \"unique_values\": int(df[col].nunique()),\n               \"top_values\": {str(k): int(v) for k, v in vc.head(5).items()},\n               \"missing\": int(df[col].isnull().sum())\n           }\n       result[\"categorical_summary\"] = cat_dict\n  \n   DATA_STORE.log_analysis(\"describe\", dataset_name, \"Statistics generated\")\n   return make_serializable(result)\n\n\n\n\ndef correlation_analysis(dataset_name: str, method: str = \"pearson\", tool_context: ToolContext = None) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ca.png\" alt=\"\ud83d\udcca\" class=\"wp-smiley\" \/> Correlation analysis: {dataset_name} ({method})\")\n  \n   df = DATA_STORE.get_dataset(dataset_name)\n   if df is None:\n       return {\"status\": \"error\", \"message\": f\"Dataset '{dataset_name}' not found\"}\n  \n   numeric_df = df.select_dtypes(include=[np.number])\n  \n   if numeric_df.shape[1] &lt; 2:\n       return {\"status\": \"error\", \"message\": \"Need at least 2 numeric columns\"}\n  \n   corr_matrix = numeric_df.corr(method=method)\n  \n   strong_corrs = []\n   for i in range(len(corr_matrix.columns)):\n       for j in range(i + 1, len(corr_matrix.columns)):\n           col1, col2 = corr_matrix.columns[i], corr_matrix.columns[j]\n           val = corr_matrix.iloc[i, j]\n           if abs(val) &gt; 0.5:\n               strong_corrs.append({\n                   \"var1\": col1,\n                   \"var2\": col2,\n                   \"correlation\": round(float(val), 3),\n                   \"strength\": \"strong\" if abs(val) &gt; 0.7 else \"moderate\"\n               })\n  \n   strong_corrs.sort(key=lambda x: abs(x[\"correlation\"]), reverse=True)\n  \n   corr_dict = {}\n   for col in corr_matrix.columns:\n       corr_dict[col] = {k: round(float(v), 3) for k, v in corr_matrix[col].items()}\n  \n   DATA_STORE.log_analysis(\"correlation\", dataset_name, f\"{method} correlation\")\n  \n   return make_serializable({\n       \"status\": \"success\",\n       \"method\": method,\n       \"correlation_matrix\": corr_dict,\n       \"strong_correlations\": strong_corrs[:10],\n       \"insight\": f\"Found {len(strong_corrs)} pairs with |correlation| &gt; 0.5\"\n   })\n\n\n\n\ndef hypothesis_test(dataset_name: str, test_type: str, column1: str,\n                  column2: str = None, group_column: str = None,\n                  tool_context: ToolContext = None) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ca.png\" alt=\"\ud83d\udcca\" class=\"wp-smiley\" \/> Hypothesis test: {test_type} on {dataset_name}\")\n  \n   df = DATA_STORE.get_dataset(dataset_name)\n   if df is None:\n       return {\"status\": \"error\", \"message\": f\"Dataset '{dataset_name}' not found\"}\n  \n   if column1 not in df.columns:\n       return {\"status\": \"error\", \"message\": f\"Column '{column1}' not found\"}\n  \n   try:\n       if test_type == \"normality\":\n           data = df[column1].dropna()\n           if len(data) &gt; 5000:\n               data = data.sample(5000)\n           stat, p = stats.shapiro(data)\n          \n           return make_serializable({\n               \"status\": \"success\",\n               \"test\": \"Shapiro-Wilk Normality Test\",\n               \"column\": column1,\n               \"statistic\": round(float(stat), 4),\n               \"p_value\": round(float(p), 6),\n               \"is_normal\": bool(p &gt; 0.05),\n               \"interpretation\": \"Data appears normally distributed\" if p &gt; 0.05 else \"Data is NOT normally distributed\"\n           })\n          \n       elif test_type == \"ttest\":\n           if group_column is None:\n               return {\"status\": \"error\", \"message\": \"group_column required for t-test\"}\n          \n           groups = df[group_column].dropna().unique()\n           if len(groups) != 2:\n               return {\"status\": \"error\", \"message\": f\"T-test needs exactly 2 groups, found {len(groups)}: {list(groups)}\"}\n          \n           g1 = df[df[group_column] == groups[0]][column1].dropna()\n           g2 = df[df[group_column] == groups[1]][column1].dropna()\n          \n           stat, p = stats.ttest_ind(g1, g2)\n          \n           return make_serializable({\n               \"status\": \"success\",\n               \"test\": \"Independent Samples T-Test\",\n               \"comparing\": column1,\n               \"group1\": {\"name\": str(groups[0]), \"mean\": round(float(g1.mean()), 3), \"n\": int(len(g1))},\n               \"group2\": {\"name\": str(groups[1]), \"mean\": round(float(g2.mean()), 3), \"n\": int(len(g2))},\n               \"t_statistic\": round(float(stat), 4),\n               \"p_value\": round(float(p), 6),\n               \"significant\": bool(p &lt; 0.05),\n               \"interpretation\": \"Significant difference\" if p &lt; 0.05 else \"No significant difference\"\n           })\n          \n       elif test_type == \"anova\":\n           if group_column is None:\n               return {\"status\": \"error\", \"message\": \"group_column required for ANOVA\"}\n          \n           groups_data = [grp[column1].dropna().values for _, grp in df.groupby(group_column)]\n           group_names = list(df[group_column].unique())\n          \n           stat, p = stats.f_oneway(*groups_data)\n          \n           group_stats = []\n           for name in group_names:\n               grp_data = df[df[group_column] == name][column1].dropna()\n               group_stats.append({\n                   \"group\": str(name),\n                   \"mean\": round(float(grp_data.mean()), 3),\n                   \"std\": round(float(grp_data.std()), 3),\n                   \"n\": int(len(grp_data))\n               })\n          \n           return make_serializable({\n               \"status\": \"success\",\n               \"test\": \"One-Way ANOVA\",\n               \"comparing\": column1,\n               \"across\": group_column,\n               \"n_groups\": int(len(group_names)),\n               \"group_statistics\": group_stats,\n               \"f_statistic\": round(float(stat), 4),\n               \"p_value\": round(float(p), 6),\n               \"significant\": bool(p &lt; 0.05),\n               \"interpretation\": \"Significant differences among groups\" if p &lt; 0.05 else \"No significant differences\"\n           })\n          \n       elif test_type == \"chi2\":\n           if column2 is None:\n               return {\"status\": \"error\", \"message\": \"column2 required for chi-square test\"}\n          \n           contingency = pd.crosstab(df[column1], df[column2])\n           chi2, p, dof, _ = stats.chi2_contingency(contingency)\n          \n           return make_serializable({\n               \"status\": \"success\",\n               \"test\": \"Chi-Square Test of Independence\",\n               \"variables\": [column1, column2],\n               \"chi2_statistic\": round(float(chi2), 4),\n               \"p_value\": round(float(p), 6),\n               \"degrees_of_freedom\": int(dof),\n               \"significant\": bool(p &lt; 0.05),\n               \"interpretation\": \"Variables are dependent\" if p &lt; 0.05 else \"Variables are independent\"\n           })\n          \n       else:\n           return {\"status\": \"error\", \"message\": f\"Unknown test: {test_type}. Use: normality, ttest, anova, chi2\"}\n          \n   except Exception as e:\n       return {\"status\": \"error\", \"message\": f\"Test failed: {str(e)}\"}\n\n\n\n\ndef outlier_detection(dataset_name: str, column: str, method: str = \"iqr\",\n                     tool_context: ToolContext = None) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ca.png\" alt=\"\ud83d\udcca\" class=\"wp-smiley\" \/> Outlier detection: {column} in {dataset_name}\")\n  \n   df = DATA_STORE.get_dataset(dataset_name)\n   if df is None:\n       return {\"status\": \"error\", \"message\": f\"Dataset '{dataset_name}' not found\"}\n  \n   if column not in df.columns:\n       return {\"status\": \"error\", \"message\": f\"Column '{column}' not found\"}\n  \n   data = df[column].dropna()\n  \n   if method == \"iqr\":\n       Q1 = float(data.quantile(0.25))\n       Q3 = float(data.quantile(0.75))\n       IQR = Q3 - Q1\n       lower = Q1 - 1.5 * IQR\n       upper = Q3 + 1.5 * IQR\n       outliers = data[(data &lt; lower) | (data &gt; upper)]\n      \n       return make_serializable({\n           \"status\": \"success\",\n           \"method\": \"IQR (Interquartile Range)\",\n           \"column\": column,\n           \"bounds\": {\"lower\": round(lower, 3), \"upper\": round(upper, 3)},\n           \"iqr\": round(IQR, 3),\n           \"total_values\": int(len(data)),\n           \"outlier_count\": int(len(outliers)),\n           \"outlier_pct\": round(float(len(outliers) \/ len(data) * 100), 2),\n           \"outlier_examples\": [round(float(x), 2) for x in outliers.head(10).tolist()]\n       })\n      \n   elif method == \"zscore\":\n       z = np.abs(stats.zscore(data))\n       outliers = data[z &gt; 3]\n      \n       return make_serializable({\n           \"status\": \"success\",\n           \"method\": \"Z-Score (threshold: 3)\",\n           \"column\": column,\n           \"total_values\": int(len(data)),\n           \"outlier_count\": int(len(outliers)),\n           \"outlier_pct\": round(float(len(outliers) \/ len(data) * 100), 2),\n           \"outlier_examples\": [round(float(x), 2) for x in outliers.head(10).tolist()]\n       })\n  \n   return {\"status\": \"error\", \"message\": f\"Unknown method: {method}. Use: iqr, zscore\"}\n\n\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Statistical analysis tools defined!\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We build the statistical analysis layer of the tutorial. We create functions to describe datasets, calculate correlations, run hypothesis tests, and detect outliers using standard analytical methods. We use these tools to turn raw tabular data into meaningful statistical insights that the agents can interpret and explain.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">def create_visualization(dataset_name: str, chart_type: str, x_column: str,\n                        y_column: str = None, color_column: str = None,\n                        title: str = None, tool_context: ToolContext = None) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4c8.png\" alt=\"\ud83d\udcc8\" class=\"wp-smiley\" \/> Creating {chart_type}: {x_column}\" + (f\" vs {y_column}\" if y_column else \"\"))\n  \n   df = DATA_STORE.get_dataset(dataset_name)\n   if df is None:\n       return {\"status\": \"error\", \"message\": f\"Dataset '{dataset_name}' not found\"}\n  \n   if x_column not in df.columns:\n       return {\"status\": \"error\", \"message\": f\"Column '{x_column}' not found\"}\n  \n   try:\n       fig, ax = plt.subplots(figsize=(10, 6))\n       chart_title = title or f\"{chart_type.title()}: {x_column}\" + (f\" vs {y_column}\" if y_column else \"\")\n      \n       if chart_type == \"histogram\":\n           if color_column and color_column in df.columns:\n               for grp in df[color_column].unique():\n                   subset = df[df[color_column] == grp][x_column].dropna()\n                   ax.hist(subset, alpha=0.6, label=str(grp), bins=30)\n               ax.legend()\n           else:\n               ax.hist(df[x_column].dropna(), bins=30, edgecolor='black', alpha=0.7, color='steelblue')\n           ax.set_xlabel(x_column)\n           ax.set_ylabel(\"Frequency\")\n          \n       elif chart_type == \"scatter\":\n           if not y_column:\n               return {\"status\": \"error\", \"message\": \"y_column required for scatter\"}\n           if color_column and color_column in df.columns:\n               for grp in df[color_column].unique():\n                   subset = df[df[color_column] == grp]\n                   ax.scatter(subset[x_column], subset[y_column], alpha=0.6, label=str(grp), s=50)\n               ax.legend()\n           else:\n               ax.scatter(df[x_column], df[y_column], alpha=0.6, s=50, color='steelblue')\n           ax.set_xlabel(x_column)\n           ax.set_ylabel(y_column)\n          \n       elif chart_type == \"bar\":\n           if y_column:\n               data = df.groupby(x_column)[y_column].sum().sort_values(ascending=False)\n           else:\n               data = df[x_column].value_counts()\n          \n           colors = plt.cm.Blues(np.linspace(0.4, 0.8, len(data)))\n           bars = ax.bar(range(len(data)), data.values, color=colors)\n           ax.set_xticks(range(len(data)))\n           ax.set_xticklabels([str(x) for x in data.index], rotation=45, ha='right')\n           ax.set_xlabel(x_column)\n           ax.set_ylabel(y_column if y_column else \"Count\")\n          \n           for bar, val in zip(bars, data.values):\n               ax.text(bar.get_x() + bar.get_width()\/2, bar.get_height() + 0.01 * max(data.values),\n                      f'{val:,.0f}', ha='center', va='bottom', fontsize=9)\n          \n       elif chart_type == \"line\":\n           if not y_column:\n               return {\"status\": \"error\", \"message\": \"y_column required for line\"}\n           df_sorted = df.sort_values(x_column)\n           if color_column and color_column in df.columns:\n               for grp in df_sorted[color_column].unique():\n                   subset = df_sorted[df_sorted[color_column] == grp]\n                   ax.plot(subset[x_column], subset[y_column], label=str(grp), marker='o', markersize=3)\n               ax.legend()\n           else:\n               ax.plot(df_sorted[x_column], df_sorted[y_column], marker='o', markersize=3, color='steelblue')\n           ax.set_xlabel(x_column)\n           ax.set_ylabel(y_column)\n           plt.xticks(rotation=45)\n          \n       elif chart_type == \"box\":\n           if color_column and color_column in df.columns:\n               groups = df[color_column].unique()\n               data_groups = [df[df[color_column] == g][x_column].dropna() for g in groups]\n               bp = ax.boxplot(data_groups, labels=[str(g) for g in groups], patch_artist=True)\n               colors = plt.cm.Blues(np.linspace(0.4, 0.8, len(groups)))\n               for patch, color in zip(bp['boxes'], colors):\n                   patch.set_facecolor(color)\n               ax.set_xlabel(color_column)\n           else:\n               bp = ax.boxplot(df[x_column].dropna(), patch_artist=True)\n               bp['boxes'][0].set_facecolor('steelblue')\n           ax.set_ylabel(x_column)\n          \n       elif chart_type == \"heatmap\":\n           numeric_df = df.select_dtypes(include=[np.number])\n           corr = numeric_df.corr()\n           im = ax.imshow(corr, cmap='RdBu_r', aspect='auto', vmin=-1, vmax=1)\n           ax.set_xticks(range(len(corr.columns)))\n           ax.set_yticks(range(len(corr.columns)))\n           ax.set_xticklabels(corr.columns, rotation=45, ha='right')\n           ax.set_yticklabels(corr.columns)\n          \n           for i in range(len(corr)):\n               for j in range(len(corr)):\n                   val = corr.iloc[i, j]\n                   color = 'white' if abs(val) &gt; 0.5 else 'black'\n                   ax.text(j, i, f'{val:.2f}', ha='center', va='center', color=color, fontsize=8)\n          \n           plt.colorbar(im, ax=ax, label='Correlation')\n          \n       elif chart_type == \"pie\":\n           data = df[x_column].value_counts()\n           colors = plt.cm.Blues(np.linspace(0.3, 0.9, len(data)))\n           wedges, texts, autotexts = ax.pie(data.values, labels=data.index, autopct='%1.1f%%',\n                                              colors=colors, startangle=90)\n           ax.axis('equal')\n          \n       else:\n           return {\"status\": \"error\", \"message\": f\"Unknown chart: {chart_type}. Use: histogram, scatter, bar, line, box, heatmap, pie\"}\n      \n       ax.set_title(chart_title, fontsize=12, fontweight='bold')\n       plt.tight_layout()\n       plt.show()\n       plt.close()\n      \n       return make_serializable({\n           \"status\": \"success\",\n           \"chart_type\": chart_type,\n           \"title\": chart_title,\n           \"message\": \"Chart displayed successfully\"\n       })\n      \n   except Exception as e:\n       plt.close()\n       return {\"status\": \"error\", \"message\": f\"Visualization failed: {str(e)}\"}\n\n\n\n\ndef create_distribution_report(dataset_name: str, column: str, tool_context: ToolContext = None) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4c8.png\" alt=\"\ud83d\udcc8\" class=\"wp-smiley\" \/> Distribution report: {column} in {dataset_name}\")\n  \n   df = DATA_STORE.get_dataset(dataset_name)\n   if df is None:\n       return {\"status\": \"error\", \"message\": f\"Dataset '{dataset_name}' not found\"}\n  \n   if column not in df.columns:\n       return {\"status\": \"error\", \"message\": f\"Column '{column}' not found\"}\n  \n   data = df[column].dropna()\n  \n   fig, axes = plt.subplots(2, 2, figsize=(12, 10))\n   fig.suptitle(f'Distribution Analysis: {column}', fontsize=14, fontweight='bold')\n  \n   axes[0, 0].hist(data, bins=30, density=True, alpha=0.7, color='steelblue', edgecolor='black')\n   data.plot.kde(ax=axes[0, 0], color='red', linewidth=2)\n   axes[0, 0].set_title('Histogram with KDE')\n   axes[0, 0].set_xlabel(column)\n   axes[0, 0].set_ylabel('Density')\n  \n   bp = axes[0, 1].boxplot(data, vert=True, patch_artist=True)\n   bp['boxes'][0].set_facecolor('steelblue')\n   axes[0, 1].set_title('Box Plot')\n   axes[0, 1].set_ylabel(column)\n  \n   stats.probplot(data, dist=\"norm\", plot=axes[1, 0])\n   axes[1, 0].set_title('Q-Q Plot (Normality)')\n  \n   vp = axes[1, 1].violinplot(data, vert=True, showmeans=True, showmedians=True)\n   vp['bodies'][0].set_facecolor('steelblue')\n   axes[1, 1].set_title('Violin Plot')\n   axes[1, 1].set_ylabel(column)\n  \n   plt.tight_layout()\n   plt.show()\n   plt.close()\n  \n   skew_val = float(data.skew())\n   shape = \"approximately symmetric\" if abs(skew_val) &lt; 0.5 else (\"right-skewed\" if skew_val &gt; 0 else \"left-skewed\")\n  \n   return make_serializable({\n       \"status\": \"success\",\n       \"column\": column,\n       \"statistics\": {\n           \"count\": int(len(data)),\n           \"mean\": round(float(data.mean()), 3),\n           \"median\": round(float(data.median()), 3),\n           \"std\": round(float(data.std()), 3),\n           \"skewness\": round(skew_val, 3),\n           \"kurtosis\": round(float(data.kurtosis()), 3),\n           \"min\": round(float(data.min()), 3),\n           \"max\": round(float(data.max()), 3)\n       },\n       \"distribution_shape\": shape,\n       \"message\": \"4 distribution plots displayed\"\n   })\n\n\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Visualization tools defined!\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We create the visualization capabilities of the pipeline. We define flexible charting functions for common plot types and also build a richer distribution report that shows multiple views of a variable at once. We use this part to visually explore patterns, relationships, spread, and potential issues in the data.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">def filter_data(dataset_name: str, condition: str, new_dataset_name: str, tool_context: ToolContext) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f504.png\" alt=\"\ud83d\udd04\" class=\"wp-smiley\" \/> Filtering {dataset_name}: {condition}\")\n  \n   df = DATA_STORE.get_dataset(dataset_name)\n   if df is None:\n       return {\"status\": \"error\", \"message\": f\"Dataset '{dataset_name}' not found\"}\n  \n   try:\n       filtered = df.query(condition)\n       DATA_STORE.add_dataset(new_dataset_name, filtered, source=f\"filtered:{dataset_name}\")\n      \n       datasets = tool_context.state.get(\"loaded_datasets\", [])\n       if new_dataset_name not in datasets:\n           datasets.append(new_dataset_name)\n       tool_context.state[\"loaded_datasets\"] = datasets\n      \n       return make_serializable({\n           \"status\": \"success\",\n           \"original_rows\": int(len(df)),\n           \"filtered_rows\": int(len(filtered)),\n           \"rows_removed\": int(len(df) - len(filtered)),\n           \"new_dataset\": new_dataset_name\n       })\n   except Exception as e:\n       return {\"status\": \"error\", \"message\": f\"Filter failed: {str(e)}\"}\n\n\n\n\ndef aggregate_data(dataset_name: str, group_by: str, aggregations: str,\n                  new_dataset_name: str, tool_context: ToolContext) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f504.png\" alt=\"\ud83d\udd04\" class=\"wp-smiley\" \/> Aggregating {dataset_name} by {group_by}\")\n  \n   df = DATA_STORE.get_dataset(dataset_name)\n   if df is None:\n       return {\"status\": \"error\", \"message\": f\"Dataset '{dataset_name}' not found\"}\n  \n   try:\n       group_cols = [c.strip() for c in group_by.split(\",\")]\n      \n       agg_dict = {}\n       for agg in aggregations.split(\",\"):\n           col, func = agg.strip().split(\":\")\n           agg_dict[col.strip()] = func.strip()\n      \n       result_df = df.groupby(group_cols).agg(agg_dict).reset_index()\n      \n       new_cols = list(group_cols) + [f\"{col}_{func}\" for col, func in agg_dict.items()]\n       result_df.columns = new_cols\n      \n       DATA_STORE.add_dataset(new_dataset_name, result_df, source=f\"aggregated:{dataset_name}\")\n      \n       datasets = tool_context.state.get(\"loaded_datasets\", [])\n       if new_dataset_name not in datasets:\n           datasets.append(new_dataset_name)\n       tool_context.state[\"loaded_datasets\"] = datasets\n      \n       return make_serializable({\n           \"status\": \"success\",\n           \"grouped_by\": group_cols,\n           \"aggregations\": agg_dict,\n           \"result_rows\": int(len(result_df)),\n           \"columns\": list(result_df.columns),\n           \"new_dataset\": new_dataset_name,\n           \"preview\": result_df.head(5).to_dict(orient=\"records\")\n       })\n   except Exception as e:\n       return {\"status\": \"error\", \"message\": f\"Aggregation failed: {str(e)}\"}\n\n\n\n\ndef add_calculated_column(dataset_name: str, new_column: str, expression: str,\n                         tool_context: ToolContext) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f504.png\" alt=\"\ud83d\udd04\" class=\"wp-smiley\" \/> Adding column '{new_column}' to {dataset_name}\")\n  \n   df = DATA_STORE.get_dataset(dataset_name)\n   if df is None:\n       return {\"status\": \"error\", \"message\": f\"Dataset '{dataset_name}' not found\"}\n  \n   try:\n       df_copy = df.copy()\n       df_copy[new_column] = df_copy.eval(expression)\n      \n       DATA_STORE.datasets[dataset_name][\"data\"] = df_copy\n       DATA_STORE.datasets[dataset_name][\"columns\"] = list(df_copy.columns)\n      \n       sample_vals = df_copy[new_column].head(5)\n      \n       return make_serializable({\n           \"status\": \"success\",\n           \"new_column\": new_column,\n           \"expression\": expression,\n           \"sample_values\": [round(float(x), 3) if pd.notna(x) else None for x in sample_vals]\n       })\n   except Exception as e:\n       return {\"status\": \"error\", \"message\": f\"Calculation failed: {str(e)}\"}\n\n\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Transformation tools defined!\")\n\n\ndef generate_summary_report(dataset_name: str, tool_context: ToolContext) -&gt; dict:\n   print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4c4.png\" alt=\"\ud83d\udcc4\" class=\"wp-smiley\" \/> Generating report: {dataset_name}\")\n  \n   df = DATA_STORE.get_dataset(dataset_name)\n   if df is None:\n       return {\"status\": \"error\", \"message\": f\"Dataset '{dataset_name}' not found\"}\n  \n   numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()\n   cat_cols = df.select_dtypes(include=['object', 'category']).columns.tolist()\n  \n   report = {\n       \"dataset\": dataset_name,\n       \"generated_at\": datetime.now().isoformat(),\n       \"overview\": {\n           \"rows\": int(len(df)),\n           \"columns\": int(len(df.columns)),\n           \"numeric_cols\": int(len(numeric_cols)),\n           \"categorical_cols\": int(len(cat_cols)),\n           \"memory_mb\": round(float(df.memory_usage(deep=True).sum() \/ 1024 \/ 1024), 2),\n           \"duplicates\": int(df.duplicated().sum()),\n           \"complete_rows_pct\": round(float((len(df) - df.isnull().any(axis=1).sum()) \/ len(df) * 100), 1)\n       },\n       \"data_quality\": {\n           \"total_missing\": int(df.isnull().sum().sum()),\n           \"missing_pct\": round(float(df.isnull().sum().sum() \/ (len(df) * len(df.columns)) * 100), 2),\n           \"columns_with_missing\": [col for col in df.columns if df[col].isnull().sum() &gt; 0]\n       }\n   }\n  \n   if numeric_cols:\n       insights = {}\n       for col in numeric_cols[:8]:\n           data = df[col].dropna()\n           if len(data) &gt; 0:\n               insights[col] = {\n                   \"mean\": round(float(data.mean()), 2),\n                   \"median\": round(float(data.median()), 2),\n                   \"std\": round(float(data.std()), 2),\n                   \"range\": [round(float(data.min()), 2), round(float(data.max()), 2)]\n               }\n       report[\"numeric_insights\"] = insights\n  \n   if cat_cols:\n       cat_insights = {}\n       for col in cat_cols[:5]:\n           cat_insights[col] = {\n               \"unique\": int(df[col].nunique()),\n               \"top_3\": {str(k): int(v) for k, v in df[col].value_counts().head(3).items()}\n           }\n       report[\"categorical_insights\"] = cat_insights\n  \n   findings = []\n   if report[\"data_quality\"][\"missing_pct\"] &gt; 5:\n       findings.append(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/26a0.png\" alt=\"\u26a0\" class=\"wp-smiley\" \/> {report['data_quality']['missing_pct']}% missing data\")\n   if report[\"overview\"][\"duplicates\"] &gt; 0:\n       findings.append(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/26a0.png\" alt=\"\u26a0\" class=\"wp-smiley\" \/> {report['overview']['duplicates']} duplicate rows\")\n   if not findings:\n       findings.append(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Data quality looks good\")\n   report[\"key_findings\"] = findings\n  \n   return make_serializable({\"status\": \"success\", \"report\": report})\n\n\n\n\ndef get_analysis_history(tool_context: ToolContext) -&gt; dict:\n   history = DATA_STORE.analysis_history\n   if not history:\n       return {\"status\": \"info\", \"message\": \"No analyses performed yet\"}\n   return make_serializable({\"status\": \"success\", \"count\": int(len(history)), \"history\": history[-15:]})\n\n\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Reporting tools defined!\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We focus on transforming data and generating structured reports. We create functions to filter rows, aggregate values, add calculated columns, and summarize the dataset into a clear report with quality indicators and key findings. We use these steps to reshape the data and prepare concise outputs that support deeper analysis and decision-making.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">data_loader_agent = Agent(\n   name=\"data_loader\",\n   model=LiteLlm(model=MODEL),\n   description=\"Loads CSV files, creates sample datasets (sales, customers, timeseries, survey)\",\n   instruction=\"\"\"You load data into the analysis pipeline.\n  \nTOOLS:\n- create_sample_dataset: Create test data. Types: 'sales', 'customers', 'timeseries', 'survey'\n- load_csv: Load from file path or URL\n- list_available_datasets: Show what's loaded\n\n\nAlways use clear dataset names like 'sales_data', 'customer_analysis'.\"\"\",\n   tools=[load_csv, create_sample_dataset, list_available_datasets]\n)\n\n\nstats_agent = Agent(\n   name=\"statistician\",\n   model=LiteLlm(model=MODEL),\n   description=\"Statistical analysis: descriptive stats, correlations, hypothesis tests, outliers\",\n   instruction=\"\"\"You perform statistical analysis.\n\n\nTOOLS:\n- describe_dataset: Full descriptive statistics\n- correlation_analysis: Correlation matrix (pearson\/spearman)\n- hypothesis_test: Tests (normality, ttest, anova, chi2)\n- outlier_detection: Find outliers (iqr\/zscore)\n\n\nExplain results in plain language alongside statistics.\"\"\",\n   tools=[describe_dataset, correlation_analysis, hypothesis_test, outlier_detection]\n)\n\n\nviz_agent = Agent(\n   name=\"visualizer\",\n   model=LiteLlm(model=MODEL),\n   description=\"Creates charts: histogram, scatter, bar, line, box, heatmap, pie\",\n   instruction=\"\"\"You create visualizations.\n\n\nTOOLS:\n- create_visualization: Charts (histogram, scatter, bar, line, box, heatmap, pie)\n- create_distribution_report: 4-plot distribution analysis\n\n\nGUIDE:\n- Single variable distribution \u2192 histogram or box\n- Two numeric variables \u2192 scatter\n- Category comparison \u2192 bar\n- Time trends \u2192 line\n- Correlations overview \u2192 heatmap\"\"\",\n   tools=[create_visualization, create_distribution_report]\n)\n\n\ntransform_agent = Agent(\n   name=\"transformer\",\n   model=LiteLlm(model=MODEL),\n   description=\"Data transformation: filter, aggregate, calculate columns\",\n   instruction=\"\"\"You transform data.\n\n\nTOOLS:\n- filter_data: Filter rows (e.g., condition='age &gt; 30')\n- aggregate_data: Group &amp; aggregate (e.g., group_by='region', aggregations='revenue:sum,profit:mean')\n- add_calculated_column: New columns (e.g., expression='revenue * 0.1')\n\n\nAlways create new dataset names - don't overwrite originals.\"\"\",\n   tools=[filter_data, aggregate_data, add_calculated_column]\n)\n\n\nreport_agent = Agent(\n   name=\"reporter\",\n   model=LiteLlm(model=MODEL),\n   description=\"Generates summary reports and tracks analysis history\",\n   instruction=\"\"\"You create reports.\n\n\nTOOLS:\n- generate_summary_report: Comprehensive dataset summary\n- get_analysis_history: View all analyses performed\"\"\",\n   tools=[generate_summary_report, get_analysis_history]\n)\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Specialist agents created!\")\n\n\nmaster_analyst = Agent(\n   name=\"data_analyst\",\n   model=LiteLlm(model=MODEL),\n   description=\"Master Data Analyst orchestrating end-to-end data analysis\",\n   instruction=\"\"\"You are an expert Data Analyst with a team of specialists.\n\n\nYOUR TEAM:\n1. data_loader - Load\/create datasets\n2. statistician - Statistical analysis\n3. visualizer - Charts and plots\n4. transformer - Data transformations\n5. reporter - Reports and summaries\n\n\nWORKFLOW:\n1. Load data \u2192 2. Describe \u2192 3. Visualize \u2192 4. Analyze \u2192 5. Transform if needed \u2192 6. Report\n\n\nBe helpful, explain insights clearly, suggest next steps.\"\"\",\n   sub_agents=[data_loader_agent, stats_agent, viz_agent, transform_agent, report_agent]\n)\n\n\nprint(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Master Analyst ready with {len(master_analyst.sub_agents)} specialists!\")\n\n\nsession_service = InMemorySessionService()\nAPP_NAME = \"data_pipeline\"\nUSER_ID = \"analyst\"\nSESSION_ID = \"session_001\"\n\n\nasync def init():\n   return await session_service.create_session(\n       app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID,\n       state={\"loaded_datasets\": [], \"active_dataset\": None}\n   )\n\n\nsession = await init()\n\n\nrunner = Runner(agent=master_analyst, app_name=APP_NAME, session_service=session_service)\n\n\nasync def analyze(query: str):\n   print(f\"n{'='*70}n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f464.png\" alt=\"\ud83d\udc64\" class=\"wp-smiley\" \/> You: {query}n{'='*70}\")\n  \n   content = types.Content(role='user', parts=[types.Part(text=query)])\n   response = \"\"\n  \n   try:\n       async for event in runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content):\n           if event.is_final_response() and event.content and event.content.parts:\n               response = event.content.parts[0].text\n               break\n   except Exception as e:\n       response = f\"Error: {str(e)}\"\n  \n   print(f\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f916.png\" alt=\"\ud83e\udd16\" class=\"wp-smiley\" \/> Analyst: {response}n{'='*70}n\")\n\n\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> Ready! Use: await analyze('your question')\")\n\n\nprint(\"=\"*70 + \"n  <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f680.png\" alt=\"\ud83d\ude80\" class=\"wp-smiley\" \/> DATA ANALYSIS DEMOn\" + \"=\"*70)\n\n\nawait analyze(\"Create a sales dataset for analysis\")\n\n\nawait analyze(\"Describe the sales_data - what columns and statistics do we have?\")\n\n\nawait analyze(\"Create a histogram of revenue\")\n\n\nawait analyze(\"Show a bar chart of total revenue by region\")\n\n\nawait analyze(\"What's the correlation between quantity, unit_price, and revenue?\")\n\n\nawait analyze(\"Is there a significant difference in revenue between customer types? Run ANOVA.\")\n\n\nawait analyze(\"Check for outliers in the revenue column\")\n\n\nawait analyze(\"Create a heatmap of correlations\")\n\n\nawait analyze(\"Generate a summary report\")\n\n\nprint(\"\"\"\n\u2554\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2557\n\u2551  <img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f3af.png\" alt=\"\ud83c\udfaf\" class=\"wp-smiley\" \/> INTERACTIVE DATA ANALYSIS                                       \u2551\n\u2560\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2563\n\u2551  Try these:                                                         \u2551\n\u2551                                                                     \u2551\n\u2551  await analyze(\"Create a customers dataset\")                        \u2551\n\u2551  await analyze(\"Show scatter plot of age vs income\")                \u2551\n\u2551  await analyze(\"Is income normally distributed?\")                   \u2551\n\u2551  await analyze(\"Compare income between education levels\")           \u2551\n\u2551  await analyze(\"Filter customers where age &gt; 40\")                   \u2551\n\u2551  await analyze(\"Calculate average lifetime_value by churn_risk\")    \u2551\n\u255a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u255d\n\"\"\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We assemble the full multi-agent system by creating specialist agents and connecting them under one master analyst. We initialize the session, define the async analysis function, and run a full demo that shows how the pipeline behaves in practice. We use this final section to orchestrate the entire workflow end-to-end, from dataset creation to analysis, visualization, and reporting.<\/p>\n<p>In conclusion, we created a complete and interactive data analysis framework that goes far beyond a basic notebook workflow. We combined data ingestion, descriptive analytics, hypothesis testing, outlier detection, chart generation, transformation operations, and reporting into one unified agent-driven pipeline. As we test the system with sample queries and analysis requests, we demonstrated how Google ADK can help us design a modular, extensible, and production-ready analytics assistant that supports clearer insights, faster iteration, and a more intelligent approach to data exploration.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the<strong><a href=\"https:\/\/arxiv.org\/pdf\/2604.06425\" target=\"_blank\" rel=\"noreferrer noopener\">\u00a0<\/a><a href=\"https:\/\/github.com\/Marktechpost\/AI-Agents-Projects-Tutorials\/blob\/main\/AI%20Agents%20Codes\/google_adk_multi_agent_data_analysis_pipeline_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Full Codes for Implementation here<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">130k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/13\/google-adk-multi-agent-pipeline-tutorial-data-loading-statistical-testing-visualization-and-report-generation-in-python\/\">Google ADK Multi-Agent Pipeline Tutorial: Data Loading, Statistical Testing, Visualization, and Report Generation in Python<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we build an &hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-719","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/719","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=719"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/719\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=719"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=719"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=719"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}