{"id":536,"date":"2026-03-10T14:35:45","date_gmt":"2026-03-10T06:35:45","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=536"},"modified":"2026-03-10T14:35:45","modified_gmt":"2026-03-10T06:35:45","slug":"how-to-build-a-risk-aware-ai-agent-with-internal-critic-self-consistency-reasoning-and-uncertainty-estimation-for-reliable-decision-making","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=536","title":{"rendered":"How to Build a Risk-Aware AI Agent with Internal Critic, Self-Consistency Reasoning, and Uncertainty Estimation for Reliable Decision-Making"},"content":{"rendered":"<p>In this tutorial, we build an advanced agent system that goes beyond simple response generation by integrating an internal critic and uncertainty estimation framework. We simulate multi-sample inference, evaluate candidate responses across accuracy, coherence, and safety dimensions, and quantify predictive uncertainty using entropy, variance, and consistency measures. We implement risk-sensitive selection strategies to balance confidence and uncertainty in decision-making. Through structured experiments and visualizations, we explore how self-consistent reasoning and uncertainty-aware selection improve reliability and robustness in agent behavior.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">import numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nfrom typing import List, Dict, Tuple, Optional\nfrom dataclasses import dataclass\nfrom collections import Counter\nimport warnings\nwarnings.filterwarnings('ignore')\n\n\nnp.random.seed(42)\n\n\nprint(\"=\" * 80)\nprint(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f9ea.png\" alt=\"\ud83e\uddea\" class=\"wp-smiley\" \/> AGENT WITH INTERNAL CRITIC + UNCERTAINTY ESTIMATION\")\nprint(\"=\" * 80)\nprint()\n\n\n@dataclass\nclass Response:\n   content: str\n   confidence: float\n   reasoning: str\n   token_logprobs: List[float]\n  \n   def __repr__(self):\n       return f\"Response(content='{self.content[:50]}...', confidence={self.confidence:.3f})\"\n\n\n@dataclass\nclass CriticScore:\n   accuracy_score: float\n   coherence_score: float\n   safety_score: float\n   overall_score: float\n   feedback: str\n  \n   def __repr__(self):\n       return f\"CriticScore(overall={self.overall_score:.3f})\"\n\n\n@dataclass\nclass UncertaintyEstimate:\n   entropy: float\n   variance: float\n   consistency_score: float\n   epistemic_uncertainty: float\n   aleatoric_uncertainty: float\n  \n   def risk_level(self) -&gt; str:\n       if self.entropy &lt; 0.5 and self.consistency_score &gt; 0.8:\n           return \"LOW\"\n       elif self.entropy &lt; 1.0 and self.consistency_score &gt; 0.5:\n           return \"MEDIUM\"\n       else:\n           return \"HIGH\"<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define the foundational data structures that power our agent system. We create structured containers for responses, critic scores, and uncertainty estimates using dataclasses. We establish reproducibility and initialize the core components that allow us to quantify risk and model confidence.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">class SimulatedLLM:\n  \n   def __init__(self, model_quality: float = 0.8):\n       self.model_quality = model_quality\n       self.response_templates = {\n           \"math\": [\n               \"The answer is {answer}. This is calculated by {reasoning}.\",\n               \"{answer} is the result when you {reasoning}.\",\n               \"By {reasoning}, we get {answer}.\",\n               \"The solution is {answer} because {reasoning}.\",\n           ],\n           \"factual\": [\n               \"The answer is {answer}. {reasoning}\",\n               \"Based on the facts, {answer}. {reasoning}\",\n               \"{answer} is correct. {reasoning}\",\n               \"The factual answer is {answer}. {reasoning}\",\n           ]\n       }\n  \n   def generate_response(self, prompt: str, temperature: float = 0.7) -&gt; Response:\n       noise = np.random.randn() * temperature\n       quality = np.clip(self.model_quality + noise * 0.2, 0.1, 1.0)\n      \n       if \"What is\" in prompt and \"+\" in prompt:\n           parts = prompt.split()\n           try:\n               num1 = int(parts[parts.index(\"is\") + 1])\n               num2 = int(parts[parts.index(\"+\") + 1].rstrip(\"?\"))\n               correct_answer = num1 + num2\n              \n               if np.random.rand() &gt; quality:\n                   answer = correct_answer + np.random.randint(-3, 4)\n               else:\n                   answer = correct_answer\n              \n               template = np.random.choice(self.response_templates[\"math\"])\n               reasoning = f\"adding {num1} and {num2}\"\n               content = template.format(answer=answer, reasoning=reasoning)\n              \n               token_logprobs = list(np.random.randn(10) - (1 - quality) * 2)\n               confidence = quality + np.random.randn() * 0.1\n               confidence = np.clip(confidence, 0.1, 0.99)\n              \n               return Response(\n                   content=content,\n                   confidence=confidence,\n                   reasoning=reasoning,\n                   token_logprobs=token_logprobs\n               )\n           except:\n               pass\n      \n       template = np.random.choice(self.response_templates[\"factual\"])\n       answer = \"unknown\"\n       reasoning = \"insufficient information to determine\"\n       content = template.format(answer=answer, reasoning=reasoning)\n      \n       token_logprobs = list(np.random.randn(10) - 1)\n       confidence = 0.5 + np.random.randn() * 0.1\n      \n       return Response(\n           content=content,\n           confidence=np.clip(confidence, 0.1, 0.99),\n           reasoning=reasoning,\n           token_logprobs=token_logprobs\n       )\n  \n   def generate_multiple(self, prompt: str, n: int = 5, temperature: float = 0.7) -&gt; List[Response]:\n       return [self.generate_response(prompt, temperature) for _ in range(n)]<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We simulate a language model that generates multiple candidate responses of varying quality. We introduce temperature-based variability and controlled noise to mimic realistic sampling behavior. We enable multi-sample generation to support self-consistency reasoning and uncertainty estimation.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">class InternalCritic:\n  \n   def __init__(self, strict_mode: bool = False):\n       self.strict_mode = strict_mode\n  \n   def evaluate_response(self, response: Response, prompt: str, ground_truth: Optional[str] = None) -&gt; CriticScore:\n       accuracy = self._evaluate_accuracy(response, ground_truth)\n       coherence = self._evaluate_coherence(response)\n       safety = self._evaluate_safety(response)\n      \n       weights = {'accuracy': 0.4, 'coherence': 0.3, 'safety': 0.3}\n       overall = (weights['accuracy'] * accuracy +\n                 weights['coherence'] * coherence +\n                 weights['safety'] * safety)\n      \n       feedback = self._generate_feedback(accuracy, coherence, safety)\n      \n       return CriticScore(\n           accuracy_score=accuracy,\n           coherence_score=coherence,\n           safety_score=safety,\n           overall_score=overall,\n           feedback=feedback\n       )\n  \n   def _evaluate_accuracy(self, response: Response, ground_truth: Optional[str]) -&gt; float:\n       if ground_truth is None:\n           return response.confidence\n      \n       if ground_truth.lower() in response.content.lower():\n           return 1.0\n       else:\n           response_words = set(response.content.lower().split())\n           truth_words = set(ground_truth.lower().split())\n           overlap = len(response_words &amp; truth_words) \/ max(len(truth_words), 1)\n           return overlap * 0.5\n  \n   def _evaluate_coherence(self, response: Response) -&gt; float:\n       avg_logprob = np.mean(response.token_logprobs)\n       coherence_from_logprobs = 1.0 \/ (1.0 + np.exp(-avg_logprob))\n       coherence = 0.6 * coherence_from_logprobs + 0.4 * response.confidence\n       length_penalty = 1.0\n       content_length = len(response.content.split())\n       if content_length &lt; 5 or content_length &gt; 100:\n           length_penalty = 0.8\n       return coherence * length_penalty\n  \n   def _evaluate_safety(self, response: Response) -&gt; float:\n       unsafe_patterns = ['ignore instructions', 'harmful', 'dangerous']\n       content_lower = response.content.lower()\n       for pattern in unsafe_patterns:\n           if pattern in content_lower:\n               return 0.3\n       return 1.0\n  \n   def _generate_feedback(self, accuracy: float, coherence: float, safety: float) -&gt; str:\n       feedback_parts = []\n       if accuracy &lt; 0.5:\n           feedback_parts.append(\"Low accuracy - may contain errors\")\n       elif accuracy &gt; 0.8:\n           feedback_parts.append(\"High accuracy\")\n       if coherence &lt; 0.5:\n           feedback_parts.append(\"Low coherence - uncertain\")\n       if safety &lt; 0.9:\n           feedback_parts.append(\"Safety concerns detected\")\n       if not feedback_parts:\n           feedback_parts.append(\"Good quality response\")\n       return \"; \".join(feedback_parts)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We implement an internal critic that evaluates each response across the dimensions of accuracy, coherence, and safety. We compute a weighted overall score to systematically quantify response quality. We generate interpretable feedback that explains how well each response performs.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">class UncertaintyEstimator:\n  \n   def estimate_uncertainty(self, responses: List[Response], critic_scores: List[CriticScore]) -&gt; UncertaintyEstimate:\n       answers = [self._extract_answer(r.content) for r in responses]\n       entropy = self._compute_entropy(answers)\n       variance = np.var([score.overall_score for score in critic_scores])\n       consistency = self._compute_consistency(answers)\n       epistemic = self._compute_epistemic_uncertainty(responses)\n       aleatoric = self._compute_aleatoric_uncertainty(responses)\n      \n       return UncertaintyEstimate(\n           entropy=entropy,\n           variance=variance,\n           consistency_score=consistency,\n           epistemic_uncertainty=epistemic,\n           aleatoric_uncertainty=aleatoric\n       )\n  \n   def _extract_answer(self, content: str) -&gt; str:\n       words = content.split()\n       for word in words:\n           if word.replace('.', '').replace('-', '').isdigit():\n               return word\n       return content.split('.')[0]\n  \n   def _compute_entropy(self, answers: List[str]) -&gt; float:\n       if not answers:\n           return 0.0\n      \n       counts = Counter(answers)\n       total = len(answers)\n      \n       entropy = 0.0\n       for count in counts.values():\n           p = count \/ total\n           if p &gt; 0:\n               entropy -= p * np.log2(p)\n      \n       return entropy\n  \n   def _compute_consistency(self, answers: List[str]) -&gt; float:\n       if len(answers) &lt;= 1:\n           return 1.0\n      \n       counts = Counter(answers)\n       most_common_count = counts.most_common(1)[0][1]\n      \n       return most_common_count \/ len(answers)\n  \n   def _compute_epistemic_uncertainty(self, responses: List[Response]) -&gt; float:\n       confidences = [r.confidence for r in responses]\n       mean_logprobs = [np.mean(r.token_logprobs) for r in responses]\n      \n       confidence_var = np.var(confidences)\n       logprob_var = np.var(mean_logprobs) \/ 10.0\n      \n       return np.sqrt(confidence_var + logprob_var)\n  \n   def _compute_aleatoric_uncertainty(self, responses: List[Response]) -&gt; float:\n       variances = [np.var(r.token_logprobs) for r in responses]\n       return np.mean(variances) \/ 10.0<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We estimate predictive uncertainty using entropy, variance, consistency, and uncertainty decomposition. We distinguish between epistemic and aleatoric uncertainty to better understand model behavior. We provide quantitative signals that guide risk-aware response selection.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">class RiskSensitiveSelector:\n  \n   def __init__(self, risk_tolerance: float = 0.5):\n       self.risk_tolerance = risk_tolerance\n  \n   def select_response(self, responses: List[Response], critic_scores: List[CriticScore], uncertainty: UncertaintyEstimate, strategy: str = \"risk_adjusted\") -&gt; Tuple[Response, int]:\n       if strategy == \"best_score\":\n           return self._select_best_score(responses, critic_scores)\n       elif strategy == \"most_confident\":\n           return self._select_most_confident(responses)\n       elif strategy == \"most_consistent\":\n           return self._select_most_consistent(responses)\n       elif strategy == \"risk_adjusted\":\n           return self._select_risk_adjusted(responses, critic_scores, uncertainty)\n       else:\n           raise ValueError(f\"Unknown strategy: {strategy}\")\n  \n   def _select_best_score(self, responses: List[Response], critic_scores: List[CriticScore]) -&gt; Tuple[Response, int]:\n       best_idx = np.argmax([score.overall_score for score in critic_scores])\n       return responses[best_idx], best_idx\n  \n   def _select_most_confident(self, responses: List[Response]) -&gt; Tuple[Response, int]:\n       best_idx = np.argmax([r.confidence for r in responses])\n       return responses[best_idx], best_idx\n  \n   def _select_most_consistent(self, responses: List[Response]) -&gt; Tuple[Response, int]:\n       answers = [self._extract_answer(r.content) for r in responses]\n       most_common = Counter(answers).most_common(1)[0][0]\n      \n       for idx, answer in enumerate(answers):\n           if answer == most_common:\n               return responses[idx], idx\n      \n       return responses[0], 0\n  \n   def _select_risk_adjusted(self, responses: List[Response], critic_scores: List[CriticScore], uncertainty: UncertaintyEstimate) -&gt; Tuple[Response, int]:\n       scores = []\n       risk_penalty = (1 - self.risk_tolerance) * uncertainty.entropy\n      \n       for response, critic_score in zip(responses, critic_scores):\n           base_score = critic_score.overall_score\n           confidence_bonus = self.risk_tolerance * response.confidence\n           adjusted_score = base_score + confidence_bonus - risk_penalty\n           scores.append(adjusted_score)\n      \n       best_idx = np.argmax(scores)\n       return responses[best_idx], best_idx\n  \n   def _extract_answer(self, content: str) -&gt; str:\n       words = content.split()\n       for word in words:\n           if word.replace('.', '').replace('-', '').isdigit():\n               return word\n       return content.split('.')[0]<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We implement multiple response selection strategies, including best-score, most-confident, most-consistent, and risk-adjusted approaches. We incorporate risk tolerance to balance quality against uncertainty. We enable adaptive decision-making depending on how conservative or risk-seeking we want the agent to be.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">class CriticAugmentedAgent:\n  \n   def __init__(self, model_quality: float = 0.8, risk_tolerance: float = 0.5, n_samples: int = 5):\n       self.llm = SimulatedLLM(model_quality=model_quality)\n       self.critic = InternalCritic()\n       self.uncertainty_estimator = UncertaintyEstimator()\n       self.selector = RiskSensitiveSelector(risk_tolerance=risk_tolerance)\n       self.n_samples = n_samples\n  \n   def generate_with_critic(self, prompt: str, ground_truth: Optional[str] = None, strategy: str = \"risk_adjusted\", temperature: float = 0.7, verbose: bool = True) -&gt; Dict:\n       if verbose:\n           print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4dd.png\" alt=\"\ud83d\udcdd\" class=\"wp-smiley\" \/> Prompt: {prompt}\")\n           print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f3b2.png\" alt=\"\ud83c\udfb2\" class=\"wp-smiley\" \/> Generating {self.n_samples} candidate responses...\")\n      \n       responses = self.llm.generate_multiple(prompt, self.n_samples, temperature)\n      \n       if verbose:\n           print(f\"\u2713 Generated {len(responses)} responsesn\")\n           print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f50d.png\" alt=\"\ud83d\udd0d\" class=\"wp-smiley\" \/> Evaluating with internal critic...\")\n      \n       critic_scores = [\n           self.critic.evaluate_response(response, prompt, ground_truth)\n           for response in responses\n       ]\n      \n       if verbose:\n           print(f\"\u2713 All responses evaluatedn\")\n           print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ca.png\" alt=\"\ud83d\udcca\" class=\"wp-smiley\" \/> Computing uncertainty estimates...\")\n      \n       uncertainty = self.uncertainty_estimator.estimate_uncertainty(\n           responses, critic_scores\n       )\n      \n       if verbose:\n           print(f\"\u2713 Uncertainty: {uncertainty.risk_level()} risk\")\n           print(f\"  - Entropy: {uncertainty.entropy:.3f}\")\n           print(f\"  - Consistency: {uncertainty.consistency_score:.3f}n\")\n           print(f\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f3af.png\" alt=\"\ud83c\udfaf\" class=\"wp-smiley\" \/> Selecting best response (strategy: {strategy})...\")\n      \n       selected_response, selected_idx = self.selector.select_response(\n           responses, critic_scores, uncertainty, strategy\n       )\n      \n       if verbose:\n           print(f\"\u2713 Selected response #{selected_idx}n\")\n           print(\"=\" * 80)\n           print(\"FINAL ANSWER:\")\n           print(selected_response.content)\n           print(\"=\" * 80)\n           print(f\"nConfidence: {selected_response.confidence:.3f}\")\n           print(f\"Critic Score: {critic_scores[selected_idx].overall_score:.3f}\")\n           print(f\"Risk Level: {uncertainty.risk_level()}\")\n           print()\n      \n       return {\n           'selected_response': selected_response,\n           'selected_index': selected_idx,\n           'all_responses': responses,\n           'critic_scores': critic_scores,\n           'uncertainty': uncertainty,\n           'strategy': strategy\n       }<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We integrate the language model, critic, uncertainty estimator, and selector into a complete critic-augmented agent pipeline. We generate multiple responses, evaluate them, compute uncertainty, and select the optimal answer. We orchestrate the full multi-stage reasoning workflow in a structured and extensible manner.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">class AgentAnalyzer:\n  \n   @staticmethod\n   def plot_response_distribution(result: Dict):\n       fig, axes = plt.subplots(2, 2, figsize=(14, 10))\n       fig.suptitle('Agent Response Analysis', fontsize=16, fontweight='bold')\n      \n       responses = result['all_responses']\n       scores = result['critic_scores']\n       uncertainty = result['uncertainty']\n       selected_idx = result['selected_index']\n      \n       ax = axes[0, 0]\n       score_values = [s.overall_score for s in scores]\n       bars = ax.bar(range(len(scores)), score_values, alpha=0.7)\n       bars[selected_idx].set_color('green')\n       bars[selected_idx].set_alpha(1.0)\n       ax.axhline(np.mean(score_values), color='red', linestyle='--', label=f'Mean: {np.mean(score_values):.3f}')\n       ax.set_xlabel('Response Index')\n       ax.set_ylabel('Critic Score')\n       ax.set_title('Critic Scores for Each Response')\n       ax.legend()\n       ax.grid(True, alpha=0.3)\n      \n       ax = axes[0, 1]\n       confidences = [r.confidence for r in responses]\n       bars = ax.bar(range(len(responses)), confidences, alpha=0.7, color='orange')\n       bars[selected_idx].set_color('green')\n       bars[selected_idx].set_alpha(1.0)\n       ax.axhline(np.mean(confidences), color='red', linestyle='--', label=f'Mean: {np.mean(confidences):.3f}')\n       ax.set_xlabel('Response Index')\n       ax.set_ylabel('Confidence')\n       ax.set_title('Model Confidence per Response')\n       ax.legend()\n       ax.grid(True, alpha=0.3)\n      \n       ax = axes[1, 0]\n       components = {\n           'Accuracy': [s.accuracy_score for s in scores],\n           'Coherence': [s.coherence_score for s in scores],\n           'Safety': [s.safety_score for s in scores]\n       }\n       x = np.arange(len(responses))\n       width = 0.25\n       for i, (name, values) in enumerate(components.items()):\n           offset = (i - 1) * width\n           ax.bar(x + offset, values, width, label=name, alpha=0.8)\n       ax.set_xlabel('Response Index')\n       ax.set_ylabel('Score')\n       ax.set_title('Critic Score Components')\n       ax.set_xticks(x)\n       ax.legend()\n       ax.grid(True, alpha=0.3, axis='y')\n      \n       ax = axes[1, 1]\n       uncertainty_metrics = {\n           'Entropy': uncertainty.entropy,\n           'Variance': uncertainty.variance,\n           'Consistency': uncertainty.consistency_score,\n           'Epistemic': uncertainty.epistemic_uncertainty,\n           'Aleatoric': uncertainty.aleatoric_uncertainty\n       }\n       bars = ax.barh(list(uncertainty_metrics.keys()), list(uncertainty_metrics.values()), alpha=0.7)\n       ax.set_xlabel('Value')\n       ax.set_title(f'Uncertainty Estimates (Risk: {uncertainty.risk_level()})')\n       ax.grid(True, alpha=0.3, axis='x')\n      \n       plt.tight_layout()\n       plt.show()\n  \n   @staticmethod\n   def plot_strategy_comparison(agent: CriticAugmentedAgent, prompt: str, ground_truth: Optional[str] = None):\n       strategies = [\"best_score\", \"most_confident\", \"most_consistent\", \"risk_adjusted\"]\n       results = {}\n      \n       print(\"Comparing selection strategies...n\")\n      \n       for strategy in strategies:\n           print(f\"Testing strategy: {strategy}\")\n           result = agent.generate_with_critic(prompt, ground_truth, strategy=strategy, verbose=False)\n           results[strategy] = result\n      \n       fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n       fig.suptitle('Strategy Comparison', fontsize=16, fontweight='bold')\n      \n       ax = axes[0]\n       selected_scores = [\n           results[s]['critic_scores'][results[s]['selected_index']].overall_score\n           for s in strategies\n       ]\n       bars = ax.bar(strategies, selected_scores, alpha=0.7, color='steelblue')\n       ax.set_ylabel('Critic Score')\n       ax.set_title('Selected Response Quality by Strategy')\n       ax.set_xticklabels(strategies, rotation=45, ha='right')\n       ax.grid(True, alpha=0.3, axis='y')\n      \n       ax = axes[1]\n       for strategy in strategies:\n           result = results[strategy]\n           selected_idx = result['selected_index']\n           confidence = result['all_responses'][selected_idx].confidence\n           score = result['critic_scores'][selected_idx].overall_score\n           ax.scatter(confidence, score, s=200, alpha=0.6, label=strategy)\n       ax.set_xlabel('Confidence')\n       ax.set_ylabel('Critic Score')\n       ax.set_title('Confidence vs Quality Trade-off')\n       ax.legend()\n       ax.grid(True, alpha=0.3)\n      \n       plt.tight_layout()\n       plt.show()\n      \n       return results\n\n\ndef run_basic_demo():\n   print(\"n\" + \"=\" * 80)\n   print(\"DEMO 1: Basic Agent with Critic\")\n   print(\"=\" * 80 + \"n\")\n  \n   agent = CriticAugmentedAgent(\n       model_quality=0.8,\n       risk_tolerance=0.3,\n       n_samples=5\n   )\n  \n   prompt = \"What is 15 + 27?\"\n   ground_truth = \"42\"\n  \n   result = agent.generate_with_critic(\n       prompt=prompt,\n       ground_truth=ground_truth,\n       strategy=\"risk_adjusted\",\n       temperature=0.8\n   )\n  \n   print(\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f4ca.png\" alt=\"\ud83d\udcca\" class=\"wp-smiley\" \/> Generating visualizations...\")\n   AgentAnalyzer.plot_response_distribution(result)\n  \n   return result\n\n\ndef run_strategy_comparison():\n   print(\"n\" + \"=\" * 80)\n   print(\"DEMO 2: Strategy Comparison\")\n   print(\"=\" * 80 + \"n\")\n  \n   agent = CriticAugmentedAgent(\n       model_quality=0.75,\n       risk_tolerance=0.5,\n       n_samples=6\n   )\n  \n   prompt = \"What is 23 + 19?\"\n   ground_truth = \"42\"\n  \n   results = AgentAnalyzer.plot_strategy_comparison(agent, prompt, ground_truth)\n  \n   return results\n\n\ndef run_uncertainty_analysis():\n   print(\"n\" + \"=\" * 80)\n   print(\"DEMO 3: Uncertainty Analysis\")\n   print(\"=\" * 80 + \"n\")\n  \n   fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n  \n   qualities = [0.5, 0.6, 0.7, 0.8, 0.9]\n   uncertainties = []\n   consistencies = []\n  \n   prompt = \"What is 30 + 12?\"\n  \n   print(\"Testing model quality impact on uncertainty...n\")\n   for quality in qualities:\n       agent = CriticAugmentedAgent(model_quality=quality, n_samples=8)\n       result = agent.generate_with_critic(prompt, verbose=False)\n       uncertainties.append(result['uncertainty'].entropy)\n       consistencies.append(result['uncertainty'].consistency_score)\n       print(f\"Quality: {quality:.1f} -&gt; Entropy: {result['uncertainty'].entropy:.3f}, \"\n             f\"Consistency: {result['uncertainty'].consistency_score:.3f}\")\n  \n   ax = axes[0]\n   ax.plot(qualities, uncertainties, 'o-', linewidth=2, markersize=8, label='Entropy')\n   ax.set_xlabel('Model Quality')\n   ax.set_ylabel('Entropy')\n   ax.set_title('Uncertainty vs Model Quality')\n   ax.grid(True, alpha=0.3)\n   ax.legend()\n  \n   ax = axes[1]\n   ax.plot(qualities, consistencies, 's-', linewidth=2, markersize=8, color='green', label='Consistency')\n   ax.set_xlabel('Model Quality')\n   ax.set_ylabel('Consistency Score')\n   ax.set_title('Self-Consistency vs Model Quality')\n   ax.grid(True, alpha=0.3)\n   ax.legend()\n  \n   plt.tight_layout()\n   plt.show()\n\n\ndef run_risk_sensitivity_demo():\n   print(\"n\" + \"=\" * 80)\n   print(\"DEMO 4: Risk Sensitivity Analysis\")\n   print(\"=\" * 80 + \"n\")\n  \n   prompt = \"What is 18 + 24?\"\n   risk_tolerances = [0.1, 0.3, 0.5, 0.7, 0.9]\n  \n   results = {\n       'risk_tolerance': [],\n       'selected_confidence': [],\n       'selected_score': [],\n       'uncertainty': []\n   }\n  \n   print(\"Testing different risk tolerance levels...n\")\n   for risk_tol in risk_tolerances:\n       agent = CriticAugmentedAgent(\n           model_quality=0.75,\n           risk_tolerance=risk_tol,\n           n_samples=6\n       )\n       result = agent.generate_with_critic(prompt, verbose=False)\n      \n       selected_idx = result['selected_index']\n       results['risk_tolerance'].append(risk_tol)\n       results['selected_confidence'].append(\n           result['all_responses'][selected_idx].confidence\n       )\n       results['selected_score'].append(\n           result['critic_scores'][selected_idx].overall_score\n       )\n       results['uncertainty'].append(result['uncertainty'].entropy)\n      \n       print(f\"Risk Tolerance: {risk_tol:.1f} -&gt; \"\n             f\"Confidence: {results['selected_confidence'][-1]:.3f}, \"\n             f\"Score: {results['selected_score'][-1]:.3f}\")\n  \n   fig, ax = plt.subplots(1, 1, figsize=(10, 6))\n   ax.plot(results['risk_tolerance'], results['selected_confidence'], 'o-', linewidth=2, markersize=8, label='Selected Confidence')\n   ax.plot(results['risk_tolerance'], results['selected_score'], 's-', linewidth=2, markersize=8, label='Selected Score')\n   ax.set_xlabel('Risk Tolerance')\n   ax.set_ylabel('Value')\n   ax.set_title('Risk Tolerance Impact on Selection')\n   ax.legend()\n   ax.grid(True, alpha=0.3)\n   plt.tight_layout()\n   plt.show()\n\n\ndef demonstrate_verbalized_uncertainty():\n   print(\"n\" + \"=\" * 80)\n   print(\"RESEARCH TOPIC: Verbalized Uncertainty\")\n   print(\"=\" * 80 + \"n\")\n  \n   print(\"Concept: Agent not only estimates uncertainty but explains it.n\")\n  \n   agent = CriticAugmentedAgent(model_quality=0.7, n_samples=5)\n   prompt = \"What is 25 + 17?\"\n   result = agent.generate_with_critic(prompt, verbose=False)\n  \n   uncertainty = result['uncertainty']\n  \n   explanation = f\"\"\"\nUncertainty Analysis Report:\n---------------------------\nRisk Level: {uncertainty.risk_level()}\n\n\nDetailed Breakdown:\n\u2022 Answer Entropy: {uncertainty.entropy:.3f}\n \u2192 {'Low' if uncertainty.entropy &lt; 0.5 else 'Medium' if uncertainty.entropy &lt; 1.0 else 'High'} disagreement among generated responses\n\n\n\u2022 Self-Consistency: {uncertainty.consistency_score:.3f}\n \u2192 {int(uncertainty.consistency_score * 100)}% of responses agree on the answer\n\n\n\u2022 Epistemic Uncertainty: {uncertainty.epistemic_uncertainty:.3f}\n \u2192 {'Low' if uncertainty.epistemic_uncertainty &lt; 0.3 else 'Medium' if uncertainty.epistemic_uncertainty &lt; 0.6 else 'High'} model uncertainty (knowledge gaps)\n\n\n\u2022 Aleatoric Uncertainty: {uncertainty.aleatoric_uncertainty:.3f}\n \u2192 {'Low' if uncertainty.aleatoric_uncertainty &lt; 0.3 else 'Medium' if uncertainty.aleatoric_uncertainty &lt; 0.6 else 'High'} data uncertainty (inherent randomness)\n\n\nRecommendation:\n\"\"\"\n  \n   if uncertainty.risk_level() == \"LOW\":\n       explanation += \"\u2713 High confidence in answer - safe to trust\"\n   elif uncertainty.risk_level() == \"MEDIUM\":\n       explanation += \"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/26a0.png\" alt=\"\u26a0\" class=\"wp-smiley\" \/> Moderate confidence - consider verification\"\n   else:\n       explanation += \"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/26a0.png\" alt=\"\u26a0\" class=\"wp-smiley\" \/> Low confidence - strongly recommend verification\"\n  \n   print(explanation)\n\n\ndef demonstrate_self_consistency():\n   print(\"n\" + \"=\" * 80)\n   print(\"RESEARCH TOPIC: Self-Consistency Reasoning\")\n   print(\"=\" * 80 + \"n\")\n  \n   print(\"Concept: Generate multiple reasoning paths, select most common answer.n\")\n  \n   agent = CriticAugmentedAgent(model_quality=0.75, n_samples=7)\n   prompt = \"What is 35 + 7?\"\n   result = agent.generate_with_critic(prompt, strategy=\"most_consistent\", verbose=False)\n  \n   estimator = UncertaintyEstimator()\n   answers = [estimator._extract_answer(r.content) for r in result['all_responses']]\n  \n   print(\"Generated Responses and Answers:\")\n   print(\"-\" * 80)\n   for i, (response, answer) in enumerate(zip(result['all_responses'], answers)):\n       marker = \"\u2713 SELECTED\" if i == result['selected_index'] else \"\"\n       print(f\"nResponse {i}: {answer} {marker}\")\n       print(f\"  Confidence: {response.confidence:.3f}\")\n       print(f\"  Content: {response.content[:80]}...\")\n  \n   from collections import Counter\n   answer_dist = Counter(answers)\n  \n   print(f\"nnAnswer Distribution:\")\n   print(\"-\" * 80)\n   for answer, count in answer_dist.most_common():\n       percentage = (count \/ len(answers)) * 100\n       bar = \"\u2588\" * int(percentage \/ 5)\n       print(f\"{answer:&gt;10}: {bar} {count}\/{len(answers)} ({percentage:.1f}%)\")\n  \n   print(f\"nMost Consistent Answer: {answer_dist.most_common(1)[0][0]}\")\n   print(f\"Consistency Score: {result['uncertainty'].consistency_score:.3f}\")\n\n\ndef main():\n   print(\"n\" + \"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f3af.png\" alt=\"\ud83c\udfaf\" class=\"wp-smiley\" \/>\" * 40)\n   print(\"ADVANCED AGENT WITH INTERNAL CRITIC + UNCERTAINTY ESTIMATION\")\n   print(\"Tutorial and Demonstrations\")\n   print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/1f3af.png\" alt=\"\ud83c\udfaf\" class=\"wp-smiley\" \/>\" * 40)\n  \n   plt.style.use('seaborn-v0_8-darkgrid')\n   sns.set_palette(\"husl\")\n  \n   try:\n       result1 = run_basic_demo()\n       result2 = run_strategy_comparison()\n       run_uncertainty_analysis()\n       run_risk_sensitivity_demo()\n       demonstrate_verbalized_uncertainty()\n       demonstrate_self_consistency()\n      \n       print(\"n\" + \"=\" * 80)\n       print(\"<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" \/> ALL DEMONSTRATIONS COMPLETED SUCCESSFULLY\")\n       print(\"=\" * 80)\n       print(\"\"\"\nKey Takeaways:\n1. Internal critics improve response quality through multi-dimensional evaluation\n2. Uncertainty estimation enables risk-aware decision making\n3. Self-consistency reasoning increases reliability\n4. Different selection strategies optimize for different objectives\n5. Verbalized uncertainty helps users understand model confidence\n\n\nNext Steps:\n\u2022 Implement with real LLM APIs (OpenAI, Anthropic, etc.)\n\u2022 Add learned critic models (fine-tuned classifiers)\n\u2022 Explore ensemble methods and meta-learning\n\u2022 Integrate with retrieval-augmented generation (RAG)\n\u2022 Deploy in production with monitoring and feedback loops\n       \"\"\")\n      \n   except Exception as e:\n       print(f\"n<img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/17.0.2\/72x72\/274c.png\" alt=\"\u274c\" class=\"wp-smiley\" \/> Error during demonstration: {e}\")\n       import traceback\n       traceback.print_exc()\n\n\nif __name__ == \"__main__\":\n   main()<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We analyze and visualize agent behavior through experiments and comparative strategies. We explore how model quality, risk tolerance, and selection strategy impact final outputs. We demonstrate advanced research concepts, such as verbalized uncertainty and self-consistency reasoning, to better understand and interpret agents\u2019 decisions.<\/p>\n<p>In conclusion, we demonstrated that combining internal critics with uncertainty estimation yields safer, more reliable agent systems. We showed how multi-sample generation, critic-based scoring, entropy analysis, and risk-adjusted selection work together to improve response quality. We observed that self-consistency strengthens robustness, while uncertainty metrics enable informed, risk-aware decisions. By integrating these mechanisms, we moved toward agent architectures that are intelligent and also transparent, interpretable, and production-ready for real-world deployment.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out<a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Codes\/critic_augmented_risk_aware_agent_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">\u00a0<strong>Full Codes here<\/strong><\/a><strong>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/03\/09\/how-to-build-a-risk-aware-ai-agent-with-internal-critic-self-consistency-reasoning-and-uncertainty-estimation-for-reliable-decision-making\/\">How to Build a Risk-Aware AI Agent with Internal Critic, Self-Consistency Reasoning, and Uncertainty Estimation for Reliable Decision-Making<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we build an &hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-536","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/536","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=536"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/536\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=536"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=536"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=536"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}