{"id":187,"date":"2025-12-24T12:10:46","date_gmt":"2025-12-24T04:10:46","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=187"},"modified":"2025-12-24T12:10:46","modified_gmt":"2025-12-24T04:10:46","slug":"google-health-ai-releases-medasr-a-conformer-based-medical-speech-to-text-model-for-clinical-dictation","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=187","title":{"rendered":"Google Health AI Releases MedASR: a Conformer Based Medical Speech to Text Model for Clinical Dictation"},"content":{"rendered":"<p>Google Health AI team has released MedASR, an open weights medical speech to text model that targets clinical dictation and physician patient conversations and is designed to plug directly into modern AI workflows.<\/p>\n<h3 class=\"wp-block-heading\"><strong>What MedASR is and where it fits<\/strong>?<\/h3>\n<p>MedASR is a speech to text model based on the Conformer architecture and is pre trained for medical dictation and transcription. It is positioned as a starting point for developers who want to build healthcare based voice applications such as radiology dictation tools or visit note capture systems.<\/p>\n<p>The model has 105 million parameters and accepts mono channel audio at 16000 hertz with 16 bit integer waveforms. It produces text only output, so it drops directly into downstream natural language processing or generative models such as MedGemma.<\/p>\n<p>MedASR sits inside the Health AI Developer Foundations portfolio, alongside MedGemma, MedSigLIP and other domain specific medical models that share common terms of use and a consistent governance story.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Training data and domain specialization<\/strong><\/h3>\n<p>MedASR is trained on a diverse corpus of de identified medical speech. The dataset includes about 5000 hours of physician dictations and clinical conversations across radiology, internal medicine and family medicine.<\/p>\n<p>The training pairs audio segments with transcripts and metadata. Subsets of the conversational data are annotated with medical named entities including symptoms, medications and conditions. This gives the model strong coverage of clinical vocabulary and phrasing patterns that appear in routine documentation.<\/p>\n<p>The model is English only, and most training audio comes from speakers for whom English is a first language and who were raised in the United States. The documentation notes that performance may be lower for other speaker profiles or noisy microphones and recommends fine tuning for such settings.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Architecture and decoding<\/strong><\/h3>\n<p>MedASR follows the Conformer encoder design. Conformer combines convolution blocks with self attention layers so it can capture local acoustic patterns and longer range temporal dependencies in the same stack.<\/p>\n<p>The model is exposed as an automated speech detector with a CTC style interface. In the reference implementation, developers use <code>AutoProcessor<\/code> to create input features from waveform audio and <code>AutoModelForCTC<\/code> to produce token sequences. Decoding uses greedy decoding by default. The model can also be paired with an external six gram language model with beam search of size 8 to improve word error rate.<\/p>\n<p>MedASR training uses JAX and ML Pathways on TPUv4p, TPUv5p and TPUv5e hardware. These systems provide the scale needed for large speech models and align with Google\u2019s broader foundation model training stack.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Performance on medical speech tasks<\/strong><\/h3>\n<p><strong>Key results, with greedy decoding and with a six gram language model, are:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>RAD DICT, radiologist dictation: MedASR greedy 6.6 percent, MedASR plus language model 4.6 percent, Gemini 2.5 Pro 10.0 percent, Gemini 2.5 Flash 24.4 percent, Whisper v3 Large 25.3 percent.<\/li>\n<li>GENERAL DICT, general and internal medicine: MedASR greedy 9.3 percent, MedASR plus language model 6.9 percent, Gemini 2.5 Pro 16.4 percent, Gemini 2.5 Flash 27.1 percent, Whisper v3 Large 33.1 percent.<\/li>\n<li>FM DICT, family medicine: MedASR greedy 8.1 percent, MedASR plus language model 5.8 percent, Gemini 2.5 Pro 14.6 percent, Gemini 2.5 Flash 19.9 percent, Whisper v3 Large 32.5 percent.<\/li>\n<li>Eye Gaze, dictation on 998 MIMIC chest X ray cases: MedASR greedy 6.6 percent, MedASR plus language model 5.2 percent, Gemini 2.5 Pro 5.9 percent, Gemini 2.5 Flash 9.3 percent, Whisper v3 Large 12.5 percent.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Developer workflow and deployment options<\/strong><\/h3>\n<p><strong>A minimal pipeline example is:<\/strong><\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">from transformers import pipeline\nimport huggingface_hub\n\naudio = huggingface_hub.hf_hub_download(\"google\/medasr\", \"test_audio.wav\")\npipe = pipeline(\"automatic-speech-recognition\", model=\"google\/medasr\")\nresult = pipe(audio, chunk_length_s=20, stride_length_s=2)\nprint(result)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>For more control, developers load <code>AutoProcessor<\/code> and <code>AutoModelForCTC<\/code>, resample audio to 16000 hertz with <code>librosa<\/code>, move tensors to CUDA if available and call <code>model.generate<\/code> followed by <code>processor.batch_decode<\/code>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ol class=\"wp-block-list\">\n<li><strong>MedASR is a lightweight, open weights Conformer based medical ASR model<\/strong>: It has 105M parameters, is trained specifically for medical dictation and transcription, and is released under the Health AI Developer Foundations program as an English only model for healthcare developers. <\/li>\n<li><strong>Domain specific training on about 5000 hours of de identified medical audio<\/strong>: MedASR is pre trained on physician dictations and clinical conversations across specialties like radiology, internal medicine and family medicine, which gives it strong coverage of clinical terminology compared to general purpose ASR systems. <\/li>\n<li><strong>Competitive or better word error rates on medical dictation benchmarks<\/strong>: On internal radiology, general medicine, family medicine and Eye Gaze datasets, MedASR with greedy or language model decoding matches or outperforms large general models such as Gemini 2.5 Pro, Gemini 2.5 Flash and Whisper v3 Large on word error rate for English medical speech.<\/li>\n<\/ol>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/google-health\/medasr\" target=\"_blank\" rel=\"noreferrer noopener\">Repo<\/a>, <a href=\"https:\/\/huggingface.co\/google\/medasr\" target=\"_blank\" rel=\"noreferrer noopener\">Model on HF<\/a> and <a href=\"https:\/\/developers.google.com\/health-ai-developer-foundations\/medasr\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/12\/23\/google-health-ai-releases-medasr-a-conformer-based-medical-speech-to-text-model-for-clinical-dictation\/\">Google Health AI Releases MedASR: a Conformer Based Medical Speech to Text Model for Clinical Dictation<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Google Health AI team has rele&hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-187","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/187","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=187"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/187\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=187"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=187"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=187"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}