Healthcare provider slashes document processing costs with GenAI

Myriad Genetics, a Salt Lake City-based provider of genetic testing and precision medicine solutions, partnered with AWS to transform its healthcare document processing pipeline using Amazon Bedrock foundation models. The implementation reduced document classification costs by 77 percent while improving accuracy from 94 to 98 percent.

The company processes thousands of healthcare documents daily across its Women's Health, Oncology and Mental Health divisions. Documents must be classified into categories including test request forms, lab results, clinical notes and insurance records to automate prior authorisation workflows.

Myriad's existing solution combined Amazon Textract for optical character recognition with Amazon Comprehend for document classification. The system routed classified documents to appropriate external vendors for processing based on their identified document class.

Despite achieving 94 percent classification accuracy, the system created significant operational challenges. Processing costs reached 3 cents per page, resulting in monthly expenses of $US15,000 per business unit. Classification latency averaged 8.5 minutes per document, creating bottlenecks in downstream workflows and delaying prior authorisation submissions.

The existing system's limitations extended beyond classification. Key information extraction remained entirely manual due to the complexity of medical documents. Staff needed contextual understanding to differentiate critical clinical distinctions such as 'is metastatic' versus 'is not metastatic' and to locate information including insurance numbers and patient details across varying document formats.

This manual processing burden was substantial. The Women's Health business unit alone required up to 10 full-time employees contributing 78 hours daily to extract key information from documents. The workload created scalability constraints and operational bottlenecks as document volumes increased.

Myriad needed a solution to reduce document classification costs while maintaining or improving accuracy, accelerate document processing to eliminate workflow bottlenecks, automate information extraction for medical documents, and scale across multiple business units and document types.

The new solution uses AWS's open-source GenAI Intelligent Document Processing Accelerator with Amazon Nova foundation models. Amazon Nova Pro handles document classification, while Amazon Nova Premier manages complex information extraction requiring advanced reasoning capabilities.

Processing speed improved by 80 percent, reducing classification time from 8.5 minutes to 1.5 minutes per document. The automated key information extraction achieved 90 percent accuracy, matching human evaluator baseline performance, while processing documents in approximately 1.3 minutes each.

Implementation challenges centred on handling complex medical documents with visual ambiguity. Checkbox fields required distinguishing between different marking styles including checkmarks, crosses and handwritten marks. Documents contained overlapping marks and content spanning multiple fields.

AWS Data Scientist Priyashree Roy and her team addressed these challenges through several optimization techniques. They enabled Amazon Textract's specialized tables and forms features to improve optical character recognition discrimination between selected and unselected checkbox elements.

The team implemented a multimodal approach that sent both document images and extracted text to the foundation model, enabling simultaneous analysis of visual layout and textual content. Few-shot learning provided example document images paired with expected extraction outputs to guide the model's understanding of various form layouts.

For particularly complex extraction scenarios, the team used Amazon Nova Premier with chain of thought reasoning, having the model work through extraction decisions step-by-step before making final determinations.

Prompt engineering proved critical to achieving high accuracy. The team used document samples from each class with Anthropic Claude Sonnet 3.7 on Amazon Bedrock with model reasoning enabled. The model identified distinguishing features between similar document classes, which Myriad's subject matter experts refined.

Format-based classification strategies used document structure and formatting as key differentiators. Lab reports contain numerical results organized in tables with reference ranges and units, while test results present findings in paragraph format with clinical interpretations.

Negative prompting techniques resolved confusion between similar documents by explicitly instructing the model what classifications to avoid. Test request forms were frequently misclassified as test results due to confusion between patient medical history and lab measurements. Adding exclusionary language to classification prompts improved classification accuracy by 4 percent.

Myriad plans a phased rollout beginning with document classification in the Women's Health business unit, followed by Oncology and Mental Health divisions. The company projects annual savings of $US132,000 in document classification costs.

The solution reduces each prior authorisation submission time by 2 minutes. Specialists now complete orders in four minutes instead of six minutes due to faster access to tagged documents. This improvement saves 300 hours monthly across 9,000 prior authorisations in Women's Health alone.

Martyna Shallenberg, Senior Director of Software Engineering at Myriad Genetics, said the partnership with AWS GenAI Innovation Centre delivered measurable business impact. She cited improved performance and accuracy alongside projected savings of more than $US10,000 per month.

The GenAI Intelligent Document Processing Accelerator provides a serverless architecture that converts unstructured documents into structured data. The accelerator processes multiple documents in parallel through configurable concurrency limits. Its built-in evaluation framework lets users provide expected output through the user interface and evaluate generated results to customize configuration.

The accelerator offers one-click deployment with three pre-built patterns optimized for different workloads. Pattern 1 uses Amazon Bedrock Data Automation with out-of-the-box features and straightforward per-page pricing. Pattern 2 uses Amazon Textract and Amazon Bedrock with Amazon Nova or Anthropic Claude models, ideal for complex documents requiring custom logic. Pattern 3 uses Amazon Textract, Amazon SageMaker with fine-tuned models for classification, and Amazon Bedrock for extraction.

Myriad selected Pattern 2 to meet requirements for low cost while offering flexibility to optimize accuracy through prompt engineering and model selection. The pattern offers no-code configuration, allowing customization of document types, extraction fields and processing logic through configuration editable in the web interface.

The open-source GenAI IDP Accelerator is available for organizations to deploy and test in their environments.

AWS provides detailed documentation on accelerating intelligent document processing with generative AI at https://aws.amazon.com/blogs/machine-learning/accelerate-intelligent-doc...

The GenAI IDP Accelerator is available at http://www.amazon.com/genai-idp-accelerator

This article is based on a case study published by AWS: https://aws.amazon.com/blogs/machine-learning/how-myriad-genetics-achiev...