Build a serverless audio summarization solution with Amazon Bedrock and Whisper


Recordings of industrial conferences, interviews, and buyer interactions have develop into crucial for protecting necessary knowledge. On the other hand, transcribing and summarizing those recordings manually is continuously time-consuming and labor-intensive. With the development in generative AI and automated speech reputation (ASR), computerized answers have emerged to make this procedure quicker and extra environment friendly.

Protective for my part identifiable knowledge (PII) is an important side of knowledge safety, pushed by means of each moral duties and felony necessities. On this put up, we show find out how to use the Open AI Whisper basis style (FM) Whisper Huge V3 Turbo, to be had in Amazon Bedrock Marketplace, which gives get entry to to over 140 fashions thru a devoted providing, to provide close to real-time transcription. Those transcriptions are then processed by means of Amazon Bedrock for summarization and redaction of delicate knowledge.

Amazon Bedrock is a completely controlled carrier that gives a number of high-performing FMs from main AI firms like AI21 Labs, Anthropic, Cohere, DeepSeek, Luma, Meta, Mistral AI, poolside (coming quickly), Stability AI, and Amazon Nova thru a unmarried API, in conjunction with a large set of functions to construct generative AI packages with safety, privateness, and accountable AI. Moreover, you’ll use Amazon Bedrock Guardrails to routinely redact sensitive information, together with PII, from the transcription summaries to enhance compliance and information coverage wishes.

On this put up, we stroll thru an end-to-end structure that mixes a React-based frontend with Amazon Bedrock, AWS Lambda, and AWS Step Functions to orchestrate the workflow, facilitating seamless integration and processing.

Answer evaluation

The answer highlights the ability of integrating serverless applied sciences with generative AI to automate and scale content material processing workflows. The consumer adventure starts with importing a recording thru a React frontend software, hosted on Amazon CloudFront and sponsored by means of Amazon Simple Storage Service (Amazon S3) and Amazon API Gateway. When the document is uploaded, it triggers a Step Purposes state system that orchestrates the core processing steps, the usage of AI fashions and Lambda purposes for seamless knowledge glide and transformation. The next diagram illustrates the answer structure.

AWS serverless architecture for audio processing: CloudFront to S3, EventBridge trigger, Lambda and Bedrock for transcription and summarization

The workflow is composed of the next steps:

  1. The React software is hosted in an S3 bucket and served to customers thru CloudFront for speedy, international get entry to. API Gateway handles interactions between the frontend and backend services and products.
  2. Customers add audio or video recordsdata immediately from the app. Those recordings are saved in a chosen S3 bucket for processing.
  3. An Amazon EventBridge rule detects the S3 add tournament and triggers the Step Purposes state system, starting up the AI-powered processing pipeline.
  4. The state system plays audio transcription, summarization, and redaction by means of orchestrating a couple of Amazon Bedrock fashions in series. It makes use of Whisper for transcription, Claude for summarization, and Guardrails to redact delicate knowledge.
  5. The redacted abstract is returned to the frontend software and exhibited to the consumer.

The next diagram illustrates the state system workflow.

AWS Step Functions state machine for audio processing: Whisper transcription, speaker identification, and Bedrock summary tasks

The Step Purposes state system orchestrates a sequence of duties to transcribe, summarize, and redact delicate knowledge from uploaded audio/video recordings:

  1. A Lambda serve as is brought about to collect enter main points (for instance, Amazon S3 object trail, metadata) and get ready the payload for transcription.
  2. The payload is distributed to the OpenAI Whisper Huge V3 Turbo style throughout the Amazon Bedrock Market to generate a close to real-time transcription of the recording.
  3. The uncooked transcript is handed to Anthropic’s Claude Sonnet 3.5 thru Amazon Bedrock, which produces a concise and coherent abstract of the dialog or content material.
  4. A 2nd Lambda serve as validates and forwards the abstract to the redaction step.
  5. The abstract is processed thru Amazon Bedrock Guardrails, which routinely redacts PII and different delicate knowledge.
  6. The redacted abstract is saved or returned to the frontend software thru an API, the place it’s exhibited to the consumer.

Must haves

Sooner than you get started, just be sure you have the next must haves in position:

Create a guardrail within the Amazon Bedrock console

For directions for developing guardrails in Amazon Bedrock, consult with Create a guardrail. For main points on detecting and redacting PII, see Remove PII from conversations by using sensitive information filters. Configure your guardrail with the next key settings:

  • Permit PII detection and dealing with
  • Set PII motion to Redact
  • Upload the related PII varieties, reminiscent of:
    • Names and identities
    • Telephone numbers
    • E-mail addresses
    • Bodily addresses
    • Monetary knowledge
    • Different delicate private knowledge

After you deploy the guardrail, word the Amazon Useful resource Title (ARN), and you’ll be the usage of this when deploys the style.

Deploy the Whisper style

Whole the next steps to deploy the Whisper Huge V3 Turbo style:

  1. At the Amazon Bedrock console, make a choice Style catalog beneath Basis fashions within the navigation pane.
  2. Seek for and make a choice Whisper Huge V3 Turbo.
  3. At the choices menu (3 dots), make a choice Deploy.

Amazon Bedrock console displaying filtered model catalog with Whisper Large V3 Turbo speech recognition model and deployment option

  1. Adjust the endpoint identify, collection of circumstances, and example kind to fit your explicit use case. For this put up, we use the default settings.
  2. Adjust the Complicated settings segment to fit your use case. For this put up, we use the default settings.
  3. Make a choice Deploy.

This creates a brand new AWS Identity and Access Management IAM function and deploys the style.

You’ll make a choice Market deployments within the navigation pane, and within the Controlled deployments segment, you’ll see the endpoint standing as Growing. Stay up for the endpoint to complete deployment and the standing to switch to In Carrier, then replica the Endpoint Title, and you’ll be the usage of this when deploying the

Amazon Bedrock console: "How it works" overview, managed deployments table with Whisper model endpoint in service

Deploy the answer infrastructure

Within the GitHub repo, practice the directions within the README file to clone the repository, then deploy the frontend and backend infrastructure.

We use the AWS Cloud Development Kit (AWS CDK) to outline and deploy the infrastructure. The AWS CDK code deploys the next assets:

  • React frontend software
  • Backend infrastructure
  • S3 buckets for storing uploads and processed effects
  • Step Purposes state system with Lambda purposes for audio processing and PII redaction
  • API Gateway endpoints for dealing with requests
  • IAM roles and insurance policies for safe get entry to
  • CloudFront distribution for website hosting the frontend

Implementation deep dive

The backend consists of a chain of Lambda purposes, every dealing with a selected degree of the audio processing pipeline:

  • Add handler – Receives audio recordsdata and retail outlets them in Amazon S3
  • Transcription with Whisper – Converts speech to textual content the usage of the Whisper style
  • Speaker detection – Differentiates and labels particular person audio system throughout the audio
  • Summarization the usage of Amazon Bedrock – Extracts and summarizes key issues from the transcript
  • PII redaction – Makes use of Amazon Bedrock Guardrails to take away delicate knowledge for privateness compliance

Let’s read about one of the crucial key elements:

The transcription Lambda serve as makes use of the Whisper style to transform audio recordsdata to textual content:

def transcribe_with_whisper(audio_chunk, endpoint_name):
    # Convert audio to hex string layout
    hex_audio = audio_chunk.hex()
    
    # Create payload for Whisper style
    payload = {
        "audio_input": hex_audio,
        "language": "english",
        "process": "transcribe",
        "top_p": 0.9
    }
    
    # Invoke the SageMaker endpoint operating Whisper
    reaction = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="software/json",
        Frame=json.dumps(payload)
    )
    
    # Parse the transcription reaction
    response_body = json.so much(reaction['Body'].learn().decode('utf-8'))
    transcription_text = response_body['text']
    
    go back transcription_text

We use Amazon Bedrock to generate concise summaries from the transcriptions:

def generate_summary(transcription):
    # Layout the steered with the transcription
    steered = f"{transcription}nnGive me the abstract, audio system, key discussions, and motion pieces with house owners"
    
    # Name Bedrock for summarization
    reaction = bedrock_runtime.invoke_model(
        modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
        frame=json.dumps({
            "steered": steered,
            "max_tokens_to_sample": 4096,
            "temperature": 0.7,
            "top_p": 0.9,
        })
    )
    
    # Extract and go back the abstract
    consequence = json.so much(reaction.get('frame').learn())
    go back consequence.get('of completion')

A important part of our resolution is the automated redaction of PII. We carried out this the usage of Amazon Bedrock Guardrails to enhance compliance with privateness rules:

def apply_guardrail(bedrock_runtime, content material, guardrail_id):
# Layout content material in step with API necessities
formatted_content = [{"text": {"text": content}}]

# Name the guardrail API
reaction = bedrock_runtime.apply_guardrail(
guardrailIdentifier=guardrail_id,
guardrailVersion="DRAFT",
supply="OUTPUT",  # The usage of OUTPUT parameter for right kind glide
content material=formatted_content
)

# Extract redacted textual content from reaction
if 'motion' in reaction and reaction['action'] == 'GUARDRAIL_INTERVENED':
if len(reaction['outputs']) > 0:
output = reaction['outputs'][0]
if 'textual content' in output and isinstance(output['text'], str):
go back output['text']

# Go back authentic content material if redaction fails
go back content material

When PII is detected, it’s changed with kind signs (for instance, {PHONE} or {EMAIL}), ensuring that summaries stay informative whilst protective delicate knowledge.

To regulate the complicated processing pipeline, we use Step Purposes to orchestrate the Lambda purposes:

{
"Remark": "Audio Summarization Workflow",
"StartAt": "TranscribeAudio",
"States": {
"TranscribeAudio": {
"Sort": "Job",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "WhisperTranscriptionFunction",
"Payload": {
"bucket": "$.bucket",
"key": "$.key"
}
},
"Subsequent": "IdentifySpeakers"
},
"IdentifySpeakers": {
"Sort": "Job",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "SpeakerIdentificationFunction",
"Payload": {
"Transcription.$": "$.Payload"
}
},
"Subsequent": "GenerateSummary"
},
"GenerateSummary": {
"Sort": "Job",
"Useful resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "BedrockSummaryFunction",
"Payload": {
"SpeakerIdentification.$": "$.Payload"
}
},
"Finish": true
}
}
}

This workflow makes certain every step completes effectively sooner than continuing to the following, with computerized error dealing with and retry good judgment in-built.

Take a look at the answer

After getting effectively finished the deployment, you’ll use the CloudFront URL to check the answer capability.

Audio/video upload and summary interface with completed file upload for team meeting recording analysis

Safety issues

Safety is a important side of this resolution, and we’ve carried out a number of best possible practices to enhance knowledge coverage and compliance:

  • Delicate knowledge redaction – Routinely redact PII to give protection to consumer privateness.
  • Effective-Grained IAM Permissions – Practice the primary of least privilege throughout AWS services and products and assets.
  • Amazon S3 get entry to controls – Use strict bucket insurance policies to restrict get entry to to licensed customers and roles.
  • API safety – Safe API endpoints the usage of Amazon Cognito for consumer authentication (not obligatory however advisable).
  • CloudFront coverage – Implement HTTPS and practice trendy TLS protocols to facilitate safe content material supply.
  • Amazon Bedrock knowledge safety – Amazon Bedrock (together with Amazon Bedrock Market) protects buyer knowledge and does no longer ship knowledge to suppliers or teach the usage of buyer knowledge. This makes certain your proprietary knowledge stays safe when the usage of AI functions.

Blank up

To forestall needless fees, remember to delete the assets provisioned for this resolution whilst you’re accomplished:

  1. Delete the Amazon Bedrock guardrail:
    1. At the Amazon Bedrock console, within the navigation menu, make a choice Guardrails.
    2. Make a choice your guardrail, then make a choice Delete.
  2. Delete the Whisper Huge V3 Turbo style deployed throughout the Amazon Bedrock Market:
    1. At the Amazon Bedrock console, make a choice Market deployments within the navigation pane.
    2. Within the Controlled deployments segment, make a choice the deployed endpoint and make a choice Delete.
  3. Delete the AWS CDK stack by means of operating the command cdk break, which deletes the AWS infrastructure.

Conclusion

This serverless audio summarization resolution demonstrates some great benefits of combining AWS services and products to create an advanced, safe, and scalable software. Via the usage of Amazon Bedrock for AI functions, Lambda for serverless processing, and CloudFront for content material supply, we’ve constructed an answer that may maintain massive volumes of audio content material successfully whilst serving to you align with safety best possible practices.

The automated PII redaction function helps compliance with privateness rules, making this resolution well-suited for regulated industries reminiscent of healthcare, finance, and felony services and products the place knowledge safety is paramount. To get began, deploy this structure inside of your AWS atmosphere to boost up your audio processing workflows.


In regards to the Authors

Kaiyin HuKaiyin Hu is a Senior Answers Architect for Strategic Accounts at Amazon Internet Services and products, with years of enjoy throughout enterprises, startups, {and professional} services and products. Lately, she is helping consumers construct cloud answers and drives GenAI adoption to cloud. Up to now, Kaiyin labored within the Sensible House area, aiding consumers in integrating voice and IoT applied sciences.

Sid VantairSid Vantair is a Answers Architect with AWS protecting Strategic accounts.  He prospers on resolving complicated technical problems to triumph over buyer hurdles. Outdoor of labor, he cherishes spending time together with his circle of relatives and fostering inquisitiveness in his youngsters.



Source link

Leave a Comment