Generative AI on AWS: 2025 Guide to Bedrock, Foundation Models and SageMaker

Read Time16 Minute, 19 Second

Generative AI is transforming how we create text, images, code, and other forms of digital content. Amazon Web Services (AWS) has emerged as a significant player in the generative AI space with services and tools that enable developers and businesses to build, train, and deploy AI models with ease.

This article is a continuation of the first article in the series:

Comparison of Generative AI Solutions: AWS vs Azure vs Google Cloud (2024 Guide).

If you have not read the previous article, I strongly recommend that you read it before continuing with this article.

Generative AI on AWS

Amazon Web Services (AWS) has played a pivotal role in the evolution of Generative AI, continuously expanding its services to support the growing demands of AI-driven innovation. AWS provides different services, tools and models to implement and use Generative AI on its platform. Most use cases around Generative AI involve end-user experiences. AWS makes it easy to build, deploy and scale Generative AI applications and services with security and privacy built-in.

There are three main components for implementing generative AI applications in AWS:

Amazon Bedrock provides a unified platform for accessing the features provided by different models in a seamless manner.
Foundation Models are core elements in machine learning that learn from data to make predictions or automate decision-making processes.
Amazon SageMaker provides a platform to host, train and fine-tune models to improve their speed and accuracy and to adapt them to the required use cases.

Reference Architecture

Here’s a high-level reference archiecture of how a typical generative AI application in AWS may look like:

Amazon Bedrock

Amazon Bedrock, introduced in 2023, is a fully managed service that provides a unified platform to access multiple Foundation Models (FMs) from several leading AI companies like AI21 Labs, Anthropic, Stability AI, and also Amazon’s own proprietary models like Titan. This flexibility allows users to choose from different models depending on their specific needs. It offers an ecosystem to build generative AI applications and services with security, privacy and responsible AI practices.

Knowledge Bases

Knowledge Bases allow Foundation Models and Bedrock Agents to access contextual information from the organization’s private data sources to implement RAG. It enables Bedrock to provide accurate, relevant and customized responses.

Agents

Bedrock Agent is a service within Amazon Bedrock designed to facilitate the development of generative AI-powered applications by integrating foundation models with external data sources and APIs. Agents allow developers to enhance the capabilities of pre-trained foundation models, such as LLMs (Large Language Models), by enabling them to perform specific tasks using real-time data, applications, and custom logic.

Guardrails

Bedrock Guardrails are built-in safety mechanisms within Amazon Bedrock that ensure the responsible use of generative AI models by preventing them from generating harmful, inappropriate, or unintended content. These guardrails help developers manage the risks associated with deploying large-scale foundation models (like LLMs and Generative AI models) by enforcing ethical guidelines, security policies, and content quality checks.

Guardrails are especially useful in use cases where the responses from generative AI models are shared with end users directly and must be checked for profanity, inappropriateness, offensive contents, etc.

Bedrock Studio

Amazon Bedrock Studio is a web interface designed to help developers experiment with and build generative AI applications. It provides a rapid prototyping environment and streamlines access to multiple foundation models (FMs) and developer tools.

Amazon Bedrock Studio allows developers to collaborate on projects within their organization, experiment with different LLMs and Foundation Models without needing to set up a developer environment, create prototype apps using Amazon Bedrock models and features such as Knowledge Bases or Guardrails, without writing any code. It’s designed to make it easy for developers to get started with generative AI applications quickly and efficiently.

Amazon Bedrock Features

Model Variety

Provides access to multiple Foundation Models (FMs) from different providers, giving users flexibility based on their use cases. AWS also provides a model evaluation tool to evaluate accuracy of models for specific use cases.

Integration with AWS Services

Bedrock integrates seamlessly with AWS services like Lambda and SageMaker, allowing users to build, deploy, manage and access AI models quickly and efficiently.

Model Customization

Bedrock allows you to customize and fine-tune models with domain information to improve the accuracy of models for specific use cases.

Retrieval Augmented Generation (RAG)

RAG is an AI technique that enables you to augment and enrich responses generated by Foundation Models with custom data from Knowledge Bases, documents, databases and other backend systems. This process helps you to enhance the accuracy and relevance of generated content.

Foundation Models

Foundation Models are general-purpose language models that are more versatile and require less data. The foundation models can be customized and fine-tuned for specific use cases. Unlike Large Language Models (LLMs), they’re not specialized and do not need large data. AWS provides different foundation models for you to choose from. Each of these models are suited for multiple use cases.

Titan

Amazon Titan are a family of models built by AWS that are pre-trained on large datasets, which makes them powerful, general-purpose models. The Titan models are optimized for tasks such as summarization, text generation, search queries, and chatbot applications. These models are integrated into various AWS services, enabling seamless deployment across the AWS ecosystem.

Claude

Claude by Anthropic is designed with a focus on safety, providing outputs that are less likely to generate harmful or unsafe content. This makes it ideal for businesses requiring a higher standard of compliance and responsibility. This model is also highly suitable for building customer facing conversational interfaces. It has a 200,000 token context window which allows you to relay large amount of data to the model. (A token roughly equals 0.75 words)

Jurassic-2

Jurassic-2 by AI21 Labs is a versatile LLM with strong capabilities in text generation, content creation, summarization, and translation. It is designed to handle multi-language support with extensive control over outputs. Jurassic-2 models are known for their high-performance capabilities and fine-grained control over generated outputs. One of its key features is its ability to generate long-form, coherent text, making it suitable for applications in content creation, virtual assistants, and customer support. Jurassic-2 emphasizes on user control, providing parameters that allow developers to adjust and fine-tune the tone, style, and precision of the generated content. This makes it more versatile for specific use cases like creative writing, technical documentation, or conversational AI.

Jamba

The Jamba 1.5 Model Family by AI21 Labs contains many models like Large, Mid, Mini and Instruct. Some models have a 256K token effective context window, one of the largest on the market. Jamba 1.5 models focus on speed and efficiency, delivering up to 2.5x faster inference than leading models of comparable size. Jamba 1.5 models support multiple languages, but do not support fine-tuning.

Llama

LLaMA (Large Language Model Meta AI) is a family of large language models developed by Meta (formerly Facebook) that excels in natural language understanding and generation. LLaMA is designed to be efficient, using fewer parameters compared to some of its peers, while still maintaining high performance in various NLP tasks like text completion, translation, and summarization. With a focus on research and academic use, LLaMA is optimized for use in environments where computational resources might be limited, yet high-quality results are needed. Its lightweight architecture makes it accessible for developers to fine-tune and deploy in specialized applications, and it’s recognized for advancing open-access research in AI language models.

Llama 2 is a high-performance, auto-regressive language model designed for developers. It uses an optimized transformer architecture and pre-trained models are trained on 2 trillion tokens with a 4k context length. Llama 3 is an accessible, open model designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas.

Stable Diffusion

Stable Diffusion by Stability AI is a cutting-edge generative AI model designed to create high-quality images from text descriptions. Developed by Stability AI, this model uses advanced diffusion processes to generate detailed and realistic images, making it one of the most powerful tools in the domain of AI-driven image creation. It is particularly useful for tasks like graphic design, product visualization, and creative content generation.

Cohere

Cohere‘s Command and Embed models are particularly suited for tasks like semantic search, text classification, and language translation, offering high-quality natural language processing capabilities. With Cohere, developers can leverage embeddings to represent textual data in vector form, facilitating tasks such as recommendation systems or document clustering.

Mistral

Mistral AI is an advanced model known for its efficiency and flexibility in natural language understanding and generation tasks. Designed for high-performance applications, Mistral excels at tasks such as content creation, question-answering, summarization, and conversational AI. Current models offered by Mistral do not support fine-tuning.

Amazon SageMaker

Amazon SageMaker is AWS’s comprehensive machine learning platform, offering capabilities for building, training, and deploying custom LLMs. SageMaker provides full flexibility to deploy pre-built models or fine-tune them with your own data. SageMaker supports custom models trained using frameworks like PyTorch, TensorFlow, and Hugging Face, which can also integrate with LLMs from other sources. There are two primary ways to build a model in SageMaker: Canvas and Studio.

SageMaker Canvas

SageMaker Canvas is designed for business analysts and non-technical users who want to build machine learning models without needing to write code. Canvas provides a no-code, drag-and-drop interface, making it easy for users with limited ML knowledge to generate predictions and insights.

SageMaker Studio

SageMaker Studio provides a comprehensive set of purpose-built tools to support every stage of machine learning (ML) development. From data preparation to model building, training, deployment, and management, SageMaker Studio offers everything you need in one place. You can easily upload data and create models using your preferred integrated development environment (IDE). It also enhances collaboration within ML teams, allows for efficient coding with an AI-powered assistant, simplifies model tuning and debugging, and supports seamless deployment and management of models in production. Plus, you can automate workflows—all through a single, unified web interface. This is geared towards data scientists, ML engineers, and developers who require a more advanced, integrated development environment (IDE) for building, training, and deploying machine learning models. Studio provides a full set of tools for every step of the ML lifecycle, including model development, experimentation, debugging, and deployment.

While both Canvas and Studio enable you to build ML models, there are a few key differences:

Feature	SageMaker Canvas	SageMaker Studio
Target Audience	Business Analysts, Non-technical Users	Data Scientists, ML Engineers, Developers
Code Requirement	No-code	Code-first
Use Case	Quick predictions, Business insights	Advanced ML workflows, Custom models
Ease of Use	Very easy, drag-and-drop interface	Requires coding and ML expertise
Customization	Limited customization, automated models	Full control, customizable workflows
Integration	Accesses pre-trained models from Studio	Fully integrated with SageMaker services
Collaboration	Collaborate with data scientists	Full collaboration within ML teams

SageMaker Pipelines

SageMaker Pipeline is a sequence of connected steps organized in a directed acyclic graph (DAG), which can be created using a drag-and-drop interface or the Pipelines SDK. You can also define your pipeline using a JSON schema, called the pipeline definition JSON schema. This JSON format outlines the requirements and relationships between each step in the pipeline. The structure of the DAG is shaped by data dependencies between steps—where the output of one step is used as the input for another.

SageMaker JumpStart

SageMaker JumpStart is a ML hub that allows users to quickly deploy and fine-tune pre-built LLMs for tasks like text generation, sentiment analysis, and more.

Amazon SageMaker Features

Integration with Bedrock

Enables the fine-tuning of large language models available via Bedrock.

Customization and Fine-Tuning

Amazon SageMaker can be used in conjunction with Bedrock for training and fine-tuning models based on proprietary data.

Model Evaluation and Selection

Amazon Bedrock allows you to evaluate, compare and select the best Foundation Model for your use case. Amazon Bedrock offers a choice of automatic evaluation and human evaluation.

Automatic Evaluation

You can use automatic evaluation with predefined metrics such as accuracy, robustness, and toxicity.

Human Evaluation

For subjective or custom metrics, such as friendliness, style, and alignment to brand voice, you can set up human evaluation workflows with just a few clicks.

For more information, check out this blog post.

What’s in for Developers?

For developers, Generative AI on AWS offers a wide range of tools, services, and resources that make it easier to build, deploy, and scale AI-powered applications. Here’s what’s in store for developers leveraging AWS for generative AI:

API access to the Bedrock platform using boto3 framework to access the different Foundation Models through a unified approach that improves developer experience and speeds up application development.
API access to the SageMaker platform using boto3 framework to train and fine-tune models.
Easy integration with AWS services like Lambda, S3, API Gateway and hosting solutions like Fargate, ECS and EC2.

More on this in a separate article later.

Use Case Matrix

This table contains the recommended models to be used for specific use cases. You can evaluate multiple models for your specific use case and determine the right model to use, using the Model evaluation option in Bedrock.

Use Case	Description	Models
Text Generation	Generate coherent text, articles, or stories.	Jurassic-2, Amazon Titan
Text Summarization	Condense long articles, documents, or reports into shorter summaries.	Jurassic-2, Amazon Titan
Content Creation	Generate blog posts, marketing copy, or social media content.	Jurassic-2, Amazon Titan
Translation	Translate text between languages.	Jurassic-2, Amazon Titan
Document Summarization	Summarize long-form documents, such as research papers.	Claude, Amazon Titan
Creative Writing	Generate stories, poetry, or other forms of creative literature.	Jurassic-2, Amazon Titan
Chatbots & Conversational AI	Build virtual assistants for customer support or general inquiries.	Claude, Amazon Titan
Sentiment Analysis	Analyze the sentiment (positive, neutral, or negative) of a given text.	BERT, T5, RoBERTa
Code Generation	Write, complete and explain programming code based on user input.	Amazon SageMaker
Code Debugging	Assist developers in finding and fixing bugs in code.	Amazon SageMaker
Question Answering	Provide concise answers to questions from a knowledge base or document.	Jurassic-2, Amazon Titan
Educational Tutoring	Provide educational assistance or tutoring on various subjects.	-
Text-to-Image Generation	Create images from textual descriptions (e.g., "A cat wearing a space helmet").	Stable Diffusion
Image Generation	Create realistic images from text.	Stable Diffusion
Personalized Recommendations	Generate personalized recommendations for products, content, etc.	Amazon Titan
Speech-to-Text & Voice Bots	Transcribe spoken language to text or powering voice-based assistants.	Amazon Polly
Product Design	Assist in the design and prototyping of products via image generation.	Stable Diffusion
Enterprise Search	Enhance internal search engines to provide intelligent, context-aware results.	Claude, Amazon Titan
Legal Document Analysis	Analyze and summarize complex legal documents or contracts.	Claude, Jurassic-2, Amazon Titan
Regulatory Compliance	Analyze documents for compliance with legal and industry regulations.	Claude, Amazon Titan, BERT

Best Practices

There are several best practices when deploying generative AI models on AWS to ensure efficiency, scalability, and cost-effectiveness:

1. Choose the Right Instance Type

For training large models, AWS offers GPU-based instances like the P3 or G5 instances, which provide significant acceleration for training deep learning models. For inference, you can opt for less expensive CPU-based instances like M5 or Inf1.

2. Use SageMaker Pipelines

SageMaker Pipelines allows you to automate end-to-end ML workflows. This is particularly useful when working with complex generative AI workflows that require frequent retraining and model updates.

3. Leverage Spot Instances

For non-critical workloads, using Spot instances can reduce training costs significantly. Spot instances allow you to use AWS’s unused EC2 capacity at a fraction of the cost.

4. Optimize Model Inference with Elastic Inference

Elastic Inference allows you to attach just the right amount of inference acceleration to any EC2 or SageMaker instance, reducing costs for serving generative models at scale.

5. Monitor and Fine-tune Your Models

Once your models are in production, it’s crucial to monitor performance and fine-tune them regularly to keep them up to date with changes in user behavior or data patterns.

Conclusion

Generative AI on AWS is rapidly changing the landscape of artificial intelligence by making powerful models more accessible. With Bedrock and SageMaker, businesses can build, customize, and deploy generative models with ease, whether they’re working with text, images, or other content types. By leveraging foundation models and following best practices, AWS enables the creation of intelligent and scalable AI applications for a wide range of industries.

One thought on “Generative AI on AWS: 2025 Guide to Bedrock, Foundation Models and SageMaker”

Generative AI using Amazon Bedrock for Developers – The Developer Space says:

November 11, 2024 at 3:23 am

[…] I recommend that you read my previous article that gives an overview of Generative AI on AWS. Generative AI on AWS: 2025 Guide to Bedrock, Foundation Models and SageMaker. To understand Generative AI platforms offered by different cloud providers, head on to Comparison […]