Skip to content

Multi-Cloud in Practice: AWS + Azure + GCP in the Same Company

A
abemon
| | 18 min read | Written by practitioners
Share

The wrong question and the right one

“Which cloud is best?” is a question every CTO asks and no useful answer can address. The right question is: “Which combination of cloud services solves my specific problems at the lowest operational cost?” And the answer, almost always, involves more than one provider.

Multi-cloud is not a strategy you choose. It is a reality you manage. Most mid-sized European companies already operate multi-cloud without realizing it: the CRM runs in one vendor’s cloud, the ERP in another, the website in a third, email on Google Workspace, and files on OneDrive. The shift from accidental multi-cloud to deliberate multi-cloud is where the value lives.

This whitepaper documents real multi-cloud implementation patterns, with numbers, specific tools, and lessons we have learned deploying workloads across AWS, Azure, and GCP for clients in logistics, legal, and retail.

When to use each cloud (and why)

The decision of where to run each workload should be based on three factors: the provider’s differential capability, total cost, and affinity with the existing ecosystem.

AWS: the Swiss army knife

AWS has the broadest catalog: over 200 services. It is the default choice when there is no strong reason to go elsewhere, and the best option for:

  • General compute: EC2 offers the widest variety of instance types. For irregular workloads, Spot Instances cut costs 60-90% versus On-Demand.
  • Mature serverless: Lambda, Step Functions, API Gateway, and DynamoDB form a serverless stack with years of production-scale track record. The ecosystem of tooling and documentation is unmatched.
  • Data lakes: S3 + Glue + Athena + Lake Formation is the most battle-tested data lake stack. S3 storage at $0.023/GB/month in us-east-1 remains the benchmark.
  • Advanced networking: Transit Gateway, PrivateLink, and Global Accelerator solve complex topologies that require workarounds on other providers.

Where AWS falls short: developer experience on high-level services (compare SageMaker with Vertex AI), integration with the Microsoft stack, and data egress pricing (one of the most painful hidden costs of multi-cloud).

Azure: the enterprise bridge

Azure is the natural choice when the organization already lives in the Microsoft ecosystem. But it has standalone merits:

  • Identity: Azure Active Directory (now Entra ID) is the most comprehensive identity directory available. SSO, MFA, Conditional Access, and device management in one platform. And it works as an identity provider for AWS and GCP.
  • Hybrid cloud: Azure Arc manages on-premise, edge, and multi-cloud servers from a single control plane. For companies with legacy infrastructure that cannot migrate, this is a real differentiator.
  • Enterprise AI/ML: Azure OpenAI Service provides access to GPT-4 models with the compliance guarantees, data residency, and SLAs that large enterprises require. Cognitive Services integration covers vision, language, and speech.
  • Managed databases: Cosmos DB offers multi-model and global distribution with configurable consistency. Azure SQL Database is the most mature managed SQL Server available.

Where Azure falls short: the admin console (Azure Portal is notoriously slow and confusing), documentation (fragmented across versions and constant rebranding), and pricing on certain premium services.

GCP: the quiet engineer

GCP has the smallest market share of the three (11% versus 31% for AWS and 25% for Azure, per Synergy Research Q4 2024), but its differentiating services are genuinely superior:

  • Data and analytics: BigQuery is, in our experience, the most powerful and easiest-to-operate data warehouse. No clusters to manage, automatic scaling, and a per-query pricing model that aligns cost with actual usage. One of our clients processes 40 TB/month for under EUR 800.
  • Kubernetes: GKE is the most mature managed Kubernetes service. Given that Google created Kubernetes, this is unsurprising. Autopilot mode eliminates node management entirely. For organizations standardizing on Kubernetes, GKE significantly reduces operational overhead.
  • Networking: Google’s global network (one of the largest private backbones in the world) delivers consistently lower latencies for inter-region traffic. Premium Tier routing is included by default.
  • AI/ML: Vertex AI unifies the complete ML lifecycle. TPUs offer superior price-performance over GPUs for large model training.

Where GCP falls short: the partner and consultant ecosystem is smaller, enterprise support has improved but still lags behind AWS and Azure, and the perception (justified or not) that Google abandons products creates hesitation among conservative decision-makers.

The multi-cloud pattern that works

Not all multi-cloud patterns are equal. We have seen three:

Multi-cloud by differential service (recommended): use the best service from each provider. BigQuery for analytics, Azure AD for identity, AWS for general compute. Each workload lives on a single provider. Complexity sits in connectivity and identity, not portability.

Multi-cloud for redundancy (expensive, rarely justified): run the same workload on two providers to avoid lock-in. Sounds good in a strategy presentation. In practice, it doubles costs, multiplies operational complexity, and rarely works when needed (cross-cloud failover is harder than it looks). Only justified for absolutely critical services where a single provider’s SLA is insufficient.

Multi-cloud by policy (unavoidable in regulated sectors): banking, healthcare, or public administration may require multi-cloud by regulation or vendor risk management policy. No choice here. Just good implementation.

We recommend the first pattern for 90% of organizations. It captures the differential benefits of each cloud without the complexity explosion of the other two.

Cross-cloud networking

Connectivity between cloud providers is the first serious technical problem of multi-cloud. Options, from simplest to most complex:

Site-to-site VPN

The simplest option. An IPsec tunnel between the AWS VPC and the Azure VNet (or GCP VPC). Each pair of providers requires its own configuration.

  • AWS: Virtual Private Gateway or Transit Gateway
  • Azure: VPN Gateway
  • GCP: Cloud VPN

Typical bandwidth: 1.25 Gbps per tunnel. Sufficient for most inter-cloud workloads. Cost: from $35/month per gateway plus data transfer. The catch: latency adds 5-15 ms depending on regions, and managing multiple tunnels becomes tedious.

Dedicated interconnection

For high volume and low latency, physical interconnection is the right answer:

  • AWS Direct Connect: dedicated connection from 50 Mbps to 100 Gbps
  • Azure ExpressRoute: dedicated connection with global reach
  • GCP Cloud Interconnect: Dedicated (10-100 Gbps) or Partner (50 Mbps - 50 Gbps)

The multi-cloud trick: use a common interconnection point. Providers like Equinix, Megaport, or PacketFabric operate interconnection fabrics where you can connect AWS, Azure, and GCP in the same colocation datacenter, with sub-millisecond latencies between providers. Megaport Cloud Router, for example, creates a mesh topology between all three providers under a single contract.

Cost: from EUR 300-500/month for 1 Gbps connections. Justified when inter-cloud traffic exceeds 5 TB/month (the point where internet egress costs exceed dedicated interconnection costs).

Network mesh with service mesh

For organizations running Kubernetes across multiple clouds, a service mesh like Istio with multi-cluster or Cilium Cluster Mesh provides service-level connectivity. Services in GKE can communicate directly with services in EKS as if they were in the same cluster.

The most elegant option, but also the most complex to operate. We only recommend it for teams with solid Kubernetes and service mesh experience.

Data egress: the invisible tax

The most underestimated cost of multi-cloud is data egress. All three providers charge for data leaving their network:

ProviderInternet egressCross-cloud egress
AWS$0.09/GB$0.09/GB (internet) / $0.02/GB (Direct Connect)
Azure$0.087/GB$0.087/GB (internet) / $0.02/GB (ExpressRoute)
GCP$0.12/GB (Premium)$0.12/GB (internet) / $0.02/GB (Interconnect)

For a 10 TB/month data flow between AWS and GCP over the internet, egress cost is approximately $900/month in transfer alone. With dedicated interconnection, it drops to $200/month, but you add the circuit cost.

The practical lesson: architect to minimize data movement between clouds. Data should live close to the compute that processes it. If BigQuery is your data warehouse, analytics data should flow to GCP once (nightly ETL batch, for example), not be queried in real-time from AWS.

Identity federation

Managing separate identities in each provider is an operational nightmare and a security risk. The solution is federation: a single identity provider (IdP) that authenticates users across all clouds.

Azure AD as central IdP

The most common configuration (and the one that works best in our experience) uses Azure AD (Entra ID) as the source of truth:

AWS: Configure SAML 2.0 federation between Azure AD and AWS IAM Identity Center (formerly SSO). Each Azure AD group maps to a Permission Set in AWS. Users access the AWS console and obtain temporary credentials via AssumeRoleWithSAML.

GCP: Configure Workforce Identity Federation. GCP accepts Azure AD tokens directly for console and API access. For service accounts (automated workloads), Workload Identity Federation allows a pod on AKS to obtain GCP credentials without static secrets.

Tangible benefits: a single user management point, immediate deprovisioning (deactivate the user in Azure AD and they lose access to all three clouds), unified MFA, and centralized access logs.

Workload Identity Federation

For service-to-service communication (machine-to-machine), workload identity federation eliminates static secrets:

  • A service in AWS needing BigQuery access obtains a GCP token using its AWS credentials (via STS and Workload Identity Federation)
  • A pod in GKE needing S3 access obtains temporary AWS credentials via Web Identity Federation
  • The result: zero long-lived secrets, automatic rotation, complete audit trail

Setting this up requires understanding the trust relationships between providers, but once established, it eliminates an entire class of vulnerabilities.

Cost governance

Multi-cloud multiplies FinOps complexity. Three invoices, three pricing models, three cost consoles. Without a governance strategy, costs spiral within weeks.

Visibility tools

Native: AWS Cost Explorer, Azure Cost Management, GCP Billing. Each excellent for its own provider, useless for consolidated visibility.

Multi-cloud: Apptio Cloudability, Flexera One, or the open-source option Opencost (for Kubernetes environments). They aggregate costs from all three providers, normalize categories, and enable unified alerts.

In our experience, the tool matters less than the process. What is critical:

  1. Consistent tagging: define a tag schema (environment, team, service, cost-center) and enforce it across all three providers. Without tags, cost attribution is impossible.
  2. Per-team budgets: each team gets a monthly budget. Alerts fire at 80% and 100%. The team owns its spend.
  3. Monthly review: 30 minutes reviewing the top 10 cost lines from each provider. 80% of optimization comes from fixing the 3-4 most expensive resources.

Committed Use Discounts

Each provider offers commitment discounts:

  • AWS: Reserved Instances (1 or 3 years) and Savings Plans (flexible compute)
  • Azure: Reserved Instances and Azure Savings Plan
  • GCP: Committed Use Discounts (1 or 3 years) and Sustained Use Discounts (automatic)

The multi-cloud strategy: commit in each provider only for workloads you know will stay there. Stable base compute in each cloud gets reserved; peaks are covered with on-demand or spot.

A common mistake: reserving capacity before optimizing. Right-size first, then reserve. We have seen companies paying for reserved m5.4xlarge instances that should have been m5.xlarge.

The cost of complexity

Multi-cloud has an operational cost that is rarely accounted for: the team needs expertise in three providers (how many engineers truly master AWS and Azure and GCP?), IaC tools must support multiple providers, testing multiplies, and debugging cross-cloud issues is significantly more complex.

Our advice: budget 1.5-2x the engineering cost when planning multi-cloud. If the team is 5 engineers managing a single cloud, you will need 7-8 to manage three at the same quality level. Or, more realistically, accept that depth in each cloud will be shallower.

Cross-cloud CI/CD

The CI/CD pipeline is where multi-cloud complexity materializes daily. Every deployment potentially touches multiple providers, and the chain of permissions, artifacts, and validations multiplies.

Pipeline architecture

The structure that works best is centralized CI with distributed CD:

Centralized CI: GitHub Actions or GitLab CI as the single source. Builds, tests, and artifact generation happen in one place. Artifacts (container images, binaries, IaC packages) are published to a common registry.

Distributed CD: each provider has its own deployment mechanism. ArgoCD for Kubernetes (works identically on EKS, AKS, and GKE). AWS CodeDeploy for Lambda and ECS. Azure Pipelines for Azure-specifics. The key: the artifact is the same, only the destination changes.

Artifact registry: a single container registry (ECR, Artifact Registry, or a neutral registry like Harbor) that all clusters across all providers can access. Avoid duplicating images in provider-specific registries.

Terraform as lingua franca

Managing multi-cloud infrastructure is most effective when combined with infrastructure as code practices. Terraform is the IaC tool with the best multi-cloud support, and in practice, where most organizations converge. The provider model allows defining AWS, Azure, and GCP resources in the same language.

Our recommended structure:

infrastructure/
├── modules/
│   ├── aws-vpc/
│   ├── azure-vnet/
│   ├── gcp-vpc/
│   └── shared/
│       ├── dns/
│       └── monitoring/
├── environments/
│   ├── staging/
│   │   ├── aws.tf
│   │   ├── azure.tf
│   │   └── gcp.tf
│   └── production/
│       ├── aws.tf
│       ├── azure.tf
│       └── gcp.tf
└── terragrunt.hcl

Terragrunt on top of Terraform manages DRY configuration across environments and providers. Terraform state is stored in a remote backend (S3, GCS, or Terraform Cloud) with locking.

A practical tip: do not try to create abstractions that unify all three providers. A “generic-compute” module that works across AWS, Azure, and GCP is a chimera that reduces quality in all three. Accept that each provider has its own API and idioms. Unification happens in the pipeline and repository structure, not in the infrastructure code.

Secrets management

Secrets (API keys, certificates, passwords) need a single management point:

HashiCorp Vault is the most complete option: stores secrets, generates dynamic credentials for each provider, rotates automatically, and audits all access. The operational complexity is justified in serious multi-cloud environments.

Pragmatic alternative: use one provider’s secret manager as the primary source (AWS Secrets Manager or GCP Secret Manager) and synchronize to others via External Secrets Operator in Kubernetes. Less elegant, but easier to operate.

Multi-cloud observability

Monitoring three clouds with three native tools is a recipe for blind spots. Multi-cloud observability needs a unified plane.

Metrics: Grafana Cloud with remote Prometheus or Grafana Mimir as backend. OpenTelemetry or Prometheus agents in each provider export metrics to a single Mimir. Unified dashboards in Grafana show service health regardless of where they run.

Logs: Grafana Loki as centralized store. Each provider has an agent (Promtail, Fluentd, or the OTel collector) sending logs to Loki. Alternative: Datadog or Elastic Cloud if the budget allows.

Traces: OpenTelemetry Collector in each environment exporting to centralized Grafana Tempo or Jaeger. Distributed traces are especially critical in multi-cloud because a request can cross two or three providers.

Alerting: Grafana Alerting with routes to PagerDuty or OpsGenie. A single alerting policy for all infrastructure.

Cost as an observability metric

A pattern that works well: treat cost as another observability metric. Export billing data from each provider to Prometheus (exporters exist for all three) and create dashboards showing cost per service, per team, and per environment alongside technical metrics. When the team sees that a service costs $400/day on a panel that also shows it processes 200 requests/minute, the cost per request becomes visible and actionable.

Mistakes we have made

Full transparency: not everything has gone smoothly in our multi-cloud implementations.

Underestimating data egress. On a logistics project, we designed a pipeline where tracking data (AWS) fed a prediction model (GCP Vertex AI) in real time. Egress cost was 3x the estimate. The fix was switching to nightly batch processing: data exports once per day to GCS, and the model retrains on the last 24 hours. Prediction latency increased from minutes to hours, but cost dropped 70%.

Ignoring IAM differences. AWS IAM, Azure RBAC, and GCP IAM use different models. An “administrator” in AWS does not have the same permissions as an “Owner” in GCP. The first time we set up identity federation, permissions were incorrectly mapped for two weeks. Now we document role mappings explicitly before implementation.

Too much abstraction too soon. We tried building an internal framework that abstracted provider differences. Six months later, the framework was the biggest bug generator in the project. Now we accept differences and manage them with clear conventions rather than magic abstractions.

Action plan: from theory to implementation

For the organization considering deliberate multi-cloud, this is the order we recommend:

Months 1-2: Inventory and decision

  • Map all existing workloads (including SaaS)
  • Identify candidates for each provider based on differential capabilities
  • Estimate networking and egress costs
  • Define the tagging schema

Months 3-4: Identity and networking

  • Configure Azure AD as central IdP (or your existing IdP)
  • Establish SAML/OIDC federation with AWS and GCP
  • Deploy site-to-site VPN between required providers
  • Implement Workload Identity Federation for machine-to-machine

Months 5-6: CI/CD and IaC

  • Standardize Terraform with Terragrunt
  • Configure centralized CI pipeline
  • Establish artifact registries accessible from all providers
  • Deploy centralized secrets management

Months 7-8: Observability and FinOps

  • Deploy unified observability stack (Grafana + Prometheus + Loki + Tempo)
  • Configure consolidated alerting
  • Implement cost dashboards
  • Establish monthly cost review process

Months 9-12: Migration and optimization

  • Migrate workloads by priority
  • Right-sizing based on real data
  • Committed use discounts for stable workloads
  • Iterate on architecture based on actual metrics

Multi-cloud is not a destination

Multi-cloud is not a destination you arrive at. It is a state you continuously manage. Providers launch new services, prices change, business needs evolve. The multi-cloud architecture that is optimal today will not be in 18 months.

What remains constant are the fundamentals: federated identity, predictable networking, unified observability, visible costs. With those pillars in place, adding or changing providers is a tactical decision, not a strategic crisis.

And if after evaluating all of this you decide that a single cloud is sufficient for your case, that is a perfectly valid decision too. The worst multi-cloud is the one implemented because of fashion rather than necessity.

To evaluate whether multi-cloud makes sense for your organization, we can perform a technology audit that maps your current workloads and recommends the optimal cloud architecture. We also offer cloud and DevOps services to implement and manage multi-cloud environments in production. If your primary challenge is data engineering across clouds, we have specific experience with multi-cloud pipelines using BigQuery, Redshift, and Synapse.

About the author

A

abemon engineering

Engineering team

Multidisciplinary engineering, data and AI team headquartered in the Canary Islands. We build, deploy and operate custom software solutions for companies at any scale.