Top APM Tools for Reducing Observability Costs in 2026

 
SOPHISTICATED CLOUD Global Lead Best Squarespace Web Designer expert in Basingstoke, Winchester, London, Hampshire, UK, Arizona, AZ. Bespoke websites for celebrities, sport personalities, elite and influencers
 

Where Engineering Teams Are Overpaying and What to Do About It

Observability costs are growing faster than infrastructure costs at most engineering organizations. A recurring pattern emerges in post-mortems on monitoring spend: teams signed up for an APM platform expecting a predictable monthly bill, then discovered hidden cost dimensions - custom metrics surcharges, per-host fees on auto-scaling clusters, log indexing cliffs, retention premiums, and cloud egress charges - that pushed the actual bill 30-60% beyond forecast.

The problem is structural, not operational. Most APM pricing models were designed for static, monolithic infrastructure, and they compound in unpredictable ways as modern workloads scale. Understanding where costs actually accumulate - and which pricing architectures avoid the compounding effect - is the first step toward bringing observability spend under control. This is not about monitoring less. It is about monitoring the same workloads on pricing models that do not punish growth.

This guide evaluates seven APM platforms through the lens of total cost of ownership at scale: CubeAPM, Grafana Cloud, Elastic APM, New Relic, Datadog, Dynatrace, and Splunk. Each tool is assessed on pricing transparency, hidden cost dimensions, and what engineering teams actually pay at 30TB/month ingestion.

What to Look For When Evaluating APM Costs

Not all pricing pages tell the full story. When evaluating the true cost of an observability platform, these are the dimensions that matter most:

1. Pricing Predictability

The most important cost metric is not the rate card - it is whether you can predict next month's bill from this month's usage. Single-dimension pricing (per-GB) is inherently more predictable than multi-axis models that charge on hosts, users, custom metrics, and data volume simultaneously. When four billing dimensions scale independently, forecasting becomes guesswork.

2. Hidden Cost Dimensions

Custom metrics surcharges are the most common hidden cost. In Kubernetes environments, labels automatically generate high-cardinality metrics, which vendors often bill as custom metrics - representing 30-52% of the total bill at scale. OTel metrics are frequently classified as custom metrics, penalizing teams that standardize on open instrumentation.

3. Cloud Egress Costs

Sending telemetry to any external SaaS platform incurs cloud provider egress charges of approximately $0.09-$0.12/GB. At 30TB/month, that is roughly $3,000/month ($36,000/year) in costs that do not appear on the APM invoice. Self-hosted platforms running inside your VPC eliminate this entirely.

4. Retention Economics

Default retention at most SaaS vendors is 8-15 days. Extending to 30, 90, or 365 days adds 50-200% to data costs. Platforms that include unlimited retention in their base pricing remove this scaling penalty.

5. Per-User Seat Fees

Per-user pricing creates a perverse incentive: limit who can observe production systems to control costs. When a $99-$349/user/month fee stands between an engineer and a production dashboard, organizations either overpay or under-observe. Unlimited-user models eliminate this tension entirely.

6. Data Sovereignty as a Cost Factor

For most SaaS vendors, data residency is a paid add-on or not available at all. For self-hosted platforms, it is guaranteed by architecture, and the compliance overhead of demonstrating data sovereignty to regulators is reduced, which has a real cost impact on audit and legal teams.

Cost Comparison at 30TB/Month Ingestion

Estimates based on 30TB/month ingestion, 100 hosts, 20 users, 30-day retention. Full methodology at the end of the article.

Tool Est. Cost @ 30TB/mo Pricing Model OTel Native Data Residency Self-Hosted
CubeAPM ~$5,100/mo all-in $0.15/GB ingestion-based Native Always (in-VPC) Yes (vendor-managed)
Grafana Cloud ~$15K-$20K+ Usage-based Native If self-hosted Yes (OSS)
Elastic APM ~$8K-$15K Deployment-based Partial If self-hosted Yes
New Relic ~$20K-$25K+ Ingest + per-user Partial SaaS only No
Datadog ~$30K-$45K+ Host + feature-based Partial* SaaS only No
Dynatrace ~$20K-$35K+ GiB-hour + commit Partial Managed option Managed
Splunk ~$35K-$60K+ Host + enterprise contract Partial On-prem option Yes (legacy)

* Datadog OTel metrics are often billed as custom metrics. All estimates are directional, based on public rate cards. Vendor discounts and EDP commitments can significantly reduce SaaS costs.

1. CubeAPM

Best for: DevOps and platform teams that want full-stack observability inside their own cloud without SaaS data egress, pricing sprawl, or DIY self-hosting overhead.

Overview

CubeAPM is a self-hosted, OpenTelemetry-native, full-stack observability platform that runs inside your own AWS, GCP, or Azure VPC. Traces, logs, and metrics never leave your infrastructure boundary - which means zero cloud egress cost and full data sovereignty by architecture, not by policy. CubeAPM handles upgrades, patches, and platform operations; you provide the infrastructure.

Cost Profile

CubeAPM eliminates five of the six cost dimensions outlined above by architecture. Self-hosted deployment eliminates egress. Ingestion-based pricing of $0.15/GB eliminates custom metrics surcharges, per-host fees, and retention premiums. Accommodating unlimited users eliminates seat fees. The only billing dimension is data volume - a single number that engineering teams can forecast and control.

  • Pricing: $0.15/GB of data ingested. No per-host, per-seat, per-core, or custom metrics fees.

  • Egress: $0 (self-hosted in VPC).

  • Retention: Unlimited at no additional cost.

  • Users: Unlimited included.

  • At 30TB/month: ~$5,100/month all-in ($4,500 license + ~$600 infra).

Delhivery: 75% savings after replacing three separate monitoring tools. Mamaearth: ~70% savings, migrated in under an hour. The world's largest bus aggregator - redBus (part of MakeMyTrip Limited (NASDAQ: MMYT), 8+ countries): 4x faster dashboards, 50% faster MTTR.

Key Features

  • Full MELT observability: APM, logs, infrastructure, Kubernetes, Kafka monitoring, RUM, synthetic monitoring, and error tracking

  • OpenTelemetry-native: Compatible with OpenTelemetry, Datadog, New Relic, Elastic, and Prometheus agents for incremental migration

  • Self-hosted, vendor-managed: Deploys in your VPC. Your monitoring stays up even if the internet doesn't

  • MCP server: CubeAPM provides an MCP server that customers can use to query CubeAPM in natural language.

  • 800+ integrations: Kubernetes, synthetic monitoring, RUM, and error tracking included

  • AI-based Smart sampling: Automatically prioritizes error and high-latency traces while reducing redundant data, which keeps storage costs low without sacrificing diagnostic depth 

  • Ratings: Capterra 5/5, G2 5/5. High Performer, G2 Spring 2026 APM Grid Report. #4 easiest-to-use APM tools on G2

Pros

  • Eliminates 5 of 6 major observability cost traps by architecture

  • 70-75% cheaper than enterprise SaaS incumbents at scale

  • Single billing dimension - no surprises from metrics, hosts, or users

  • Complete data ownership - telemetry never leaves your VPC

  • Direct engineering support via WhatsApp and Slack - minute-level response during incidents.

Cons

  • Requires self-hosted deployment in the cloud or on-prem; may not suit teams looking for a SaaS-only model

  • AI/ML anomaly detection is growing, but not as mature as Dynatrace Davis AI.

  • SSO/RBAC less mature than enterprise SaaS incumbents

2. Grafana Cloud (LGTM Stack)

Best for: Teams already running Prometheus that want usage-based observability pricing with OTel-native pipelines and cost-reduction tooling.

Overview

Grafana Labs assembled the LGTM stack - Loki (logs), Grafana (dashboards), Tempo (traces), Mimir (metrics) - into a unified observability platform. Paired with Grafana Alloy (an OTel Collector distribution), it provides dedicated OTLP endpoints that auto-route signals to the right backend. The platform is fully OTel-native with no custom metrics penalty, which removes one of the most common cost traps in the category.

Cost Profile

Grafana Cloud's pricing is usage-based across telemetry types: logs at ~$0.55/GB effective (30-day retention), traces at $0.50/GB, metrics at $8 per 1,000 active series, and a $19/month platform fee. Adaptive Metrics and Adaptive Logs features actively reduce ingestion volume, which can materially lower costs. The self-hosted path (free, open-source components) eliminates SaaS fees entirely - but operating LGTM at scale requires dedicated SRE capacity.

  • Managed cloud at 30TB/month: ~$15,000-$20,000+/month.

  • Enterprise minimum: $25K/year.

  • Self-hosted: Free software; you cover infrastructure and operational overhead.

Key Features

  • LGTM stack: Mimir for metrics, Loki for logs, Tempo for traces - all cost-aware

  • Adaptive Metrics/Logs: Automatically reduces low-value ingestion to cut costs

  • Strongest dashboarding and visualization in the category

  • Cost attribution features for tracking telemetry spend by team or namespace

  • k6 performance testing integrated for load testing

Pros

  • No custom metrics penalty - OTel metrics priced the same as native metrics

  • Adaptive features actively help reduce billing

  • Self-hosted path available for cost-driven teams with SRE capacity

  • Transparent, usage-based pricing with no hidden dimensions

Cons

  • Self-hosting LGTM at scale requires dedicated SRE expertise.

  • Managed cloud costs still grow with volume - at 30TB+, the bill is substantial.

  • No built-in AI/ML anomaly detection.

  • APM experience less polished than purpose-built APM platforms; teams looking for lower operational overhead may find self-hosted alternatives with vendor-managed operations more practical

3. Elastic APM

Best for: Teams already on the ELK stack that want to add APM without introducing another vendor or another bill.

Overview

Elastic APM is the distributed tracing and application monitoring component of the Elastic Stack. For teams already indexing logs in Elasticsearch and visualizing in Kibana, adding APM extends their existing investment at zero incremental licensing cost for the self-hosted path. It provides distributed tracing, service maps, error tracking, and MELT correlation.

Cost Profile

Self-hosted Elastic APM is free under the SSPL license - you cover infrastructure. Elastic Cloud (managed) uses deployment-based pricing. Serverless Observability starts from $0.07/GB ingested for log essentials. The cost advantage is strongest for teams that already run Elasticsearch, where the marginal cost of adding APM is low.

  • Elastic Cloud at 30TB/month: ~$8,000-$15,000/month.

  • Self-hosted: Free software; significant operational overhead at scale.

Key Features

  • Native Elasticsearch integration: APM data correlates directly with log indices

  • OpenTelemetry compatible across deployment modes

  • Machine learning-based anomaly detection included

  • Available self-hosted (SSPL license) or via Elastic Cloud

  • RUM via JavaScript agent for frontend services

Pros

  • Zero incremental licensing cost if already running Elastic for logs

  • Strong log + trace correlation in the same query interface

  • Self-hosted option keeps all telemetry on your infrastructure

  • ML-based anomaly detection included

Cons

  • Significant operational overhead to self-host at scale with high telemetry volume.

  • KQL (Kibana Query Language) is less developer-friendly than SQL or PromQL.

  • 2021 SSPL licensing change - review for open-source compliance.

  • APM experience is less polished than purpose-built APM tools; teams using self-hosted alternatives with vendor-managed operations may find less operational burden.

4. New Relic

Best for: Teams that want a unified telemetry store with a generous free tier and are comfortable managing dual-axis costs at scale.

Overview

New Relic stores all telemetry in NRDB, a unified data store queried via NRQL. The free tier - 100GB/month plus one full platform user - is genuinely useful for small teams. At scale, however, costs compound across two independent axes: data ingest and per-user seat fees, which makes the total bill harder to predict as both dimensions grow.

Cost Profile

New Relic charges $0.40/GB for data ingest on the standard plan, with full platform users at $99 to $349 per user per month for full platform access. Data Plus (90-day retention) raises the ingest rate to $0.60/GB - a 50% premium. The newer Compute Capacity Unit (CCU) model introduces additional billing dimensions that scale with query and incident activity.

  • At 30TB/month: ~$20,000-$25,000+/month.

  • Free tier: 100GB/month + 1 full platform user.

Key Features

  • NRDB unified telemetry store with NRQL query language

  • Free tier: 100GB/month, useful for small teams or staging environments

  • Broad coverage: APM, infrastructure, browser, synthetics, K8s, incident workflows

  • OpenTelemetry support via OTLP 

  • AI-assisted observability and alert coverage analysis

Pros

  • Generous free tier for small teams and proof-of-concept workloads

  • Unified query language, NRQL across all telemetry types

  • Broad full-stack coverage in one SaaS platform

  • OpenTelemetry support for vendor-neutral instrumentation

Cons

  • Dual-axis billing (data + users) makes cost forecasting difficult at scale.

  • 8-day default retention - extending to 30 or 90 days adds a significant cost.

  • SaaS-only - telemetry leaves your infrastructure; for teams where data residency is a hard requirement, self-hosted platforms are worth evaluating before committing.

  • NRQL lock-in: every dashboard and alert is non-portable to other platforms.

5. Datadog

Best for: Teams that want the broadest SaaS integration ecosystem and have the budget to manage multi-dimensional billing complexity.

Overview

Datadog is the largest commercial observability platform with 1000+ integrations and feature breadth that covers APM, logs, security, RUM, synthetics, and network monitoring. The trade-off is pricing complexity: Datadog bills on hosts, custom metrics, log ingestion, log indexing, APM spans, and RUM sessions simultaneously. At scale, the interaction between these billing dimensions creates significant cost unpredictability.

Cost Profile

Multi-dimensional billing: hosts (~$24/host/month APM) + custom metrics (up to 30-52% of total bill at scale) + log ingestion ($0.10/GB) + log indexing (~$2.50/million events at 30 days) + APM spans + RUM sessions. OTel metrics are often billed as custom metrics, which penalizes teams standardizing on open instrumentation.

  • At 30TB/month: ~$30,000-$45,000+/month.

  • Breakdown: 100 hosts ~$2,400 + log ingest 20TB ~$2,000 + log indexing ~$30,000 + APM spans ~$3,000-5,000 + custom metrics ~$5,000+. Log indexing is the dominant cost driver.

Key Features

  • 1000+ integrations - largest ecosystem in the category

  • Watchdog AI for proactive anomaly detection

  • Kubernetes Explorer with pod, deployment, and resource-level visibility

  • Network Performance Monitoring for container-to-container traffic

  • Deep CI/CD and deployment tracking

Pros

  • Broadest integration ecosystem and feature coverage

  • Strong Kubernetes monitoring with dedicated cluster views

  • Mature CI/CD, security, and network monitoring add-ons

  • Watchdog AI surfaces anomalies proactively

Cons

  • Multi-dimensional billing makes cost forecasting difficult - custom metrics alone can represent 30-52% of the bill.

  • OTel metrics billed as custom metrics - adds cost for teams adopting open standards.

  • Mostly SaaS-based deployment - on-prem/CloudPrem added recently, although still in preview; for teams where this is a hard requirement, full self-hosted platforms are worth evaluating before committing.

  • Log indexing is the dominant cost driver at scale and is easy to underestimate.

6. Dynatrace

Best for: Large enterprises that need AI-automated root cause analysis and are willing to pay a premium plus annual commitment.

Overview

Dynatrace differentiates with its Davis AI engine, which automatically maps service dependencies and performs causal root-cause analysis. Gartner ranks Dynatrace highest in "Ability to Execute" among observability vendors. The Dynatrace Managed option provides genuine data residency. The trade-off is a mandatory annual commitment and consumption-based pricing with multiple rate-card units, which makes cost forecasting complex.

Cost Profile

Usage-based with separate rate-card units. Full-Stack Monitoring at $0.08/hr per 8 GiB host, log ingestion at $0.20/GiB, retention at $0.0007/GiB-day. Mandatory annual minimum commitment (~$2K/month). The 4 GiB minimum billing for small hosts creates a penalty for lightweight services and sidecars.

  • At 30TB/month: ~$20,000-$35,000+/month.

  • Breakdown: 100 hosts ~$4,700 + log ingest 20TB ~$4,100 + retention ~$430 + traces/metrics/APM + DPS commitment overhead.

Key Features

  • Davis AI: Automatic baselining, anomaly detection, and causal root-cause analysis

  • Dynatrace Managed for genuine data residency

  • Automatic pod injection - no manual instrumentation per service

  • OpenTelemetry support via OTLP API and Dynatrace Collector

  • Strong compliance and enterprise security features

Pros

  • Best automated root cause analysis in the category (Davis AI)

  • Managed deployment option for data residency

  • Automatic instrumentation reduces setup overhead

  • Strong enterprise compliance and security posture

Cons

  • Mandatory annual commitment - no month-to-month flexibility.

  • GiB-hour pricing is harder to forecast than simple per-GB models.

  • 4 GiB minimum billing for small hosts penalizes lightweight services.

  • Proprietary OneAgent creates vendor lock-in - harder to move to OTel-native backends later.

7. Splunk Observability Cloud

Best for: Organizations with deep existing Splunk SIEM investment that want to extend into observability without introducing a new vendor.

Overview

Splunk Observability Cloud provides full-fidelity distributed tracing (no default sampling) and deep integration with Splunk's SIEM platform. For organizations already paying for Splunk Enterprise or Splunk Cloud, adding observability extends the existing investment. For new deployments, however, the cost structure is among the highest in the category, and the deployment effort is significant.

Cost Profile

Splunk pricing combines per-host fees ($15/host/month base) with separate enterprise contracts for APM and log management. The full-fidelity tracing model means no sampling-related data loss, but it also means higher storage costs. The total cost at scale is the highest among the tools in this guide.

  • At 30TB/month: ~$35,000-$60,000+/month.

  • Pricing model: $15/host/month base + APM + log management under enterprise contract.

Key Features

  • Full-fidelity distributed tracing - no default sampling

  • Deep Splunk SIEM integration for security + observability correlation

  • Tag-based analytics for exploring traces without pre-defined queries

  • On-premises deployment option for data residency

  • SignalFx heritage for real-time streaming analytics

Pros

  • Full-fidelity tracing captures every transaction without sampling

  • Deep SIEM integration for combined security and observability workflows

  • On-premises option available for regulated environments

  • Strong real-time streaming analytics for high-volume environments

Cons

  • Highest cost structure among the tools evaluated - hard to justify without existing Splunk investment.

  • Significant deployment effort and operational complexity.

  • Value proposition depends heavily on existing Splunk SIEM commitment.

  • Vendor lock-in - migrating away from Splunk's data format and query language is substantial.

How to Choose the Right APM Tool for Cost Reduction

  • Choose CubeAPM if cost predictability and data sovereignty are priorities. Ingestion-based pricing of $0.15/GB with no per-host, per-seat, or custom metrics fees - a single billing dimension that eliminates the cost traps most vendors depend on.

  • Choose Grafana Cloud if your team already runs Prometheus and wants usage-based pricing with active cost-reduction tooling (Adaptive Metrics/Logs). Consider the self-hosted path if you have SRE capacity.

  • Choose Elastic APM if you already run the ELK stack and want to add APM at minimal incremental cost. Self-hosted is free; operational overhead at scale is the main trade-off.

  • Choose New Relic if the 100GB free tier covers your needs or you want unified NRQL querying. Model dual-axis costs (data + users) carefully before scaling up.

  • Choose Datadog if you need the broadest integration ecosystem and can manage multi-dimensional billing. Run third-party cost calculators to model custom metrics and log indexing before committing.

  • Choose Dynatrace if enterprise AI automation (Davis AI) justifies the cost premium and annual commitment. Factor in the 4 GiB minimum host billing for small services.

  • Choose Splunk if you already have a deep Splunk SIEM investment and want full-fidelity tracing. For net-new observability without existing Splunk contracts, the cost is hard to justify.

Final Thoughts

Observability costs are not fixed - they are a function of pricing architecture. Multi-dimensional billing models that charge on hosts, users, custom metrics, and data volume simultaneously create cost unpredictability that grows with infrastructure scale. Single-dimension models based on data ingestion are inherently more forecastable, but the total cost of ownership must also account for cloud egress, retention premiums, and the operational overhead of self-hosting.

The practical approach is straightforward: model your actual telemetry volume across all billing dimensions before committing. Run the numbers at your current scale and at 2-3x your current scale - the gap between pricing architectures widens as usage grows. Teams that have already moved to newer self-hosted platforms report 60-75% savings, and that gap only widens at enterprise volumes. For teams where data residency matters, self-hosted alternatives eliminate both egress costs and compliance overhead in a single architectural decision.

The numbers at your scale will make the decision clearer than any feature matrix.

Methodology

*Methodology: 30TB/month (~20TB logs, 7TB traces, 3TB metrics), 30% log indexing, 500K metric series, core observability only. Based on public rate cards, early 2026. Vendor discounts and EDP commitments can significantly reduce SaaS costs.*

Frequently Asked Questions

1. What is the biggest hidden cost in APM tools?

Custom metrics surcharges. In Kubernetes environments, labels generate high-cardinality metrics automatically, which many vendors bill as custom metrics at premium rates. This can represent 30-52% of the total APM bill at scale. OTel metrics are often classified as custom metrics, adding further cost for teams standardizing on open instrumentation.

2. How much do cloud egress charges add to observability costs?

At 30TB/month, sending telemetry to any external SaaS platform costs approximately $3,000/month ($36,000/year) in cloud provider egress fees alone. This cost scales linearly with telemetry volume and does not appear on the APM vendor's invoice. Self-hosted platforms running inside your VPC eliminate this entirely.

3. Is self-hosted APM actually cheaper than SaaS?

At scale, yes - typically 60-75% cheaper. The total cost comparison must include infrastructure, operational overhead, and egress. Self-hosted platforms eliminate egress costs, retention premiums, and per-user fees. The operational overhead depends on whether the platform is vendor-managed (minimal effort) or fully self-managed (significant SRE investment).

4. Can I reduce costs without switching APM platforms?

Partially. Strategies include reducing log indexing ratios, implementing sampling, dropping low-value metrics, and negotiating volume discounts. However, if the pricing architecture itself creates cost compounding (per-host + per-metric + per-user), these optimizations have diminishing returns. Structural cost reduction requires a pricing model change.

5. How does OpenTelemetry affect APM costs?

OpenTelemetry reduces vendor lock-in, which gives you leverage to switch platforms when costs escalate. However, some vendors classify OTel metrics as custom metrics, which can increase rather than decrease costs. Choose backends that treat OTel-native metrics the same as proprietary metrics to avoid this penalty.


Previous
Previous

Cloud-Based AI 3D Generation Is Becoming a Real Part of the Product Development Stack

Next
Next

Enduring Success: Keeping a Family Business Relevant Across Generations