The state of observability in 2025: a deep dive on our third annual Observability Survey

GFLab8

And while those two items were almost universally ranked as the top two across every demographic, the exact ranking varied by a number of factors. Smaller organizations, those using SaaS, and those with fewer observability technologies tend to favor training-based alerts by a fairly large margin, while larger organizations, those using self-managed setups, and those with more technologies tend to favor faster root cause analysis.

Cost controls remain a priority, but the focus is shifting to value

Cost is always top of mind for businesses, especially these days, so it isn’t a surprise that three-quarters of companies say cost is an important criteria when selecting observability technologies. There’s also a strong relationship between cost concerns and cost as a selection criteria, as 88% of those concerned that observability costs too much are prioritizing cost in their selection, and 85% of those who say costs are too difficult to predict and budget for are prioritizing it in their selections.

There’s also the question of how much you should spend on observability relative to the rest of your infrastructure. There’s no industry standard for what percentage that should translate to, and that was clear from the survey responses. The average across all organizations was 17%, though some respondents say it’s 0% (presumably because they’re using OSS tooling, though that doesn’t factor in overhead costs), while others said it’s upwards of 50%. However, the median and mode both came in at 10%, so perhaps that’s a benchmark to keep in mind going forward.

In terms of biggest concerns, there wasn’t a clear frontrunner. Only a minority of respondents explicitly cited cost—either too high (37%) or too unpredictable (29%)—but there are other expenses to consider beyond your monthly observability bill. In fact, you could argue that the most commonly cited concerns (complexity/overhead and signal-to-noise ratio) carry their own hidden costs.

Managing a complex system on your own can translate to lots of engineering hours at scale, which quickly gets expensive. And the signal-to-noise problem can be a byproduct of collecting too much data, which can be especially problematic if you use a vendor that charges based on telemetry ingestion. But the underlying fear associated with both of these concerns is the potential for outages and an inability to respond promptly—a scenario that could have much larger financial consequences than any observability bill.

Moreover, cost savings was the least important outcome for organizations using or interested in service level objectives (SLOs), with teams instead focusing on MTTR, accountability, and reduced alert nose. Taken collectively, these stats indicate that organizations are focused on getting value from their tools and techniques rather than just hunting for the cheapest option.

GFLab9

In fact, the percentage of respondents who cited “convincing management of the value of observability” as a top concern fell year over year (28% vs. 23%). That makes sense when you consider that three-quarters of all companies say observability is business-critical at either the CTO, VP, or director level, with CTO being the most common response (33%).

+ This was an optional, open-ended question. Inconsistent or inaccurate responses were removed from the dataset, leaving a base of 294 responses.

SLOs and other emerging tools and techniques are starting to take hold

One of the best ways to combat cost and complexity concerns is through SLOs, which establish measurable goals related to the quality of service provided to users. Though not entirely new, they haven’t quite cemented themselves in the observability ethos, in part because it’s as much about cultural change as it is about the actual technology.

Still, nearly three-quarters (73%) of all organizations are actively investigating or using SLOs today, and adoption rates are higher among those using more mature tools and techniques, including traces, profiles, and centralized observability, as well as those juggling more observability technologies and data sources.

GFLab10

Adoption varies by role, with SREs (29%) much more likely to say their organization is using them in production in some capacity (in production, using extensively, using exclusively) than developers (18%). There’s also varying degrees of interest at the managerial level, with 32% of engineering directors saying their organization uses them, compared to just 14% of CTOs.

In terms of what organizations hope to get out of SLOs, the most common response was reduced MTTR (33%), followed by better accountability (25%), reduced alert noise (16%), and cost savings (14%)

Full-stack observability, FinOps, and LLM observability

Another emerging area that’s getting even more attention today is unified application and infrastructure observability, with 85% of all organizations either using or looking into it to get visibility into their entire software stack. They’re especially popular with companies moving beyond just logs and metrics, with 45% of those using profiles also using full-stack observability in production in some capacity (in production, using extensively, using exclusively), as well as 42% of those using traces, compared to just 34% across all organizations.

GFLab11

We also asked about two other emerging areas: LLM observability, and FinOps. More than half of all organizations are either looking into or using both, but neither is seeing a ton of use in production in any capacity: 7% for LLM observability, and 15% for FinOps.

Methodology

A total of 1,255 observability practitioners and leaders around the world participated in our third annual Observability Survey between Sept. 18, 2024, and Jan. 2, 2025. We developed the questions internally and promoted the survey online through our blog, website, social media channels, and newsletters, and through the help of our Grafana Champions. Our Events and Community teams also collected responses in-person at ObservabilityCON 2024 and ObservabilityCON on the Road, as well as third-party events like AWS re:Invent, KubeCon North America, KubeCon India, and local Meetups.

The data analysis was conducted with Censuswide. Censuswide abides by and employs members of the Market Research Society and follows the MRS code of conduct and ESOMAR principles. Censuswide is also a member of the British Polling Council.

Grafana Labs
View Profile
Make An Enquiry

Fixing Kubernetes Asset Chaos: Why Most Enterprise ...

DeepL on DeepL: How we customize translations with ...