Member-only story

Optimizing Prometheus Costs — How a Minor Change Triggered a Major Cost Surge

Guy Erez
Level Up Coding
Published in
5 min readFeb 12, 2024

Prometheus stands as the industry standard for monitoring applications in the cloud, thanks to its ability to collect metrics at scale and its robust query language, facilitating the creation of complex, actionable alerts.

However, when data collection, analysis, and storage reach substantial volumes, we inevitably encounter a corresponding surge in expenses.
That is to be expected. But what happens when we experience a surge in expenses that does not match the company’s scale?

How One Minor Change Triggered a Cost Surge

As a growing startup, we’ve seen our metrics usage steadily increase as we acquired more customers and dealt with more complex operations. That naturally resulted in a significant increase in expenses. Initially, this increase didn’t raise any red flags for us, as it was within our expectations to a certain extent. However, as the metrics usage continued to surge, we realized that we might have a real issue on our hands. Consequently, we initiated a thorough examination of our operations to identify key areas in need of optimization.

AI Generated image powered by DALL-E 3

We started by looking at our most memory-intensive metrics, that is our high-cardinality metrics. We decided to investigate just the top

--

--

Written by Guy Erez

Software Engineering Team Lead & On a Quest to Help People Live Purpose-Driven Fulfilling Lives

Responses (3)

Write a response