Member-only story
Optimizing Prometheus Costs — How a Minor Change Triggered a Major Cost Surge
Prometheus stands as the industry standard for monitoring applications in the cloud, thanks to its ability to collect metrics at scale and its robust query language, facilitating the creation of complex, actionable alerts.
However, when data collection, analysis, and storage reach substantial volumes, we inevitably encounter a corresponding surge in expenses.
That is to be expected. But what happens when we experience a surge in expenses that does not match the company’s scale?
How One Minor Change Triggered a Cost Surge
As a growing startup, we’ve seen our metrics usage steadily increase as we acquired more customers and dealt with more complex operations. That naturally resulted in a significant increase in expenses. Initially, this increase didn’t raise any red flags for us, as it was within our expectations to a certain extent. However, as the metrics usage continued to surge, we realized that we might have a real issue on our hands. Consequently, we initiated a thorough examination of our operations to identify key areas in need of optimization.

We started by looking at our most memory-intensive metrics, that is our high-cardinality metrics. We decided to investigate just the top…