Cortex: Prometheus as a Service, One Year On - Part III

Tom Wilkie
on Oct 24, 2017

This blog post is part 3 of a series on the recent talk I gave at PromCon 2017 in Munich about what we’ve learnt running Cortex, our open source, horizontally scalable Prometheus implementation for over a year. Part 1 and Part 2 can be found under those links.

Problem #5: Cost

We looked at cost of running Cortex, and it was huge - IIRC ~90% of our AWS bill was going on S3 write operations. We considered using bigger chunks, but compression gets worse. We briefly flirted with the idea of “super chunks” - combining multiple chunks into one S3 object - but decided this would be too complicated.

We went back to the drawing board and compared the cost of various operations. DynamoDB and S3 billing information is published in non-comparable units, but with a bit of munging you can force them into units for comparison:

	S3	DynamoDB
IOP Cost ($/IOP)	5x10^-6	2x10^-7
Storage Cost ($/GB/Month)	0.023	0.250

As you can see, S3 is an order of magnitude more expensive to write to, but an order of magnitude cheaper to store stuff in, than DynamoDB. This can’t be by coincidence… Given that information, we switched to storing the 1KB chunks in DynamoDB, saving ~50% of our AWS bill.

Problem #6: DynamoDB, again

After moving the chunks to DynamoDB it became clear than almost all of the cost of running Cortex was paying for DynamoDB. On top of this, DynamoDB has been a pain to use and a source of most of our issues - the client code is complicated to reflect the various retires, backoffs, batching etc we need to have to make it fast and reliable. Finally, I’ve never been particularly satisfied with DynamoDB’s range query performance.

So I bit the bullet and ported Cortex to Google’s Bigtable. The result has been about 1/10th of the code, better performance and about 1/3rd of the cost. Also, Google gave me a bunch of free credits on Google Cloud Platform (GCP).

Closing thoughts

Most of the problems we’ve had in the past year were running Cortex on DynamoDB - the move to Bigtable has resolved all of them. There is a lot more work to do on Cortex before I could consider it “finished”, and some of my plans for the next year include:

Separate ingester index for better load balancing - currently timeseries are distributed by metric name, which can lead to uneven load amongst the ingesters.
Use prometheus/tsdb for the ingesters - will give us a write ahead log for better durability, and increase performance and scalability of the ingesters.
Etcd & gossip for ring storage - make it simpler and easier for people to adopt Cortex.
Chunks in Google Cloud Storage - the GCP backend currently writes chunks to Bigtable, like the AWS backend does for DynamoDB. The economics are different on GCP, so we should write them to GCS.

Cortex is completely opensource so if there is anything else you want to see supported, we welcome contributions! And if you looking for help getting started or operating a Cortex cluster, contact us.

Kausal's mission is to enable software developers to better understand the behaviour of their code. We combine Prometheus monitoring, log aggregation and OpenTracing-compatible distributed tracing into a hosted observability service for developers.