Skip to main content

One post tagged with "stream-processing"

View All Tags

Watch out for Kafka costs

· 3 min read
Paweł Mantur
Solutions Architect

Not obvious Confluent Cloud costs

When using AWS or any other cloud we need to be aware about network traffic charges, especially cross-zone and cross-region data transfer fees.

Example deployment:

  • Cluster in Confluent Cloud, hosted in AWS (1 CKU is limited to single AZ, from 2 CKUs we have multi-AZ setup)
  • AWS Private Link for private connectivity with AWS
  • Kafka clients running in AWS EKS (multi-AZ)

Cross-AZ data transfer costs

Be aware that if Kafka broker node happens to be running in a different AZ than Kafka clients, then additional data transfer charges will apply for cross-AZ traffic.

Kafka has the concept of Racks that allows to co-locate Kafka clients and broker nodes. More details about this setting in context of AWS and Confluent can be fond here: https://docs.confluent.io/cloud/current/networking/fetch-from-follower.html

Data transfer costs within AZ

But even if we manage to keep connections within same AZ, is consuming data from Kafka for free?

Imagine architecture in which single topic contains data dedicated to multiple consumers. Every consumer reads only relevant data and filters-out (ignores) other messages. Sounds straightforward, but we need to be aware that each consumer to filter data, first needs to read the message. So even not relevant data creates traffic from broker to clients.

Kafka does not support filtering on broker side. There is open feature request for that.

If we have a lot of consumers we will have a lot of outbound traffic (topic throughput x number of consumers). Having additional infrastructure like AWS Private Lnk for such traffic will generate extra costs.

Extreme scenario - generating costs for nothing

Another interesting scenario is implementing a retry policy when message processing fails. For example when every message needs to be delivered to an endpoint which is down. If Kafka consumer tries to deliver the message very aggressively (for example every second or even worse in an infinite loop), and every retry is a new read from topic, then we can easily generate a lot of reads.

We may be fooled by most of the documentation that states that reading from Kafka is very efficient as it is basically about reading sequentially from log. From broker costs perspective, multiple consumers is not a significant costs factor compared to things like written data volumes, but we still need to be mindful of data transfer costs that may apply for reads. Confluent charges 0.05$/GB for Egress traffic. Total costs may grow quickly in a busy cluster with active producers and multiple reads of every message.