Librdkafka Changelog¶
This page is a copy of the releases of librdkafka.
2.12.1 (2025-10-21)¶
librdkafka v2.12.1 is a maintenance release:
- Restored macOS binaries compatibility with macOS 13 and 14 (#5219).
Fixes¶
General fixes¶
- Fix to restore macOS 13 and 14 compatibility in prebuilt binaries present in
librdkafka.redist. Happening since 2.12.0 (#5219).
Checksums¶
Release asset checksums:
* v2.12.1.zip SHA256 da7571a0c1dc374aabb18af6ca01411d4bc597d321977980c8d3211ec5adf696
* v2.12.1.tar.gz SHA256 ec103fa05cb0f251e375f6ea0b6112cfc9d0acd977dc5b69fdc54242ba38a16f
2.12.0 (2025-10-08)¶
librdkafka v2.12.0 is a feature release:
KIP-848 – General Availability¶
Starting with librdkafka 2.12.0, the next generation consumer group rebalance protocol defined in KIP-848 is production-ready. Please refer the following migration guide for moving from classic to consumer protocol.
Note: The new consumer group protocol defined in KIP-848 is not enabled by default. There are few contract change associated with the new protocol and might cause breaking changes. group.protocol configuration property dictates whether to use the new consumer protocol or older classic protocol. It defaults to classic if not provided.
Enhancements and Fixes¶
- Support for OAUTHBEARER metadata based authentication types, starting with Azure IMDS. Introduction available (#5155).
- Fix compression types read issue in GetTelemetrySubscriptions response for big-endian architectures (#5183, @paravoid).
- Fix for KIP-1102 time based re-bootstrap condition (#5177).
- Fix for discarding the member epoch in a consumer group heartbeat response when leaving with an inflight HB (#4672).
- Fix for an error being raised after a commit due to an existing error in the topic partition (#4672).
- Fix double free of headers in
rd_kafka_producevamethod (@blindspotbounty, #4628). - Fix to ensure
rd_kafka_query_watermark_offsetsenforces the specified timeout and does not continue beyond timeout expiry (#5201). - New walkthrough in the Wiki about configuring Kafka cross-realm authentication between Windows SSPI and MIT Kerberos.
Fixes¶
General fixes¶
- Issues: #5178.
Fix for KIP-1102 time based re-bootstrap condition.
Re-bootstrap is now triggered only after
metadata.recovery.rebootstrap.trigger.mshave passed since first metadata refresh request after last successful metadata response. The calculation was since last successful metadata response so it's possible it did overlap with the periodictopic.metadata.refresh.interval.msand cause a re-bootstrap even if not needed. Happening since 2.11.0 (#5177). - Issues: #4878.
Fix to ensure
rd_kafka_query_watermark_offsetsenforces the specified timeout and does not continue beyond timeout expiry. Happening since 2.3.0 (#5201).
Telemetry fixes¶
- Issues: #5179 . Fix issue in GetTelemetrySubscriptions with big-endian architectures where wrong values are read as accepted compression types causing the metrics to be sent uncompressed. Happening since 2.5.0. Since 2.10.1 unit tests are failing when run on big-endian architectures (#5183, @paravoid).
Consumer fixes¶
- Issues: #5199 Fixed an issue where topic partition errors were not cleared after a successful commit. Previously, a partition could retain a stale error state even though the most recent commit succeeded, causing misleading error reporting. Now, successful commits correctly clear the error state for the affected partitions Happening since 2.4.0 (#4672).
Producer fixes¶
- Issues: #4627.
Fix double free of headers in
rd_kafka_producevamethod in cases where the partition doesn't exist. Happening since 1.x (@blindspotbounty, #4628).
## Checksums
Release asset checksums:
* v2.12.0.zip SHA256 9b2f373e03f3d5d87c2075b3ce07ee9ea3802eea00cea41b99d8351a68d8a062
* v2.12.0.tar.gz SHA256 1355d81091d13643aed140ba0fe62437c02d9434b44e90975aaefab84c2bf237
2.11.1 (2025-08-18)¶
librdkafka v2.11.1 is a maintenance release:
- Made the conditions for enabling the features future proof (#5130).
- Avoid returning an all brokers down error on planned disconnections (#5126).
- An "all brokers down" error isn't returned when we haven't tried to connect to all brokers since last successful connection (#5126).
Fixes¶
General fixes¶
-
Issues: #4948, #4956. Made the conditions for enabling the features future proof, allowing to remove RPC versions in a subsequent Apache Kafka version without disabling features. The existing checks were matching a single version instead of a range and were failing if the older version was removed. Happening since 1.x (#5130).
-
Issues: #5142. Avoid returning an all brokers down error on planned disconnections. This is done by avoiding to count planned disconnections, such as idle disconnections, broker host change and similar as events that can cause the client to reach the "all brokers down" state, returning an error and since 2.10.0 possibly starting a re-bootstrap sequence. Happening since 1.x (#5126).
-
Issues: #5142. An "all brokers down" error isn't returned when we haven't tried to connect to all brokers since last successful connection. It happened because the down state is cached and can be stale when a connection isn't needed to that particular broker. Solved by resetting the cached broker down state when any broker successfully connects, so that broker needs to be tried again. Happening since 1.x (#5126).
Checksums¶
Release asset checksums:
* v2.11.1.zip SHA256 4a63e4422e5f5bbbb47f0ac1200e2ebd1f91b7b23f0de1bc625810c943fb870e
* v2.11.1.tar.gz SHA256 a2c87186b081e2705bb7d5338d5a01bc88d43273619b372ccb7bb0d264d0ca9f
2.11.0 (2025-07-03)¶
librdkafka v2.11.0 is a feature release:
- KIP-1102 Enable clients to rebootstrap based on timeout or error code (#4981).
- KIP-1139 Add support for OAuth jwt-bearer grant type (#4978).
- Fix for poll ratio calculation in case the queues are forwarded (#5017).
- Fix data race when buffer queues are being reset instead of being initialized (#4718).
- Features BROKER_BALANCED_CONSUMER and SASL_GSSAPI don't depend on JoinGroup v0 anymore, missing in AK 4.0 and CP 8.0 (#5131).
- Improve HTTPS CA certificates configuration by probing several paths when OpenSSL is statically linked and providing a way to customize their location or value (#5133).
Fixes¶
General fixes¶
- Issues: #4522. A data race happened when emptying buffers of a failing broker, in its thread, with the statistics callback in main thread gathering the buffer counts. Solved by resetting the atomic counters instead of initializing them. Happening since 1.x (#4718).
- Issues: #4948 Features BROKER_BALANCED_CONSUMER and SASL_GSSAPI don't depend on JoinGroup v0 anymore, missing in AK 4.0 and CP 8.0. This PR partially fixes the linked issue, a complete fix for all features will follow. Rest of fixes are necessary only for a subsequent Apache Kafka major version (e.g. AK 5.x). Happening since 1.x (#5131).
Telemetry fixes¶
- Issues: #5109 Fix for poll ratio calculation in case the queues are forwarded. Poll ratio is now calculated per-queue instead of per-instance and it allows to avoid calculation problems linked to using the same field. Happens since 2.6.0 (#5017).
Checksums¶
Release asset checksums:
* v2.11.0.zip SHA256 9e76a408f0ed346f21be5e2df58b672d07ff9c561a5027f16780d1b26ef24683
* v2.11.0.tar.gz SHA256 592a823dc7c09ad4ded1bc8f700da6d4e0c88ffaf267815c6f25e7450b9395ca
2.10.1 (2025-06-11)¶
librdkafka v2.10.1 is a maintenance release:
- Fix to add locks when updating the metadata cache for the consumer after no broker connection is available (@marcin-krystianc, #5066).
- Fix to the re-bootstrap case when
bootstrap.serversisNULLand brokers were added manually throughrd_kafka_brokers_add(#5067). - Fix an issue where the first message to any topic produced via
producevorproducevawas delivered late (by up to 1 second) (#5032). - Fix for a loop of re-bootstrap sequences in case the client reaches the
all brokers downstate (#5086). - Fix for frequent disconnections on push telemetry requests with particular metric configurations (#4912).
- Avoid copy outside boundaries when reading metric names in telemetry subscription (#5105)
- Metrics aren't duplicated when multiple prefixes match them (#5104)
Fixes¶
General fixes¶
- Issues: #5088.
Fix for a loop of re-bootstrap sequences in case the client reaches the
all brokers downstate. The client continues to select the bootstrap brokers given they have no connection attempt and doesn't re-connect to the learned ones. In case it happens a broker restart can break the loop for the clients using the affected version. Fixed by giving a higher chance to connect to the learned brokers even if there are new ones that never tried to connect. Happens since 2.10.0 (#5086). - Issues: #5057.
Fix to the re-bootstrap case when
bootstrap.serversisNULLand brokers were added manually throughrd_kafka_brokers_add. Avoids a segmentation fault in this case. Happens since 2.10.0 (#5067).
Producer fixes¶
- In case of
producevorproduceva, the producer did not enqueue a leader query metadata request immediately, and rather, waited for the 1 second timer to kick in. This could cause delays in the sending of the first message by up to 1 second. Happens since 1.x (#5032).
Consumer fixes¶
- Issues: #5051. Fix to add locks when updating the metadata cache for the consumer. It can cause memory corruption or use-after-free in case there's no broker connection and the consumer group metadata needs to be updated. Happens since 2.10.0 (#5066).
Telemetry fixes¶
- Issues: #5106.
Fix for frequent disconnections on push telemetry requests
with particular metric configurations.
A
NULLpayload is sent in a push telemetry request when an empty one is needed. This causes disconnections every time the push is sent, only when metrics are requested and some metrics are matching the producer but none the consumer or the other way around. Happens since 2.5.0 (#4912). - Issues: #5102. Avoid copy outside boundaries when reading metric names in telemetry subscription. It can cause that some metrics aren't matched. Happens since 2.5.0 (#5105).
- Issues: #5103. Telemetry metrics aren't duplicated when multiple prefixes match them. Fixed by keeping track of the metrics that already matched. Happens since 2.5.0 (#5104).
Checksums¶
Release asset checksums:
* v2.10.1.zip SHA256 7cb72c4f3d162f50d30d81fd7f7ba0f3d9e8ecd09d9b4c5af7933314e24dd0ba
* v2.10.1.tar.gz SHA256 75f59a2d948276504afb25bcb5713a943785a413b84f9099d324d26b2021f758
2.10.0 (2025-04-17)¶
librdkafka v2.10.0 is a feature release:
KIP-848 – Now in Preview¶
- KIP-848 has transitioned from Early Access to Preview.
- Added support for regex-based subscriptions.
- Implemented client-side member ID generation as per KIP-1082.
rd_kafka_DescribeConsumerGroups()now supports KIP-848-styleconsumergroups. Two new fields have been added:- Group type – Indicates whether the group is
classicorconsumer. - Target assignment – Applicable only to
consumerprotocol groups (defaults toNULL). - Group configuration is now supported in
AlterConfigs,IncrementalAlterConfigs, andDescribeConfigs. (#4939) - Added Topic Authorization Error support in the
ConsumerGroupHeartbeatresponse. - Removed usage of the
partition.assignment.strategyproperty for theconsumergroup protocol. An error will be raised if this is set withgroup.protocol=consumer. - Deprecated and disallowed the following properties for the
consumergroup protocol: session.timeout.msheartbeat.interval.msgroup.protocol.type
Attempting to set any of these will result in an error.- Enhanced handling for
subscribe()andunsubscribe()edge cases.
[!Note] The KIP-848 consumer is currently in Preview and should not be used in production environments. Implementation is feature complete but contract could have minor changes before General Availability.
## Enhancements and Fixes
- Identify brokers only by broker id (#4557, @mfleming)
- Remove unavailable brokers and their thread (#4557, @mfleming)
- Commits during a cooperative incremental rebalance aren't causing an assignment lost if the generation id was bumped in between (#4908).
- Fix for librdkafka yielding before timeouts had been reached (#4970)
- Removed a 500ms latency when a consumer partition switches to a different leader (#4970)
- The mock cluster implementation removes brokers from Metadata response when they're not available, this simulates better the actual behavior of a cluster that is using KRaft (#4970).
- Doesn't remove topics from cache on temporary Metadata errors but only on metadata cache expiry (#4970).
- Doesn't mark the topic as unknown if it had been marked as existent earlier
and
topic.metadata.propagation.max.mshasn't passed still (@marcin-krystianc, #4970). - Doesn't update partition leaders if the topic in metadata response has errors (#4970).
- Only topic authorization errors in a metadata response are considered permanent and are returned to the user (#4970).
- The function
rd_kafka_offsets_for_timesrefreshes leader information if the error requires it, allowing it to succeed on subsequent manual retries (#4970). - Deprecated
api.version.request,api.version.fallback.msandbroker.version.fallbackconfiguration properties (#4970). - When consumer is closed before destroying the client, the operations queue isn't purged anymore as it contains operations unrelated to the consumer group (#4970).
- When making multiple changes to the consumer subscription in a short time, no unknown topic error is returned for topics that are in the new subscription but weren't in previous one (#4970).
- Prevent metadata cache corruption when topic id changes (@kwdubuc, @marcin-krystianc, @GerKr, #4970).
- Fix for the case where a metadata refresh enqueued on an unreachable broker prevents refreshing the controller or the coordinator until that broker becomes reachable again (#4970).
- Remove a one second wait after a partition fetch is restarted following a leader change and offset validation (#4970).
- Fix the Nagle algorithm (TCP_NODELAY) on broker sockets to not be enabled by default (#4986).
Fixes¶
General fixes¶
- Issues: #4212 Identify brokers only by broker id, as happens in Java, avoid to find the broker with same hostname and use the same thread and connection. Happens since 1.x (#4557, @mfleming).
- Issues: #4557 Remove brokers not reported in a metadata call, along with their thread. Avoids that unavailable brokers are selected for a new connection when there's no one available. We cannot tell if a broker was removed temporarily or permanently so we always remove it and it'll be added back when it becomes available again. Happens since 1.x (#4557, @mfleming).
- Issues: #4970
librdkafka code using
cnd_timedwaitwas yielding before a timeout occurred without the condition being fulfilled because of spurious wake-ups. Solved by verifying with a monotonic clock that the expected point in time was reached and calling the function again if needed. Happens since 1.x (#4970). - Issues: #4970 Doesn't remove topics from cache on temporary Metadata errors but only on metadata cache expiry. It allows the client to continue working in case of temporary problems to the Kafka metadata plane. Happens since 1.x (#4970).
- Issues: #4970
Doesn't mark the topic as unknown if it had been marked as existent earlier
and
topic.metadata.propagation.max.mshasn't passed still. It achieves this property expected effect even if a different broker had previously reported the topic as existent. Happens since 1.x (@marcin-krystianc, #4970). - Issues: #4907 Doesn't update partition leaders if the topic in metadata response has errors. It's in line with what Java client does and allows to avoid segmentation faults for unknown partitions. Happens since 1.x (#4970).
- Issues: #4970 Only topic authorization errors in a metadata response are considered permanent and are returned to the user. It's in line with what Java client does and avoids returning to the user an error that wasn't meant to be permanent. Happens since 1.x (#4970).
- Issues: #4964, #4778 Prevent metadata cache corruption when topic id for the same topic name changes. Solved by correctly removing the entry with the old topic id from metadata cache to prevent subsequent use-after-free. Happens since 2.4.0 (@kwdubuc, @marcin-krystianc, @GerKr, #4970).
- Issues: #4970 Fix for the case where a metadata refresh enqueued on an unreachable broker prevents refreshing the controller or the coordinator until that broker becomes reachable again. Given the request continues to be retried on that broker, the counter for refreshing complete broker metadata doesn't reach zero and prevents the client from obtaining the new controller or group or transactional coordinator. It causes a series of debug messages like: "Skipping metadata request: ... full request already in-transit", until the broker the request is enqueued on is up again. Solved by not retrying these kinds of metadata requests. Happens since 1.x (#4970).
- The Nagle algorithm (TCP_NODELAY) is now disabled by default. It caused a
large increase in latency for some use cases, for example, when using an
SSL connection.
For efficient batching, the application should use
linger.ms,batch.sizeetc. Happens since: 0.x (#4986).
Consumer fixes¶
- Issues: #4059 Commits during a cooperative incremental rebalance could cause an assignment lost if the generation id was bumped by a second join group request. Solved by not rejoining the group in case an illegal generation error happens during a rebalance. Happening since v1.6.0 (#4908)
- Issues: #4970
When switching to a different leader a consumer could wait 500ms
(
fetch.error.backoff.ms) before starting to fetch again. The fetch backoff wasn't reset when joining the new broker. Solved by resetting it, given it's not needed to backoff the first fetch on a different node. This way faster leader switches are possible. Happens since 1.x (#4970). - Issues: #4970
The function
rd_kafka_offsets_for_timesrefreshes leader information if the error requires it, allowing it to succeed on subsequent manual retries. Similar to the fix done in 2.3.0 inrd_kafka_query_watermark_offsets. Additionally, the partition current leader epoch is taken from metadata cache instead of from passed partitions. Happens since 1.x (#4970). - Issues: #4970 When consumer is closed before destroying the client, the operations queue isn't purged anymore as it contains operations unrelated to the consumer group. Happens since 1.x (#4970).
- Issues: #4970 When making multiple changes to the consumer subscription in a short time, no unknown topic error is returned for topics that are in the new subscription but weren't in previous one. This was due to the metadata request relative to previous subscription. Happens since 1.x (#4970).
- Issues: #4970 Remove a one second wait after a partition fetch is restarted following a leader change and offset validation. This is done by resetting the fetch error backoff and waking up the delegated broker if present. Happens since 2.1.0 (#4970).
Note: there was no v2.9.0 librdkafka release, it was a dependent clients release only
Checksums¶
Release asset checksums:
* v2.10.0.zip SHA256 e30944f39b353ee06e70861348011abfc32d9ab6ac850225b0666e9d97b9090d
* v2.10.0.tar.gz SHA256 004b1cc2685d1d6d416b90b426a0a9d27327a214c6b807df6f9ea5887346ba3a
2.8.0 (2025-01-07)¶
librdkafka v2.8.0 is a maintenance release:
- Socket options are now all set before connection (#4893).
- Client certificate chain is now sent when using
ssl.certificate.pemorssl_certificateorssl.keystore.location(#4894). - Avoid sending client certificates whose chain doesn't match with broker trusted root certificates (#4900).
- Fixes to allow to migrate partitions to leaders with same leader epoch, or NULL leader epoch (#4901).
- Support versions of OpenSSL without the ENGINE component (Chris Novakovic, #3535 and @remicollet, #4911).
Fixes¶
General fixes¶
- Socket options are now all set before connection, as documentation says it's needed for socket buffers to take effect, even if in some cases they could have effect even after connection. Happening since v0.9.0 (#4893).
- Issues: #3225.
Client certificate chain is now sent when using
ssl.certificate.pemorssl_certificateorssl.keystore.location. Without that, broker must explicitly add any intermediate certification authority certificate to its truststore to be able to accept client certificate. Happens since: 1.x (#4894).
Consumer fixes¶
- Issues: #4796. Fix to allow to migrate partitions to leaders with NULL leader epoch. NULL leader epoch can happen during a cluster roll with an upgrade to a version supporting KIP-320. Happening since v2.1.0 (#4901).
- Issues: #4804. Fix to allow to migrate partitions to leaders with same leader epoch. Same leader epoch can happen when partition is temporarily migrated to the internal broker (#4804), or if broker implementation never bumps it, as it's not needed to validate the offsets. Happening since v2.4.0 (#4901).
Note: there was no v2.7.0 librdkafka release
Checksums¶
Release asset checksums:
* v2.8.0.zip SHA256 5525efaad154e277e6ce30ab78bb00dbd882b5eeda6c69c9eeee69b7abee11a4
* v2.8.0.tar.gz SHA256 5bd1c46f63265f31c6bfcedcde78703f77d28238eadf23821c2b43fc30be3e25
2.2.1 (2025-01-13)¶
Note: given this patch version contains only a single fix, it's suggested to upgrade to latest backward compatible release instead, as it contains all the issued fixes. Following semver 2.0, all our patch and minor releases are backward compatible and our minor releases may also contain fixes. Please note that 2.x versions of librdkafka are also backward compatible with 1.x as the major version release was only for the upgrade to OpenSSL 3.x.
librdkafka v2.2.1 is a maintenance release backporting:
- Fix for idempotent producer fatal errors, triggered after a possibly persisted message state (#4438).
- Update bundled lz4 (used when
./configure --disable-lz4-ext) to v1.9.4, which contains bugfixes and performance improvements (#4726). - Upgrade OpenSSL to v3.0.13 (while building from source) with various security fixes, check the release notes (@janjwerner-confluent, #4690).
- Upgrade zstd to v1.5.6, zlib to v1.3.1, and curl to v8.8.0 (@janjwerner-confluent, #4690).
- Upgrade Linux dependencies: OpenSSL 3.0.15, CURL 8.10.1 (#4875).
Checksums¶
Release asset checksums:
* v2.2.1.zip SHA256 2d7fdb54b17be8442b61649916b94eda1744c21d2325795d92f9ad6dec4e5621
* v2.2.1.tar.gz SHA256 c6f0ccea730ce8f67333e75cc785cce28a8941d5abf041d7a9b8fef91d4778e8
2.6.1 (2024-11-18)¶
librdkafka v2.6.1 is a maintenance release:
- Fix for a Fetch regression when connecting to Apache Kafka < 2.7 (#4871).
- Fix for an infinite loop happening with cooperative-sticky assignor under some particular conditions (#4800).
- Fix for retrieving offset commit metadata when it contains
zeros and configured with
strndup(#4876) - Fix for a loop of ListOffset requests, happening in a Fetch From Follower scenario, if such request is made to the follower (#4616, #4754, @kphelps).
- Fix to remove fetch queue messages that blocked the destroy of rdkafka instances (#4724)
- Upgrade Linux dependencies: OpenSSL 3.0.15, CURL 8.10.1 (#4875).
- Upgrade Windows dependencies: MSVC runtime to 14.40.338160.0, zstd 1.5.6, zlib 1.3.1, OpenSSL 3.3.2, CURL 8.10.1 (#4872).
- SASL/SCRAM authentication fix: avoid concatenating client side nonce once more, as it's already prepended in server sent nonce (#4895).
- Allow retrying for status code 429 ('Too Many Requests') in HTTP requests for OAUTHBEARER OIDC (#4902).
Fixes¶
General fixes¶
- SASL/SCRAM authentication fix: avoid concatenating
client side nonce once more, as it's already prepended in
server sent nonce.
librdkafka was incorrectly concatenating the client side nonce again, leading to this fix being made on AK side, released with 3.8.1, with
endsWithinstead ofequals. Happening since v0.0.99 (#4895).
Consumer fixes¶
- Issues: #4870 Fix for a Fetch regression when connecting to Apache Kafka < 2.7, causing fetches to fail. Happening since v2.6.0 (#4871)
- Issues: #4783.
A consumer configured with the
cooperative-stickypartition assignment strategy could get stuck in an infinite loop, with corresponding spike of main thread CPU usage. That happened with some particular orders of members and potential assignable partitions. Solved by removing the infinite loop cause. Happening since: 1.6.0 (#4800). - Issues: #4649.
When retrieving offset metadata, if the binary value contained zeros
and librdkafka was configured with
strndup, part of the buffer after first zero contained uninitialized data instead of rest of metadata. Solved by avoiding to usestrndupfor copying metadata. Happening since: 0.9.0 (#4876). - Issues: #4616 When an out of range on a follower caused an offset reset, the corresponding ListOffsets request is made to the follower, causing a repeated "Not leader for partition" error. Fixed by sending the request always to the leader. Happening since 1.5.0 (tested version) or previous ones (#4616, #4754, @kphelps).
- Issues: Fix to remove fetch queue messages that blocked the destroy of rdkafka instances. Circular dependencies from a partition fetch queue message to the same partition blocked the destroy of an instance, that happened in case the partition was removed from the cluster while it was being consumed. Solved by purging internal partition queue, after being stopped and removed, to allow reference count to reach zero and trigger a destroy. Happening since 2.0.2 (#4724).
Checksums¶
Release asset checksums:
* v2.6.1.zip SHA256 b575811865d9c0439040ccb2972ae6af963bc58ca39d433243900dddfdda79cf
* v2.6.1.tar.gz SHA256 0ddf205ad8d36af0bc72a2fec20639ea02e1d583e353163bf7f4683d949e901b
2.6.0 (2024-10-10)¶
librdkafka v2.6.0 is a feature release:
- KIP-460 Admin Leader Election RPC (#4845)
- [KIP-714] Complete consumer metrics support (#4808).
- [KIP-714] Produce latency average and maximum metrics support for parity with Java client (#4847).
- [KIP-848] ListConsumerGroups Admin API now has an optional filter to return only groups of given types.
- Added Transactional id resource type for ACL operations (@JohnPreston, #4856).
- Fix for permanent fetch errors when using a newer Fetch RPC version with an older inter broker protocol (#4806).
Fixes¶
Consumer fixes¶
- Issues: #4806 Fix for permanent fetch errors when brokers support a Fetch RPC version greater than 12 but cluster is configured to use an inter broker protocol that is less than 2.8. In this case returned topic ids are zero valued and Fetch has to fall back to version 12, using topic names. Happening since v2.5.0 (#4806)
Checksums¶
Release asset checksums:
* v2.6.0.zip SHA256 e9eb7faedb24da3a19d5f056e08630fc2dae112d958f9b714ec6e35cd87c032e
* v2.6.0.tar.gz SHA256 abe0212ecd3e7ed3c4818a4f2baf7bf916e845e902bb15ae48834ca2d36ac745
2.5.3 (2024-09-02)¶
librdkafka v2.5.3 is a maintenance release.
- Fix an assert being triggered during push telemetry call when no metrics matched on the client side. (#4826)
Fixes¶
Telemetry fixes¶
- Issue: #4833 Fix a regression introduced with KIP-714 support in which an assert is triggered during PushTelemetry call. This happens when no metric is matched on the client side among those requested by broker subscription. Happening since 2.5.0 (#4826).
Checksums¶
Release asset checksums:
* v2.5.3.zip SHA256 5b058006fcd403bc23fc1fcc14fe985641203f342c5715794af51023bcd047f9
* v2.5.3.tar.gz SHA256 eaa1213fdddf9c43e28834d9a832d9dd732377d35121e42f875966305f52b8ff
Note: there were no v2.5.1 and v2.5.2 librdkafka releases
2.5.0 (2024-07-10)¶
[!WARNING] This version has introduced a regression in which an assert is triggered during PushTelemetry call. This happens when no metric is matched on the client side among those requested by broker subscription.
You won't face any problem if: * Broker doesn't support KIP-714. * KIP-714 feature is disabled on the broker side. * KIP-714 feature is disabled on the client side. This is enabled by default. Set configuration
enable.metrics.pushtofalse. * If KIP-714 is enabled on the broker side and there is no subscription configured there. * If KIP-714 is enabled on the broker side with subscriptions that match the KIP-714 metrics defined on the client.Having said this, we strongly recommend using
v2.5.3and above to not face this regression at all.
librdkafka v2.5.0 is a feature release.
- KIP-951 Leader discovery optimisations for the client (#4756, #4767).
- Fix segfault when using long client id because of erased segment when using flexver. (#4689)
- Fix for an idempotent producer error, with a message batch not reconstructed identically when retried (#4750)
- Removed support for CentOS 6 and CentOS 7 (#4775).
- KIP-714 Client metrics and observability (#4721).
Upgrade considerations¶
- CentOS 6 and CentOS 7 support was removed as they reached EOL and security patches aren't publicly available anymore. ABI compatibility from CentOS 8 on is maintained through pypa/manylinux, AlmaLinux based. See also Confluent supported OSs page (#4775).
Enhancements¶
- Update bundled lz4 (used when
./configure --disable-lz4-ext) to v1.9.4, which contains bugfixes and performance improvements (#4726). - KIP-951 With this KIP leader updates are received through Produce and Fetch responses in case of errors corresponding to leader changes and a partition migration happens before refreshing the metadata cache (#4756, #4767).
Fixes¶
General fixes¶
- Issues: confluentinc/confluent-kafka-dotnet#2084 Fix segfault when a segment is erased and more data is written to the buffer. Happens since 1.x when a portion of the buffer (segment) is erased for flexver or compression. More likely to happen since 2.1.0, because of the upgrades to flexver, with certain string sizes like a long client id (#4689).
Idempotent producer fixes¶
- Issues: #4736 Fix for an idempotent producer error, with a message batch not reconstructed identically when retried. Caused the error message "Local: Inconsistent state: Unable to reconstruct MessageSet". Happening on large batches. Solved by using the same backoff baseline for all messages in the batch. Happens since 2.2.0 (#4750).
Checksums¶
Release asset checksums:
* v2.5.0.zip SHA256 644c1b7425e2241ee056cf8a469c84d69c7f6a88559491c0813a6cdeb5563206
* v2.5.0.tar.gz SHA256 3dc62de731fd516dfb1032861d9a580d4d0b5b0856beb0f185d06df8e6c26259
2.4.0 (2024-05-07)¶
librdkafka v2.4.0 is a feature release:
- KIP-848: The Next Generation of the Consumer Rebalance Protocol. Early Access: This should be used only for evaluation and must not be used in production. Features and contract of this KIP might change in future (#4610).
- KIP-467: Augment ProduceResponse error messaging for specific culprit records (#4583).
- KIP-516 Continue partial implementation by adding a metadata cache by topic id and updating the topic id corresponding to the partition name (#4676)
- Upgrade OpenSSL to v3.0.12 (while building from source) with various security fixes, check the release notes.
- Integration tests can be started in KRaft mode and run against any GitHub Kafka branch other than the released versions.
- Fix pipeline inclusion of static binaries (#4666)
- Fix to main loop timeout calculation leading to a tight loop for a max period of 1 ms (#4671).
- Fixed a bug causing duplicate message consumption from a stale fetch start offset in some particular cases (#4636)
- Fix to metadata cache expiration on full metadata refresh (#4677).
- Fix for a wrong error returned on full metadata refresh before joining a consumer group (#4678).
- Fix to metadata refresh interruption (#4679).
- Fix for an undesired partition migration with stale leader epoch (#4680).
- Fix hang in cooperative consumer mode if an assignment is processed while closing the consumer (#4528).
Upgrade considerations¶
- With KIP 467,
INVALID_MSG(Java: CorruptRecordExpection) will be retried automatically.INVALID_RECORD(Java: InvalidRecordException) instead is not retriable and will be set only to the records that caused the error. Rest of records in the batch will fail with the new error code_INVALID_DIFFERENT_RECORD(Java: KafkaException) and can be retried manually, depending on the application logic (#4583).
Early Access¶
KIP-848: The Next Generation of the Consumer Rebalance Protocol¶
- With this new protocol the role of the Group Leader (a member) is removed and the assignment is calculated by the Group Coordinator (a broker) and sent to each member through heartbeats.
The feature is still not production-ready. It's possible to try it in a non-production enviroment.
A guide is available with considerations and steps to follow to test it (#4610).
Fixes¶
General fixes¶
- Issues: confluentinc/confluent-kafka-go#981. In librdkafka release pipeline a static build containing libsasl2 could be chosen instead of the alternative one without it. That caused the libsasl2 dependency to be required in confluent-kafka-go v2.1.0-linux-musl-arm64 and v2.3.0-linux-musl-arm64. Solved by correctly excluding the binary configured with that library, when targeting a static build. Happening since v2.0.2, with specified platforms, when using static binaries (#4666).
- Issues: #4684. When the main thread loop was awakened less than 1 ms before the expiration of a timeout, it was serving with a zero timeout, leading to increased CPU usage until the timeout was reached. Happening since 1.x.
- Issues: #4685.
Metadata cache was cleared on full metadata refresh, leading to unnecessary
refreshes and occasional
UNKNOWN_TOPIC_OR_PARTerrors. Solved by updating cache for existing or hinted entries instead of clearing them. Happening since 2.1.0 (#4677). - Issues: #4589.
A metadata call before member joins consumer group,
could lead to an
UNKNOWN_TOPIC_OR_PARTerror. Solved by updating the consumer group following a metadata refresh only in safe states. Happening since 2.1.0 (#4678). - Issues: #4577. Metadata refreshes without partition leader change could lead to a loop of metadata calls at fixed intervals. Solved by stopping metadata refresh when all existing metadata is non-stale. Happening since 2.3.0 (#4679).
- Issues: #4687. A partition migration could happen, using stale metadata, when the partition was undergoing a validation and being retried because of an error. Solved by doing a partition migration only with a non-stale leader epoch. Happening since 2.1.0 (#4680).
Consumer fixes¶
- Issues: #4686. In case of subscription change with a consumer using the cooperative assignor it could resume fetching from a previous position. That could also happen if resuming a partition that wasn't paused. Fixed by ensuring that a resume operation is completely a no-op when the partition isn't paused. Happening since 1.x (#4636).
- Issues: #4527.
While using the cooperative assignor, given an assignment is received while closing the consumer
it's possible that it gets stuck in state
WAIT_ASSIGN_CALL, while the method is converted to a full unassign. Solved by changing state fromWAIT_ASSIGN_CALLtoWAIT_UNASSIGN_CALLwhile doing this conversion. Happening since 1.x (#4528).
Checksums¶
Release asset checksums:
* v2.4.0.zip SHA256 24b30d394fc6ce5535eaa3c356ed9ed9ae4a6c9b4fc9159c322a776786d5dd15
* v2.4.0.tar.gz SHA256 d645e47d961db47f1ead29652606a502bdd2a880c85c1e060e94eea040f1a19a
2.3.0 (2023-10-25)¶
librdkafka v2.3.0 is a feature release:
- KIP-516
Partial support of topic identifiers. Topic identifiers in metadata response
available through the new
rd_kafka_DescribeTopicsfunction (#4300, #4451). - KIP-117 Add support for AdminAPI
DescribeCluster()andDescribeTopics()(#4240, @jainruchir). - KIP-430: Return authorized operations in Describe Responses. (#4240, @jainruchir).
- KIP-580: Added Exponential Backoff mechanism for
retriable requests with
retry.backoff.msas minimum backoff andretry.backoff.max.msas the maximum backoff, with 20% jitter (#4422). - KIP-396: completed the implementation with the addition of ListOffsets (#4225).
- Fixed ListConsumerGroupOffsets not fetching offsets for all the topics in a group with Apache Kafka version below 2.4.0.
- Add missing destroy that leads to leaking partition structure memory when there are partition leader changes and a stale leader epoch is received (#4429).
- Fix a segmentation fault when closing a consumer using the cooperative-sticky assignor before the first assignment (#4381).
- Fix for insufficient buffer allocation when allocating rack information (@wolfchimneyrock, #4449).
- Fix for infinite loop of OffsetForLeaderEpoch requests on quick leader changes. (#4433).
- Fix to add leader epoch to control messages, to make sure they're stored for committing even without a subsequent fetch message (#4434).
- Fix for stored offsets not being committed if they lacked the leader epoch (#4442).
- Upgrade OpenSSL to v3.0.11 (while building from source) with various security fixes, check the release notes (#4454, started by @migarc1).
- Fix to ensure permanent errors during offset validation continue being retried and don't cause an offset reset (#4447).
- Fix to ensure max.poll.interval.ms is reset when rd_kafka_poll is called with consume_cb (#4431).
- Fix for idempotent producer fatal errors, triggered after a possibly persisted message state (#4438).
- Fix
rd_kafka_query_watermark_offsetscontinuing beyond timeout expiry (#4460). - Fix
rd_kafka_query_watermark_offsetsnot refreshing the partition leader after a leader change and subsequentNOT_LEADER_OR_FOLLOWERerror (#4225).
Upgrade considerations¶
-
retry.backoff.ms: If it is set greater thanretry.backoff.max.mswhich has the default value of 1000 ms then it is assumes the value ofretry.backoff.max.ms. To change this behaviour make sure thatretry.backoff.msis always less thanretry.backoff.max.ms. If equal then the backoff will be linear instead of exponential. -
topic.metadata.refresh.fast.interval.ms: If it is set greater thanretry.backoff.max.mswhich has the default value of 1000 ms then it is assumes the value ofretry.backoff.max.ms. To change this behaviour make sure thattopic.metadata.refresh.fast.interval.msis always less thanretry.backoff.max.ms. If equal then the backoff will be linear instead of exponential.
Fixes¶
General fixes¶
- An assertion failed with insufficient buffer size when allocating rack information on 32bit architectures. Solved by aligning all allocations to the maximum allowed word size (#4449).
- The timeout for
rd_kafka_query_watermark_offsetswas not enforced after making the necessary ListOffsets requests, and thus, it never timed out in case of broker/network issues. Fixed by setting an absolute timeout (#4460).
Idempotent producer fixes¶
- After a possibly persisted error, such as a disconnection or a timeout, next expected sequence
used to increase, leading to a fatal error if the message wasn't persisted and
the second one in queue failed with an
OUT_OF_ORDER_SEQUENCE_NUMBER. The error could contain the message "sequence desynchronization" with just one possibly persisted error or "rewound sequence number" in case of multiple errored messages. Solved by treating the possible persisted message as not persisted, and expecting aDUPLICATE_SEQUENCE_NUMBERerror in case it was orNO_ERRORin case it wasn't, in both cases the message will be considered delivered (#4438).
Consumer fixes¶
- Stored offsets were excluded from the commit if the leader epoch was less than committed epoch, as it's possible if leader epoch is the default -1. This didn't happen in Python, Go and .NET bindings when stored position was taken from the message. Solved by checking only that the stored offset is greater than committed one, if either stored or committed leader epoch is -1 (#4442).
- If an OffsetForLeaderEpoch request was being retried, and the leader changed while the retry was in-flight, an infinite loop of requests was triggered, because we weren't updating the leader epoch correctly. Fixed by updating the leader epoch before sending the request (#4433).
- During offset validation a permanent error like host resolution failure would cause an offset reset. This isn't what's expected or what the Java implementation does. Solved by retrying even in case of permanent errors (#4447).
- If using
rd_kafka_poll_set_consumer, along with a consume callback, and then callingrd_kafka_pollto service the callbacks, would not resetmax.poll.interval.ms.This was because we were only checkingrk_repfor consumer messages, while the method to service the queue internally also services the queue forwarded to fromrk_rep, which isrkcg_q. Solved by moving themax.poll.interval.mscheck intord_kafka_q_serve(#4431). - After a leader change a
rd_kafka_query_watermark_offsetscall would continue trying to call ListOffsets on the old leader, if the topic wasn't included in the subscription set, so it started querying the new leader only aftertopic.metadata.refresh.interval.ms(#4225).
Checksums¶
Release asset checksums:
* v2.3.0.zip SHA256 15e77455811b3e5d869d6f97ce765b634c7583da188792e2930a2098728e932b
* v2.3.0.tar.gz SHA256 2d49c35c77eeb3d42fa61c43757fcbb6a206daa560247154e60642bcdcc14d12
2.2.0 (2023-07-12)¶
librdkafka v2.2.0 is a feature release:
- Fix a segmentation fault when subscribing to non-existent topics and using the consume batch functions (#4273).
- Store offset commit metadata in
rd_kafka_offsets_store(@mathispesch, #4084). - Fix a bug that happens when skipping tags, causing buffer underflow in MetadataResponse (#4278).
- Fix a bug where topic leader is not refreshed in the same metadata call even if the leader is present.
- KIP-881: Add support for rack-aware partition assignment for consumers (#4184, #4291, #4252).
- Fix several bugs with sticky assignor in case of partition ownership changing between members of the consumer group (#4252).
- KIP-368: Allow SASL Connections to Periodically Re-Authenticate (#4301, started by @vctoriawu).
- Avoid treating an OpenSSL error as a permanent error and treat unclean SSL closes as normal ones (#4294).
- Added
fetch.queue.backoff.msto the consumer to control how long the consumer backs off next fetch attempt. (@bitemyapp, @edenhill, #2879) - KIP-235: Add DNS alias support for secured connection (#4292).
- KIP-339: IncrementalAlterConfigs API (started by @PrasanthV454, #4110).
- KIP-554: Add Broker-side SCRAM Config API (#4241).
Enhancements¶
- Added
fetch.queue.backoff.msto the consumer to control how long the consumer backs off next fetch attempt. When the pre-fetch queue has exceeded its queuing thresholds:queued.min.messagesandqueued.max.messages.kbytesit backs off for 1 seconds. If those parameters have to be set too high to hold 1 s of data, this new parameter allows to back off the fetch earlier, reducing memory requirements.
Fixes¶
General fixes¶
- Fix a bug that happens when skipping tags, causing buffer underflow in MetadataResponse. This is triggered since RPC version 9 (v2.1.0), when using Confluent Platform, only when racks are set, observers are activated and there is more than one partition. Fixed by skipping the correct amount of bytes when tags are received.
- Avoid treating an OpenSSL error as a permanent error and treat unclean SSL
closes as normal ones. When SSL connections are closed without
close_notify, in OpenSSL 3.x a new type of error is set and it was interpreted as permanent in librdkafka. It can cause a different issue depending on the RPC. If received when waiting for OffsetForLeaderEpoch response, it triggers an offset reset following the configured policy. Solved by treating SSL errors as transport errors and by setting an OpenSSL flag that allows to treat unclean SSL closes as normal ones. These types of errors can happen it the other side doesn't supportclose_notifyor if there's a TCP connection reset.
Consumer fixes¶
- In case of multiple owners of a partition with different generations, the sticky assignor would pick the earliest (lowest generation) member as the current owner, which would lead to stickiness violations. Fixed by choosing the latest (highest generation) member.
- In case where the same partition is owned by two members with the same generation, it indicates an issue. The sticky assignor had some code to handle this, but it was non-functional, and did not have parity with the Java assignor. Fixed by invalidating any such partition from the current assignment completely.
Checksums¶
Release asset checksums:
* v2.2.0.zip SHA256 e9a99476dd326089ce986afd3a5b069ef8b93dbb845bc5157b3d94894de53567
* v2.2.0.tar.gz SHA256 af9a820cbecbc64115629471df7c7cecd40403b6c34bfdbb9223152677a47226
2.1.1 (2023-05-02)¶
librdkafka v2.1.1 is a maintenance release:
- Avoid duplicate messages when a fetch response is received in the middle of an offset validation request (#4261).
- Fix segmentation fault when subscribing to a non-existent topic and
calling
rd_kafka_message_leader_epoch()on the polledrkmessage(#4245). - Fix a segmentation fault when fetching from follower and the partition lease expires while waiting for the result of a list offsets operation (#4254).
- Fix documentation for the admin request timeout, incorrectly stating -1 for infinite timeout. That timeout can't be infinite.
- Fix CMake pkg-config cURL require and use
pkg-config
Requires.privatefield (@FantasqueX, @stertingen, #4180). - Fixes certain cases where polling would not keep the consumer in the group or make it rejoin it (#4256).
- Fix to the C++ set_leader_epoch method of TopicPartitionImpl, that wasn't storing the passed value (@pavel-pimenov, #4267).
Fixes¶
Consumer fixes¶
- Duplicate messages can be emitted when a fetch response is received in the middle of an offset validation request. Solved by avoiding a restart from last application offset when offset validation succeeds.
- When fetching from follower, if the partition lease expires after 5 minutes,
and a list offsets operation was requested to retrieve the earliest
or latest offset, it resulted in segmentation fault. This was fixed by
allowing threads different from the main one to call
the
rd_kafka_toppar_set_fetch_statefunction, given they hold the lock on therktp. - In v2.1.0, a bug was fixed which caused polling any queue to reset the
max.poll.interval.ms. Only certain functions were made to reset the timer, but it is possible for the user to obtain the queue with messages from the broker, skipping these functions. This was fixed by encoding information in a queue itself, that, whether polling, resets the timer.
Checksums¶
Release asset checksums:
* v2.1.1.zip SHA256 3b8a59f71e22a8070e0ae7a6b7ad7e90d39da8fddc41ce6c5d596ee7f5a4be4b
* v2.1.1.tar.gz SHA256 7be1fc37ab10ebdc037d5c5a9b35b48931edafffae054b488faaff99e60e0108
2.1.0 (2023-04-06)¶
librdkafka v2.1.0 is a feature release:
- KIP-320 Allow fetchers to detect and handle log truncation (#4122).
- Fix a reference count issue blocking the consumer from closing (#4187).
- Fix a protocol issue with ListGroups API, where an extra field was appended for API Versions greater than or equal to 3 (#4207).
- Fix an issue with
max.poll.interval.ms, where polling any queue would cause the timeout to be reset (#4176). - Fix seek partition timeout, was one thousand times lower than the passed value (#4230).
- Fix multiple inconsistent behaviour in batch APIs during pause or resume operations (#4208). See Consumer fixes section below for more information.
- Update lz4.c from upstream. Fixes CVE-2021-3520 (by @filimonov, #4232).
- Upgrade OpenSSL to v3.0.8 with various security fixes, check the release notes (#4215).
Enhancements¶
- Added
rd_kafka_topic_partition_get_leader_epoch()(andset..()). - Added partition leader epoch APIs:
rd_kafka_topic_partition_get_leader_epoch()(andset..())rd_kafka_message_leader_epoch()rd_kafka_*assign()andrd_kafka_seek_partitions()now supports partitions with a leader epoch set.rd_kafka_offsets_for_times()will return per-partition leader-epochs.leader_epoch,stored_leader_epoch, andcommitted_leader_epochadded to per-partition statistics.
Fixes¶
OpenSSL fixes¶
- Fixed OpenSSL static build not able to use external modules like FIPS provider module.
Consumer fixes¶
- A reference count issue was blocking the consumer from closing. The problem would happen when a partition is lost, because forcibly unassigned from the consumer or if the corresponding topic is deleted.
- When using
rd_kafka_seek_partitions, the remaining timeout was converted from microseconds to milliseconds but the expected unit for that parameter is microseconds. - Fixed known issues related to Batch Consume APIs mentioned in v2.0.0 release notes.
- Fixed
rd_kafka_consume_batch()andrd_kafka_consume_batch_queue()intermittently updatingapp_offsetandstore_offsetincorrectly when pause or resume was being used for a partition. - Fixed
rd_kafka_consume_batch()andrd_kafka_consume_batch_queue()intermittently skipping offsets when pause or resume was being used for a partition.
Known Issues¶
Consume Batch API¶
- When
rd_kafka_consume_batch()andrd_kafka_consume_batch_queue()APIs are used with any of the seek, pause, resume or rebalancing operation,on_consumeinterceptors might be called incorrectly (maybe multiple times) for not consumed messages.
Consume API¶
- Duplicate messages can be emitted when a fetch response is received in the middle of an offset validation request.
- Segmentation fault when subscribing to a non-existent topic and
calling
rd_kafka_message_leader_epoch()on the polledrkmessage.
Checksums¶
Release asset checksums:
* v2.1.0.zip SHA256 2fe898f9f5e2b287d26c5f929c600e2772403a594a691e0560a2a1f2706edf57
* v2.1.0.tar.gz SHA256 d8e76c4b1cde99e283a19868feaaff5778aa5c6f35790036c5ef44bc5b5187aa
2.0.2 (2023-01-20)¶
librdkafka v2.0.2 is a bugfix release:
- Fix OpenSSL version in Win32 nuget package (#4152).
Checksums¶
Release asset checksums:
* v2.0.2.zip SHA256 87010c722111539dc3c258a6be0c03b2d6d4a607168b65992eb0076c647e4e9d
* v2.0.2.tar.gz SHA256 f321bcb1e015a34114c83cf1aa7b99ee260236aab096b85c003170c90a47ca9d
2.0.1 (2023-01-19)¶
librdkafka v2.0.1 is a bugfix release:
- Fixed nuget package for Linux ARM64 release (#4150).
Checksums¶
Release asset checksums:
* v2.0.1.zip SHA256 7121df3fad1f72ea1c42dcc4e5367337207a75966216c63e58222c6433c528e0
* v2.0.1.tar.gz SHA256 3670f8d522e77f79f9d09a22387297ab58d1156b22de12ef96e58b7d57fca139
2.0.0 (2023-01-18)¶
librdkafka v2.0.0 is a feature release:
- KIP-88 OffsetFetch Protocol Update (#3995).
- KIP-222 Add Consumer Group operations to Admin API (started by @lesterfan, #3995).
- KIP-518 Allow listing consumer groups per state (#3995).
- KIP-396 Partially implemented: support for AlterConsumerGroupOffsets (started by @lesterfan, #3995).
- OpenSSL 3.0.x support - the maximum bundled OpenSSL version is now 3.0.7 (previously 1.1.1q).
- Fixes to the transactional and idempotent producer.
Upgrade considerations¶
OpenSSL 3.0.x¶
OpenSSL default ciphers¶
The introduction of OpenSSL 3.0.x in the self-contained librdkafka bundles changes the default set of available ciphers, in particular all obsolete or insecure ciphers and algorithms as listed in the OpenSSL legacy manual page are now disabled by default.
WARNING: These ciphers are disabled for security reasons and it is highly recommended NOT to use them.
Should you need to use any of these old ciphers you'll need to explicitly
enable the legacy provider by configuring ssl.providers=default,legacy
on the librdkafka client.
OpenSSL engines and providers¶
OpenSSL 3.0.x deprecates the use of engines, which is being replaced by
providers. As such librdkafka will emit a deprecation warning if
ssl.engine.location is configured.
OpenSSL providers may be configured with the new ssl.providers
configuration property.
Broker TLS certificate hostname verification¶
The default value for ssl.endpoint.identification.algorithm has been
changed from none (no hostname verification) to https, which enables
broker hostname verification (to counter man-in-the-middle
impersonation attacks) by default.
To restore the previous behaviour, set ssl.endpoint.identification.algorithm to none.
Known Issues¶
Poor Consumer batch API messaging guarantees¶
The Consumer Batch APIs rd_kafka_consume_batch() and rd_kafka_consume_batch_queue()
are not thread safe if rkmessages_size is greater than 1 and any of the seek,
pause, resume or rebalancing operation is performed in parallel with any of
the above APIs. Some of the messages might be lost, or erroneously returned to the
application, in the above scenario.
It is strongly recommended to use the Consumer Batch APIs and the mentioned operations in sequential order in order to get consistent result.
For rebalancing operation to work in sequencial manner, please set rebalance_cb
configuration property (refer examples/rdkafka_complex_consumer_example.c for the help with the usage) for the consumer.
Enhancements¶
- Self-contained static libraries can now be built on Linux arm64 (#4005).
- Updated to zlib 1.2.13, zstd 1.5.2, and curl 7.86.0 in self-contained librdkafka bundles.
- Added
on_broker_state_change()interceptor - The C++ API no longer returns strings by const value, which enables better move optimization in callers.
- Added
rd_kafka_sasl_set_credentials()API to update SASL credentials. - Setting
allow.auto.create.topicswill no longer give a warning if used by a producer, since that is an expected use case. Improvement in documentation for this property. - Added a
resolve_cbconfiguration setting that permits using custom DNS resolution logic. - Added
rd_kafka_mock_broker_error_stack_cnt(). - The librdkafka.redist NuGet package has been updated to have fewer external dependencies for its bundled librdkafka builds, as everything but cyrus-sasl is now built-in. There are bundled builds with and without linking to cyrus-sasl for maximum compatibility.
- Admin API DescribeGroups() now provides the group instance id for static members KIP-345 (#3995).
Fixes¶
General fixes¶
- Windows: couldn't read a PKCS#12 keystore correctly because binary mode wasn't explicitly set and Windows defaults to text mode.
- Fixed memory leak when loading SSL certificates (@Mekk, #3930)
- Load all CA certificates from
ssl.ca.pem, not just the first one. - Each HTTP request made when using OAUTHBEARER OIDC would leak a small amount of memory.
Transactional producer fixes¶
- When a PID epoch bump is requested and the producer is waiting to reconnect to the transaction coordinator, a failure in a find coordinator request could cause an assert to fail. This is fixed by retrying when the coordinator is known (#4020).
- Transactional APIs (except
send_offsets_for_transaction()) that timeout due to low timeout_ms may now be resumed by calling the same API again, as the operation continues in the background. - For fatal idempotent producer errors that may be recovered by bumping the epoch the current transaction must first be aborted prior to the epoch bump. This is now handled correctly, which fixes issues seen with fenced transactional producers on fatal idempotency errors.
- Timeouts for EndTxn requests (transaction commits and aborts) are now automatically retried and the error raised to the application is also a retriable error.
- TxnOffsetCommitRequests were retried immediately upon temporary errors in
send_offsets_to_transactions(), causing excessive network requests. These retries are now delayed 500ms. - If
init_transactions()is called with an infinite timeout (-1), the timeout will be limited to 2 *transaction.timeout.ms. The application may retry and resume the call if a retriable error is returned.
Consumer fixes¶
- Back-off and retry JoinGroup request if coordinator load is in progress.
- Fix
rd_kafka_consume_batch()andrd_kafka_consume_batch_queue()skipping other partitions' offsets intermittently when seek, pause, resume or rebalancing is used for a partition. - Fix
rd_kafka_consume_batch()andrd_kafka_consume_batch_queue()intermittently returing incorrect partitions' messages if rebalancing happens during these operations.
Checksums¶
Release asset checksums:
* v2.0.0.zip SHA256 9d8a8be30ed09daf6c560f402e91db22fcaea11cac18a0d3c0afdbf884df1d4e
* v2.0.0.tar.gz SHA256 f75de3545b3c6cc027306e2df0371aefe1bb8f86d4ec612ed4ebf7bfb2f817cd
1.9.2 (2022-08-01)¶
librdkafka v1.9.2 is a maintenance release:
- The SASL OAUTHBEAR OIDC POST field was sometimes truncated by one byte (#3192).
- The bundled version of OpenSSL has been upgraded to version 1.1.1q for non-Windows builds. Windows builds remain on OpenSSL 1.1.1n for the time being.
- The bundled version of Curl has been upgraded to version 7.84.0.
Checksums¶
Release asset checksums:
* v1.9.2.zip SHA256 4ecb0a3103022a7cab308e9fecd88237150901fa29980c99344218a84f497b86
* v1.9.2.tar.gz SHA256 3fba157a9f80a0889c982acdd44608be8a46142270a389008b22d921be1198ad
1.9.1 (2022-07-06)¶
librdkafka v1.9.1¶
librdkafka v1.9.1 is a maintenance release:
- The librdkafka.redist NuGet package now contains OSX M1/arm64 builds.
- Self-contained static libraries can now be built on OSX M1 too, thanks to disabling curl's configure runtime check.
Checksums¶
Release asset checksums:
* v1.9.1.zip SHA256 d3fc2e0bc00c3df2c37c5389c206912842cca3f97dd91a7a97bc0f4fc69f94ce
* v1.9.1.tar.gz SHA256 3a54cf375218977b7af4716ed9738378e37fe400a6c5ddb9d622354ca31fdc79
1.9.0 (2022-06-16)¶
librdkafka v1.9.0¶
librdkafka v1.9.0 is a feature release:
- Added KIP-768 OUATHBEARER OIDC support (by @jliunyu, #3560)
- Added KIP-140 Admin API ACL support (by @emasab, #2676)
Upgrade considerations¶
- Consumer:
rd_kafka_offsets_store()(et.al) will now return an error for any partition that is not currently assigned (throughrd_kafka_*assign()). This prevents a race condition where an application would store offsets after the assigned partitions had been revoked (which resets the stored offset), that could cause these old stored offsets to be committed later when the same partitions were assigned to this consumer again - effectively overwriting any committed offsets by any consumers that were assigned the same partitions previously. This would typically result in the offsets rewinding and messages to be reprocessed. As an extra effort to avoid this situation the stored offset is now also reset when partitions are assigned (throughrd_kafka_*assign()). Applications that explicitly call..offset*_store()will now need to handle the case whereRD_KAFKA_RESP_ERR__STATEis returned in the per-partition.errfield - meaning the partition is no longer assigned to this consumer and the offset could not be stored for commit.
Enhancements¶
- Improved producer queue scheduling. Fixes the performance regression introduced in v1.7.0 for some produce patterns. (#3538, #2912)
- Windows: Added native Win32 IO/Queue scheduling. This removes the internal TCP loopback connections that were previously used for timely queue wakeups.
- Added
socket.connection.setup.timeout.ms(default 30s). The maximum time allowed for broker connection setups (TCP connection as well as SSL and SASL handshakes) is now limited to this value. This fixes the issue with stalled broker connections in the case of network or load balancer problems. The Java clients has an exponential backoff to this timeout which is limited bysocket.connection.setup.timeout.max.ms- this was not implemented in librdkafka due to differences in connection handling andERR__ALL_BROKERS_DOWNerror reporting. Having a lower initial connection setup timeout and then increase the timeout for the next attempt would yield possibly false-positiveERR__ALL_BROKERS_DOWNtoo early. - SASL OAUTHBEARER refresh callbacks can now be scheduled for execution
on librdkafka's background thread. This solves the problem where an
application has a custom SASL OAUTHBEARER refresh callback and thus needs to
call
rd_kafka_poll()(et.al.) at least once to trigger the refresh callback before being able to connect to brokers. With the newrd_kafka_conf_enable_sasl_queue()configuration API andrd_kafka_sasl_background_callbacks_enable()the refresh callbacks can now be triggered automatically on the librdkafka background thread. rd_kafka_queue_get_background()now creates the background thread if not already created.- Added
rd_kafka_consumer_close_queue()andrd_kafka_consumer_closed(). This allow applications and language bindings to implement asynchronous consumer close. - Bundled zlib upgraded to version 1.2.12.
- Bundled OpenSSL upgraded to 1.1.1n.
- Added
test.mock.broker.rttto simulate RTT/latency for mock brokers.
Fixes¶
General fixes¶
- Fix various 1 second delays due to internal broker threads blocking on IO
even though there are events to handle.
These delays could be seen randomly in any of the non produce/consume
request APIs, such as
commit_transaction(),list_groups(), etc. - Windows: some applications would crash with an error message like
no OPENSSL_Applink()written to the console ifssl.keystore.locationwas configured. This regression was introduced in v1.8.0 due to use of vcpkgs and how keystore file was read. #3554. - Windows 32-bit only: 64-bit atomic reads were in fact not atomic and could
in rare circumstances yield incorrect values.
One manifestation of this issue was the
max.poll.interval.msconsumer timer expiring even though the application was polling according to profile. Fixed by @WhiteWind (#3815). rd_kafka_clusterid()would previously fail with timeout if called on cluster with no visible topics (#3620). The clusterid is now returned as soon as metadata has been retrieved.- Fix hang in
rd_kafka_list_groups()if there are no available brokers to connect to (#3705). - Millisecond timeouts (
timeout_ms) in various APIs, such asrd_kafka_poll(), was limited to roughly 36 hours before wrapping. (#3034) - If a metadata request triggered by
rd_kafka_metadata()or consumer group rebalancing encountered a non-retriable error it would not be propagated to the caller and thus cause a stall or timeout, this has now been fixed. (@aiquestion, #3625) - AdminAPI
DeleteGroups()andDeleteConsumerGroupOffsets(): if the given coordinator connection was not up by the time these calls were initiated and the first connection attempt failed then no further connection attempts were performed, ulimately leading to the calls timing out. This is now fixed by keep retrying to connect to the group coordinator until the connection is successful or the call times out. Additionally, the coordinator will be now re-queried once per second until the coordinator comes up or the call times out, to detect change in coordinators. - Mock cluster
rd_kafka_mock_broker_set_down()would previously accept and then disconnect new connections, it now refuses new connections.
Consumer fixes¶
rd_kafka_offsets_store()(et.al) will now return an error for any partition that is not currently assigned (throughrd_kafka_*assign()). See Upgrade considerations above for more information.rd_kafka_*assign()will now reset/clear the stored offset. See Upgrade considerations above for more information.seek()followed bypause()would overwrite the seeked offset when later callingresume(). This is now fixed. (#3471). Note: Avoid storing offsets (offsets_store()) after callingseek()as this may later interfere with resuming a paused partition, instead store offsets prior to calling seek.- A
ERR_MSG_SIZE_TOO_LARGEconsumer error would previously be raised if the consumer received a maximum sized FetchResponse only containing (transaction) aborted messages with no control messages. The fetching did not stop, but some applications would terminate upon receiving this error. No error is now raised in this case. (#2993) Thanks to @jacobmikesell for providing an application to reproduce the issue. - The consumer no longer backs off the next fetch request (default 500ms) when the parsed fetch response is truncated (which is a valid case). This should speed up the message fetch rate in case of maximum sized fetch responses.
- Fix consumer crash (
assert: rkbuf->rkbuf_rkb) when parsing malformed JoinGroupResponse consumer group metadata state. - Fix crash (
cant handle op type) when usingconsume_batch_queue()(et.al) and an OAUTHBEARER refresh callback was set. The callback is now triggered by the consume call. (#3263) - Fix
partition.assignment.strategyordering when multiple strategies are configured. If there is more than one eligible strategy, preference is determined by the configured order of strategies. The partitions are assigned to group members according to the strategy order preference now. (#3818) - Any form of unassign*() (absolute or incremental) is now allowed during consumer close rebalancing and they're all treated as absolute unassigns. (@kevinconaway)
Transactional producer fixes¶
- Fix message loss in idempotent/transactional producer. A corner case has been identified that may cause idempotent/transactional messages to be lost despite being reported as successfully delivered: During cluster instability a restarting broker may report existing topics as non-existent for some time before it is able to acquire up to date cluster and topic metadata. If an idempotent/transactional producer updates its topic metadata cache from such a broker the producer will consider the topic to be removed from the cluster and thus remove its local partition objects for the given topic. This also removes the internal message sequence number counter for the given partitions. If the producer later receives proper topic metadata for the cluster the previously "removed" topics will be rediscovered and new partition objects will be created in the producer. These new partition objects, with no knowledge of previous incarnations, would start counting partition messages at zero again. If new messages were produced for these partitions by the same producer instance, the same message sequence numbers would be sent to the broker. If the broker still maintains state for the producer's PID and Epoch it could deem that these messages with reused sequence numbers had already been written to the log and treat them as legit duplicates. This would seem to the producer that these new messages were successfully written to the partition log by the broker when they were in fact discarded as duplicates, leading to silent message loss. The fix included in this release is to save the per-partition idempotency state when a partition is removed, and then recover and use that saved state if the partition comes back at a later time.
- The transactional producer would retry (re)initializing its PID if a
PRODUCER_FENCEDerror was returned from the broker (added in Apache Kafka 2.8), which could cause the producer to seemingly hang. This error code is now correctly handled by raising a fatal error. - If the given group coordinator connection was not up by the time
send_offsets_to_transactions()was called, and the first connection attempt failed then no further connection attempts were performed, ulimately leading tosend_offsets_to_transactions()timing out, and possibly also the transaction timing out on the transaction coordinator. This is now fixed by keep retrying to connect to the group coordinator until the connection is successful or the call times out. Additionally, the coordinator will be now re-queried once per second until the coordinator comes up or the call times out, to detect change in coordinators.
Producer fixes¶
- Improved producer queue wakeup scheduling. This should significantly decrease the number of wakeups and thus syscalls for high message rate producers. (#3538, #2912)
- The logic for enforcing that
message.timeout.msis greather than an explicitly configuredlinger.mswas incorrect and instead of erroring out early the lingering time was automatically adjusted to the message timeout, ignoring the configuredlinger.ms. This has now been fixed so that an error is returned when instantiating the producer. Thanks to @larry-cdn77 for analysis and test-cases. (#3709)
Checksums¶
Release asset checksums:
* v1.9.0.zip SHA256 a2d124cfb2937ec5efc8f85123dbcfeba177fb778762da506bfc5a9665ed9e57
* v1.9.0.tar.gz SHA256 59b6088b69ca6cf278c3f9de5cd6b7f3fd604212cd1c59870bc531c54147e889
1.6.2 (2021-11-25)¶
librdkafka v1.6.2¶
librdkafka v1.6.2 is a maintenance release with the following backported fixes:
- Upon quick repeated leader changes the transactional producer could receive
an
OUT_OF_ORDER_SEQUENCEerror from the broker, which triggered an Epoch bump on the producer resulting in an InitProducerIdRequest being sent to the transaction coordinator in the middle of a transaction. This request would start a new transaction on the coordinator, but the producer would still think (erroneously) it was in the current transaction. Any messages produced in the current transaction prior to this event would be silently lost when the application committed the transaction, leading to message loss. To avoid message loss a fatal error is now raised. This fix is specific to v1.6.x. librdkafka v1.8.x implements a recoverable error state instead. #3575. - The transactional producer could stall during a transaction if the transaction coordinator changed while adding offsets to the transaction (send_offsets_to_transaction()). This stall lasted until the coordinator connection went down, the transaction timed out, transaction was aborted, or messages were produced to a new partition, whichever came first. #3571.
- librdkafka's internal timers would not start if the timeout was set to 0, which would result in some timeout operations not being enforced correctly, e.g., the transactional producer API timeouts. These timers are now started with a timeout of 1 microsecond.
- Force address resolution if the broker epoch changes (#3238).
Checksums¶
Release asset checksums:
* v1.6.2.zip SHA256 1d389a98bda374483a7b08ff5ff39708f5a923e5add88b80b71b078cb2d0c92e
* v1.6.2.tar.gz SHA256 b9be26c632265a7db2fdd5ab439f2583d14be08ab44dc2e33138323af60c39db
1.8.2 (2021-10-18)¶
librdkafka v1.8.2¶
librdkafka v1.8.2 is a maintenance release.
Enhancements¶
- Added
ssl.ca.pemto add CA certificate by PEM string. (#2380) - Prebuilt binaries for Mac OSX now contain statically linked OpenSSL v1.1.1l. Previously the OpenSSL version was either v1.1.1 or v1.0.2 depending on build type.
Fixes¶
- The
librdkafka.redist1.8.0 package had two flaws: - the linux-arm64 .so build was a linux-x64 build.
- the included Windows MSVC 140 runtimes for x64 were infact x86. The release script has been updated to verify the architectures of provided artifacts to avoid this happening in the future.
- Prebuilt binaries for Mac OSX Sierra (10.12) and older are no longer provided. This affects confluent-kafka-go.
- Some of the prebuilt binaries for Linux were built on Ubuntu 14.04, these builds are now performed on Ubuntu 16.04 instead. This may affect users on ancient Linux distributions.
- It was not possible to configure
ssl.ca.locationon OSX, the property would automatically revert back toprobe(default value). This regression was introduced in v1.8.0. (#3566) - librdkafka's internal timers would not start if the timeout was set to 0, which would result in some timeout operations not being enforced correctly, e.g., the transactional producer API timeouts. These timers are now started with a timeout of 1 microsecond.
Transactional producer fixes¶
- Upon quick repeated leader changes the transactional producer could receive
an
OUT_OF_ORDER_SEQUENCEerror from the broker, which triggered an Epoch bump on the producer resulting in an InitProducerIdRequest being sent to the transaction coordinator in the middle of a transaction. This request would start a new transaction on the coordinator, but the producer would still think (erroneously) it was in current transaction. Any messages produced in the current transaction prior to this event would be silently lost when the application committed the transaction, leading to message loss. This has been fixed by setting the Abortable transaction error state in the producer. #3575. - The transactional producer could stall during a transaction if the transaction coordinator changed while adding offsets to the transaction (send_offsets_to_transaction()). This stall lasted until the coordinator connection went down, the transaction timed out, transaction was aborted, or messages were produced to a new partition, whichever came first. #3571.
Checksums¶
Release asset checksums:
* v1.8.2.zip SHA256 8b03d8b650f102f3a6a6cff6eedc29b9e2f68df9ba7e3c0f3fb00838cce794b8
* v1.8.2.tar.gz SHA256 6a747d293a7a4613bd2897e28e8791476fbe1ae7361f2530a876e0fd483482a6
Note: there was no v1.8.1 librdkafka release
1.8.0 (2021-09-16)¶
librdkafka v1.8.0¶
librdkafka v1.8.0 is a security release:
- Upgrade bundled zlib version from 1.2.8 to 1.2.11 in the
librdkafka.redistNuGet package. The updated zlib version fixes CVEs: CVE-2016-9840, CVE-2016-9841, CVE-2016-9842, CVE-2016-9843 See https://github.com/edenhill/librdkafka/issues/2934 for more information. - librdkafka now uses vcpkg for up-to-date Windows
dependencies in the
librdkafka.redistNuGet package: OpenSSL 1.1.1l, zlib 1.2.11, zstd 1.5.0. - The upstream dependency (OpenSSL, zstd, zlib) source archive checksums are
now verified when building with
./configure --install-deps. These builds are used by the librdkafka builds bundled with confluent-kafka-go, confluent-kafka-python and confluent-kafka-dotnet.
Enhancements¶
- Producer
flush()now overrides thelinger.mssetting for the duration of theflush()call, effectively triggering immediate transmission of queued messages. (#3489)
Fixes¶
General fixes¶
- Correctly detect presence of zlib via compilation check. (Chris Novakovic)
ERR__ALL_BROKERS_DOWNis no longer emitted when the coordinator connection goes down, only when all standard named brokers have been tried. This fixes the issue withERR__ALL_BROKERS_DOWNbeing triggered onconsumer_close(). It is also now only emitted if the connection was fully up (past handshake), and not just connected.rd_kafka_query_watermark_offsets(),rd_kafka_offsets_for_times(),consumer_lagmetric, andauto.offset.resetnow honourisolation.leveland will return the Last Stable Offset (LSO) whenisolation.levelis set toread_committed(default), rather than the uncommitted high-watermark when it is set toread_uncommitted. (#3423)- SASL GSSAPI is now usable when
sasl.kerberos.min.time.before.reloginis set to 0 - which disables ticket refreshes (by @mpekalski, #3431). - Rename internal crc32c() symbol to rd_crc32c() to avoid conflict with other static libraries (#3421).
txidleandrxidlein the statistics object was emitted as 18446744073709551615 when no idle was known. -1 is now emitted instead. (#3519)
Consumer fixes¶
- Automatically retry offset commits on
ERR_REQUEST_TIMED_OUT,ERR_COORDINATOR_NOT_AVAILABLE, andERR_NOT_COORDINATOR(#3398). Offset commits will be retried twice. - Timed auto commits did not work when only using assign() and not subscribe(). This regression was introduced in v1.7.0.
- If the topics matching the current subscription changed (or the application updated the subscription) while there was an outstanding JoinGroup or SyncGroup request, an additional request would sometimes be sent before handling the response of the first. This in turn lead to internal state issues that could cause a crash or malbehaviour. The consumer will now wait for any outstanding JoinGroup or SyncGroup responses before re-joining the group.
auto.offset.resetcould previously be triggered by temporary errors, such as disconnects and timeouts (after the two retries are exhausted). This is now fixed so that the auto offset reset policy is only triggered for permanent errors.- The error that triggers
auto.offset.resetis now logged to help the application owner identify the reason of the reset. - If a rebalance takes longer than a consumer's
session.timeout.ms, the consumer will remain in the group as long as it receives heartbeat responses from the broker.
Admin fixes¶
DeleteRecords()could crash if one of the underlying requests (for a given partition leader) failed at the transport level (e.g., timeout). (#3476).
Checksums¶
Release asset checksums:
* v1.8.0.zip SHA256 4b173f759ea5fdbc849fdad00d3a836b973f76cbd3aa8333290f0398fd07a1c4
* v1.8.0.tar.gz SHA256 93b12f554fa1c8393ce49ab52812a5f63e264d9af6a50fd6e6c318c481838b7f
1.7.0 (2021-05-10)¶
librdkafka v1.7.0¶
librdkafka v1.7.0 is feature release:
- KIP-360 - Improve reliability of transactional producer. Requires Apache Kafka 2.5 or later.
- OpenSSL Engine support (
ssl.engine.location) by @adinigam and @ajbarb.
Enhancements¶
- Added
connections.max.idle.msto automatically close idle broker connections. This feature is disabled by default unlessbootstrap.serverscontains the stringazurein which case the default is set to <4 minutes to improve connection reliability and circumvent limitations with the Azure load balancers (see #3109 for more information). - Bumped to OpenSSL 1.1.1k in binary librdkafka artifacts.
- The binary librdkafka artifacts for Alpine are now using Alpine 3.12.
- Improved static librdkafka Windows builds using MinGW (@neptoess, #3130).
Upgrade considerations¶
- The C++
oauthbearer_token_refresh_cb()was missing aHandle *argument that has now been added. This is a breaking change but the original function signature is considered a bug. This change only affects C++ OAuth developers. - KIP-735 The consumer
session.timeout.msdefault was changed from 10 to 45 seconds to make consumer groups more robust and less sensitive to temporary network and cluster issues. - Statistics:
consumer_lagis now using thecommitted_offset, while the newconsumer_lag_storedis usingstored_offset(offset to be committed). This is more correct than the previousconsumer_lagwhich was using eithercommitted_offsetorapp_offset(last message passed to application).
Fixes¶
General fixes¶
- Fix accesses to freed metadata cache mutexes on client termination (#3279)
- There was a race condition on receiving updated metadata where a broker id
update (such as bootstrap to proper broker transformation) could finish after
the topic metadata cache was updated, leading to existing brokers seemingly
being not available.
One occurrence of this issue was query_watermark_offsets() that could return
ERR__UNKNOWN_PARTITIONfor existing partitions shortly after the client instance was created. - The OpenSSL context is now initialized with
TLS_client_method()(on OpenSSL >= 1.1.0) instead of the deprecated and outdatedSSLv23_client_method(). - The initial cluster connection on client instance creation could sometimes
be delayed up to 1 second if a
group.idortransactional.idwas configured (#3305). - Speed up triggering of new broker connections in certain cases by exiting the broker thread io/op poll loop when a wakeup op is received.
- SASL GSSAPI: The Kerberos kinit refresh command was triggered from
rd_kafka_new()which made this call blocking if the refresh command was taking long. The refresh is now performed by the background rdkafka main thread. - Fix busy-loop (100% CPU on the broker threads) during the handshake phase of an SSL connection.
- Disconnects during SSL handshake are now propagated as transport errors rather than SSL errors, since these disconnects are at the transport level (e.g., incorrect listener, flaky load balancer, etc) and not due to SSL issues.
- Increment metadata fast refresh interval backoff exponentially (@ajbarb, #3237).
- Unthrottled requests are no longer counted in the
brokers[].throttlestatistics object. - Log CONFWARN warning when global topic configuration properties
are overwritten by explicitly setting a
default_topic_conf.
Consumer fixes¶
- If a rebalance happened during a
consume_batch..()call the already accumulated messages for revoked partitions were not purged, which would pass messages to the application for partitions that were no longer owned by the consumer. Fixed by @jliunyu. #3340. - Fix balancing and reassignment issues with the cooperative-sticky assignor. #3306.
- Fix incorrect detection of first rebalance in sticky assignor (@hallfox).
- Aborted transactions with no messages produced to a partition could
cause further successfully committed messages in the same Fetch response to
be ignored, resulting in consumer-side message loss.
A log message along the lines
Abort txn ctrl msg bad order at offset 7501: expected before or at 7702: messages in aborted transactions may be delivered to the applicationwould be seen. This is a rare occurrence where a transactional producer would register with the partition but not produce any messages before aborting the transaction. - The consumer group deemed cached metadata up to date by checking
topic.metadata.refresh.interval.ms: if this property was set too low it would cause cached metadata to be unusable and new metadata to be fetched, which could delay the time it took for a rebalance to settle. It now correctly usesmetadata.max.age.msinstead. - The consumer group timed auto commit would attempt commits during rebalances, which could result in "Illegal generation" errors. This is now fixed, the timed auto committer is only employed in the steady state when no rebalances are taking places. Offsets are still auto committed when partitions are revoked.
- Retriable FindCoordinatorRequest errors are no longer propagated to the application as they are retried automatically.
- Fix rare crash (assert
rktp_started) on consumer termination (introduced in v1.6.0). - Fix unaligned access and possibly corrupted snappy decompression when building with MSVC (@azat)
- A consumer configured with the
cooperative-stickyassignor did not actively Leave the group on unsubscribe(). This delayed the rebalance for the remaining group members by up tosession.timeout.ms. - The current subscription list was sometimes leaked when unsubscribing.
Producer fixes¶
- The timeout value of
flush()was not respected when delivery reports were scheduled as events (such as for confluent-kafka-go) rather than callbacks. - There was a race conditition in
purge()which could cause newly created partition objects, or partitions that were changing leaders, to not have their message queues purged. This could causeabort_transaction()to time out. This issue is now fixed. - In certain high-thruput produce rate patterns producing could stall for
1 second, regardless of
linger.ms, due to rate-limiting of internal queue wakeups. This is now fixed by not rate-limiting queue wakeups but instead limiting them to one wakeup per queue reader poll. #2912.
Transactional Producer fixes¶
- KIP-360: Fatal Idempotent producer errors are now recoverable by the
transactional producer and will raise a
txn_requires_abort()error. - If the cluster went down between
produce()andcommit_transaction()and before any partitions had been registered with the coordinator, the messages would time out but the commit would succeed because nothing had been sent to the coordinator. This is now fixed. - If the current transaction failed while
commit_transaction()was checking the current transaction state an invalid state transaction could occur which in turn would trigger a assertion crash. This issue showed up as "Invalid txn state transition: .." crashes, and is now fixed by properly synchronizing both checking and transition of state.
1.6.1 (2021-02-24)¶
librdkafka v1.6.1¶
librdkafka v1.6.1 is a maintenance release.
Upgrade considerations¶
- Fatal idempotent producer errors are now also fatal to the transactional producer. This is a necessary step to maintain data integrity prior to librdkafka supporting KIP-360. Applications should check any transactional API errors for the is_fatal flag and decommission the transactional producer if the flag is set.
- The consumer error raised by
auto.offset.reset=errornow has error-code set toERR__AUTO_OFFSET_RESETto allow an application to differentiate between auto offset resets and other consumer errors.
Fixes¶
General fixes¶
- Admin API and transactional
send_offsets_to_transaction()coordinator requests, such as TxnOffsetCommitRequest, could in rare cases be sent multiple times which could cause a crash. ssl.ca.location=probeis now enabled by default on Mac OSX since the librdkafka-bundled OpenSSL might not have the same default CA search paths as the system or brew installed OpenSSL. Probing scans all known locations.
Transactional Producer fixes¶
- Fatal idempotent producer errors are now also fatal to the transactional producer.
- The transactional producer could crash if the transaction failed while
send_offsets_to_transaction()was called. - Group coordinator requests for transactional
send_offsets_to_transaction()calls would leak memory if the underlying request was attempted to be sent after the transaction had failed. - When gradually producing to multiple partitions (resulting in multiple underlying AddPartitionsToTxnRequests) sub-sequent partitions could get stuck in pending state under certain conditions. These pending partitions would not send queued messages to the broker and eventually trigger message timeouts, failing the current transaction. This is now fixed.
- Committing an empty transaction (no messages were produced and no offsets were sent) would previously raise a fatal error due to invalid state on the transaction coordinator. We now allow empty/no-op transactions to be committed.
Consumer fixes¶
- The consumer will now retry indefinitely (or until the assignment is changed)
to retrieve committed offsets. This fixes the issue where only two retries
were attempted when outstanding transactions were blocking OffsetFetch
requests with
ERR_UNSTABLE_OFFSET_COMMIT. #3265
1.6.0 (2021-01-26)¶
librdkafka v1.6.0¶
librdkafka v1.6.0 is feature release:
- KIP-429 Incremental rebalancing with sticky consumer group partition assignor (KIP-54) (by @mhowlett).
- KIP-480 Sticky producer partitioning (
sticky.partitioning.linger.ms) - achieves higher throughput and lower latency through sticky selection of random partition (by @abbycriswell). - AdminAPI: Add support for
DeleteRecords(),DeleteGroups()andDeleteConsumerGroupOffsets()(by @gridaphobe) - KIP-447 Producer scalability for exactly once semantics - allows a single transactional producer to be used for multiple input partitions. Requires Apache Kafka 2.5 or later.
- Transactional producer fixes and improvements, see Transactional Producer fixes below.
- The librdkafka.redist NuGet package now supports Linux ARM64/Aarch64.
Upgrade considerations¶
- Sticky producer partitioning (
sticky.partitioning.linger.ms) is enabled by default (10 milliseconds) which affects the distribution of randomly partitioned messages, where previously these messages would be evenly distributed over the available partitions they are now partitioned to a single partition for the duration of the sticky time (10 milliseconds by default) before a new random sticky partition is selected. - The new KIP-447 transactional producer scalability guarantees are only supported on Apache Kafka 2.5 or later, on earlier releases you will need to use one producer per input partition for EOS. This limitation is not enforced by the producer or broker.
- Error handling for the transactional producer has been improved, see the Transactional Producer fixes below for more information.
Known issues¶
- The Transactional Producer's API timeout handling is inconsistent with the
underlying protocol requests, it is therefore strongly recommended that
applications call
rd_kafka_commit_transaction()andrd_kafka_abort_transaction()with thetimeout_msparameter set to-1, which will use the remaining transaction timeout.
Enhancements¶
- KIP-107, KIP-204: AdminAPI: Added
DeleteRecords()(by @gridaphobe). - KIP-229: AdminAPI: Added
DeleteGroups()(by @gridaphobe). - KIP-496: AdminAPI: Added
DeleteConsumerGroupOffsets(). - KIP-464: AdminAPI: Added support for broker-side default partition count
and replication factor for
CreateTopics(). - Windows: Added
ssl.ca.certificate.storesto specify a list of Windows Certificate Stores to read CA certificates from, e.g.,CA,Root.Rootremains the default store. - Use reentrant
rand_r()on supporting platforms which decreases lock contention (@azat). - Added
assignordebug context for troubleshooting consumer partition assignments. - Updated to OpenSSL v1.1.1i when building dependencies.
- Update bundled lz4 (used when
./configure --disable-lz4-ext) to v1.9.3 which has vast performance improvements. - Added
rd_kafka_conf_get_default_topic_conf()to retrieve the default topic configuration object from a global configuration object. - Added
confdebugging context todebug- shows set configuration properties on client and topic instantiation. Sensitive properties are redacted. - Added
rd_kafka_queue_yield()to cancel a blocking queue call. - Will now log a warning when multiple ClusterIds are seen, which is an indication that the client might be erroneously configured to connect to multiple clusters which is not supported.
- Added
rd_kafka_seek_partitions()to seek multiple partitions to per-partition specific offsets.
Fixes¶
General fixes¶
- Fix a use-after-free crash when certain coordinator requests were retried.
- The C++
oauthbearer_set_token()function would callfree()on anew-created pointer, possibly leading to crashes or heap corruption (#3194)
Consumer fixes¶
- The consumer assignment and consumer group implementations have been decoupled, simplified and made more strict and robust. This will sort out a number of edge cases for the consumer where the behaviour was previously undefined.
- Partition fetch state was not set to STOPPED if OffsetCommit failed.
- The session timeout is now enforced locally also when the coordinator connection is down, which was not previously the case.
Transactional Producer fixes¶
- Transaction commit or abort failures on the broker, such as when the producer was fenced by a newer instance, were not propagated to the application resulting in failed commits seeming successful. This was a critical race condition for applications that had a delay after producing messages (or sendings offsets) before committing or aborting the transaction. This issue has now been fixed and test coverage improved.
- The transactional producer API would return
RD_KAFKA_RESP_ERR__STATEwhen API calls were attempted after the transaction had failed, we now try to return the error that caused the transaction to fail in the first place, such asRD_KAFKA_RESP_ERR__FENCEDwhen the producer has been fenced, orRD_KAFKA_RESP_ERR__TIMED_OUTwhen the transaction has timed out. - Transactional producer retry count for transactional control protocol
requests has been increased from 3 to infinite, retriable errors
are now automatically retried by the producer until success or the
transaction timeout is exceeded. This fixes the case where
rd_kafka_send_offsets_to_transaction()would fail the current transaction into an abortable state whenCONCURRENT_TRANSACTIONSwas returned by the broker (which is a transient error) and the 3 retries were exhausted.
Producer fixes¶
- Calling
rd_kafka_topic_new()with a topic config object withmessage.timeout.msset could sometimes adjust the globallinger.msproperty (if not explicitly configured) which was not desired, this is now fixed and the auto adjustment is only done based on thedefault_topic_confat producer creation. rd_kafka_flush()could previously returnRD_KAFKA_RESP_ERR__TIMED_OUTjust as the timeout was reached if the messages had been flushed but there were now no more messages. This has been fixed.
Checksums¶
Release asset checksums:
* v1.6.0.zip SHA256 af6f301a1c35abb8ad2bb0bab0e8919957be26c03a9a10f833c8f97d6c405aa8
* v1.6.0.tar.gz SHA256 3130cbd391ef683dc9acf9f83fe82ff93b8730a1a34d0518e93c250929be9f6b
1.5.3 (2020-12-09)¶
librdkafka v1.5.3¶
librdkafka v1.5.3 is a maintenance release.
Upgrade considerations¶
- CentOS 6 is now EOL and is no longer included in binary librdkafka packages, such as NuGet.
Fixes¶
General fixes¶
- Fix a use-after-free crash when certain coordinator requests were retried.
Consumer fixes¶
- Consumer would not filter out messages for aborted transactions if the messages were compressed (#3020).
- Consumer destroy without prior
close()could hang in certain cgrp states (@gridaphobe, #3127). - Fix possible null dereference in
Message::errstr()(#3140). - The
roundrobinpartition assignment strategy could get stuck in an endless loop or generate uneven assignments in case the group members had asymmetric subscriptions (e.g., c1 subscribes to t1,t2 while c2 subscribes to t2,t3). (#3159)
Checksums¶
Release asset checksums:
* v1.5.3.zip SHA256 3f24271232a42f2d5ac8aab3ab1a5ddbf305f9a1ae223c840d17c221d12fe4c1
* v1.5.3.tar.gz SHA256 2105ca01fef5beca10c9f010bc50342b15d5ce6b73b2489b012e6d09a008b7bf
Last modified: 2025-10-22 10:06:37