Understanding WaterDrop error handling mechanisms is crucial for developing robust and reliable Kafka-based applications. !!! notice "" This document focuses on error handling for the Standard Producer (non-transactional). For information regarding the error behavior of the Transactional Producer, it is highly recommended to refer to the [Transactional Producer](https://karafka.io/docs/WaterDrop-Transactions.md) documentation. WaterDrop operates with a fully asynchronous architecture, maintaining a memory buffer for efficiently handling messages. This buffer stores messages waiting to be dispatched or already in the delivery process. Following the delivery of a message or the occurrence of an error after reaching the maximum retry limit, WaterDrop enqueues a delivery event into an internal event queue. This event includes the relevant delivery outcome, ensuring that each message's status is accurately tracked and managed within the system. ## Operational Modes WaterDrop provides three distinct APIs, each with unique error-handling behaviors that are essential to understand for practical usage: 1. **Single Message Dispatch (`#produce_sync` and `#produce_async`)**: These methods send a single message to Kafka. Errors in this mode are specifically related to the individual message being sent. If an error occurs, it's directly tied to the single message dispatch attempt, making it straightforward to identify and handle issues related to message production or delivery. 2. **Batch Dispatch (`#produce_many_sync` and `#produce_many_async`)**: These methods allow sending multiple messages to Kafka in a batch. In this mode, errors can be more complex, as they might pertain to any single message within the batch. It's vital to have a strategy to identify which message(s) caused the error and respond accordingly. Error handling in this context needs to consider partial failures where some messages are dispatched successfully while others are not. 3. **Transactional Dispatch**: This mode supports operations within a Kafka transaction. It suits scenarios where you must maintain exactly-once delivery semantics or atomicity across multiple messages and partitions. Errors in this mode can be transaction-wide, affecting all messages sent within the transaction's scope. The transactional producer operates under its own set of rules and complexities, and it's crucial to refer to the [specific documentation](https://karafka.io/docs/WaterDrop-Transactions.md) page dedicated to transactional dispatch for guidance on handling errors effectively. ## Error Types In WaterDrop, errors encountered during the message-handling process can be categorized into five distinct types. Each type represents a specific stage or aspect of the message delivery lifecycle, highlighting the diverse issues in a message queuing system. - **Pre-Handle Inline Errors**: These errors occur at the initial stage of message production, preventing the creation of a delivery handle. Inline errors indicate the message has not been sent to the message queue. A typical example of this type of error is the `:queue_full`, which occurs when the message cannot be queued due to a lack of available buffer space. This type of error is immediate and directly related to the message production process and indicates a dispatch failure. - **Wait Timeout Errors**: This error arises when there is an exception during the invocation of the `#wait` method on a delivery handle. This can happen either when calling `#wait` directly after `#produce_async`, or when producing messages synchronously, especially if the maximum wait time is reached. Notably, a wait error does not necessarily mean that the message will not be delivered; it primarily indicates that the allotted wait time for the message to be processed was exceeded. Please know that `#wait` can raise additional errors, indicating final delivery failure. With the default configuration where `max_wait_timeout` exceeds other message delivery timeouts, the `#wait` raised error should always be final. - **Intermediate Errors**: These errors can occur anytime, are not necessarily linked to producing specific messages, do not happen inline, and are published via the `error.occurred` notifications channel. They usually signify operational problems within the system and are often temporary. Intermediate errors might indicate issues such as network interruptions or temporary system malfunctions. They are not directly tied to the fate of individual messages but rather to the overall health and functioning of the messaging system. - **Delivery Failures**: This type of error is specifically related to the non-delivery of a message. A delivery failure occurs when a message, identifiable by its label, is retried several times but ultimately fails to be delivered. After a certain period, WaterDrop determines that it is no longer feasible to continue attempting delivery. This error signifies a definitive failure in the message delivery process, marking the end of the message's lifecycle with a non-delivery status. - **ProduceMany Errors**: During non-transactional batch dispatches, some messages may be successfully enqueued, and some may not. In such a case, this error will be raised. It will contain a `#dispatched` method with appropriate delivery handles for successfully enqueued messages. Those messages have the potential to be delivered based on their delivery report, but messages without matching delivery handles were for sure rejected and not enqueued for delivery. - **Transactional ProduceMany Errors**: In a transactional batch dispatch, all messages within the transaction are either successfully enqueued and delivered together or not at all. If a failure occurs during the transaction, no messages are dispatched, and a rollback is performed. Therefore, the `#dispatched` method will always be empty in this error, as either all messages have been delivered successfully or none have been delivered. The transactional nature ensures atomicity, meaning that partial success or failure is not possible, and no message delivery handles will be available for any messages in case of a rollback. Each error type plays a crucial role in understanding and managing the complexities of message handling in WaterDrop, providing precise categorization for troubleshooting and system optimization. ## Errors' Impact on the Delivery
Error Type | Delivery Failed | Details |
---|---|---|
Pre Handle Inline Errors | Yes | Errors occurring before delivery confirmation suggest non-delivery. For non-transactional batches, partial delivery may occur. ProduceManyError is raised, detailing messages via #dispatched for successful sends, while #cause reveals the original error. |
Wait Timeout Errors | No | The Rdkafka::Producer::WaitTimeoutError occurs when the #wait exceeds its limit without receiving a delivery report. It implies prolonged waiting, not necessarily message non-delivery. |
Intermediate Errors on error.occurred |
No | Intermediate errors without a delivery_report key in error.occurred are temporary, identified by a librdkafka.error type, indicating ongoing processes or transient issues. |
Wait Inline Errors (excluding WaitTimeoutError ) |
Yes | Errors from #wait other than WaitTimeoutError signify an available delivery report with errors. In ProduceManyError cases, delivery may be partial; check #dispatched for success and #cause for error origins. |
ProduceMany Errors | Partially Yes | WaterDrop::Errors::ProduceManyError s are raised during batch dispatches with full queues. Some messages may be sent successfully (see #dispatched for details), while others fail. The #cause method provides the specific error reason. |
Transactional ProduceMany Errors | Yes | WaterDrop::Errors::ProduceManyError s raised during transactional batch dispatches. If a failure occurs, no messages are sent, and a rollback is performed. The #dispatched method will be empty, as either all messages are successfully enqueued, or none are. The #cause method provides the specific error reason. |