Karafka's Routing Patterns is a powerful feature that offers flexibility in routing messages from various topics, including topics created during the runtime of Karafka processes. This feature allows you to use regular expression (regexp) patterns in routes. When a matching Kafka topic is detected, Karafka will automatically expand the routing and start consumption without additional configuration. This feature greatly simplifies the management of dynamically created Kafka topics.
How it works
Routing Patterns is not just about using regex patterns. It's about the marriage of regexp and Karafka's topic routing system.
When you define a route using a regexp pattern, Karafka monitors the Kafka topics. As soon as a topic matching the pattern emerges, Karafka takes the initiative. Without waiting for manual interventions or service restarts, it dynamically adds the topic to the routing tree, initiates a consumer for this topic, and starts processing data.
Below, you can find a conceptual diagram of how the discovery process works:
*This example illustrates how the detection process works. When Karafka detects new topics in Kafka, it will try to match them, expand routes, and process incoming data.
Upon detecting a new topic, Karafka seamlessly integrates its operations just as with pre-existing ones. Notably, regexp patterns identify topics even during application initialization, ensuring compatibility with topics established prior, provided they haven't been previously defined in the routes.
Representation of Routing Patterns
Defining a pattern within Karafka automatically translates this into an underlying topic representation with a regular expression (regexp) that matches the appropriate Kafka topics.
To support this concept, from the Routing Patterns feature perspective, Karafka has three types of topics:
regular Represents the standard Kafka topics. These are directly matched based on their name without the need for any patterns. This type is used when a straightforward, one-to-one correlation with an existing topic and patterns are not used.
matcher: This type signifies the representation of a regular expression used by
librdkafka. It is the gateway for Karafka's dynamic topic discovery, laying the foundation for the
discovered This type comes into play when there's a real, tangible topic that Karafka begins to listen to after it was matched with a regular expression.
The matcher topic holds a paramount position in the dynamic topic discovery mechanism. It embodies a regular expression subscription, acting as the initial point of discovery. When a new topic aligns with the matcher topic's criteria (whether during boot-up or at runtime), Karafka uses the matcher topic's configuration as the blueprint. This new topic inherits the settings and becomes part of the same consumer group and subscription group as its originating matcher topic.
Subsequently, this newly registered topic is created as the
:discovered type. To simplify its identification, especially in environments where multiple topics are at play, it's labeled as
:discovered in the Web UI.
Diagram below represents the relationship between topics of various types and how they operate within Karafka routing:
*This example illustrates how a single consumer group and subscription group can incorporate multiple topics of various types. All `:discovered` topics always use the same settings as their base `:matcher` topic.
Enabling Routing Patterns
Creating patterns for routing in Karafka is done by using the routing
#pattern method. Think of it as the twin of the
#topic method but with an added flair of pattern matching:
class KarafkaApp < Karafka::App setup do |config| # ... end routes.draw do pattern(/.*_dlq/) do consumer DlqConsumer end # patterns accept same settings as `#topic` # so they can use all the Karafka features pattern(/.*_customers/) do consumer CustomersDataConsumer long_running_job true manual_offset_management true delay_by(60_000) end end end
The same usage contexts apply since this method is a twin to the
#topic. You can define your patterns in the following places:
Routing root Level: Place it directly in your routing.
Consumer Group: Use it within a
Subscription Group: Insert it inside the
Regardless of where you use it, it works similarly to the
#topic method, but searches for topics based on patterns.
Patterns crafted in such a way are called "anonymous patterns". This terminology highlights that these patterns don't have a predefined name. Instead, Karafka generates a name prefixed with "karafka-pattern-" based on the regular expression content. This approach ensures unique and distinguishable matcher topics but, at the same time, makes it much harder to exclude pattern routes from the CLI. Anonymous patterns are easy to start with and great for development. However, we do recommend assigning them names in the later stages before shipping to production.
Named and anonymous patterns in Karafka work the same way when setting up routing. The key difference is that named patterns have a specific name you choose, while anonymous patterns don't. This name is handy when picking or skipping certain routes in Karafka using the CLI. It's good to start with anonymous patterns when testing things out. But, as you finalize how you use them, switching to named patterns can make things more transparent and consistent.
You define named patterns similarly to anonymous by using the
#pattern method but instead of providing only the regular expression, the expectation is that you provide both the name and the regexp:
class KarafkaApp < Karafka::App setup do |config| # ... end routes.draw do pattern(:dlqs_pattern, /.*_dlq/) do consumer DlqConsumer end pattern(:customers_pattern, /.*_customers/) do consumer CustomersDataConsumer long_running_job true manual_offset_management true delay_by(60_000) end end end
ActiveJob Routing Patterns
In the case of ActiveJob, a new method is available called
#active_job_pattern that allows you to define pattern matchings for ActiveJob jobs. Its API is similar to the
#pattern one and works the same way:
class KarafkaApp < Karafka::App setup do |config| # ... end routes.draw do # Anonymous active job pattern active_job_pattern(/.*_fast_jobs/) # or a named on with some extra options active_job_pattern(:active_jobs, /.*_late_jobs/) do delay_by(60_000) end end end
Limiting Patterns used per process
Karafka's named and anonymous patterns are associated with a specific matcher topic name. This uniform naming system allows more flexibility when using the Karafka Command Line Interface (CLI). As with standard topics, you can include or exclude matcher topics when running Karafka processes by referencing their name.
Giving the below routing setup:
class KarafkaApp < Karafka::App setup do |config| # ... end routes.draw do pattern(:dlqs_pattern, /.*_dlq/) do consumer DlqConsumer end pattern(:customers_pattern, /.*_customers/) do consumer CustomersDataConsumer end end end
You can limit the process operations to only specific patterns:
# This code will not run topics and patterns other than `dlqs_pattern` # # You can use space to provide more patterns bundle exec karafka server --include-topics dlqs_pattern
as well as you can exclude patterns from being part of the operations:
# This code will run ony topics and patterns other than `dlqs_pattern` # # You can use space to provide more patterns bundle exec karafka server --exclude-topics dlqs_pattern
There are key aspects to consider to ensure efficient and consistent behavior:
Changing Regexp in Anonymous Patterns: If the regular expression for an anonymous pattern is changed, its name will change too.
Avoid Overlapping Regexps: It's crucial to avoid defining multiple regular expressions within the same consumer group that might match the same topics. This can lead to unexpected behavior because of possible reassignments and rebalances.
Thoroughly Test Patterns: Always ensure your patterns don't overlap within a single consumer group. Regular testing can prevent unwanted behavior.
Potential Regexp Differences: While unlikely, the regular expressions in Ruby and
librdkafkain C might not work identically. Always test the matching behaviors before deploying to production.
Runtime Topic Detection Isn't Immediate: When a new topic emerges, its detection isn't real-time. It's influenced by the cache TTL, governed by the
topic.metadata.refresh.interval.mssetting. The default is 5 seconds in development and 5 minutes in production. For most production scenarios, sticking to the 5-minute default is advised as it strikes a good balance between operational responsiveness and system load.
Internal Regular Expression Requirements of
librdkafka: The library requires regular expression strings to start with
^. Karafka's Routing Patterns adapt Ruby's regular expressions to fit this format internally. Remembering this transformation and thoroughly testing your patterns before deploying is important. You can find the adjusted regular expression in the Web UI under the routing page topic details view if you wish to review the adjusted regular expression.
Please ensure you're familiar with these considerations to harness the full power of Routing Patterns without encountering unexpected issues.
Tenant-specific Topics: Modern SaaS applications often cater to multiple tenants, each requiring its own data isolation. You can ensure data segregation by having a Kafka topic for each tenant, like
tenantB_logs. Routing Patterns can simplify the consumption from these dynamically created topics.
Environment-based Topics: Development environments like staging, production, or QA might generate events. Using routing patterns can streamline the consumption process if these are categorized into topics like
Versioned Topics: As systems evolve, data formats and structures change. Regexp patterns can handle these variations smoothly if you've chosen to version your topics, like
Date-based Topics: The feature becomes invaluable for systems that rotate topics based on timeframes, like
logs_202302, ensuring no topic goes unnoticed.
Special-event Topics: Seasonal events, promotions, or sales like
holiday_discountsoften have dedicated topics. This feature ensures that such transient topics are efficiently catered to.
Automated Testing Topics: In CI/CD pipelines, where automated tests might create on-the-fly topics like
test_run_002, regexp routing can prove to be a boon.
Backup and Archive Topics: Systems that create backup topics, like
backup_2023Q2, can benefit by dynamically routing these for monitoring or analysis.
Error and Debug Topics: Special topics created for debugging or error tracking, like
debug_minor, can be consumed automatically using patterns.
Dedicated DLQ (Dead Letter Queue) Topics: Handling erroneous or unprocessable messages becomes crucial as applications scale. Systems can isolate and address these problematic messages by employing dedicated Kafka topics for DLQ, such as
dlq_notifications. Karafka's Routing Patterns can be leveraged to automatically detect and route messages to these DLQ topics, ensuring efficient monitoring and subsequent troubleshooting.
Karafka's Routing Patterns offers a dynamic solution for message routing, utilizing the power of regular expressions. By defining regexp patterns within routes, this feature allows automatic detection and consumption of Kafka topics that match the specified patterns. This functionality ensures agile integration of new and pre-existing topics that have yet to be defined in routes, simplifying the management process and eliminating the need for manual configuration. Whether handling tenant-specific or dedicated Dead Letter Queue topics, Karafka's Routing Patterns enhance flexibility and efficiency in data flow management.