RabbitMQ - Can I Set Limits by Custom Attribute - rabbitmq

I'm evaluating using RabbitMQ as a message queue broker/framework to replace an internally built message queue (using C# if that matters).
I will have a service with N threads, each thread being a consumer for a specific queue. There may be more than one thread that shares the same queue. I believe I would use the prefetch property of 1 for each consumer so that a consumer thread receives 1 message at a time.
Lets use an example where I have 4 consumer threads and all look at a queue named "reports". These reports can be run for different "customers". I want to avoid allowing a customer to monopolize the queue so lets say I don't want any single customer to use more than 2 consumers at any given time (if there are other queued messages waiting). If there are no other queued messages waiting for other customers then I'd like to allow all consumers to be eligible.
With my limited RabbitMQ understanding so far, I believe that I could either use a topic pattern to indicate the customer or I could use a custom header.
I'd like to know if there's a design pattern to support defining a limit per unique header/customer value.
I'm not asking for anyone to write the code for me, I just want to know if anyone can tell me "no that won't work" in advance before I waste a bunch of time getting ramped up.
If my question doesn't make sense please let me know and I'll update it with more information. Thanks in advance.

Related

Is there a standard pattern for scanning a job table executing some actions?

(I realize that my title is poor. If after reading the question you have an improvement in mind, please either edit it or tell me and I'll change it.)
I have the relatively common scenario of a job table which has 1 row for some thing that needs to be done. For example, it could be a list of emails to be sent. The table looks something like this:
ID Completed TimeCompleted anything else...
---- --------- ------------- ----------------
1 No blabla
2 No blabla
3 Yes 01:04:22
...
I'm looking either for a standard practice/pattern (or code - C#/SQL Server preferred) for periodically "scanning" (I use the term "scanning" very loosely) this table, finding the not-completed items, doing the action and then marking them completed once done successfully.
In addition to the basic process for accomplishing the above, I'm considering the following requirements:
I'd like some means of "scaling linearly", e.g. running multiple "worker processes" simultaneously or threading or whatever. (Just a specific technical thought - I'm assuming that as a result of this requirement, I need some method of marking an item as "in progress" to avoid attempting the action multiple times.)
Each item in the table should only be executed once.
Some other thoughts:
I'm not particularly concerned with the implementation being done in the database (e.g. in T-SQL or PL/SQL code) vs. some external program code (e.g. a standalone executable or some action triggered by a web page) which is executed against the database
Whether the "doing the action" part is done synchronously or asynchronously is not something I'm considering as part of this question.
If you're willing to consider non-database technologies, the best (though not the only) solution is message queuing (often in conjunction with a database that contains each job's details). Message queues provide a lot of functionality, but the basic workflow is simple:
1) One process puts a 'job message' (perhaps just an id) on a queue.
2) Another process keeps an eye on the queue. It polls the queue for work, and pulls jobs it finds off the queue, one at a time, in the order they were received. Items you've pulled off the queue are effectively marked as 'in progress' - they are no longer available to other processes.
3) For critical workflows, you can perform a transactional read - in the event of a system failure, the transaction rolls back and the message is still on the queue. If there's some other kind of exception (like a timeout during a database read), you might just forward the message to a special error queue.
The simplest way to scale this is to have your reader process dispatch multiple threads to handle jobs it pulls off the queue. Alternately, you can scale out using multiple reader processes, which may be on separate servers.
.NET support includes Microsoft Message Queue, and either Windows Communication Foundation or the classes in the System.Messaging namespace. It requires some setup and configuration (you have to create the queues and configure permissions), but it's worth it.
In order to scale, you might want to consider scanning for jobs that are ready then adding them to a message queue. This way multiple consumers can read ready jobs off the queue. Marking jobs as "in progress" could be as simple as putting that value in the Completed column, or you could add a TimeStarted column and have a pre-determined timeout period before a job will be reset and be eligible for another worker thread to process. (The latter approach assumes the processing failed if the time elapses without the job completing. Failing after some number of attempts should call for manual inspection of that job.) The same daemon process that scans the database for ready jobs to add to the queue can look for jobs that have timed out.
If you're using SQL 2005+, you may want to investigate Service Broker. It's pretty much designed for this.

RabbitMQ Consumer Design for Multiple Exchange-Queue Model

I have a RabbitMQ setup with following configuration.
each Exchange is FANOUT type
Multiple Queue attached to each Exchange.
BlockingConnection is made by consumer.
Single Consumer to handle all callbacks.
Problem -
Some payload take longer time to process than others, which leads the consumer to stay idle even when there are payloads in other queue.
Question -
How should I implement the consumer to avoid long waits ? Should I
run separate consumer for each module ? any user experience ?
Can I configure RabbitMQ to handle these situations ? if so how.?
First it would be nice to know why do you have more than one fanout exchange? Do you really need this? Fanout exchange sends messages to all queues...
Just have more consumers. Check this example from rabbitmq tutorial.
You don't really need to configure rabbitmq explicitly, everything can be done with the clients (publishers and subscribers), you just need to figure out how many exchanges do you need and which type should they be etc.
First, what programming language are u using? Most common languages, such as python, java, c#, all support creating additional threads for parallel process.
Let's say you consume the queue like below (pseu code):
def callback(ch, method, properties, body) ...
def threaded_function(ch, method, properties, body) ...
channel.basic_qos(prefetch_count=3)
channel.basic_consume(callback, queue='task_queue')
channel.start_consuming()
first, setting "prefetch_count=3" allows your consumer to have at-most 3 messages in not-ack status concurrently.
In the callback method, you should start a thread for executing each message with threaded_function. At the end of the threaded_function method body, do:
ch.basic_ack(delivery_tag = method.delivery_tag)
so that, at-most 3 messages could be processed concurrently, even it takes longer time for one or two of the threads to run, the others could still process next messages.

How to solve message disorder in RabbitMq? [duplicate]

I need to choose a new Queue broker for my new project.
This time I need a scalable queue that supports pub/sub, and keeping message ordering is a must.
I read Alexis comment: He writes:
"Indeed, we think RabbitMQ provides stronger ordering than Kafka"
I read the message ordering section in rabbitmq docs:
"Messages can be returned to the queue using AMQP methods that feature
a requeue
parameter (basic.recover, basic.reject and basic.nack), or due to a channel
closing while holding unacknowledged messages...With release 2.7.0 and later
it is still possible for individual consumers to observe messages out of
order if the queue has multiple subscribers. This is due to the actions of
other subscribers who may requeue messages. From the perspective of the queue
the messages are always held in the publication order."
If I need to handle messages by their order, I can only use rabbitMQ with an exclusive queue to each consumer?
Is RabbitMQ still considered a good solution for ordered message queuing?
Well, let's take a closer look at the scenario you are describing above. I think it's important to paste the documentation immediately prior to the snippet in your question to provide context:
Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Messages can be returned to the queue using AMQP methods that feature
a requeue parameter (basic.recover, basic.reject and basic.nack), or
due to a channel closing while holding unacknowledged messages. Any of
these scenarios caused messages to be requeued at the back of the
queue for RabbitMQ releases earlier than 2.7.0. From RabbitMQ release
2.7.0, messages are always held in the queue in publication order, even in the presence of requeueing or channel closure. (emphasis added)
So, it is clear that RabbitMQ, from 2.7.0 onward, is making a rather drastic improvement over the original AMQP specification with regard to message ordering.
With multiple (parallel) consumers, order of processing cannot be guaranteed.
The third paragraph (pasted in the question) goes on to give a disclaimer, which I will paraphrase: "if you have multiple processors in the queue, there is no longer a guarantee that messages will be processed in order." All they are saying here is that RabbitMQ cannot defy the laws of mathematics.
Consider a line of customers at a bank. This particular bank prides itself on helping customers in the order they came into the bank. Customers line up in a queue, and are served by the next of 3 available tellers.
This morning, it so happened that all three tellers became available at the same time, and the next 3 customers approached. Suddenly, the first of the three tellers became violently ill, and could not finish serving the first customer in the line. By the time this happened, teller 2 had finished with customer 2 and teller 3 had already begun to serve customer 3.
Now, one of two things can happen. (1) The first customer in line can go back to the head of the line or (2) the first customer can pre-empt the third customer, causing that teller to stop working on the third customer and start working on the first. This type of pre-emption logic is not supported by RabbitMQ, nor any other message broker that I'm aware of. In either case, the first customer actually does not end up getting helped first - the second customer does, being lucky enough to get a good, fast teller off the bat. The only way to guarantee customers are helped in order is to have one teller helping customers one at a time, which will cause major customer service issues for the bank.
I hope this helps to illustrate the problem you are asking about. It is not possible to ensure that messages get handled in order in every possible case, given that you have multiple consumers. It doesn't matter if you have multiple queues, multiple exclusive consumers, different brokers, etc. - there is no way to guarantee a priori that messages are answered in order with multiple consumers. But RabbitMQ will make a best-effort.
Message ordering is preserved in Kafka, but only within partitions rather than globally. If your data need both global ordering and partitions, this does make things difficult. However, if you just need to make sure that all of the same events for the same user, etc... end up in the same partition so that they are properly ordered, you may do so. The producer is in charge of the partition that they write to, so if you are able to logically partition your data this may be preferable.
I think there are two things in this question which are not similar, consumption order and processing order.
Message Queues can -to a degree- give you a guarantee that messages will get consumed in order, they can't, however, give you any guarantees on the order of their processing.
The main difference here is that there are some aspects of message processing which cannot be determined at consumption time, for example:
As mentioned a consumer can fail while processing, here the message's consumption order was correct, however, the consumer failed to process it correctly, which will make it go back to the queue, and until now the consumption order is still intact but we don't know how the processing order is now
If by "processing" we mean that the message is now discarded and finished processing completely, then consider the case when your processing time is not linear, in other words processing one message takes longer than another, so if message 3 takes longer in processing than anticipated, then messages 4 and 5 might get consumed and finish processing before message 3 does
So even if you managed to get the message back to the front of the queue (which by the way violates the consumption order) you still cannot guarantee that all messages before the next message have finished processing.
If you want to ensure the processing order then:
Have only 1 consumer instance at all times
Or don't use a messaging queue and do the processing in a synchronous blocking method, which might sound bad but in many cases and business requirements is completely valid and sometimes even critical
There are proper ways to guarantuee the order of messages within RabbitMQ subscriptions.
If you use multiple consumers, they will process the message using a shared ExecutorService. See also ConnectionFactory.setSharedExecutor(...). You could set a Executors.newSingleThreadExecutor().
If you use one Consumer with a single queue, you can bind this queue using multiple bindingKeys (they may have wildcards). The messages will be placed into the queue in the same order that they were received by the message broker.
For example you have a single publisher that publishes messages where the order is important:
try (Connection connection2 = factory.newConnection();
Channel channel2 = connection.createChannel()) {
// publish messages alternating to two different topics
for (int i = 0; i < messageCount; i++) {
final String routingKey = i % 2 == 0 ? routingEven : routingOdd;
channel2.basicPublish(exchange, routingKey, null, ("Hello" + i).getBytes(UTF_8));
}
}
You now might want to receive messages from both topics in a queue in the same order that they were published:
// declare a queue for the consumer
final String queueName = channel.queueDeclare().getQueue();
// we bind to queue with the two different routingKeys
final String routingEven = "even";
final String routingOdd = "odd";
channel.queueBind(queueName, exchange, routingEven);
channel.queueBind(queueName, exchange, routingOdd);
channel.basicConsume(queueName, true, new DefaultConsumer(channel) { ... });
The Consumer will now receive the messages in the order that they were published, regardless of the fact that you used different topics.
There are some good 5-Minute Tutorials in the RabbitMQ documentation that might be helpful:
https://www.rabbitmq.com/tutorials/tutorial-five-java.html

RabbitMQ - subscribe to message type as it gets created

I'm new to RabbitMQ and I'm wondering how to implement the following: producer creates tasks for multiple sites, there's a bunch of consumers that should process these tasks one by one, but only talking to 1 site with concurrency of 1, without starting a new task for this site before the previous one ended. This way slow site would be processed slowly, and the fast ones - fast (as opposed by slow sites taking up all the worker capacity).
Ideally a site would be processed only by one worker at a time, being replaced by another worker if it dies. This seems like a task for exclusive queues, but apparently there's no easy way to list and subscribe to new queues. What is the proper way to achieve such results with RabbitMQ?
I think you may have things the wrong way round. For workers you have 1 or more producers sending to 1 exchange. The exchange has 1 queue (you can send directly to the queue, but all that is really doing is going via a default exchange, I prefer to be explicit). All consumers connect to the single queue and read off tasks in turn. You should set the queue to require messages to be ACKed before removing them. That way if a process dies it should be returned to the queue and picked up by the next consumer/worker.

RabbitMQ queue order management

I'm currently implementing rabbitMQ for a tracking system with multiple front producers writing on the same queue.
Basically i have two types of messages that are sent in the queue, as the tracking workflow has two steps : impression/click => lead/sale.
It is very simple : The user clicks a banner, then performs an action on the website that he was redirected to. This action can take a few seconds to a few days to be done.
I need to consume the lead or sale AFTER the according impresison or click.
The problem is that i need to consume messages in a chronological order. While everything should be good if all producers send messages in the queue at the same speed (ie, the messages should order properly in a FIFO way) i will have issues when one of the producer writes (for some reason) slower in the queue.
For example, if my lead action occurs one second after the click action, and the click producers stalls for a couple seconds, i'll consume the lead before the click and my tracking system won't work.
I'd like to know how to set an order for a queue according to a header that's attached to the message.
All my servers are synchronized and their clocks have <1ns difference, so i'd like to order my queue according to this information, but i can't find anywhere in the doc a way to setup the queue order or consumption order.
Thanks for your help.
AMQP queues are FIFO queue. Under high numbers simultaneous message been published there might be some ambiguity which message come first, so you may expect that message one and message two might not be in same order in queue as they happened in real world. It's a price you have pay for HA and speed. If you want to know more about it you can ask a question on IRC rabbitmq channel.
I think the queue means the queue, i.e. first in - first out. Mabe you can sort them during the consumption? I mean you take, for example, 10 messages from the queue, parse them, and put them to your own queue or list in the proper order.

Resources