Asynchronous Tasks with Celery & Django

by
10th June 2016

What’s Celery?

 

It is an asynchronous job queue used to run tasks in the background based on distributed message passing written in python. It allows you to run time-consuming Python functions in the background. In other words it lets you do the heavy lifting later or at a specific time.

Celery uses “brokers” to pass messages between your project and the celery task queue. A task is a piece of code that will be run asynchronously.

I will try to explain how it works, with a few code examples:
First we connect to the broker and register a task

from celery import Celery

app = Celery('tasks', broker='amqp://guest@localhost//')

@app.task
def time_expensive_task(x, y):
    send_some_emails()

@app.task
def collect_information_from_other_sites():
    request.get("http://google.com")

Now from another module we can execute the task by using one of the different invocations, which will serialize the arguments passed to it and put that information on the selected broker along with a reference on which function must be called.

from tasks import time_expensive_task

time_expensive_task.delay(4, 4)

# will execute 10 seconds from now
time_expensive_task.apply_async(args=[4, 4], countdown=10)

“delay” is like calling a normal function, but it doesn’t offer any special options, “apply_async” takes as kwargs several options like:

  • countdown – Number of seconds into the future that the task should execute. Defaults to immediate execution.
  • eta – A datetime object describing the absolute time and date of when the task should be executed. May not be specified if countdown is also supplied.
  • expires – Either a int, describing the number of seconds, or a datetime object that describes the absolute date and time when the task should expire. The task will not be executed after the expiration time.
  • retry – If enabled, sending of the task message will be retried in the event of connection loss or failure.
  • serializer – A string identifying the default serialization method to use. Can be pickle, json, yaml, msgpack or any custom serialization method that has been registered with kombu.serialization.registry. Defaults to the serializer attribute.

Then we need to run the workers. Each worker will be pulling information from the broker and doing the queued tasks registered with the “apply_async” method.

celery -A tasks worker --loglevel=info

Some use cases for Celery inside a Django app

 

Inside Django we could split our requests into several asynchronous jobs, do the required things inside the request / response flow and let celery handle the non essentials tasks later. For example, a common use case for using Celery is to send out emails, because emails providers take around ~500ms to answer requests. Another use it could be to perform concurrent calls to services or business logic, update the cache or search indexes after saving an object. Another common case is when we need to verify some information provided by the user but it takes a long time to perform, and because of the time it could take, it is not allowed to be executed inside the request/response call. In this case we isolate the code inside a task and we could just return a code to the user, which the user could use to poll for the status, or also we could implement some kind of push notification and let the user know that the process has finished.

Some words on the Brokers

 

Let’s take a look at the brokers. First I will describe the Broker Pattern in a few words. It is an architectural pattern that can be used to structure distributed software systems with decoupled components that interact by remote service invocations. The broker component is responsible for coordinating communication, such as forwarding requests, as well as for transmitting results and exceptions.


Here you are a list of brokers that Celery could use, listed as a comparison table. Celery recommends RabbitMQ but you can choose any.

 

 

Name Status Monitoring Remote Control
RabbitMQ Stable Yes Yes
Redis Stable Yes Yes
Mongo DB Experimental Yes Yes
Beanstalk Experimental No No
Amazon SQS Experimental No No
Couch DB Experimental No No
Zookeeper Experimental No No
Django DB Experimental No No
SQLAlchemy Experimental No No
Iron MQ 3rd party No No

 

Missing monitor support means that the transport does not implement events, and as such Flower, celery events, celerymon and other event-based monitoring tools will not work.

Remote control means the ability to inspect and manage workers at runtime using the celery inspect and celery control commands (and other tools using the remote control API).

Although there is no big difference between the two stable brokers, in my experience RabbitMQ performed better under high load.