Retry after errors, with exponential backup (in Ruby)

There are situations where some errors can occur. Let’s say you connect to a remote service, like a database or an API over HTTP. An error raised by your client is not always permanent. It might be a network glitch or something else.

Here is an attempt (in Ruby) to retry on error, with a longer sleep time between attempts.

class WhateverException < StandardError; end
debug_counter = 0

sleep_times = [0.1, 0.2, 0.5, 1]
    fail WhateverException, "counter=#{debug_counter += 1}"
rescue WhateverException
    if time = sleep_times[(nb_retries ||= 0)]
        sleep time
        puts "retry #{nb_retries} after #{time}s"
        nb_retries += 1

The 2 first lines are just context ; an exception class and a counter for debugging purposes.

sleep_times = [0.1, 0.2, 0.5, 1] is an array of times in seconds that I want to wait at each attempt.

The begin/rescue block allow to rescue the exception when it occurs, but also the retry (see later).

When an expected exception occurs, Ruby executes the body of the rescue part. It takes the first sleep time, wait that long, puts a debug line of text (that you'll want to remove or change to an audit log message), increments the number of attempts and executes the retry statement.

A retry statement rolls back to the previous begin block and executes it again, without any condition. That's why we have to deal with a maximum number of attempts or it will loop forever.

If we reach the end of the sleep_times array of times, Ruby will return nil and the if condition will fail. The original exception is raised again, as is.

Here is the output of this "script" :

ruby ~/tmp/retry.rb
retry 0 after 0.1s
retry 1 after 0.2s
retry 2 after 0.5s
retry 3 after 1s
/Users/jlecour/tmp/retry.rb:6:in `': counter: 5 (WhateverException)

Remember that in Ruby raise and fail are exactly the same method, but as Jim Weirich was saying :

Because I use exceptions to indicate failures, I almost always use the « fail » keyword rather than the « raise » keyword in Ruby. Fail and raise are synonyms so there is no difference except that « fail » more clearly communcates that the method has failed. The only time I use “raise” is when I am catching an exception and re-raising it, because here I’m not failing, but explicitly and purposefully raising an exception.

Publié dans Informatique, Personnel | Tagué | Laisser un commentaire

Rsync to just delete files on destination when missing from source

I have this situation where I have a huge number of images (about 50 millions, with 3-4 versions of each one), organized in a nested tree of directories, like images/103/045/475/example-{format}.jpg.

This immense catalog of images is replicated from our internal « master » to a CDN-like box. Sometimes, the replication is out of sync and some images a destroyed on the master but on the slave.

It’s not a surprise that Rsync has the right set of options to deal with this :

rsync --recursive --delete --ignore-existing --existing --prune-empty-dirs --verbose src/ dst/

Let me explain each option.

--recursive will explore the whole directory tree, not just the first level.

--delete will remove files in dst that are not in src.

--ignore-existing will not update any file in dst

--existing will not create any file in dst.

--prune-empty-dirs will remove empty directories in dst, not just deleting files.

--verbose will log what it does.

By not trying to compare the files, it’s much faster, but of course it’s only cleanup, not a real synchronization.

You can also run this a first time with --dry-run to print each action instead of executing them, to verify that Rsync does what you want.

The complete list of options is available in the man page

Publié dans Informatique | Tagué | 3 commentaires

How to use a different ActiveRecord connection pool between Unicorn and Sidekiq?

I work on a Rails (4.1) application, sitting behind Unicorn and backed by a couple of Sidekiq processes.

A quick reminder : Unicorn is an application server (for Rack-compatible Ruby applications) based on the multi-process master-workers model, and Sidekiq is a background processing based on the multi-threaded model.

The jobs we put in the Sidekiq queue are also multi-threaded and need to access the database behind ActiveRecord in parallel. It forces us to have a bigger connection pool than usual : 20 instead of the default 5.

But the main part of the application, which run solely inside Unicorn, doesn’t need to use that much connections. In fact we have 16 Unicorn workers, so the bigger the pool is the more connections are opened.

Even if they are mainly idle, it’s still a waste of resource and it make the supervision of resources consumption more difficult.

Here how I’ve managed to have a separate pool size for Sidekiq and Unicorn.


A typical Unicorn configuration contains something like this

before_fork do |server, worker|
  # the following is recommended for Rails + "preload_app true"
  # as there's no need for the master process to hold a connection
  if defined?(ActiveRecord::Base)

after_fork do |server, worker|
  if defined?(ActiveRecord::Base)

Before forking, the master process close all its connections, and after forking, every worker re-open their connections.

ActiveRecord::Base.establish_connection can be called with a connection « name » if there is something by that name in the config/database.yml file.

Let’s create such a configuration, by overriding only what’s necessary :

production: &production
  adapter: mysql2
  host: localhost
  port: 3306
  database: database_name
  username: my_app
  password: password
  encoding: utf8
  reconnect: true
  pool: 5

  <<: *production
  username: my_app_unicorn
  pool: 5

You can see that I’ve kept the same pool setting’s default value. That way It’s easier to have a Unicorn setting different from the default setting, used when I run Rake tasks, …

I’ve also changed the username because I want to differentiate them when looking at the database opened connections. It’s completely optional.

Then we nee to configure Unicorn to use that new configuration :

after_fork do |server, worker|
  if defined?(ActiveRecord::Base)
    spec = "#{Rails.env}_unicorn"
    if Rails.application.config.database_configuration.key?(spec)


For Sidekiq it’s quite similar. We begin by adding another override in the config/database.yml file :

  <<: *production
  username: my_app_sidekiq
  pool: 20

NB : I set Sidekiq up in the config/initializers/sidekiq.rb file, but you can also put this in many other places. Consult Sidekiq’s documentation for more details.

Let’s use our custom connection inside Sidekiq :

Sidekiq.configure_server do |config|
  if defined?(ActiveRecord::Base)
    spec = "#{Rails.env}_sidekiq"
    if Rails.application.config.database_configuration.key?(spec)
Publié dans Informatique | Tagué , , , | 1 commentaire

Monit, Unicorn and UTF-8

I’ve spent an entire afternoon trying to debug an issue we’ve been having with one of our Rails apps.

It is deployed on a Linux server, where Monit is in charge of supervising the Unicorn processes.

A quick reminder :

  • Unicorn is a Rack compatible application server (comparable to Passenger, Puma, Thin, …) ;
  • Monit is a daemon configured to look after resources, able to execute commands if the resource is not in the correct state. For example you can start/stop another daemon with Monit ;
  • Rails is a framework written in Ruby.

With one (and only one) of our many Rails apps, but on every front end server, we couldn’t start or restart Unicorn with Monit.


Usually when something doesn’t start (or crashes) you look at the log files, or you try to pipe the standard error output to something you can read. But with Monit it’s another world of debugging pain that you’re about to enter.

With probably many good reasons, Monit is not executing your commands as is. It is able to execute them as another user, but the environment is almost completely blank (except for a few MONIT_XXX variables). The PATH is unset, as most of what you’re used to rely on.

A common trick is to start the command by a shell invocation, and you command. For example : /bin/bash -c 'my_start_command', or even /bin/bash -c 'PATH=./bin:$PATH KEY=val my_start_command'.

Our issue

So our app wasn’t starting when launched by Monit, but was starting OK when executing the exact start command from the Monit configuration. A typo wasn’t the issue.

The complete start command is like this :

/bin/sh -c 'PATH=/home/user/.rbenv/bin:/home/user/.rbenv/shims:$PATH /home/user/app/current/bin/unicorn -E staging -c /home/user/app/current/config/unicorn/staging.rb -D'

I needed to find a way to print the output of this command. I’ve found a good solution. So I’ve put this wrapper call between my ENV setup and Unicorn’s binary execution.

My command was then :

/bin/sh -c 'PATH=/home/user/.rbenv/bin:/home/user/.rbenv/shims:$PATH /usr/local/bin/ /home/user/app/current/bin/unicorn -E staging -c /home/user/app/current/config/unicorn/staging.rb -D'

In the log file I’ve found this :

/home/user/app/shared/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/configurator.rb:664:in `parse_rackup_file':
invalid byte sequence in US-ASCII (ArgumentError)

A rackup_file is usually a file (written in Ruby) to configure the Rack part of the application.

But our rackup file was begining by # encoding UTF-8, which is the Ruby way to say that the content of the file is UTF-8 encoded.
For a reason beyond my knowledge, Ruby was not using this information when Unicorn was (simply) reading its content.

The solution

Looking closely at the Monit wrapper’s output, I’ve noticed that the LANG environment variable was not set (removed by Monit), so Ruby was defaulting back to ASCII for external encoding.

We’ve been using Monit + Rails + Unicorn for years within a dozen of projects, without any issue. Why now?

On this app, it was blowing up because of UTF-8 characters in the rackup file. If we removed them, it would start normally, but we needed them.
It turns out that Ruby or Rails is taking care of setting the correct encoding during the normal execution of the process. But when Unicorn was parsing this rackup file very early on, it only used the information he add from its environment.

Adding an explicit LANG=en_US.UTF-8 in the start command solved the issue.

/bin/sh -c 'PATH=/home/user/.rbenv/bin:/home/user/.rbenv/shims:$PATH LANG=en_US.UTF-8 /home/user/app/current/bin/unicorn -E staging -c /home/user/app/current/config/unicorn/staging.rb -D'

The good thing is that it’s modifying the environment only when Monit starts the app, without changing anything in the app itself.

I hope this explaination will help someone else and prevent them from losing a handful of hours.

Publié dans Informatique | Tagué , , , | Laisser un commentaire

How to rebuild native Ruby gems after a lib/system upgrade?

When you install Ruby gems that have native extensions (usually developed in C and compiled) sometimes that extension binds itself to a system library, for example Nokogiri with LibXML.

After a library upgrade (security minor update, or a full system upgrade), your gem has to be rebuilt to use the new library’s interface.

Most of the times, the gem will continue to work just fine, put will print a message like this :

WARNING: Nokogiri was built against LibXML version 2.7.8, but has dynamically loaded 2.8.0

So how can we rebuild the gem?

You could upgrade the gem’s version in your application’s dependency specifications (usually Bundler’s Gemfile), and redeploy your app. It’s ok, but what if you don’t want to change the version?

The best way would be to rebuild the existing gem and restart your app. There is a gem pristine --all command available. I’ve tried it but it didn’t solve my issue. Maybe the rebuild process doesn’t change the libraries paths, or something else. If you know about this, let me know too ;)

You could uninstall the gem and redeploy your app. If you’re using a deployment mechanism that checks for dependencies, it will reinstall the gem and build it against the latest library. But in the meantime, the gem is missing and your app might break.

You could finally briefly uninstall the gem and reinstall it. On a modern server, it’s a matter of tens of seconds.

The usual steps :

$ gem list | grep nokogiri
nokogiri (1.5.11)
$ gem uninstall --executables --ignore-dependencies nokogiri
Removing nokogiri
Successfully uninstalled nokogiri-1.5.11
$ gem install nokogiri -v 1.5.11
Fetching: nokogiri-1.5.11.gem (100%)
Building native extensions. This could take a while...
Successfully installed nokogiri-1.5.11
1 gem installed

The --executables --ignore-dependencies flags on uninstall disable confirmation for removing the gem executable(s) and dependency warnings.

[Edit] The --ignore-dependencies flag is better than --force. It is also compatible with Rubygems 1.8.

If you’re using something Bundler and Capistrano, there is an additional issue ; the gems are not installed in your traditional Ruby paths, they are in your application’s shared directory.

Here is the gem environment of one of my users :

$ gem env
RubyGems Environment:
- RUBY VERSION: 2.1.1 (2014-02-24 patchlevel 76) [x86_64-linux]
- INSTALLATION DIRECTORY: /home/jlecour/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0
- RUBY EXECUTABLE: /home/jlecour/.rbenv/versions/2.1.1/bin/ruby
- EXECUTABLE DIRECTORY: /home/jlecour/.rbenv/versions/2.1.1/bin
- SPEC CACHE DIRECTORY: /home/jlecour/.gem/specs
- ruby
- x86_64-linux
- /home/jlecour/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0
- /home/jlecour/.gem/ruby/2.1.0
- :update_sources => true
- :verbose => true
- :backtrace => false
- :bulk_threshold => 1000
- "gem" => "--no-rdoc --no-ri"
- :sources => [""]
- /home/jlecour/.rbenv/versions/2.1.1/bin
- /home/jlecour/.rbenv/libexec
- /home/jlecour/.rbenv/plugins/rbenv-vars/bin
- /home/jlecour/.rbenv/plugins/ruby-build/bin
- /home/jlecour/.rbenv/shims
- /home/jlecour/.rbenv/bin
- /home/jlecour/bin
- /usr/local/bin
- /usr/bin
- /bin
- /usr/local/games
- /usr/games

But my application’s gems are in /home/jlecour/apps/example/shared/bundle/ruby/2.1.0/

To tell the gem command to use this location, you have to set the GEM_HOME environment variable for each call (or exporting it at the beginning, but remember to set it back after)

The sequence becomes this :

$ export GEM_HOME=/home/jlecour/apps/example/shared/bundle/ruby/2.1.0/
$ gem list | grep nokogiri
nokogiri (1.5.11)
$ gem uninstall --executables --ignore-dependencies nokogiri
Removing nokogiri
Successfully uninstalled nokogiri-1.5.11
$ gem install nokogiri -v 1.5.11
Fetching: nokogiri-1.5.11.gem (100%)
Building native extensions. This could take a while...
Successfully installed nokogiri-1.5.11
1 gem installed
$ unset GEM_HOME

Then you just have to restart your application and voilà.

Publié dans Informatique | Tagué , | Laisser un commentaire

Elasticsearch : stored scripts for bulk updates

I’ve been trying to improve my game with Elasticsearch and found myself in a situation where I needed to update thousands of records in an index. Some of those records, depending on existing field values, wouldn’t need to be updated, but it couldn’t be determined without getting those records first.

Given the number of records and the facts that a lot of similar operations would take place concurrently, the chance of race conditions was high.

Then I’ve heard about scripts that are available in bulk update requests. Here is a very simple example :

{"script":"ctx._source.counter += value","params":{"counter":10}}
{"script":"ctx._source.counter += value","params":{"counter":4}}

Scripts are really useful. You can use MVEL (the basic/default embedded language), or Javascript, native Java and even Python. Some have even managed to use JRuby scripts.
Using a script is slower than not, but you can save some network roundtrips and let Elasticsearch decide if and how the record must be updated.

If the script is the same for a lot or updates, you can also choose to store it in the node and juste reference it in the update action.

You can store it in config/scripts (you might have to create this). The base directory depends on your installation. The .deb package puts it in /etc/elasticsearch/. The homebrew package puts it in /usr/local/Cellar/elasticsearch/_version_/config/scripts.

You can create a file config/scripts/myscript.mvel that must be accessible to the user who runs the elasticsearch process.

Your update action can be changed to :


Be careful with undescores in script names since Elasticsearch uses them to map to a nested directory structure. For example {"script":"my_perfect_script","params":{"counter":4}} will look for a script config/scripts/my/perfect/script.mvel

I’ve not verified this (yet) but it seems that the script must be copied on every node and the server might automatically reload scripts regularly. Check the documentation for details.

According to the documentation and other sources, MVEL is really easy, convenient and easy to write (mine was done in a matter of minutes, as a first time experience) but can be a little slow. When speed really matters, you can write native Java code. There is a lot more boilerplate code that needs to be written (it’s Java, right?) and the script must implement a predefined interface. I’ve nt done this yet and will definitely post a follow up if I do.

At first I’ve had issues with the stored script. It was not named properly, or containing code bugs, but the error message was less than informative. I’ve found the solution in Elasticsearch’s log file. At startup it will complain if the script can’t be compiled (at least with MVEL scripts).

Publié dans Informatique | Tagué , , | 1 commentaire

Participer à une coderetreat

J’organise cette année, à Marseille, une « coderetreat » à l’occasion de « Global Day of CodeRetreat« , le 14 décembre 2013.

Depuis que j’ai annoncé cet événement on m’a posé beaucoup de questions sur ce qu’est une coderetreat du point de vue des participants.

J’ai eu la chance de participer à une coderetreat puis d’en faciliter deux autres donc je peux en parler avec un point de vue complet.

Ce qu’est une coderetreat

Une coderetreat est un événement communautaire — dans le sens du rassemblement, et pas de l’exclusion — qui permet à des développeurs de se retrouver autour des fondamentaux de leur métier.

Ça dure une journée entière, généralement un samedi de 8h30 à 17h30-18h (c’est le cas cette fois), et des fois 2 journées.

C’est un événement gratuit pour les participants — des sponsors paient les frais logistiques — afin d’être le plus ouvert possible.

Le volontariat et la motivation sont primordiaux pour tirer le meilleur parti de cette journée aussi enrichissante qu’éprouvante.

Ça plonge les participants dans une atmosphère d’expérimentation, de remise en question des habitudes, de perfectionnement des pratiques, … hors de toute pression de production/livraison.

Si nous étions des musiciens ou des comédiens, ça ressemblerait plus à un entraînement qu’à une représentation.

C’est un moment d’expérimentation et de perfectionnement du Test Driven Development, des principes SOLID et des règles XP du Design Simple.

C’est une journée entière de pair-programing, durant laquelle on change de partenaire (et si possible d’outils) à chaque session, avec suppression systématique du code d’une session à l’autre.

Ce que n’est pas une coderetreat

Une coderetreat n’est pas une formation. On ne vient pas pour écouter un formateur parler et lire ses diapos.

Ça n’est pas non plus un atelier dirigé, avec un déroulé bien précis et un objectif à atteindre.

Il n’y a pas de technologie imposée, ni même étudiée en tant que telle.

Ça n’est pas une simple animation où on vient à la légère, pour regarder ou essayer. La participation est engageante et requiert une grande motivation : non pas par la difficulté (tous les niveaux y trouveront leur compte), mais par le bénéfice individuel et collectif proportionnel a cette motivation.

À qui s’adresse une coderetreat

C’est donc un événement pour les développeurs qui aiment questionner leur manière d’écrire du code, chercher des approches différentes aux problèmes courants, élargir leur champs de vision technique, … bref pour ceux qui recherchent la qualité dans leur pratique du développement.

Publié dans Informatique | Tagué , | Laisser un commentaire