How to rebuild native Ruby gems after a lib/system upgrade?

When you install Ruby gems that have native extensions (usually developed in C and compiled) sometimes that extension binds itself to a system library, for example Nokogiri with LibXML.

After a library upgrade (security minor update, or a full system upgrade), your gem has to be rebuilt to use the new library’s interface.

Most of the times, the gem will continue to work just fine, put will print a message like this :

WARNING: Nokogiri was built against LibXML version 2.7.8, but has dynamically loaded 2.8.0

So how can we rebuild the gem?

You could upgrade the gem’s version in your application’s dependency specifications (usually Bundler’s Gemfile), and redeploy your app. It’s ok, but what if you don’t want to change the version?

The best way would be to rebuild the existing gem and restart your app. There is a gem pristine --all command available. I’ve tried it but it didn’t solve my issue. Maybe the rebuild process doesn’t change the libraries paths, or something else. If you know about this, let me know too ;)

You could uninstall the gem and redeploy your app. If you’re using a deployment mechanism that checks for dependencies, it will reinstall the gem and build it against the latest library. But in the meantime, the gem is missing and your app might break.

You could finally briefly uninstall the gem and reinstall it. On a modern server, it’s a matter of tens of seconds.

The usual steps :

$ gem list | grep nokogiri
nokogiri (1.5.11)
$ gem uninstall --executables --ignore-dependencies nokogiri
Removing nokogiri
Successfully uninstalled nokogiri-1.5.11
$ gem install nokogiri -v 1.5.11
Fetching: nokogiri-1.5.11.gem (100%)
Building native extensions. This could take a while...
Successfully installed nokogiri-1.5.11
1 gem installed

The --executables --ignore-dependencies flags on uninstall disable confirmation for removing the gem executable(s) and dependency warnings.

[Edit] The --ignore-dependencies flag is better than --force. It is also compatible with Rubygems 1.8.

If you’re using something Bundler and Capistrano, there is an additional issue ; the gems are not installed in your traditional Ruby paths, they are in your application’s shared directory.

Here is the gem environment of one of my users :

$ gem env
RubyGems Environment:
- RUBYGEMS VERSION: 2.2.2
- RUBY VERSION: 2.1.1 (2014-02-24 patchlevel 76) [x86_64-linux]
- INSTALLATION DIRECTORY: /home/jlecour/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0
- RUBY EXECUTABLE: /home/jlecour/.rbenv/versions/2.1.1/bin/ruby
- EXECUTABLE DIRECTORY: /home/jlecour/.rbenv/versions/2.1.1/bin
- SPEC CACHE DIRECTORY: /home/jlecour/.gem/specs
- RUBYGEMS PLATFORMS:
- ruby
- x86_64-linux
- GEM PATHS:
- /home/jlecour/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0
- /home/jlecour/.gem/ruby/2.1.0
- GEM CONFIGURATION:
- :update_sources => true
- :verbose => true
- :backtrace => false
- :bulk_threshold => 1000
- "gem" => "--no-rdoc --no-ri"
- :sources => ["http://rubygems.org"]
- REMOTE SOURCES:
- http://rubygems.org
- http://gems.wishbed.net
- SHELL PATH:
- /home/jlecour/.rbenv/versions/2.1.1/bin
- /home/jlecour/.rbenv/libexec
- /home/jlecour/.rbenv/plugins/rbenv-vars/bin
- /home/jlecour/.rbenv/plugins/ruby-build/bin
- /home/jlecour/.rbenv/shims
- /home/jlecour/.rbenv/bin
- /home/jlecour/bin
- /usr/local/bin
- /usr/bin
- /bin
- /usr/local/games
- /usr/games

But my application’s gems are in /home/jlecour/apps/example/shared/bundle/ruby/2.1.0/

To tell the gem command to use this location, you have to set the GEM_HOME environment variable for each call (or exporting it at the beginning, but remember to set it back after)

The sequence becomes this :

$ export GEM_HOME=/home/jlecour/apps/example/shared/bundle/ruby/2.1.0/
$ gem list | grep nokogiri
nokogiri (1.5.11)
$ gem uninstall --executables --ignore-dependencies nokogiri
Removing nokogiri
Successfully uninstalled nokogiri-1.5.11
$ gem install nokogiri -v 1.5.11
Fetching: nokogiri-1.5.11.gem (100%)
Building native extensions. This could take a while...
Successfully installed nokogiri-1.5.11
1 gem installed
$ unset GEM_HOME

Then you just have to restart your application and voilà.

Publié dans Informatique | Tagué , | Poster un commentaire

Elasticsearch : stored scripts for bulk updates

I’ve been trying to improve my game with Elasticsearch and found myself in a situation where I needed to update thousands of records in an index. Some of those records, depending on existing field values, wouldn’t need to be updated, but it couldn’t be determined without getting those records first.

Given the number of records and the facts that a lot of similar operations would take place concurrently, the chance of race conditions was high.

Then I’ve heard about scripts that are available in bulk update requests. Here is a very simple example :

POST http://127.0.0.1:9200/_bulk
{"update":{"_index":"my_index","_type":"my_type","_id":"id1"}}
{"script":"ctx._source.counter += value","params":{"counter":10}}
{"update":{"_index":"my_index","_type":"my_type","_id":"id2"}}
{"script":"ctx._source.counter += value","params":{"counter":4}}
…

Scripts are really useful. You can use MVEL (the basic/default embedded language), or Javascript, native Java and even Python. Some have even managed to use JRuby scripts.
Using a script is slower than not, but you can save some network roundtrips and let Elasticsearch decide if and how the record must be updated.

If the script is the same for a lot or updates, you can also choose to store it in the node and juste reference it in the update action.

You can store it in config/scripts (you might have to create this). The base directory depends on your installation. The .deb package puts it in /etc/elasticsearch/. The homebrew package puts it in /usr/local/Cellar/elasticsearch/_version_/config/scripts.

You can create a file config/scripts/myscript.mvel that must be accessible to the user who runs the elasticsearch process.

Your update action can be changed to :

POST http://127.0.0.1:9200/_bulk
{"update":{"_index":"my_index","_type":"my_type","_id":"id1"}}
{"script":"myscript","params":{"counter":10}}
{"update":{"_index":"my_index","_type":"my_type","_id":"id2"}}
{"script":"myscript","params":{"counter":4}}

Be careful with undescores in script names since Elasticsearch uses them to map to a nested directory structure. For example {"script":"my_perfect_script","params":{"counter":4}} will look for a script config/scripts/my/perfect/script.mvel

I’ve not verified this (yet) but it seems that the script must be copied on every node and the server might automatically reload scripts regularly. Check the documentation for details.

According to the documentation and other sources, MVEL is really easy, convenient and easy to write (mine was done in a matter of minutes, as a first time experience) but can be a little slow. When speed really matters, you can write native Java code. There is a lot more boilerplate code that needs to be written (it’s Java, right?) and the script must implement a predefined interface. I’ve nt done this yet and will definitely post a follow up if I do.

At first I’ve had issues with the stored script. It was not named properly, or containing code bugs, but the error message was less than informative. I’ve found the solution in Elasticsearch’s log file. At startup it will complain if the script can’t be compiled (at least with MVEL scripts).

Publié dans Informatique | Tagué , , | Un commentaire

Participer à une coderetreat

J’organise cette année, à Marseille, une "coderetreat" à l’occasion de "Global Day of CodeRetreat", le 14 décembre 2013.

Depuis que j’ai annoncé cet événement on m’a posé beaucoup de questions sur ce qu’est une coderetreat du point de vue des participants.

J’ai eu la chance de participer à une coderetreat puis d’en faciliter deux autres donc je peux en parler avec un point de vue complet.

Ce qu’est une coderetreat

Une coderetreat est un événement communautaire — dans le sens du rassemblement, et pas de l’exclusion — qui permet à des développeurs de se retrouver autour des fondamentaux de leur métier.

Ça dure une journée entière, généralement un samedi de 8h30 à 17h30-18h (c’est le cas cette fois), et des fois 2 journées.

C’est un événement gratuit pour les participants — des sponsors paient les frais logistiques — afin d’être le plus ouvert possible.

Le volontariat et la motivation sont primordiaux pour tirer le meilleur parti de cette journée aussi enrichissante qu’éprouvante.

Ça plonge les participants dans une atmosphère d’expérimentation, de remise en question des habitudes, de perfectionnement des pratiques, … hors de toute pression de production/livraison.

Si nous étions des musiciens ou des comédiens, ça ressemblerait plus à un entraînement qu’à une représentation.

C’est un moment d’expérimentation et de perfectionnement du Test Driven Development, des principes SOLID et des règles XP du Design Simple.

C’est une journée entière de pair-programing, durant laquelle on change de partenaire (et si possible d’outils) à chaque session, avec suppression systématique du code d’une session à l’autre.

Ce que n’est pas une coderetreat

Une coderetreat n’est pas une formation. On ne vient pas pour écouter un formateur parler et lire ses diapos.

Ça n’est pas non plus un atelier dirigé, avec un déroulé bien précis et un objectif à atteindre.

Il n’y a pas de technologie imposée, ni même étudiée en tant que telle.

Ça n’est pas une simple animation où on vient à la légère, pour regarder ou essayer. La participation est engageante et requiert une grande motivation : non pas par la difficulté (tous les niveaux y trouveront leur compte), mais par le bénéfice individuel et collectif proportionnel a cette motivation.

À qui s’adresse une coderetreat

C’est donc un événement pour les développeurs qui aiment questionner leur manière d’écrire du code, chercher des approches différentes aux problèmes courants, élargir leur champs de vision technique, … bref pour ceux qui recherchent la qualité dans leur pratique du développement.

Publié dans Informatique | Tagué , | Poster un commentaire

Delete empty directories to reclaim inodes

I was facing a situation on a server of ours ; the inode usage of one of our partitions was dangerously high (93%) and growing steadily.

This partition is quite big (~ 1 TB) and contains mainly a lot of pictures and their thumbnails. Here, a lot is about 50 millions pictures (originals and thumbnails).

Those pictures are stored by ID, partitioned to have no more than 999 sub-directories at each level. If a picture has the ID 1234567, it is stored (with its thumbnails) at
/srv/images/001/234/556/picture.jpg.

The problem is that when my app is deleting a picture, it deletes the files but doesn’t clean after itself, leaving a lot of empty directories. That’s why, we are using only 73% of the available disk space, but 93% of available inodes.

It occurred to us that a directory in the filesystem (ext3 for us) is consuming 2 inodes. Thanks to Evolix (our wonderful hosting partner) we’ve ran a quite simple command to delete those useless directories :

find -mindepth 1 -type d -empty -delete

with a watch 'df -i | grep /srv' we quickly saw the number of available inodes growing again.

Publié dans Autrement, Informatique | Tagué | Poster un commentaire

HotelHotel cherche un développeur web expérimenté

Ça fait 4 ans que je travaille sur le site HotelHotel (et ce qui gravite autour), avec toujours autant de plaisir et de passion. Depuis peu, on a passé notre seuil de rentabilité. J’ai même fait un retour d’expérience technique à la conférence RuLu en juin 2013.

Seulement, on n’est plus assez de 3 pour assurer les développements techniques, alors on cherche des gens biens, compétents et passionnés par le développement web pour nous rejoindre et participer à l’aventure.

Vous trouvez différentes formes d’annonce sur notre blog, dans un Gist un peu plus geek, et sur HumanCoders. On a même fait un test de compatibilité (court et fun) pour savoir si on est fait l’un pour l’autre.

Si vous voulez en savoir plus sur la boîte, sur nous, sur nos valeurs et notre manière de travailler, il y a aussi nos comptes Twitter (@MrMoins, @colinux@jlecour, @hotelhotel), GitHub (MrMoins, colinuxjlecour et Autrement), le reste de mon blog. Et surtout, on est super ouvert à des discussions (en vrai ou pas).

Alors n’hésitez surtout pas à prendre contact, même si vous n’êtes pas sûr de bien cadrer (mais surtout, pensez à ces astuces). On verra bien.

Mise à jour (03/09/2013) : Fabien Catteau et Philippe Creux m’ont interviewé pour le podcast Parlons Ruby durant lequel nous avons parlé du contexte technique d’HotelHotel et de son évolution. Vous y trouverez plein de détails croustillants sur notre manière de travailler.

Publié dans Autrement, Informatique | Tagué , , | 2 Commentaires

Filtered Redis log of executed commands

If you deal with some kind of database, you should know by now that Redis is TEH AWESOME.

As part of the provided tools, the redis-cli binary is an invaluable tool to connect to a Redis database, via a TCP port or a UNIX socket (my preferred way).

Once connected to a Redis server via redis-cli you can send the MONITOR command. The clients then stops listening to regular inputs and starts to log every command received by the server, with a timestamp (seconds and microseconds).

Here is an output sample :

1353711173.069255 "MONITOR"
1353711204.631496 "SET" "users:1:login" "alfred"
1353711224.119123 "SET" "users:2:login" "tom"
1353711281.926336 "SADD" "users" "1" "2"
1353711297.878012 "SMEMBERS" "users"

The problem is that there doesn’t seem to be a built-in way to filter or save the command logs.

Redis being a good UNIX citizen acts like any other command regarding to input, outputs and pipes.

Let’s assume that the server is reachable on the default TCP port,

To simply print a command log
echo "MONITOR" | redis-cli
To print a filtered command log (only "add to Sets")
echo "MONITOR" | redis-cli | grep -i 'SADD'
To save a command log to a file
echo "MONITOR" | redis-cli > redis.log
To save a filtered command log to a file
echo "MONITOR" | redis-cli | grep -i 'SADD' > redis.log

NB : Redis is case insensitive for the commands it accepts.

Publié dans Informatique | Tagué , | Poster un commentaire

Multiple Redis instances on Mac OS X with Homebrew

I’ve already written (in french) about the benefit of having multiple Redis instances, and we use this technique on our servers.

On our development setup we use Mac OS X or Linux with Redis installed by a package manager. I’ll focus here on the Mac OS X + Homebrew part, but everything will be easy to adjust to a different OS or package manager.

Why do we need multiple Redis instances?

Every so often, we need or want to replace our local databases with the data from our production environment. We use a handful of different datastores : MySQL, Redis, MongoDB and Solr.

For MySQL, MongoDB and Solr, it’s quite trivial. We download dumps from our backup server and inject them into our development environment.
For Redis, it has never been that easy, here is why and what we’ve done to eliminate the pain.

How Redis stores it’s data

For each instance of a Redis server there is a dump file, usualy called dump.rdb that contains all the data needed by Redis. If you want to backup your Redis data, you just have to copy this file into another location. You don’t even need to stop the server since this file is always consistent. It may lack some data not yet written to disk (the delay depends on the configuratyion, usually a few seconds), but it is safe.

Since we’ve been using Redis, we wanted to separate different data parts from each other, with something more secure than a basic namespace in the keys. We’ve been using different “databases” inside a single Redis instance (there are 16 available by default), but everything is stored in a single dump.rdb file. As of Redis v2.4 I’ve not found a way to dump/restore only a specific database.

As I’ve said, we have 4 different dump files from our servers and only 1 (with 4 internal DBs) on our development environment which makes it impossible to import our production data locally.

The solution is quite simple ; let’s have as many Redis instances locally as on our production server.

Run multiple Redis instances locally

On Mac OS X, you can use Homebrew to manage third-parties packages, including Redis. When you install it, you get a binary (redis-server), a config file and a Launchd file. By default Homebrew doesn’t set Redis to start automaticaly but tells you what to do if you want this behavior.

$ brew info redis
redis 2.4.14

http://redis.io/

/usr/local/Cellar/redis/2.4.13 (9 files, 436K) *

https://github.com/mxcl/homebrew/commits/master/Library/Formula/redis.rb

==> Caveats
If this is your first install, automatically load on login with:
    mkdir -p ~/Library/LaunchAgents
    cp /usr/local/Cellar/redis/2.4.14/homebrew.mxcl.redis.plist ~/Library/LaunchAgents/
    launchctl load -w ~/Library/LaunchAgents/homebrew.mxcl.redis.plist

If this is an upgrade and you already have the homebrew.mxcl.redis.plist loaded:
    launchctl unload -w ~/Library/LaunchAgents/homebrew.mxcl.redis.plist
    cp /usr/local/Cellar/redis/2.4.14/homebrew.mxcl.redis.plist ~/Library/LaunchAgents/
    launchctl load -w ~/Library/LaunchAgents/homebrew.mxcl.redis.plist

To start redis manually:
    redis-server /usr/local/etc/redis.conf

To access the server:
    redis-cli

To have many instances of Redis running, you just need 1 binary, but you need to start it with different configurations. That’s what we’ll do.

All our instances share the same basic configuration, except for a few settings : the path of the dump.rdb and pid files, which TCP port or unix socket to use.

Let’s copy the default config file :

$ cp /usr/local/etc/redis{,-common}.conf

In our case we use unix sockets so in this common config file set port 0 the disable the use of TCP ports.

Then we can edit a /usr/local/etc/redis-1.conf with only these values :

include /usr/local/etc/redis-common.conf
pidfile /usr/local/var/run/redis-1.pid
unixsocket /tmp/redis-1.sock
dbfilename dump-1.rdb
vm-swap-file /tmp/redis-1.swap

We can copy this file for each instance we want, and just change the values. All the common configuration is shared. That’s up to you to decide which setting is shared and which is not. And as far as I know, you can override any setting in any configuration file, since the common config file is included at the top.

Now we need to launch those instances. Homebrew gives us a default file (homebrew.mxcl.redis.plist) :

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
  <dict>
    <key>KeepAlive</key>
    <true/>
    <key>Label</key>
    <string>homebrew.mxcl.redis</string>
    <key>ProgramArguments</key>
    <array>
      <string>/usr/local/bin/redis-server</string>
      <string>/usr/local/etc/redis.conf</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>UserName</key>
    <string>jlecour</string>
    <key>WorkingDirectory</key>
    <string>/usr/local/var</string>
    <key>StandardErrorPath</key>
    <string>/usr/local/var/log/redis.log</string>
    <key>StandardOutPath</key>
    <string>/usr/local/var/log/redis.log</string>
  </dict>
</plist>

The basic principle of Launchd is that if you load such a file, Launchd will take care of it. In this case it starts Redis after boot, makes sure that it is restared if the process dies, runs it with a specific user and with some arguments (the config file for example).

We will copy this file for each instance we want and change the values in it. Here is a ~/Library/LaunchAgents/custom1.redis.plist :

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
  <dict>
    <key>KeepAlive</key>
    <false/>
    <key>Label</key>
    <string>custom1.redis</string>
    <key>ProgramArguments</key>
    <array>
      <string>/usr/local/bin/redis-server</string>
      <string>/usr/local/etc/redis-1.conf</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>UserName</key>
    <string>jlecour</string>
    <key>WorkingDirectory</key>
    <string>/usr/local/var</string>
    <key>StandardErrorPath</key>
    <string>/usr/local/var/log/redis-1.log</string>
    <key>StandardOutPath</key>
    <string>/usr/local/var/log/redis-1.log</string>
  </dict>
</plist>

You see, we’ve just changed a couple of values.

Then we can load all these Launchd files :

$ launchctl load -w ~/Library/LaunchAgents/custom*.redis.plist

If you want to automate the start/stop of all your Redis instances, you can put this into your shell :

alias redis_start='launchctl load -w ~/Library/LaunchAgents/custom*.redis.plist'
alias redis_stop='launchctl unload -w ~/Library/LaunchAgents/custom*.redis.plist'

Restore production/backup data

Now that we have separate instances, just like on our production environment, we can restore the data for any of them, at any time. We just have to stop our local instance, replace the dump file with the one downloaded from our backup or production server and restart the local instance.

Since the dump file is just a persistence of the data Redis keeps in memory, it’s really important to stop the instance before you replace the dump file.
If you don’t do this and don’t restart the instance, it will never pick the “new” data up. If you restart the instance after you replaced the dump file, Redis will first dump it’s memory data into the file and replace the “new” date.

What have we done?

  • disabled the Launchd management of Redis as provided by Homebrew;
  • made separate config files for each instance (with shared settings to keep the maintainance simple)
  • made custom launchd files
  • imported a backup locally
Publié dans Autrement, Informatique, Mac | Tagué , , , | Un commentaire