Using Concurrency and Parallelism in Ruby

Ruby on Rails

Using Concurrency and Parallelism in Ruby

By Oleg K. May 13th, 2016

At every Ruby meetup you are most likely to hear two things: Ruby is slow and it has GIL. All Ruby developers know about it and it was strange to face this topic at the conference again. However, after talking to other developers I was surprised to learn that only few of them know when it is better to write parallel algorithms in Ruby.

The majority said that there was no point to use parallelism as all threads were executed one by one (thanks to GIL).

I will try to show you the cases when concurrency and parallelism make sense.

Data Recovery from Memory Card

Once I needed to recover data from my action camera. As there were no suitable programs in Ubuntu I decided to write it by myself.

Firstly, I made dump of the memory card with the help of dd which I was going to use next.

dd if=/dev/mmcblk0p1 of=/image.img

The search algorithm is quite simple: the files I need have definite start identifiers "\x00\x00\x00 ftypavc1", and for defining the end of the file it’s enough to find the beginning of the next one. It’s not necessary to invent anything more complicated. As there is no fragmentation on the card the file is solid.

The only difficulty is that image size is 30 GB, that’s why you need to read it piece by piece.

Soon I got the ready-to-use script.

To facilitate the start I used optparse, which allows to process all incoming parameters easily.

And of course it was necessary to test everything before going live.

I generated the original file and 7 file parts.

ls spec/support/
1_test_100b.mp4  2_test_111b.mp4  3_test_2x100b.mp4  4_test_70b_80b.mp4  5_test_273b.mp4  6_test_ori.mp4  7_test_with_r.mp4  8_test_44b_100b.mp4

Having tested everything, I managed to restore the majority of necessary data.

But after listening to Thijs Cadier’s report where he showed the difference between chat organization in forks, threads and callbacks at RubyConfBy I decided to use parallelism for this script.

We can do it in two ways in Ruby:

1. Native OS threads, threads with GIL, where switching occurs according to IO events

2. Forks which create a separate subprocess

Both methods are similar by implementation: the incoming file is divided into several threads by size; each thread contains offset which is used for search process.

Threads

This code is enough to start a certain thread:

threads = []
options.threads.times do |i|
 threads << Thread.new { process_thread(options, i, last += 1,  [last += (part_size - 1), max_offset].min) }
end
threads.each { |thr| thr.join }

Threads are good enough because everything is processed in one program and memory overhead is low. Even having GIL considered, it’s easy to synchronize threads.

Forks

Everything is quite the same, I only needed to wrap up incoming parameters into lambda expression.

l = -> (_start, _end) { process_thread(options, i, _start, _end) }
forks << fork { l[start, end_part] }
Process.waitall

Fork creates subprocess, so the synchronization becomes really problematic.

For testing I generated a file with random data sized 1 GB.

dd if=/dev/urandom of=image.random.iso bs=64M count=32

I also added 10 beginning identifiers.

irb(main):001:0> start = "\x00\x00\x00 ftypavc1"
irb(main):002:0> input = 'image.random.iso'
irb(main):005:0> p = File.size(input)/10
=> 107374179
irb(main):006:0> 10.times {|i| IO.write('image.random.iso', start, i * p)}

Configuration:

Image is stored on HDD

Data are extracted to SDD

Processor: Intel(R) Core(TM) i3-2130 CPU @ 3.40GHz, 2 cores + Hyper-Threading

The size of the fragment to read: 5Mb

Test results:

472

In all cases we haven’t managed to save time. Why?

Everything is very simple – HDD-SSD connection is our so-called “bottle neck”, i.e. files were processed so fast that in fact it was like simple copying. As a result, it was impossible to get faster results.

Fork

Where could we use subprocess?

We would achieve good results if processors spent more time. So let’s add the code to load it.

The best solution is to use bcrypt module.

Now we will reduce the size of readable part to 5Mb and add this calculation.

10.times { BCrypt::Password.create('secret') }

We will get the following file part processing/time calculation correlation: 0.169/0.0883s.

The results are predictable:

477

Now we can see the advantages in using forks as it is the processer that is loaded this time and though 4 separate processes load CPU we achieve better results.

Threads do not give us anything useful (thanks to GIL), so we have only one thread working at one time that switches process execution while reading every part.

We will be able to see differences in core load parameters while checking processor load values.

Threads

Is it reasonable to use threads with GIL?

We should understand that we are not likely to outrun fork because of constant context switching. The only thing left is to reduce this difference.

GIL switches the thread processing at every event, so if we want to succeed we need to make IO output our bottle neck. That’s how we will be able to prolong our time of bcrypt processing on IO (the perfect option would be simultaneous work with several sources of input/output, but it would require substantial code changes).

The easiest way that came to my mind was to work directly with the memory card, which would be like a file receiver.

As memory card speed is very slow, we reduce the amount of threads to 2 and add fsync to record the file. This option commands to write data immediately with no caching.

484

Forks and threads have similar results and bcrypt fetch time was partially hidden by input/output.

Conclusion

If our algorithm uses processor+memory, only forks can help us in this case. However, we will get memory overhead and have a hard time trying to manage thread synchronization.

If we have a “bottle neck” – incoming IO, then threads allow us to hide processing time behind IO events partially. Besides, data synchronization is achieved easily. GIL lets us stop worrying about thread safe, because it is one thread executed at a time (unless you decide to launch it all on JRuby).

If you do have problems with IO there is no point using both forks and threads.

All script variants are available on GitHub (in separate branches).

Using Concurrency and Parallelism in Ruby

Oleg K.

Browse Recent Posts

Bitcoin Hardfork Developed by Ikantam Team

Employees vs. Independent contractors: do you know the real price of hiring?

Employees vs. Independent contractors: do you know the real price of hiring?

	701 Ridge Hill Blvd., 4H, Yonkers, NY 10710, Phone: 650-353-7636
	Company Brochure (PDF)