{ Josh Rendek }

<3 Ruby & Go

Sidekiq vs Resque, with MRI and JRuby

Nov 3, 2012 - 10 minutes

Before we dive into the benchmarks of Resque vs Sidekiq it will first help to have a better understanding of how forking and threading works in Ruby.

Threading vs Forking

Forking

When you fork a process you are creating an entire copy of that process: the address space and all open file descriptors. You get a separate copy of the address space of the parent process, isolating any work done to that fork. If the forked child process does a lot of work and uses a lot of memory, when that child exits the memory gets free’d back to the operating system. If your programming language (MRI Ruby) doesn’t support actual kernel level threading, then this is the only way to spread work out across multiple cores since each process will get scheduled to a different core. You also gain some stability since if a child crashes the parent can just respawn a new fork, however there is a caveat. If the parent dies while there are children that haven’t exited, then those children become zombies.

Forking and Ruby

One important note about forking with Ruby is that the maintainers have done a good job on keeping memory usage down when forking. Ruby implements a copy on write system for memory allocation with child forks.

 1require 'benchmark'
 2
 3fork_pids = []
 4
 5# Lets fill up some memory
 6
 7objs = {}
 8objs['test'] = []
 91_000_000.times do
10  objs['test'] << Object.new
11end
12
13
14
1550.times do
16    fork_pids << Process.fork do
17        sleep 0.1
18    end
19end
20fork_pids.map{|p| Process.waitpid(p) }
21}

We can see this in action here:

However when we start modifying memory inside the child forks, memory quickly grows.

150.times do
2    fork_pids << Process.fork do
3      1_000_000.times do
4        objs << Object.new
5      end
6    end
7end
8fork_pids.map{|p| Process.waitpid(p) }

We’re now creating a million new objects in each forked child:

Threading

Threads on the other hand have considerably less overhead since they share address space, memory, and allow easier communication (versus inter-process communication with forks). Context switching between threads inside the same process is also generally cheaper than scheduling switches between processes. Depending on the runtime being used, any issues that might occur using threads (for instance needing to use lots of memory for a task) can be handled by the garbage collector for the most part. One of the benefits of threading is that you do not have to worry about zombie processes since all threads die when the process dies, avoiding the issue of zombies.

Threading with Ruby

As of 1.9 the GIL (Global Interpreter Lock) is gone! But it’s only been renamed to the GVL (Global VM Lock). The GVL in MRI ruby uses a lock called rb_thread_lock_t which is a mutex around when ruby code can be run. When no ruby objects are being touched, you can actually run ruby threads in parallel before the GVL kicks in again (ie: system level blocking call, IO blocking outside of ruby). After these blocking calls each thread checks the interrupt RUBY_VM_CHECK_INTS.

With MRI ruby threads are pre-emptively scheduled using a function called rb_thread_schedule which schedules an “interrupt” that lets each thread get a fair amount of execution time (every 10 microseconds). [source: thread.c:1018]

We can see an example of the GIL/GVL in action here:

 1threads = []
 2
 3objs = []
 4objs['test'] = []
 51_000_000.times do
 6  objs << Object.new
 7end
 8
 950.times do |num|
10  threads << Thread.new do
11    1_000_000.times do
12      objs << Object.new
13    end
14  end
15end
16
17threads.map(&:join)

Normally this would be an unsafe operation, but since the GIL/GVL exists we don’t have to worry about two threads adding to the same ruby object at once since only one thread can run on the VM at once and it ends up being an atomic operation (although don’t rely on this quirk for thread safety, it definitely doesn’t apply to any other VMs).

Another important note is that the Ruby GC is doing a really horrible job during this benchmark.

The memory kept growing so I had to kill the process after a few seconds.

Threading with JRuby on the JVM

JRuby specifies the use of native threads based on the operating system support using the getNativeThread call [2]. JRuby’s implementation of threads using the JVM means there is no GIL/GVL. This allows CPU bound processes to utilize all cores of a machine without having to deal with forking (which, in the case of resque, can be very expensive).

When trying to execute the GIL safe code above JRuby spits out a concurrency error: ConcurrencyError: Detected invalid array contents due to unsynchronized modifications with concurrent users

We can either add a mutex around this code or modify it to not worry about concurrent access. I chose the latter:

 1threads = []
 2
 3objs = {}
 4objs['test'] = []
 51_000_000.times do
 6  objs['test'] << Object.new
 7end
 8
 950.times do |num|
10  threads << Thread.new do
11    1_000_000.times do
12      objs[num] = [] if objs[num].nil?
13      objs[num] << Object.new
14    end
15  end
16end
17
18threads.map(&:join)

Compared to the MRI version, ruby running on the JVM was able to make some optimizations and keep memory usage around 800MB for the duration of the test:

Now that we have a better understanding of the differences between forking and threading in Ruby, lets move on to Sidekiq and Resque.

Sidekiq and Resque

Resque’s view of the world

Resque assumes chaos in your environment. It follows the forking model with C and ruby and makes a complete copy of each resque parent when a new job needs to be run. This has its advantages in preventing memory leaks, long running workers, and locking. You run into an issue with forking though when you need to increase the amount of workers on a machine. You end up not having enough spare CPU cycles since the majority are being taken up handling all the forking.

Resque follows a simple fork and do work model, each worker will take a job off the queue and fork a new process to do the job.

Resque @ Github

Sidekiq’s view of the world

Unlike Resque, Sidekiq uses threads and is extremely easy to use as a drop in replacement to Resque since they both work on the same perform method. When you dig into the results below you can see that Sidekiq’s claim of being able to handle a larger number of workers and amount of work is true. Due to using threads and not having to allocate a new stack and address space for each fork, you get that overhead back and are able to do more work with a threaded model.

Sidekiq follows the actor pattern. So compared to Resque which has N workers that fork, Sidekiq has an Actor manager, with N threads and one Fetcher actor which will pop jobs off Redis and hand them to the Manager. Sidekiq handles the “chaos” portion of Resque by catching all exceptions and bubbling them up to an exception handler such as Airbrake or Errbit.

Now that we know how Sidekiq and Resque work we can get on to testing them and comparing the results.

Sidekiq @ Github

The Test Code

The idea behind the test was to pick a CPU bound processing task, in this case SHA256 and apply it across a set of 20 numbers, 150,000 times.

 1require 'sidekiq'
 2require 'resque'
 3require 'digest'
 4
 5
 6# Running:
 7# sidekiq -r ./por.rb -c 240
 8#
 9# require 'sidekiq'
10# require './por'
11# queueing: 150_000.times { Sidekiq::Client.enqueue(POR, [rand(123098)]*20) }
12# queueing: 150_000.times { Resque.enqueue(POR, [rand(123098)]*20) }
13
14class POR
15  include Sidekiq::Worker
16
17  @queue = :por
18
19  def perform(arr)
20    arr.each do |a|
21      Digest::SHA2.new << a.to_s
22    end
23  end
24
25  def self.perform(arr)
26    arr.each do |a|
27      Digest::SHA2.new << a.to_s
28    end
29  end
30
31end

Test Machine

 1      Model Name: Mac Pro
 2      Model Identifier: MacPro4,1
 3      Processor Name: Quad-Core Intel Xeon
 4      Processor Speed: 2.26 GHz
 5      Number of Processors: 2
 6      Total Number of Cores: 8
 7      L2 Cache (per Core): 256 KB
 8      L3 Cache (per Processor): 8 MB
 9      Memory: 12 GB
10      Processor Interconnect Speed: 5.86 GT/s

This gives us a total of 16 cores to use for our testing. I’m also using a Crucial M4 SSD

Results

Time to Process 150,000 sets of 20 numbers

TypeTime to Completion (seconds)
Sidekiq (JRuby) 150 Threads88
Sidekiq (JRuby) 240 Threads89
Sidekiq (JRuby) 50 Threads91
Sidekiq (MRI) 5x5098
Sidekiq (MRI) 3x50120
Sidekiq (MRI) 50312
Resque 50396


All about the CPU

Resque: 50 workers

Here we can see that the forking is taking its toll on the available CPU we have for processing. Roughly 50% of the CPU is being wasted on forking and scheduling those new processes. Resque took 396 seconds to finish and process 150,000 jobs.

Sidekiq (MRI) 1 process, 50 threads

We’re not fully utilizing the CPU. When running this test it pegged one CPU at 100% usage and kept it there for the duration of the test. We have a slight overhead with system CPU usage. Sidekiq took 312 seconds with 50 threads using MRI Ruby. Lets now take a look at doing things a bit resque-ish, and use multiple sidekiq processes to get more threads scheduled across multiple CPUs.

Sidekiq (MRI) 3 processes, 50 threads

We’re doing better. We’ve cut our processing time roughly in third and we’re utilizing more of our resources (CPUs). 3 Sidekiq processes with 50 threads each (for a total of 150 threads) took 120 seconds to complete 150,000 jobs.

Sidekiq (MRI) 5 processes, 50 threads

As we keep adding more processes that get scheduled to different cores we’re seeing the CPU usage go up even further, however with more processes comes more overhead for process scheduling (versus thread scheduling). We’re still wasting CPU cycles, but we’re completing 150,000 jobs in 98 seconds.

Sidekiq (JRuby) 50 threads

We’re doing much better now with native threads. With 50 OS level threads, we’re completing our set of jobs in 91 seconds.

Sidekiq (JRuby) 150 threads & 240 Threads

We’re no longer seeing a increase in (much) CPU usage and only a slight decrease in processing time. As we keep adding more and more threads we end up running into some thread contention issues with accessing redis and how quickly we can pop things off the queue.

Overview

Even if we stick with the stock MRI ruby and go with Sidekiq, we’re going to see a huge decrease in CPU usage while also gaining a little bit of performance as well.

Sidekiq, overall, provides a cleaner, more object oriented interface (in my opinion) to inspecting jobs and what is going on in the processing queue.

In Resque you would do something like: Resque.size("queue_name"). However, in Sidekiq you would take your class, in this case, POR and call POR.jobs to get the list of jobs for that worker queue. (note: you need to require 'sidekiq/testing' to get access to the jobs method).

The only thing I find missing from Sidekiq that I enjoyed in Resque was the ability to inspect failed jobs in the web UI. However Sidekiq more than makes up for that with the ability to automatically retry failed jobs (although be careful you don’t introduce race conditions and accidentally DOS yourself).

And of course, JRuby comes out on top and gives us the best performance and bang for the buck (although your mileage may vary, depending on the task).

Further Reading

Deploying with JRuby: Deliver Scalable Web Apps using the JVM (Pragmatic Programmers)

JRuby Cookbook

Sidekiq & Resque

Sidekiq

Resque

I saw the question of “How can I prevent a class from being reopened again in Ruby?” pop up on the Ruby mailing list. While this is somewhat against the nature of ruby, it can be accomplished:

{% codeblock lang:ruby %} class Foo def Foo.method_added(name) raise “This class is closed for modification” end end

class Foo def testing p “test” end end {% endcodeblock %}

This will raise an exception anytime someone tries to reopen the class.

Writing Dependable Ruby & a Reddit CLI

Aug 20, 2012 - 7 minutes

View Source on Github

When you work on your code and are finished for the day, is what you have committed worry free? If another developer were to push your code in the middle of the night, would they be calling you at 3am?

Let’s see how we can improve our development cycle with testing so we can avoid those early morning calls. We’ll go over some of the basics with a simple project to start.

The most important part about TDD is getting quick feedback based on our desired design (the feedback loop).

Here is an example of how fast the tests run:

While this is a somewhat contrived example for the reddit cli we’re making, this can be applied equally as well when writing Rails applications. Only load the parts you need (ActionMailer, ActiveSupport, etc), usually you don’t need to load the entire rails stack. This can make your tests run in milliseconds instead of seconds. This lets you get feedback right away.

Before we go further into the testing discussion, lets setup a spec helper.

{% codeblock spec/spec_helper.rb lang:ruby %} require ‘rspec’ require ‘vcr’ require ‘pry’ VCR.configure do |c| c.cassette_library_dir = ‘fixtures/vcr_cassettes’ c.hook_into :fakeweb# or :fakeweb end {% endcodeblock %}

Now how do we start doing TDD? We first start with a failing test.

{% codeblock Reddit API Spec (Pass 1) - spec/lib/reddit_api_spec lang:ruby %} require ‘spec_helper’ require ‘./lib/reddit_api’

describe RedditApi do let(:reddit) { RedditApi.new(‘ProgrammerHumor’) } context “#initializing” do it “should form the correct endpoint” do reddit.url.should eq “http://reddit.com/r/ProgrammerHumor/.json?after=" end end end {% endcodeblock %}

When we create a new instance of the Reddit API we want to pass it a subreddit, and then we want to make sure it builds the URL properly.

{% codeblock Reddit API (Pass 1) - lib/reddit_api.rb lang:ruby %} require ‘json’ require ‘rest-client’

class RedditApi REDDIT_URL = “http://reddit.com/r/" attr_reader :url, :stories def initialize(subreddit) @subreddit = subreddit @after = “” @url = “#{REDDIT_URL}#{subreddit}/.json?after=#{@after}” end end {% endcodeblock %}

Next we want to make the actual HTTP request to the Reddit api and process it.

{% codeblock Reddit API Spec (Pass 2) - spec/lib/reddit_api_spec lang:ruby %} require ‘spec_helper’ require ‘./lib/reddit_api’

describe RedditApi do let(:reddit) { RedditApi.new(‘ProgrammerHumor’) } context “#initializing” do it “should form the correct endpoint” do VCR.use_cassette(‘reddit_programmer_humor’) do reddit.url.should eq “http://reddit.com/r/ProgrammerHumor/.json?after=" end end end

1context "#fetching" do
2    it "should fetch the first page of stories" do
3        VCR.use_cassette('reddit_programmer_humor') do
4            reddit.stories.count.should eq(25)
5        end
6    end
7end

end {% endcodeblock %}

We’ve now added a VCR wrapper and added an expectation that the reddit api will return a list of stories. We use VCR here to again ensure that our tests run fast. Once we make the first request, future runs will take milliseconds and will hit our VCR tape instead of the API.

Now we need to introduce three new areas: requesting, processing, and a Story object class.

{% codeblock Story - lib/story.rb lang:ruby %} Story = Struct.new(:title, :score, :comments, :url) {% endcodeblock %}

{% codeblock Reddit API (Pass 2) - lib/reddit_api.rb lang:ruby %} require ‘json’ require ‘rest-client’ require ‘./lib/story’

class RedditApi REDDIT_URL = “http://reddit.com/r/" attr_reader :url, :stories def initialize(subreddit) @subreddit = subreddit @after = “” @url = “#{REDDIT_URL}#{subreddit}/.json?after=#{@after}” request process_request end

 1def request
 2    @request_response = JSON.parse(RestClient.get(@url))
 3end
 4
 5def process_request
 6    @stories = []
 7    @request_response['data']['children'].each do |red|
 8        d = red['data']
 9        @stories << Story.new(d['title'], d['score'],
10                              d['num_comments'], d['url'])
11    end
12    @after = @request_response['data']['after']
13end

end {% endcodeblock %}

What can we do now? The API lets us make a full request and get a list of Story struct objects back. We’ll be using this array of structs later on to build the CLi.

The only thing left for this simple CLI a way to get to the next page. Let’s add our failing spec:

{% codeblock Reddit API Spec (Pass 3) - spec/lib/reddit_api_spec lang:ruby %} require ‘spec_helper’ require ‘./lib/reddit_api’

describe RedditApi do let(:reddit) { RedditApi.new(‘ProgrammerHumor’) } context “#initializing” do it “should form the correct endpoint” do VCR.use_cassette(‘reddit_programmer_humor’) do reddit.url.should eq “http://reddit.com/r/ProgrammerHumor/.json?after=" end end end

 1context "#fetching" do
 2    it "should fetch the first page of stories" do
 3        VCR.use_cassette('reddit_programmer_humor') do
 4            reddit.stories.count.should eq(25)
 5        end
 6    end
 7
 8    it "should fetch the second page of stories" do
 9        VCR.use_cassette('reddit_programmer_humor_p2') do
10            reddit.next.stories.count.should eq(25)
11        end
12    end
13end

end {% endcodeblock %}

And let’s make the test pass:

{% codeblock Reddit API (Pass 3) - lib/reddit_api.rb lang:ruby %} require ‘json’ require ‘rest-client’ require ‘./lib/story’

class RedditApi REDDIT_URL = “http://reddit.com/r/" attr_reader :url, :stories def initialize(subreddit) @subreddit = subreddit @after = “” @url = “#{REDDIT_URL}#{subreddit}/.json?after=#{@after}” request process_request end

 1def next
 2    @url = "#{REDDIT_URL}#{@subreddit}/.json?after=#{@after}"
 3    request
 4    process_request
 5    self
 6end
 7
 8def request
 9    @request_response = JSON.parse(RestClient.get(@url))
10end
11
12def process_request
13    @stories = []
14    @request_response['data']['children'].each do |red|
15        d = red['data']
16        @stories << Story.new(d['title'], d['score'],
17                              d['num_comments'], d['url'])
18    end
19    @after = @request_response['data']['after']
20end

end {% endcodeblock %}

We also allow method chaining since we return self after calling next (so you could chain next’s for instance).

Another important principal to keep in mind is the “Tell, Dont Ask” rule. Without tests, we might have gone this route:

{% codeblock bad_example.rb lang:ruby %} @reddit = Reddit.new(‘ProgrammerHumor’)

User presses next

@reddit.url = “http://reddit.com/r/ProgrammerHumor/.json?after=sometoken" {% endcodeblock %}

Not only would we not be telling the object what we want, we would be modifying the internal state of an object as well. By implementing a next method we abstract the idea of a URL and any tokens we may need to keep track of away from the consumer. Doing TDD adds a little extra step of “Thinking” more about what we want our interfaces to be. What’s easier? Calling next or modifying the internal state?

I’m kind of cheating a bit here. I found a nice “table” gem that outputs what you send in as a formatted table (think MySQL console output). Let’s just make sure everything is being sent around properly and STDOUT is printing the correct contents:

{% codeblock Reddit CLI Spec (Pass 1) - spec/lib/reddit-cli.rb lang:ruby %} require ‘spec_helper’ require ‘stringio’ require ‘./lib/reddit-cli’

describe RedditCli do let(:subreddit) { “ProgrammerHumor” } context “#initializing” do before(:all) do $stdout = @fakeout = StringIO.new end

 1    it "should print out a story" do
 2        api_response = double(RedditApi)
 3        api_response.stub!(:stories =>
 4                           [Story.new("StoryTitle", "Score",
 5                                      "Comments", "URL")])
 6        $stdin.should_receive(:gets).and_return("q")
 7        cli = RedditCli.new(api_response)
 8        $stdout = STDOUT
 9        @fakeout.string.include?('StoryTitle').should be_true
10    end
11end

end {% endcodeblock %}

We’re doing several things here. First we’re taking $stdout and putting it (temporarily) into a instance variable so we can see what gets outputted. Next we’re mocking out the RedditApi since we dont actually need to hit that class or the VCR tapes, we just need to stub out the expected results (stories) and pass the response object along to the CLI class. And finally once we’re finished we set $stdout back to the proper constant.

And the class for output:

{% codeblock Reddit CLI (Pass 1) - lib/reddit-cli.rb lang:ruby %} require ‘./lib/reddit_api’ require ‘terminal-table’ class RedditCli def initialize(api) @rows = [] @api = api @stories = api.stories print_stories print “\nType ? for help\n” prompt end

 1def print_stories
 2    @stories.each_with_index {|x, i| @rows << [i, x.score, x.comments, x.title[0..79] ] }
 3    puts Terminal::Table.new :headings=> ['#', 'Score', 'Comments', 'Title'], :rows => @rows
 4end
 5
 6def prompt
 7    print "\n?> "
 8    input = STDIN.gets.chomp
 9    case input
10    when "?"
11        p "Type the # of a story to open it in your browser"
12        p "Type n to go to the next page"
13        prompt
14    when "quit", "q"
15    when "n"
16        @rows = []
17        @stories = @api.next.stories
18        print_stories
19        prompt
20    else
21        print "#=> Oepning: #{@stories[input.to_i].url}"
22        `open #{@stories[input.to_i].url}`
23        prompt
24    end
25end

end {% endcodeblock %}

And finally, a little wrapper in the root directory:

{% codeblock Wrapper - reddit-cli.rb lang:ruby %} require ‘./lib/reddit_api’ require ‘./lib/reddit-cli’

subreddit = ARGV[0] RedditCli.new(RedditApi.new(subreddit)) {% endcodeblock %}

An Important Note

When working with external resources, whether it be a gem or a remote API, it’s important to wrap those endpoints in your own abstraction. For instance, with our Reddit CLI we could have avoided those first 2 classes entirely, written everything in the CLI display class, and worked with the raw JSON. But what happens when Reddit changes their API? If this CLI class was huge or incoporated many other components, this could be quite a big code change. Instead, what we wrote encapsulates the API inside a RedditApi class that returns a generic Story struct we can work with and pass around. We don’t care if the API changes in the CLI, or in any other code. If the API changes, we only have to update the one API class to mold the new API to the output we were already generating.

End Result & Source Code

View Source on Github

I was working on my blog and moving some posts around when I kept getting a Psych::SyntaxError when generating it with Jekyll and ruby 1.9.x. Unfortunately the default stack trace doesn’t provide much information on what file was causing the issue, so a quick way to find out is opening up irb:

{% codeblock Example to run in irb - sample.rb lang:ruby %} require ‘yaml’ Dir.foreach(“source/_posts”).each {|f| YAML.load_file(“source/_posts/” + f) unless f == “.” || f == “..” } {% endcodeblock %}

Moved domains to my name

Aug 19, 2012 - 1 minutes

Moved everything over to my other domain, joshrendek.com incase you’re wondering why you got redirected.

Never Set Instance Variables Again

Aug 16, 2012 - 1 minutes

Tired of doing this on every method in ruby? {% codeblock lang:ruby %} class Person def initialize(name) @name = name end end {% endcodeblock %}

Use the awesome power of ruby and metaprogramming to auto set method paramters to instance variables:

{% codeblock lang:ruby %} class Person def initialize(name) method(method).parameters.collect {|x| instance_variable_set(“@#{x[1]}“, eval(x[1].to_s)) } end end {% endcodeblock %}

Now you can access your parameters being passed in as instance variables for an object. You can extract this out into a method to apply to all objects or just make a simple extension to include it in files that you wanted to use it in. While this is a trivial example, for methods with longer signatures this becomes a more appealing approach. I’ll probably extract this out into a gem and post it here later.

Let’s start out by logging into our machine and installing some pre-requistes (these can also be found by running rvm requirements as well):

1sudo apt-get -y install build-essential openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison subversion git-core mysql-client libmysqlclient-dev libsasl2-dev libsasl2-dev mysql-server

Lets also install nodejs:

1curl -O http://nodejs.org/dist/v0.8.4/node-v0.8.4.tar.gz
2tar xzvf node-v0.8.4.tar.gz
3cd node-v0.8.4.tar.gz
4./configure && make && sudo make install

Now we can install ruby and RVM:

1curl -L https://get.rvm.io | bash -s stable --ruby
2source /home/ubuntu/.rvm/scripts/rvm
3rvm use 1.9.3 --default
4echo 'rvm_trust_rvmrcs_flag=1' > ~/.rvmrc
5# sudo su before this
6echo 'RAILS_ENV=production' >> /etc/environment
7rvm gemset create tester

And lastly nginx:

1sudo apt-get install nginx

Now let’s make a simple rails application back on our development machine with 1 simple root action:

 1rails new tester -d=mysql
 2echo 'rvm use [email protected] --create' > tester/.rvmrc
 3cd tester
 4bundle install
 5rails g controller homepage index
 6rm -rf public/index.html
 7# Open up config/routes.rb and modify the root to to point to homepage#index
 8rake db:create
 9git init .
10git remote add origin https://github.com/bluescripts/tester.git # replace this with your git repo
11git add .; git ci -a -m 'first'; git push -u origin master
12rails s

Open your browser and go to http://localhost:3000 – all good! Now lets make some modifications to our Gemfile:

 1source 'https://rubygems.org'
 2gem 'rails', '3.2.6'
 3gem 'mysql2'
 4group :assets do
 5  gem 'sass-rails',   '~> 3.2.3'
 6  gem 'coffee-rails', '~> 3.2.1'
 7  gem 'uglifier', '>= 1.0.3'
 8end
 9gem 'jquery-rails'
10gem 'capistrano', :group => :development
11gem 'unicorn'

and re-bundle:

1 bundle 

Now lets start prepping for deployment and compile our assets.

1capify .
2rake assets:precompile # dont forget to add it to git!

Make a file called config/unicorn.rb:

 1# config/unicorn.rb
 2# Set environment to development unless something else is specified
 3env = ENV["RAILS_ENV"] || "development"
 4
 5site = 'tester'
 6deploy_user = 'ubuntu'
 7
 8# See http://unicorn.bogomips.org/Unicorn/Configurator.html for complete
 9# documentation.
10worker_processes 4
11
12# listen on both a Unix domain socket and a TCP port,
13# we use a shorter backlog for quicker failover when busy
14listen "/tmp/#{site}.socket", :backlog => 64
15
16# Preload our app for more speed
17preload_app true
18
19# nuke workers after 30 seconds instead of 60 seconds (the default)
20timeout 30
21
22pid "/tmp/unicorn.#{site}.pid"
23
24# Production specific settings
25if env == "production"
26  # Help ensure your application will always spawn in the symlinked
27  # "current" directory that Capistrano sets up.
28  working_directory "/home/#{deploy_user}/apps/#{site}/current"
29
30  # feel free to point this anywhere accessible on the filesystem
31  shared_path = "/home/#{deploy_user}/apps/#{site}/shared"
32
33  stderr_path "#{shared_path}/log/unicorn.stderr.log"
34  stdout_path "#{shared_path}/log/unicorn.stdout.log"
35end
36
37before_fork do |server, worker|
38  # the following is highly recomended for Rails + "preload_app true"
39  # as there's no need for the master process to hold a connection
40  if defined?(ActiveRecord::Base)
41    ActiveRecord::Base.connection.disconnect!
42  end
43
44  # Before forking, kill the master process that belongs to the .oldbin PID.
45  # This enables 0 downtime deploys.
46  old_pid = "/tmp/unicorn.#{site}.pid.oldbin"
47  if File.exists?(old_pid) && server.pid != old_pid
48    begin
49      Process.kill("QUIT", File.read(old_pid).to_i)
50    rescue Errno::ENOENT, Errno::ESRCH
51      # someone else did our job for us
52    end
53  end
54end
55
56after_fork do |server, worker|
57  # the following is *required* for Rails + "preload_app true",
58  if defined?(ActiveRecord::Base)
59    ActiveRecord::Base.establish_connection
60  end
61
62  # if preload_app is true, then you may also want to check and
63  # restart any other shared sockets/descriptors such as Memcached,
64  # and Redis.  TokyoCabinet file handles are safe to reuse
65  # between any number of forked children (assuming your kernel
66  # correctly implements pread()/pwrite() system calls)
67end
_

Now lets setup the config/deploy.rb to be more unicorn and git friendly, take note of the default environment settings which are taken from the server when running rvm info modified version of ariejan.net’s:

  1require "bundler/capistrano"
  2
  3set :scm,             :git
  4set :repository,      "[email protected]:bluescripts/tester.git"
  5set :branch,          "origin/master"
  6set :migrate_target,  :current
  7set :ssh_options,     { :forward_agent => true }
  8set :rails_env,       "production"
  9set :deploy_to,       "/home/ubuntu/apps/tester"
 10set :normalize_asset_timestamps, false
 11
 12set :user,            "ubuntu"
 13set :group,           "ubuntu"
 14set :use_sudo,        false
 15
 16role :web,    "192.168.5.113"
 17role :db,     "192.168.5.113", :primary => true
 18
 19set(:latest_release)  { fetch(:current_path) }
 20set(:release_path)    { fetch(:current_path) }
 21set(:current_release) { fetch(:current_path) }
 22
 23set(:current_revision)  { capture("cd #{current_path}; git rev-parse --short HEAD").strip }
 24set(:latest_revision)   { capture("cd #{current_path}; git rev-parse --short HEAD").strip }
 25set(:previous_revision) { capture("cd #{current_path}; git rev-parse --short [email protected]{1}").strip }
 26
 27default_environment["RAILS_ENV"] = 'production'
 28
 29default_environment["PATH"]         = "/home/ubuntu/.rvm/gems/ruby-1.9.3-p194/bin:/home/ubuntu/.rvm/gems/[email protected]/bin:/home/ubuntu/.rvm/rubies/ruby-1.9.3-p194/bin:/home/ubuntu/.rvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games"
 30default_environment["GEM_HOME"]     = "/home/ubuntu/.rvm/gems/ruby-1.9.3-p194"
 31default_environment["GEM_PATH"]     = "/home/ubuntu/.rvm/gems/ruby-1.9.3-p194:/home/ubuntu/.rvm/gems/[email protected]"
 32default_environment["RUBY_VERSION"] = "ruby-1.9.3-p194"
 33
 34default_run_options[:shell] = 'bash'
 35
 36namespace :deploy do
 37  desc "Deploy your application"
 38  task :default do
 39    update
 40    restart
 41  end
 42
 43  desc "Setup your git-based deployment app"
 44  task :setup, :except => { :no_release => true } do
 45    dirs = [deploy_to, shared_path]
 46    dirs += shared_children.map { |d| File.join(shared_path, d) }
 47    run "#{try_sudo} mkdir -p #{dirs.join(' ')} && #{try_sudo} chmod g+w #{dirs.join(' ')}"
 48    run "git clone #{repository} #{current_path}"
 49  end
 50
 51  task :cold do
 52    update
 53    migrate
 54  end
 55
 56  task :update do
 57    transaction do
 58      update_code
 59    end
 60  end
 61
 62  desc "Update the deployed code."
 63  task :update_code, :except => { :no_release => true } do
 64    run "cd #{current_path}; git fetch origin; git reset --hard #{branch}"
 65    finalize_update
 66  end
 67
 68  desc "Update the database (overwritten to avoid symlink)"
 69  task :migrations do
 70    transaction do
 71      update_code
 72    end
 73    migrate
 74    restart
 75  end
 76
 77  task :finalize_update, :except => { :no_release => true } do
 78    run "chmod -R g+w #{latest_release}" if fetch(:group_writable, true)
 79
 80    # mkdir -p is making sure that the directories are there for some SCM's that don't
 81    # save empty folders
 82    run <<-CMD
 83      rm -rf #{latest_release}/log #{latest_release}/public/system #{latest_release}/tmp/pids &&
 84      mkdir -p #{latest_release}/public &&
 85      mkdir -p #{latest_release}/tmp &&
 86      ln -s #{shared_path}/log #{latest_release}/log &&
 87      ln -s #{shared_path}/system #{latest_release}/public/system &&
 88      ln -s #{shared_path}/pids #{latest_release}/tmp/pids &&
 89      ln -sf #{shared_path}/database.yml #{latest_release}/config/database.yml
 90    CMD
 91
 92    if fetch(:normalize_asset_timestamps, true)
 93      stamp = Time.now.utc.strftime("%Y%m%d%H%M.%S")
 94      asset_paths = fetch(:public_children, %w(images stylesheets javascripts)).map { |p| "#{latest_release}/public/#{p}" }.join(" ")
 95      run "find #{asset_paths} -exec touch -t #{stamp} {} ';'; true", :env => { "TZ" => "UTC" }
 96    end
 97  end
 98
 99  desc "Zero-downtime restart of Unicorn"
100  task :restart, :except => { :no_release => true } do
101    run "kill -s USR2 `cat /tmp/unicorn.tester.pid`"
102  end
103
104  desc "Start unicorn"
105  task :start, :except => { :no_release => true } do
106    run "cd #{current_path} ; bundle exec unicorn_rails -c config/unicorn.rb -D"
107  end
108
109  desc "Stop unicorn"
110  task :stop, :except => { :no_release => true } do
111    run "kill -s QUIT `cat /tmp/unicorn.tester.pid`"
112  end
113
114  namespace :rollback do
115    desc "Moves the repo back to the previous version of HEAD"
116    task :repo, :except => { :no_release => true } do
117      set :branch, "[email protected]{1}"
118      deploy.default
119    end
120
121    desc "Rewrite reflog so [email protected]{1} will continue to point to at the next previous release."
122    task :cleanup, :except => { :no_release => true } do
123      run "cd #{current_path}; git reflog delete --rewrite [email protected]{1}; git reflog delete --rewrite [email protected]{1}"
124    end
125
126    desc "Rolls back to the previously deployed version."
127    task :default do
128      rollback.repo
129      rollback.cleanup
130    end
131  end
132end
133
134def run_rake(cmd)
135  run "cd #{current_path}; #{rake} #{cmd}"
136end

Now lets try deploying (you may need to login to the server if this is the first time you’ve cloned from git to accept the SSH handshake):

1cap deploy:setup

Create your database config file in shared/database.yml:

1production:
2  adapter: mysql2
3  encoding: utf8
4  reconnect: false
5  database: tester_production
6  pool: 5
7  username: root
8  password:
_

Go into current and create the database if you haven’t already:

1rake db:create
2# cd down a level
3cd ../
4mkdir -p shared/pids

Now we can run the cold deploy:

1cap deploy:cold
2cap deploy:start

Now we can configure nginx:

Open up /etc/nginx/sites-enabled/default:

 1upstream tester {
 2	server unix:/tmp/tester.socket fail_timeout=0;
 3}
 4server {
 5	listen 80 default;
 6 	root /home/ubuntu/apps/tester/current/public;
 7	location / {
 8		proxy_pass  http://tester;
 9		proxy_redirect     off;
10
11		proxy_set_header   Host             $host;
12		proxy_set_header   X-Real-IP        $remote_addr;
13		proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
14
15		client_max_body_size       10m;
16		client_body_buffer_size    128k;
17
18		proxy_connect_timeout      90;
19		proxy_send_timeout         90;
20		proxy_read_timeout         90;
21
22		proxy_buffer_size          4k;
23		proxy_buffers              4 32k;
24		proxy_busy_buffers_size    64k;
25		proxy_temp_file_write_size 64k;
26	}
27
28	location ~ ^/(images|javascripts|stylesheets|system|assets)/  {
29		root /home/deployer/apps/my_site/current/public;
30		expires max;
31		break;
32    }
33}

Now restart nginx and visit http://192.168.5.113/ ( replace with your server hostname/IP ). You should be all set!

The problem with Pingdom

Jul 22, 2012 - 3 minutes

{% codeblock Time to be Awesome - awesome.rb %} puts “Awesome!” unless lame {% endcodeblock %}

The problem with pingdom.

My money’s in that office, right? If she start giving me some bullshit about it ain’t there, and we got to go someplace else and get it, I’m gonna shoot you in the head then and there. Then I’m gonna shoot that bitch in the kneecaps, find out where my goddamn money is. She gonna tell me too. Hey, look at me when I’m talking to you, motherfucker. You listen: we go in there, and that nigga Winston or anybody else is in there, you the first motherfucker to get shot. You understand?

Blockquote is what goes inside this block here would you believe that bullshit?

Well, the way they make shows is, they make one show. That show’s called a pilot. Then they show that show to the people who make shows, and on the strength of that one show they decide if they’re going to make more shows. Some pilots get picked and become television programs. Some don’t, become nothing. She starred in one of the ones that became nothing.

The path of the righteous man is beset on all sides by the iniquities of the selfish and the tyranny of evil men. Blessed is he who, in the name of charity and good will, shepherds the weak through the valley of darkness, for he is truly his brother’s keeper and the finder of lost children. And I will strike down upon thee with great vengeance and furious anger those who would attempt to poison and destroy My brothers. And you will know My name is the Lord when I lay My vengeance upon thee.

Your bones don’t break, mine do. That’s clear. Your cells react to bacteria and viruses differently than mine. You don’t get sick, I do. That’s also clear. But for some reason, you and I react the exact same way to water. We swallow it too fast, we choke. We get some in our lungs, we drown. However unreal it may seem, we are connected, you and I. We’re on the same curve, just on opposite ends.

Do you see any Teletubbies in here? Do you see a slender plastic tag clipped to my shirt with my name printed on it? Do you see a little Asian child with a blank expression on his face sitting outside on a mechanical helicopter that shakes when you put quarters in it? No? Well, that’s what you see at a toy store. And you must think you’re in a toy store, because you’re here shopping for an infant named Jeb.

I had a query that, after adding indexes, was taking anywhere from 1.5 to 5ms to return on my local machine. In production and staging environments it was taking 500+ms to return.

The query was producing different optimizer paths:

The good optimizer:

 1*************************** 2. row ***************************
 2           id: 1
 3  select_type: SIMPLE
 4        table: activities
 5         type: ref
 6possible_keys: index_activities_on_is_archived,index_activities_on_equipment_id,index_activities_on_date_completed,index_activities_on_shop_id
 7          key: index_activities_on_shop_id
 8      key_len: 5
 9          ref: const
10         rows: 1127
11     filtered: 100.00
12        Extra: Using where

The bad optimizer:

 1*************************** 2. row ***************************
 2           id: 1
 3  select_type: SIMPLE
 4        table: activities
 5         type: index_merge
 6possible_keys: index_activities_on_is_archived,index_activities_on_equipment_id,index_activities_on_date_completed,index_activities_on_shop_id
 7          key: index_activities_on_shop_id,index_activities_on_is_archived
 8      key_len: 5,2
 9          ref: NULL
10         rows: 1060
11        Extra: Using intersect(index_activities_on_shop_id,index_activities_on_is_archived); Using where

My first thought was it might have been the MySQL versions since I was running 5.5 locally and 5.0 in production, but that turned out not to be the case.

Next was to make sure my database was an exact replica of the one in production. After ensuring this I still ended up with the same results from the optimizer.

My last guess was server configuration. The issue ended up being query-cacheing being turned off in production and staging but not on my local machine. Turning this on, restarted mysqld, and re-running the query produced the good optmizer results on both my local machine and production.