MongoDB Query Performance Gotchas

At my current job, we use mongodb as our main database and as you can imagine we query our database a lot. The followings are what I found while investigating some slow performance. Hopefully it will prevent you from making the same mistake or pinpoint the source of problem quickly. We are using Rails 4 with Mongoid.

1. Don’t use regular expression with

/i

in queries.

One of the offending query looked pretty innocent. Customer is embedded document with email attribute and email field is indexed.


def self.with_email(email)
  self.where("customer.email" => /\Aemail\z/i).count
end

When I ran

Benchmark.bmbm

where the method was repeated 100 times, it returned the following result.


       user     system      total        real
   0.090000   0.010000   0.100000 ( 27.656723)

Compare that with when you call the same without regexp.


def self.with_email(email)
  self.where("customer.email" => email).count
end

Performance result is day and night.


       user     system      total        real
   0.080000   0.000000   0.080000 (  0.122888)

So, why the difference? I couldn’t quite figure it out myself. I knew that Mongodb uses B-Tree for index and index for email was already created. What about the regular expression was making it slow? I thought perhaps the anchor tags (

\A

and

\z

) were causing it, but that wasn’t it. I tried using substring of email and anchoring the beginning of email string (using

^

), but that didn’t yield difference either.

I finally turned to Stack Overflow for assistance, and within a few minutes, someone provided exact answer I was looking for. They key to the answer was in output of

explain

method.

Look at the

nscanned

and

indexBounds

in the output of explain method for the first method. It scanned all index because the upper and lower bounds are not defined (

""

and

{}

).


{
                     "cursor" => "BtreeCursor customer.email_1",
                 "isMultiKey" => false,
                          "n" => 781,
            "nscannedObjects" => 781,
                   "nscanned" => 500000,
    "nscannedObjectsAllPlans" => 781,
           "nscannedAllPlans" => 500000,
               "scanAndOrder" => false,
                  "indexOnly" => false,
                    "nYields" => 1397,
                "nChunkSkips" => 0,
                     "millis" => 406,
            "indexBounds" => {
    "customer.email" => [
        [0] [
            [0] "",
            [1] {}
        ],
        [1] [
            [0] /test/i,
            [1] /test/i
        ]
    ]
  }
}

Compare that to the output of explain method for the second method. Exactly the same upper and lower bounds, and it only scanned extremely small number of objects. This explains the big performance difference between the two methods.


{
                     "cursor" => "BtreeCursor customer.email_1",
                 "isMultiKey" => false,
                          "n" => 230,
            "nscannedObjects" => 230,
                   "nscanned" => 230,
    "nscannedObjectsAllPlans" => 230,
           "nscannedAllPlans" => 230,
               "scanAndOrder" => false,
                  "indexOnly" => false,
                    "nYields" => 1,
                "nChunkSkips" => 0,
                     "millis" => 0,
            "indexBounds" => {
    "customer.email" => [
        [0] [
            [0] "test@email.com",
            [1] "test@email.com"
        ]
    ]
  }
}

2. Don’t convert collection to array

Another slow performance was due to the following line of code. ProductA collection had about 20 documents and ProductB collection had about 2 documents. Another seemingly innocent code, but why?


products = ProductA.all + ProductB.all

Back to measuring performance with Benchmark (this time I just did it with Benchmark.bm).


Benchmark.bm do |x|
  x.report {n.times {|n| products = ProductA.all + ProductB.all}}
end
       user     system      total        real
  99.810000   2.630000 102.440000 (114.282440)

Let’s confirm the performance of retrieval.


Benchmark.bm do |x|
  x.report {n.times {|n| ProductA.all.count}}
end
       user     system      total        real
   0.430000   0.000000   0.430000 (  0.437339)

Benchmark.bm do |x|
  x.report {n.times {|n| ProductB.all.count}}
end
       user     system      total        real
   0.440000   0.000000   0.440000 (  0.456449)

That’s pretty reasonable performance. So, it must be something in the addition of two collections. When you add two collections using Mongoid, it returns an array, so it might be something related to converting to an array. Let’s check the performance on that.


Benchmark.bm do |x|
  x.report {n.times {|n| ProductA.all.to_a}}
end
       user     system      total        real
 110.070000   1.640000 111.710000 (117.081572)

Ah-ha! Looks like we found the problem. Converting to an array is quite slow. In order to use single collection, without converting to an array, I created a parent class – ProductParent, and ProductA and ProductB would inherit from that class. In such way, one collection is used and subclasses are distinguished by just

type

field. Actually, it should have been parent/subclass structure since ProductA and ProductB are kinds of a Product, and both share many of the same attributes.

After the re-arrangement, the following is the performance.


Benchmark.bm do |x|
  x.report {n.times {|n| ProductParent.all.count}}
end
       user     system      total        real
   0.480000   0.010000   0.490000 (  0.490489)

What a difference it made!

Finding these two performance gotchas have been a fun journey, and I hope they are useful to you. Do you have any gotchas you have found? Let me know, and I will post a link here!

User-friendly 500 and 404 pages on Rails 3

This came in handy for me, so I wanted to share this particular way of handling 404 and 500 errors.

First, rescue errors and tell what to do in application_controller.rb.

  if Rails.env.production?
    unless Rails.application.config.consider_all_requests_local
      rescue_from Exception, with: :render_500
      rescue_from ActionController::RoutingError, with: :render_404
      rescue_from ActionController::UnknownController, with: :render_404
      rescue_from ActionController::UnknownAction, with: :render_404
      rescue_from ActiveRecord::RecordNotFound, with: :render_404
    end
  end

In the same application_controller.rb, then you create the methods specified above.

  def render_404(exception)
    @not_found_path = exception.message
    respond_to do |format|
      format.html { render template: 'errors/not_found', layout: 'layouts/application', status: 404 }
      format.all { render nothing: true, status: 404 }
    end
  end

  def render_500(exception)
    logger.info exception.backtrace.join("\n")
    respond_to do |format|
      format.html { render template: 'errors/internal_server_error', layout: 'layouts/application', status: 500 }
      format.all { render nothing: true, status: 500}
    end
  end

Since it’s using the application layout, you will see the error message you specify in the layout. The following is my 500 error page, /views/errors/internal_server_error.html.haml. For 404 page, you can also use @not_found_path instance variable in the view as well.

#errors-page
  .thumbnail.errors
    .errors-500
    .caption
      %h5 Sorry
  %h1 500 Internal Server Error

  - # Do not remove this line. It is used for development
  - # purposes. When an error is found. In development, it
  - # shows the logs, in production it is nil.
  .error-msg
    = @log

Cheers,

Simple Ruby on Rails app using Twilio API

Twilio created cloud telephony services made mainly for developers. I’ve been meaning to figure out how to use their service, and I had a chance to develop a Ruby on Rails app using Twilio app called Phonein. You can check out the source code here. It’s nothing fancy, and totally not optimized. I also used Twitter’s Bootstrap CSS/JS package.

While Twilio had a few samples for Ruby on Rails, it was kind of outdated and a few things were incorrect. I wanted to share what I did to use Twilio API to help you come up to speed quickly.

Basically my app is for home-care professionals, and it allows them to check in and out using client’s phone. For Twilio, the followings are its requirements.

  • Identify client by phone number
  • Identify home-care professional by 6-digit identification code
  • Check in the home-care professional at the client’s location
  • Read out tasks for the home-care professional
  • Check out the home-care professional from the client’s location

Twilio has its own markup language called TwiML, and in Twilio controller I created, TwiML is used to create views (or voices). I also created certain POST actions under Twilio controller that will receive input from phone (digits punched in by a person) and use it to look up information.

1. Answer incoming call

When you sign up with Twilio, you can configure destination URL Twilio will invoke when it receives an incoming call. For mine, it would be “http://phonein.herokuapp.com/twilios/incoming”. So, I have “incoming” action defined in “Twilios” controller.

Controller in app/controller/twilios_controller.rb

Just like a regular Ruby on Rails app, controller gets objects ready for the view. It looks up client using the incoming phone number. If client is not found, it says (instead of displaying) an error message and hangs up. If client is found, it asks user to enter a 6-digit code. @post_to object contains URL that will be invoked after user interacts with Twilio. In our case, it’s after user enters code.

  
class TwiliosController < ApplicationController      
  BASE_URL = "http://phonein.herokuapp.com/twilios"         

  def incoming          
    # Get client by phone number          
    client_phone = params['From'][2..params['From'].size]          
    @client = Client.find_by_phone(client_phone)               
    
    if @client.nil?               
      render :action => "no_client.xml.builder"
    else
      @post_to = BASE_URL + "/verify?client_id=#{@client.id}"
      render :action => "incoming.xml.builder", :layout => false
    end
  end
end

View in app/views/twilios/incoming.xml.builder

xml.instruct!
xml.Response do
  xml.Gather(:action => @post_to, :numDigits => 6) do
    xml.Say "Welcome to #{@client.name}'s residence. Please enter your 6 digit code."
  end
end

View in app/views/twilios/no_client.xml.builder

xml.instruct!
xml.Response do
  xml.Say "Client could not be found."
  xml.Hangup
end

2. Verifies home-care professional, checks in, and reads tasks.

Next, look up the home-care professional by the 6-digit code. The numbers user punched will be in ‘Digits’ parameter. If user is found but has not checked in before, it checks in the user and gives some options. If user has already checked in before, user is presented with a few options including an option to check out.

Controller in app/controller/twilios_controller.rb

In addition to incoming action.

class TwiliosController < ApplicationController   

  def verify     
    @client = Client.find(params[:client_id])         
    @agent = Agent.find_by_code(params['Digits'])          
    if @agent.nil?       
      # If no agent is found, say "no agent found" error message and hang up.       
      @post_to = BASE_URL + "/verify?client_id=#{@client.id}"       
      render :action => "no_agent.xml.builder"
      return
    else
      if @agent.checked_in?(@client.id) 
        @message = "You have already checked in."
      else
        @agent.check_in(@client.id)
        @message = "Now you are checked in."
      end
    end 

    # Default action is direction
    @post_to = BASE_URL + "/direction?agent_id=#{@agent.id}&client_id=#{@client.id}"
    render :action => "direction.xml.builder", :layout => false
  end
end

View in app/views/twilios/direction.xml.builder

xml.instruct!
xml.Response do
  xml.Gather(:action => @post_to, :numDigits => 1) do
    xml.Say "Welcome #{@agent.name}."
    xml.Say "#{@message}"
    xml.Say "Please press 1 to read the task. Press 2 to check out. Press 3 to hear about Yang. Or Press 4 to hang up."
  end
end

View in app/views/twilios/no_agent.xml.builder

xml.instruct!
xml.Response do
  xml.Gather(:action => @post_to, :numDigits => 6) do  
    xml.Say "Agent could not be found. Please enter your 6 digit code."
  end
end

3. Read tasks, check out or hear about Yang

In addition to incoming and verify actions.

This is where main messages are configured and played. Depending on the option user chooses, it will either 1) say the tasks again, 2) check the user out, 3) say a few things about Yang, or 4) just hang up.

class TwiliosController < ApplicationController   

  def direction         
    @client = Client.find(params[:client_id])     
    @agent = Agent.find(params[:agent_id])     
    @message = @client.task_list     
    @post_to = BASE_URL + "/direction?agent_id=#{@agent.id}&client_id=#{@client.id}"          

    # 1 to hear the tasks again, 2 to check out, 3 to hang up.     
    if params['Digits'] == '1'       
      render :action => "direction.xml.builder", :layout => false
    elsif params['Digits'] == '2'
      @agent.check_out(@client.id)
      @goodbye_message = "Thank you for your service today."
      render :action => 'goodbye.xml.builder', :layout => false
    elsif params['Digits'] == '3'
      @message = "Yang is the most awesome guy ever - both personally and professionally. He is pretty sexy, too."
      render :action => 'direction.xml.builder', :layout => false
    elsif params['Digits'] == '4'
      @goodbye_message = "Have a great day."
      render :action => 'goodbye.xml.builder', :layout => false
    end
  end
end

View in app/views/twilios/goodbye.xml.builder

xml.instruct!
xml.Response do
  xml.Say "#{@goodbye_message}"
  xml.Say "Good-bye."
  xml.Hangup
end

Have fun!

That’s it! My codes are so not refactored and optimized. I did it to see what I can do with Twilio app. I hope this gives you a tip of what you can do with Twilio API. Go bananas!

Facebook Connect with Rails (using Omniauth and Devise) [Update]

This is an update to my last post about Facebook Connect with a Rails app. At the time I was using facebooker plugin (yeah, a plugin, not a gem), which has been discontinued for the longest time, and thus wouldn’t work with current Facebook connect.

Since then, I’ve used omniauth, omniauth-facebook, and devise gems to implement Facebook connect with a few Rails app I have been toying with. So, this is kind of an update to my last post about integrating Facebook Connect with a Rails app.

1. First, you need the following gems in your Gemfile.

gem 'devise'
gem 'omniauth'
gem 'omniauth-facebook'
gem 'oauth2'

Make sure you install them by running “bundle install” command.

2. Next configure devise gem.

rails generate devise:install

3. Apply devise to a model. 99.9% of time, this would be the User model.

rails generate devise User

4. Next, generate authentication model with the following columns. Token column is extra, if you want to save an access token.

rails g model Authentication user_id:integer provider:string uid:string token:string

5. Configure omniauth by modifying config/initializers/omniauth.rb

Rails.application.config.middleware.use OmniAuth::Builder do
  # The following is for facebook
  provider :facebook, [APP ID], [SECRET KEY], {:scope => 'email, read_stream, read_friendlists, friends_likes, friends_status, offline_access'}

  # If you want to also configure for additional login services, they would be configured here.
end

6. After user authenticates with whatever provider you specify, user needs to be redirected to omniauth call, so add the following line in your routes.rb.

match '/auth/:provider/callback' => 'authentications#create'

7. Then in Authentications controller, you figure out whether to create a new user or log the user in, if the user is an existing user. For complete hash, take a look at omniauth-facebook github page.

def create
  auth = request.env["omniauth.auth"]

  # Try to find authentication first
  authentication = Authentication.find_by_provider_and_uid(auth['provider'], auth['uid'])

  if authentication
    # Authentication found, sign the user in.
    flash[:notice] = "Signed in successfully."
    sign_in_and_redirect(:user, authentication.user)
  else
    # Authentication not found, thus a new user.
    user = User.new
    user.apply_omniauth(auth)
    if user.save(:validate => false)
      flash[:notice] = "Account created and signed in successfully."
      sign_in_and_redirect(:user, user)
    else
      flash[:error] = "Error while creating a user account. Please try again."
      redirect_to root_url
    end
  end
end

8. In User model, store essential information with apply_omniauth method.

has_many :authentications, :dependent => :delete_all
def apply_omniauth(auth)
  # In previous omniauth, 'user_info' was used in place of 'raw_info'
  self.email = auth['extra']['raw_info']['email']
  # Again, saving token is optional. If you haven't created the column in authentications table, this will fail
  authentications.build(:provider => auth['provider'], :uid => auth['uid'], :token => auth['credentials']['token'])
end

9. In Authenication model,

belongs_to :user

10. In your view, user clicking on /auth/facebook/ link will be redirected to Facebook to log in.

<%= link_to 'Login with Facebook', '/auth/facebook/' %>

11. This method doesn’t do FB login in a popup. For that, you have to use FB Javascript SDK, and you can use the example here.

Fresh installation of Ruby, Rails, Git, RubyGems, and Postgresql 8.x

Most people need to do this only when they get a new system. I had to go through this to help out a potential brogrammer, and it would have been much easier if I pointed him to one place where he could just following direction. Instruction here applies to Mac OS X 10.6.x.

For Windows users, use the convenient installer done by guys at EngineYard from here (http://railsinstaller.org/) to install Ruby, Rails, etc., and installer from Postgres from here (http://www.postgresql.org/download/windows/) to install Postgresql.

1. Install Xcode

Xcode is included in the Mac OS X install disc. You can install from the DVD or you can down load it from here,
http://developer.apple.com/xcode/. If you already have xcode installed, you are good to go.

2. Install Git

Download and install git if you don’t have one already. Do the following and see if returns a version.

git --version

Best place is to go here (http://git-scm.com/), and download and install the latest stable release.

3. Install RVM and Ruby

RVM lets you manage set of gems for Ruby/Rails version combination. This is the best way to install Ruby and Rails. Follow the link below and install RVM and Ruby version 1.9.2 (which is latest, stable version as of this time).

curl -L get.rvm.io | bash -s stable --ruby

3.1. Then, update the .bash_profile (or whatever shell profile or rc file you need)

echo '[[ -s "$HOME/.rvm/scripts/rvm" ]] && . "$HOME/.rvm/scripts/rvm" # Load RVM function' >> ~/.bash_profile

3.2. Reload the shell.

source ~/.bash_profile

3.3. Test to see if installation was successful.

rvm list known

3.4. Install Ruby 1.9.2 (or 1.9.3)

rvm install 1.9.2

3.5. It’s probably a good to set 1.9.2 as default as well.

rvm use 1.9.2 --default

4. Install RubyGems if not already installed

Doing the following command in terminal and see if it returns version number.

gem -v

If it doesn’t return a version number, follow the steps below to install RubyGems.

4.1 Download the latest stable version from here (http://rubygems.org/pages/download)

4.2 Unzip and Install

tar xzvf rubygems-1.X.Y.tgz
cd rubygems-1.X.Y
sudo ruby setup.rb

5. Configure RVM gemset

Now, you have to actually make a choice whether you want to install the latest version 3.1 or the one before 3.0.x. If you just do gem install rails, it will install 3.1. I am sticking with 3.0.10 for a little longer, since that’s what I am used to. But, if you are starting new, you should just install 3.1. When you search for code examples online, make sure it’s for 3.1, since 3.1 is quite different from 3.0.x.
rvm gemset create rails31

Also set default gemset.

rvm use 1.9.2@rails31 --default

6. Install Rails

Depending on what you decided above,

gem install rails -v 3.1.0

7. Install Postgresql 8.x

I’ve used MySQL mostly before, but after Oracle’s purchase of MySQL and since I deploy most of my apps on Heroku, which only supports Postgresql, I have been using Postgresql for quite sometime. For typical Rails app, it shouldn’t matter whether you use MySQL or Postgresql. You need to be careful when you need to write raw SQL statement. Postgresql is known to be more strict with the syntax.

Latest Postgresql is 9.x, but I believe pg gem only supports up to 8.x. Better stick with safe version.  Follow the link below to download and install Postgresql 8.x. Remember the admin username and password.

http://www.enterprisedb.com/products-services-training/pgdownload

pg gem will be installed when you create a Rails app with

rails new [APP NAME] -d postgresql

and install gems with

bundle install

8. Configure Postgresql for Rails

The most you have to configure is to create a user and password for each app.

8.1 Login as admin (whatever admin username you picked when you installed the Postgresql)

psql -U admin

8.2 Create a username and password for your app.

create role [USERNAME] with createdb login password '[PASSWORD]';

You would use the username and password you just created in /config/database.yml file in your Rails app.

Full text search on Heroku

If you want to have full text search capability on Heroku, you should definitely use texticle or acts_as_tsearch. It is possible to use acts_as_farret on Heroku, but since you can only write to /tmp directory and your index file will be deleted sooner or later, you should not use it.

I first used texticle, and later switched over to acts_as_tsearch because I wasn’t happy with performance. Acts_as_tsearch is working really fine for me, and since your database on Heroku is already Postgres, there should be minimal configuration required. I even switched my local database from MySQL to Postgres to make both development and production environments consistent.

How to convert from MySQL to Postgres

I have been using MySQL for probably as long as I could remember. For Bloglation, search capability is an important feature since it’s hard to browse each post one by one. I will probably implement tagging functionality, but even so, it’s important to be able to search the contents with a keyword(s). While Ultrasphinx works well, Heroku only supports WebSolr… I was using acts_as_ferret using /tmp for index files, but the problem using the /tmp directory is that ferret index files most likely to disappear at some point.

Then, I found out that Postgres supports full-text search and since Heroku uses Postgres, I could use other plug-ins like acts_as_tsearch or texticle for free. Free is important to me, since it’s not making any money.

Searching online, there are various ways to do it like Pivotal Labs’ script or Heroku’s Taps gem, but I wanted to do it in an old way like AEdifice to check everything is going alright at each step.

1. First thing to do is to backup MySQL

For me, it was important to backup preserving encoding, since it had many different languages. First I pulled db from Heroku thinking that I Continue reading

Bloglation – Translate, Save, and Share!

Last Thursday, I released private alpha version of Bloglation, which lets a user translate any web page, save and share. It’s supposed to be private, but I need to get some good feedback from real users. If you are bi-lingual (or not) and interested in translating cool ideas, concepts and/or knowledge, please go ahead. And don’t forget to send me any comments/feedback you have.

I also wanted to maintain a separate blog just for bloglation. You can find it here.

Installing acts_as_ferret with pagination and deploying on Heroku

OMG!

This shouldn’t have been this difficult, but it has because while there are many cool tutorials are out there, they are mostly outdated, and for some reason, the instruction on Heroku was not accessible.

While I picked acts_as_ferret because Heroku supports it, many seemed to prefer Thinking Sphinx. So, if you are not constrained (like me with Heroku), you should try that out too.

1. Install acts_as_ferret

Full instruction is outlined on github, so you should check it out. You can also find the installation instruction and complete list of methods here, too.

While the instruction asks you to put version name, since Heroku only has version 0.4.3 installed, specifying a version will break it.

Continue reading