Rake Tasks 102
This is a follow up post to Rake Tasks 101. In the 101 post we created Rake tasks, setup dependencies and made our tasks reusable by passing in parameters. In Rake Tasks 102 we’ll be building on those practices, interfacing with a Rails environment and leveraging the cron to automate our Rake task.
Our tasks will search Twitter for any mentions of “daneharrigan” and add
the most recent to our Tweet model. In this article I’m making a few
assumptions. You’re on a Linux/Unix-based machine. You have John
Nunemaker’s Twitter gem installed in
your Rails project. Lastly, your Tweet
model was built with one of the
two commands:
# rails 2
$ script/generate model Tweet username:string message:string tweeted_at:datetime
# rails 3
$ rails generate model Tweet username:string message:string tweeted_at:datetime
Task Setup
Let’s create our Rake file as Rails.root/libs/tasks/twitter.rake
and
get started. First we’ll make a reusable task called :search
in the
:twitter
namespace. This task will search Twitter for whatever
parameter we pass it. Next, we’ll make a task called :daneharrigan
.
This task will live in a :search
namespace, nested in the :twitter
namespace. You’ll notice that we’re nesting namespaces because this
wasn’t covered in the 101 post.
namespace :twitter do
desc 'Search Twitter for the parameter you pass in'
task :search, :query do |cmd, args|
# some very impressive search code...
Rake::Task['twitter:search'].reenable
end
namespace :search do
desc 'Search Twitter for "@daneharrigan" and save it in the database'
task :daneharrigan => :search do
# save results from :search and be happy
end
end
end
Instance Variables in Tasks
Why would we ever want to use instance variables in a Rake task? The
same reason you use an instance variable in a Ruby class. You want to
make certain data available to multiple areas of your code. Instance
variables in Rake tasks are no different, but that instance variable
will be available to any other task run at that time. For example, if we
set @name
equal to “Dane” in the :search
task, we can do puts
@name
in the :daneharrigan
task and see the @name
output when
running rake twitter:search:daneharrigan
. That makes things really
easy, but you run the risk of overwriting instance variables from other
higher level tasks.
I took a look through the Rake tasks that come with Rails 3 and I didn’t
see anything that we could conflict with. I’ll digress for just a moment
and say the “rails:update” task does set the @app_generator
instance
variable so that is a potential conflict, but I can’t think of a
scenario where you’d need to set “rails:update” as a dependency to any
new task. Please share your scenario if you have one!
Instance variables look safe enough, but I think we could do better. How about storing our data in an object?. This sounds a lot safer than using instance variables.
Objects in Tasks
You can create your class file in Rails.root/lib
or
Rails.root/app/model
. Either location will yield identical results for
what we’re doing. As your code changes pick whichever location makes the
most sense to you. The following is a our object that will store our
data between Rake tasks:
class TwitterStore
def self.search(query)
@results = Twitter::Search.new(query)
end
def self.latest_result
@results.first
end
end
The TwitterStore
object has only a search method and a results method.
You can certainly get more fancy at this step or even use an
ActiveRecord
model instead so feel free to use your creative license.
We have our object and we have our task, but at this point Rake is
unaware of any object or model in Rails — that includes our nifty
TwitterStore
. Rails comes with a handy :environment
task that sets
up this awareness. We just need to set :environment
as a task
dependency or invoke it within the task. For us, we’ll be choosing the
latter.
namespace :twitter do
desc 'Search Twitter for the parameter you pass in'
task :search, :query do |cmd, args|
Rake::Task[:environment].invoke
# Rake is now aware of our Rails environment!
TwitterStore.search args[:query]
end
# ...
end
Putting the Pieces Together
You can see we’re putting the pieces together now. We setup the Rails
environment within our task, called TwitterStore.search
and passed
args[:query]
to the search method. Now for the :daneharrigan
task.
task :daneharrigan do
Rake::Task[:search].invoke('daneharrigan')
result = TwitterStore.latest_result
params = {
:username => result[:from_user],
:message => result[:text],
:tweeted_at => result[:created_at].to_datetime
}
Tweet.find_or_create_by_username_and_message_and_tweeted_at(params)
end
In the :search
task we called TwitterStore.search
which makes the
response available to the :daneharrigan
task through the
latest_result
method. I decided to use the find_or_create_by
method
because it’s easy enough to show that we don’t store the same tweet more
than once.
Our tasks are complete, let’s give it a try, rake twitter:search:daneharrigan
, then check your Tweet
model to see what
data is populated.
Cron Jobs
if you aren’t familiar with the cron or a cron job I recommend reading over Wikipedia’s page on it.
Time to setup our cron job! Before we can start we need to know what the cron job does, where does the cron have to be on the system to run properly and how often does it run. After we answer those questions we put them all together.
When you’re answering “what cron job does” make sure to always use full
paths to your executable files. The cron doesn’t have the $PATH
environmental variable set so it needs to know exactly where files live.
For example, write /usr/bin/rake twitter:search:daneharrigan
opposed
to rake twitter:search:daneharrigan
.
Now “where does the cron have to be in the system to run properly?” We
know we want to run our Rake task, but that can’t be run from just
anywhere. It needs to run from within our Rails project directory. Let’s
say /home/dane/twitter_store
.
Finally, “how often does it run?” How about every 5 minutes? That’s
reflected in the cron as */5 * * * *
.
We’ve answered all 3 questions so let’s put them together.
*/5 * * * * cd /home/dane/twitter_store && /usr/bin/rake
twitter:search:daneharrigan
You know how a cron entry should look, but how do you actually add an
entry to the cron? Run crontab -e
from the command-line. This will
launch the system’s default editor or whatever you have set in
$EDITOR
. Fill out your entry there, save it and you’re set!
And We’re Done
We created our Rake tasks, made them aware of the Rails environment, passed data between tasks through a storage class and added an entry to our cron to run every 5 minutes. We’re done! I hope this post gave you additional understanding to enhance your own tasks. Please do comment if there are questions or other areas of Rake you’d like to know about.
I’d like to thank Gokul Janga and Stuart Ellis for suggesting these topics from the Rake Tasks 101 comments, thanks guys!