<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Chris Chandler &#187; Ruby</title>
	<atom:link href="http://chrischandler.name/category/ruby/feed/" rel="self" type="application/rss+xml" />
	<link>http://chrischandler.name</link>
	<description>Squandering time as a raving egomaniac</description>
	<lastBuildDate>Thu, 03 Jun 2010 23:16:54 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Using HBase&#8217;s Thrift interface with Ruby</title>
		<link>http://chrischandler.name/ruby/using-hbases-thrift-interface-with-ruby/</link>
		<comments>http://chrischandler.name/ruby/using-hbases-thrift-interface-with-ruby/#comments</comments>
		<pubDate>Fri, 23 Oct 2009 03:46:33 +0000</pubDate>
		<dc:creator>Chris Chandler</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[hbase]]></category>
		<category><![CDATA[thrift]]></category>

		<guid isPermaLink="false">http://chrischandler.name/?p=76</guid>
		<description><![CDATA[Using HBase's Thrift interface with Ruby the language and some basic examples.]]></description>
			<content:encoded><![CDATA[<p>In my continued fooling around with various key-value stores I&#8217;ve finally come across <a href="http://hadoop.apache.org/hbase/">HBase</a>.  Naturally, since I do my day-to-day programming in <a href="http://chrischandler.name/ruby/">Ruby</a> I wanted to setup some basic examples.  Though HBase does support a RESTful interface I thought I would get the <a href="http://incubator.apache.org/thrift/">Thrift</a> interface working for some better throughput.</p>
<p>If you need help Thrift running take a look at my post on <a href="http://chrischandler.name/ruby/using-cassandras-thrift-interface-with-ruby/">Cassandra&#8217;s thrift interface</a> that has all the prerequisites listed.</p>
<p>The example assumes a table &#8220;t1&#8243; and a column &#8220;f1&#8243;.</p>
<p><script src="http://gist.github.com/216619.js"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://chrischandler.name/ruby/using-hbases-thrift-interface-with-ruby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Cassandra&#8217;s Thrift interface with Ruby</title>
		<link>http://chrischandler.name/ruby/using-cassandras-thrift-interface-with-ruby/</link>
		<comments>http://chrischandler.name/ruby/using-cassandras-thrift-interface-with-ruby/#comments</comments>
		<pubDate>Mon, 12 Oct 2009 03:47:35 +0000</pubDate>
		<dc:creator>Chris Chandler</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[cassandra]]></category>
		<category><![CDATA[snow leopard]]></category>
		<category><![CDATA[thrift]]></category>
		<category><![CDATA[ubuntu]]></category>

		<guid isPermaLink="false">http://chrischandler.name/?p=70</guid>
		<description><![CDATA[A quick example of how to use Cassandra's Thrift interface to connect with Ruby]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve been trying to figure out how to work with <a href="incubator.apache.org/cassandra">Cassandra</a> then you&#8217;ve probably come across <a href="http://incubator.apache.org/thrift">Thrift</a>.  Thrift is a library written in the spirit of <a href="http://code.google.com/apis/protocolbuffers/">Google&#8217;s protocol buffers</a>, but developed by Facebook and then open-sourced in 2007.  The quick and short of it is that Thrift enables you to create RPC style calls in a platform-independent and XML-free way that is extremely efficient and surprisingly easy to work with once you get all the pieces working.</p>
<p><a href="http://jetfar.com">Rich Atkinson</a> already has a great blog post on how to get up and running with Thrift on <a href="http://jetfar.com/installing-cassandra-and-thrift-on-snow-leopard-a-quick-start-guide/">Snow Leopard</a>.  So if that&#8217;s what you&#8217;re running, I&#8217;m going to suggest you check it out.  If you&#8217;re running Ubuntu you&#8217;ll need to satisfy the following dependencies:</p>
<pre>sudo aptitude -q -y install libexpat1-dev libboost1.37-dev g++ autoconf automake libtool</pre>
<p>and the source can be obtained with:</p>
<pre>svn co http://svn.apache.org/repos/asf/incubator/thrift/trunk thrift</pre>
<p>and then you can proceed with the standard &#8220;configure &#038;&#038; make &#038;&#038; make install&#8221;.</p>
<p>Hopefully at this point you have the Thrift native libraries installed.  Since this is about Ruby, you should also install the Thrift gem that will take advantage of the native libraries.</p>
<pre>sudo gem install thrift</pre>
<p>Armed with both native library and gem, let&#8217;s go ahead and navigate to your Cassandra install&#8217;s interface directory (cassandra/interface) and build the ruby code:</p>
<pre>thrift --gen rb:new_style cassandra.thrift</pre>
<p>This will generate (as of this writing&#8230;) three files: gen-rb/cassandra.rb, gen-rb/cassandra_constants.rb, and gen-rb/cassandra_types.rb.  At this point you can create a temp.rb file in the gen-rb folder to play around with connections.  Here&#8217;s a short example of how to make a GET request for a specific key:</p>
<p><script src="http://gist.github.com/208103.js"></script></p>
<p>It&#8217;s worth noting that there *is* a gem available on github from <a href="http://github.com/fauna/cassandra">fauna/cassandra</a> that creates a much easier-to-work-with client, but since the interface for Cassandra is still evolving and changing the client is broken at the moment.  As far as I know this only applies to Cassandra 0.4.1 DEV and newer.  I&#8217;m very much looking forward to a working update.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrischandler.name/ruby/using-cassandras-thrift-interface-with-ruby/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Provisioning script for Ubuntu Intrepid and Ruby 1.9.1</title>
		<link>http://chrischandler.name/ruby/provisioning-script-for-ubuntu-intrepid-and-ruby-1-9-1/</link>
		<comments>http://chrischandler.name/ruby/provisioning-script-for-ubuntu-intrepid-and-ruby-1-9-1/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 01:16:28 +0000</pubDate>
		<dc:creator>Chris Chandler</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[easy]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[provisioning]]></category>
		<category><![CDATA[setup]]></category>
		<category><![CDATA[slicehost]]></category>
		<category><![CDATA[ubuntu intrepid]]></category>

		<guid isPermaLink="false">http://chrischandler.name/?p=66</guid>
		<description><![CDATA[A simple provisioning script for setting up a standard Rails stack based on Ruby 1.9.1 for Ubuntu Intrepid.]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a simple gist I use to provision either <a href="http://aws.amazon.com/ec2/">Amazon EC2</a> AMIs or <a href="http://slicehost.com">Slicehost</a> images running Ubuntu Intrepid.  It&#8217;ll setup all the requirements to build Ruby 1.9.1 from source since the official Ubuntu package isn&#8217;t due out until <a href="https://wiki.ubuntu.com/KarmicKoala">Karmic Koala</a> is released.</p>
<p>It has a handful of constants at the top of the file you need to define for everything to work right.  Of note are the application name and the machine&#8217;s FQDN it should answer on.  If your using EC2 you might have to tweak some configuration afterward since the FQDN in DNS probably won&#8217;t match the IP of the machine&#8217;s interface.</p>
<p>Also, if you plan on using authorized_keys and deploying from a git repository it makes things a lot easier if you tar and gzip the relevant files and put them in an S3 bucket to pull from.  The script handles this case as well.</p>
<p><script src="http://gist.github.com/200289.js"></script></p>
<p>As always, make sure you understand what a provisioning script does before you accept it with blind faith.  At <a href="http://flatterline.com">Flatterline</a> we use this as our base template.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrischandler.name/ruby/provisioning-script-for-ubuntu-intrepid-and-ruby-1-9-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Symmetric indices to make JOINs faster</title>
		<link>http://chrischandler.name/ruby/symmetric-indices-to-make-joins-faster/</link>
		<comments>http://chrischandler.name/ruby/symmetric-indices-to-make-joins-faster/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 08:10:03 +0000</pubDate>
		<dc:creator>Chris Chandler</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[btrees]]></category>
		<category><![CDATA[explain]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[join]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[rails]]></category>

		<guid isPermaLink="false">http://chrischandler.name/?p=57</guid>
		<description><![CDATA[Has-and-belongs-to-many relationships are quite common, but they require some special database/index love to make sure their performance stays up as data volume increases.]]></description>
			<content:encoded><![CDATA[<p>I am frequently asked how to increase the performance of Rails, and here&#8217;s a great starting point.  This advice generalizes to just about any database or platform that relies on B-Tree indices.  If your using MySQL out of the box, then this definitely applies.</p>
<p>Consider the following three models which are a very basic &#8220;has and belongs to many&#8221; setup:</p>
<p><script src="http://gist.github.com/191818.js"></script></p>
<p>So as you can see, a user can be in many groups and a group can have many users, all by way of the memberships join relationship.</p>
<p>The two example use cases I&#8217;m going to work with are:</p>
<ol>
<li>Given a user, what groups is he/she in?, and</li>
<li>Given a group, who are the members? </li>
</ol>
<p>Both of these are pretty typical, but can yield surprisingly different results from the database&#8217;s perspective.  If we try and get this data from the console we either start with a user and navigate to group, or the reverse.  Here&#8217;s the MySQL EXPLAIN output from the console ( I recommend viewing the RAW output unless I can figure out how to make github display it correctly):</p>
<p><script src="http://gist.github.com/191820.js"></script></p>
<p>Totally understanding the output of the EXPLAIN syntax is well outside the scope of this post, but we&#8217;re going to need to cover the basics.  The first thing you should notice is the word ALL in the type column and the NULL in possible_keys.  This indicates that MySQL&#8217;s query optimizer has no index to read from and will be forced to perform a table scan to return the result.  In general, this will kill your performance.  Note the rows value of 100.  This value will be whatever the size of your table is.  If you have 500,000 records, then the database will check all 500,000 rows.</p>
<p><em>It&#8217;s worth noting that for small datasets you&#8217;ll see ALL and a possible_keys value.  This means the optimizer believes that scanning the table will be faster than actually loading the index into memory.  This is generally fine.</em></p>
<p>So let&#8217;s go ahead and add a composite index on [user_id,group_id].  The SQL is:</p>
<pre>ALTER TABLE memberships ADD INDEX test_index(user_id,group_id)</pre>
<p>Now let&#8217;s repeat the previous queries.</p>
<p><script src="http://gist.github.com/191821.js"></script></p>
<p>This is where I see most people stop when it comes to performance optimization.  Note though that these tables aren&#8217;t the same!  If you join from users to groups (the first query) you see a massive speedup.  Only one row is consulted (instead of 100) and it&#8217;s in the index.  A further benefit we see in both queries is the &#8220;Using index&#8221; in the Extra column.  This means that MySQL can determine the query result without ever checking the actual table because all required info is in the index (ie no extra disk hits).  Unfortunately, joining from groups to users (second query) still (sorta) sucks.  It says index instead of ALL, but that just means it will have to scan the entire index rather than scan the entire table on disk.  This is a marginal improvement at best, so 50% of our use cases still suck.</p>
<p>Here&#8217;s the explanation: B-Tree indices are unidimensional structures.  That means that the interior nodes of the index tree are strongly ordered, and thus cannot be arbitrarily accessed out of order.  If that doesn&#8217;t make any sense, it means that joining from users to groups is not an equivalent operation to joining groups to users because of the ordering of the index elements.</p>
<p>So let&#8217;s cleanup the second use case by adding an additional index, exactly like the first, except the order of the elements is reversed.  Here&#8217;s the SQL: </p>
<pre>ALTER TABLE memberships ADD INDEX test_index1(group_id,user_id)</pre>
<p>Now we have symmetric indices.  Let&#8217;s run our queries again with our second index in place:</p>
<p><script src="http://gist.github.com/191823.js"></script></p>
<p>Voila!  Now it doesn&#8217;t matter which way we join the tables because we have an index that is correctly ordered based on the directionality of the join.  You can even see that the optimizer selects a different index (key) depending on which direction you join the tables, exactly as expected.  Also, both queries now only require consulting the exact number of rows necessary and won&#8217;t involve any further disk hits as both queries can be satisfied with data available entirely within the index.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrischandler.name/ruby/symmetric-indices-to-make-joins-faster/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Converting latitude and longitude to timezones</title>
		<link>http://chrischandler.name/ruby/converting-latitude-and-longitude-to-timezones/</link>
		<comments>http://chrischandler.name/ruby/converting-latitude-and-longitude-to-timezones/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 07:16:04 +0000</pubDate>
		<dc:creator>Chris Chandler</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[geolocation]]></category>
		<category><![CDATA[l10n]]></category>
		<category><![CDATA[timezones]]></category>

		<guid isPermaLink="false">http://chrischandler.name/?p=39</guid>
		<description><![CDATA[A quick method for converting a latitude and longitude coordinate pair into a timezone.]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve fought it out with <a href="http://en.wikipedia.org/wiki/Internationalization_and_localization">localization</a>(l10n) of timezones then you know it can be a pain in the ass.  Further, suppose your localizing arbitrary information where all you&#8217;ve really been given is an address.  The relevant information isn&#8217;t necessarily in your system and the user might be in the wrong timezone anyway, so no sense in using that.</p>
<p>Here&#8217;s a quick Ruby means to convert latitude and longitude into a timezone <a title="ActiveSupport" href="http://as.rubyonrails.org/">ActiveSupport</a> recognizes.  This snippet only relies on <a title="Hpricot" href="http://wiki.github.com/why/hpricot/hpricot-xml">Hpricot</a> and the freely available <a title="Geonames" href="http://www.geonames.org/">Geonames</a> API:</p>
<p><script src="http://gist.github.com/157902.js"></script></p>
<p>GMT offsets are a convenient way for moving time data in and out of UTC as well as for not having to deal with arbitrary string names.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrischandler.name/ruby/converting-latitude-and-longitude-to-timezones/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
