socialinks_now_uses_Sphinx
<p>At Socialinks, the number of links that we are processing have push our 'LIKE' search beyond usable.</p>
<p>We have several options:</p>
<ul>
<li>Solr: We already know Solr from past projects, thus setting it up would be trivial for us.</li>
<li>Xapian: We are struggling with its documentation and finally decided to stop our effort.</li>
<li>MySQL FULLTEXT: We want to separate search "database" vs data "database", which makes FULLTEXT undesirable.</li>
<li>Sphinx: We never use it before but we heard a lot of good things about it. Plus <a href="http://jeremy.zawodny.com/blog/archives/010869.html" target="_blank">craigslist.org uses it</a>.</li>
</ul>
<p>We decided to use sphinx mainly because we want to gain expertise in it. And we are happily surprised with it:</p>
<ul>
<li>Configuring Sphinx is painless because it is built to perform full text search on RDBMS. Just 1 config file and we are ready to go.</li>
<li>It's indexer is a simple command-line tool that's runnable as cron (recommended by documentation).</li>
<li><span style="text-decoration: underline;">Negative:</span> The python client API can only be found when we download the source code. It was quite a struggle to find it. On top of it, the client silently fail if it was newer than the Sphinx server.</li>
</ul>
<p>As of right now, the indexing and searching is obviously lightning fast because our database size is still small compared to many big forums / social links site, but it's big enough to make 'LIKE' search completely unusable even with carefully crafted queries and denormalization (Our tables are mostly denormalized). Thus, we cannot recommend how fast/robust it is.</p>