Few days back I had to work on a project integrated with Nutch and Solr for providing high performance search. We have millions of domain. So there were no other way but separation of data for each domain both in Nutch and Solr. I maintained separate location for seed, crawldb, segments and index in nutch and one core for every domain in Solr. We used Solr for indexing from Nutch. I thought this post could help in a similar task. This post...
Read MoreRunning multiple solr instances
Nutch cheat sheet
What is NUTCH you ask? Nutch is a very popular open source JAVA based search engine built on top of Lucene which is translated to C, C++, C#, Python, Perl and Ruby. It provides all of the tools you need to run your very own search engine. Current version of nutch (as of October 2010) is: 1.2. Download PDF version of this cheat sheet Some important gotchas on nutch: Founded in 2003 by Doug Cutting, the Lucene creator, and Mike...
Read More
