Hadoop is a collection of sub-projects from ASF to do hardcore distributed computing written in Java. If you have a look on their powered by page you will find most of the big boys are there. If you are from technology field you must be heard couple of these tools which are sub-projects of Hadoop: We will be using these three in throughout the post: Hadoop Common: The common utilities that support the other Hadoop subprojects. HDFS: A...
Read MoreRunning multiple solr instances
Few days back I had to work on a project integrated with Nutch and Solr for providing high performance search. We have millions of domain. So there were no other way but separation of data for each domain both in Nutch and Solr. I maintained separate location for seed, crawldb, segments and index in nutch and one core for every domain in Solr. We used Solr for indexing from Nutch. I thought this post could help in a similar task. This post...
Read MoreNutch cheat sheet
What is NUTCH you ask? Nutch is a very popular open source JAVA based search engine built on top of Lucene which is translated to C, C++, C#, Python, Perl and Ruby. It provides all of the tools you need to run your very own search engine. Current version of nutch (as of October 2010) is: 1.2. Download PDF version of this cheat sheet Some important gotchas on nutch: Founded in 2003 by Doug Cutting, the Lucene creator, and Mike...
Read More
