Few days back I had to work on a project integrated with Nutch and Solr for providing high performance search. We have millions of domain. So there were no other way but separation of data for each domain both in Nutch and Solr. I maintained separate location for seed, crawldb, segments and index in nutch and one core for every domain in Solr. We used Solr for indexing from Nutch. I thought this post could help in a similar task. This post will cover the Solr part only. I will write for Nutch soon.
Solr is an web app from Lucid Imagination which is built on top of popular open source search technology Lucene. Lucene is a Java library. Both Solr and Lucene are maintained by Apache software foundation. Lucene is one of the top 5 Apache projects which has a daily 6000 downloads per day. With Solr and Lucene you can enjoy a hell lots of features. I am writing some of them here:
- Faceted search
- spell-checking
- Hit highlighting
- Advanced analysis/tokenization
- Efficient Replication to other Solr Search Servers
- Multiple core instances (from version 1.3)
- configurable response formats (XML/XSLT, JSON, Python, Ruby, PHP, Velocity, binary)
- and many more…
Solr could be run in any of the following servlet container:
In this post we will be using Tomcat in Ubuntu as OS. So, what are we waiting for? Let’s start…
Before running the installation of solr we need some packages to be installed. Let’s install the jdk first. Fire up your terminal and issue this command:
sudo apt-get install sun-java6-jdk sun-java6-jre
Now open the .bashrc file. the .bashrc file is located in your home directory you can open it in vi or nano with the following command:
sudo nano ~/.bashrc
Append the following line at the last of your .bashrc the jvm path could be different on your box. check it
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.16
Now install the tomcat server. you can install it via aptitude:
sudo apt-get install tomcat6
If you want to set a password for the tomcat6 administration panel you can do in /etc/tomcat6/tomcat-users.xml like this:
It is time to install the Solr itself:
sudo apt-get install solr-common solr-tomcat libxpp3-java
If you are interested to modify the Solr schema you can modify it from here: /etc/solr/conf/schema.xml
Now restart the tomcat and test Solr:
sudo service tomcat6 restart
Visit this url: http://127.0.0.1:8080/solr/admin/
We are done with the single instance setup.
Now we will create multiple cores sharing single instance with multiple config and schema files. News instance could be created dynamically. Let’s add a new xml file named solr.xml in /etc/solr directory and put the following content in it:
<solr persistent=”true” sharedLib=”lib”>
<cores adminPath=”/admin/cores”>
</cores>
</solr>
Restart tomcat6:
sudo service restart tomcat6
Now we can create new instance by visiting an url like this:
http://localhost:8080/solr/admin/cores?action=CREATE&name=newCore&instanceDir=/usr/share/solr
&config=/usr/share/solr/conf/solrconfig.xml
&schema=/usr/share/solr/conf/schema.xml&dataDir=data/newCore
There are some other actions, such as: RELOAD, RENAME, STATUS, ALIAS, SWAP, UNLOAD, LOAD
We can view the statistics of newly created instance: http://localhost:8080/solr/newCore/admin/stats.jsp if everything goes well you should see a page something like this:
further reading:
http://wiki.apache.org/solr/CoreAdmin
http://wiki.apache.org/solr/SolrInstall
Related posts:
12 Responses to “Running multiple solr instances”



Very informative kickstart for even the newbies.
This website can be a walk-by for all of the information you needed about this and didn’t know who to ask. Glimpse right here, and you’ll positively discover it.
Great article! Can you briefly explain why one may need to run multiple instances of Solr? I’m at a juncture deciding if that’s what I should do, or just bloat my schema file with fields that some records will use, and some won’t, defining a “data_type” field so I can filter.query on customer data records, or product data records, for example.
Or would it be better to run two instances, one for customer data, and a second for product data? One note, I need to pull data related between these two sources, in addition to pulling them independently of each other.
thanks for your help! Solr Rocks!!
Ferdous vai,
This tutorial is the best one to play with Nutch and Solr. U might be happy that i m doing my thesis project on designing a search engine which will provide the search results not only depending on the page ranks etc etc.. also it will associate a user review or user rating infos for each link/page which will be clicked from the search results. I have been studying and surfing a lot of white papers, web sites, forums, blogs to learn and find about best search engine technologies. After three months of studying i finally sticked with apache lucene and decided to implement my search engine using Ubuntu 10.4 LTS server, Hadoop for the cluster(1 master and 3 slave nodes), Nutch the crawler and Solr the search server with Tomcat. I still didnt decided the best way how to associate the User Ratings or review data to each page in the slor index but have a primary plan doing it by having two server for different purpose. First one will be a basic search engine, I mean the one solr search sever using the cluster, other server could be a simple ISPconfig3 server which will contain all the rating infos for all the pages those were indexed in the Solr serach server by Nutch. For example I want to host the domain for the search engine in that ISPconfig3 server and at the same time it could work as the repository index for the User ratings/ reviews informations. I want to use JAVA all the way through this project but can be considered other languages if some better solutions found to associate the user ratings infos. Now my question is do u think this could be done? If u didnt understand my objectives in this project I can describe u more or will send the link where I m updating my documentations. Please suggest me how to get it done fairly. Right now I m concentrating on designing the cluster and making the Hadoop running. After that I will start playing with Nutch and Solr. But the problem is I cant install java in my ubuntu. I am trying to follow Michael G. Noll’s Running Hadoop in multiple cluster. Here are the commands and brief infos from his tutorial for installing Java.
…………………………………
In Ubuntu 10.04 LTS, the package sun-java6-jdk has been dropped from the Multiverse section of the Ubuntu archive. You have to perform the following four steps to install the package.
1. Add the Canonical Partner Repository to your apt repositories:
1
$ sudo add-apt-repository “deb http://archive.canonical.com/ lucid partner”
2. Update the source list
1
$ sudo apt-get update
3. Install sun-java6-jdk
1
$ sudo apt-get install sun-java6-jdk
4. Select Sun’s Java as the default on your machine.
1
$ sudo update-java-alternatives -s java-6-sun
………………………………..
evrytime i made the command it says the command is not a valid one. damnn…:(
I am asking u this rather asking him coz I m sure I will get the best and rapid answer only from u.
By the way this tutorial rocks. Thanx bro. Couldnt wait any longer to get my hand dirty with the other tutorials that u have posted here regarding all those open source components.
Dude,
I solved the problem of installing jdk for my server. yeah its a repo politics in lucid lynx. Anyway I am still looking forward to hear from u about possible ways to associate User reviews/User ratings data with the solr index. Any suggestions will be highly appreciative. Thanx in advance.