Friday, March 14, 2014

SolrCloud set up in windows machine

View Source

SolrCloud Set up in a windows machine

Getting Started

Solr is an open source search platform which uses Lucene libraries and features powerful full text search from Apache Foundation. The Solr 4 release brought in many features including but not limited to SolrCloud.
On a single node, Solr has a core that is essentially a single index. If you want multiple indexes, you create multiple cores. With SolrCloud, a single index can span multiple Solr instances. This means that a single index can be made up of multiple cores on different machines.Please refer the below listed URLs for tutorials/wiki on how SolrCloud works.
https://cwiki.apache.org/confluence/display/solr/SolrCloud

Installation

ZooKeeper
Although Solr comes bundled with Apache ZooKeeper, you should consider yourself discouraged from using this internal ZooKeeper in production, because shutting down a redundant Solr instance will also shut down its ZooKeeper server, which might not be quite so redundant.
SolrCloud uses Zookeeper as a repository for cluster configuration and coordination. Also it can be thought like a distributed file system which contains information about all the Solr servers.
Download ZooKeeper (version 3.4.5 which goes along with the Solr 4.6.0) http://zookeeper.apache.org/releases.html. Zookeeper can be downloaded as a tar file and can be unzipped in any folder of user's choice.
Solr
Download Solr 4.6.0 from http://lucene.apache.org/solr/downloads.html. This is again a zip file and can be unzipped it into any location (I did it under C:\DevApps\Solr 4.6.0).
Tomcat
Tomcat server can be used to deploy the solr.war which is given out of the box by Apache when it's downloaded. Multiple tomcat instances can be part of the cloud. Download Tomcat 7 from http://tomcat.apache.org/download-70.cgi. This is also a zip file and can be unzipped it into any location.

Zookeeper and Solr deployment in the local windows machine

Create a folder where you can store the various solr instances and the zookeeper configuration like C:\solr
Solr Directory structure -

Zookeeper Data

Unzip ZooKeeper into a central directory like C:\Solr\ZooKeeper-3.4.5
Create a folder C:\solr\zookeeper-data. This folder is going to contain the data that's indexed and all the configuration information of Solr. (this can be any folder but remember to configure the same folder in zoo.cfg)
Update the C:\solr\zookeeper-3.4.5\conf\zoo.cfg. The contents should look like below. Please note the port number 3181 and the dataDir (solr/zookeeper-data)
Zookeeper Configuration

Config, Data, Libraries

Solr has a set of config files. Those config files are available in the example\solr\collection1\conf directory. Collection1 is the default collection that comes with the solr package (Please read the tutorial to understand what a collection is). Attached is the screen shot which shows the config files that comes with the downloaded package.
Solr Config structure

Now, you are going to copy the config files, libraries to a centralized location which can be accessed by both the solr instances.
Create two folders solr1 and solr2 under C:\solr like shown in the Solr Directory structure screen shot.
Copy the C:\DevApps\Solr 4.6.0\solr-4.6.0\example\solr\solr.xml to C:\solr\solr1 and C:\solr\solr2. The xml contents needs to be modified. The default solr.xml comes with the "Jetty" server configuration. The contents should look like below - Make sure both the solr.xml (in Solr1 and Solr2) have the below contents.
Solr.xml

Copy the config files from C:\DevApps\Solr 4.6.0\solr-4.6.0\example\solr\collection1\conf\* to C:\Solr\solr-conf folder.
To copy the solr libraries - Unzip the C:\DevApps\Solr 4.6.0\solr-4.6.0\example\webapps\solr.war to C:\solr-war and copy the C:\solr-war\WEB-INF\lib\* to C:\solr\solr-cli-lib
Add the C:\solr\solr-cli-lib to C:\solr\solr-conf\solrconfig.xml
Solrconfig.xml

Solr-Cli-Lib

Tomcat configuration

Unzip Apache Tomcat into C:\solr\tomcat1 and C:\solr\tomcat2 and copy the C:\DevApps\Solr 4.6.0\solr-4.6.0\example\webapps\solr.war to both the tomcat's webapps folder.

Starting the zoo keeper server

Now, start the zoo keeper server using C:\solr\zookeeper-3.4.5\bin\zkServer.cmd (if you are on windows else zkServer.sh)

Using Zoo Keeper for managing the config files

Use the below command to upload the config to the zoo keeper under a random name "myconfig"
java -classpath .;/solr/solr-cli-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:3181 -confdir /solr/solr-conf -confname myconfig
Use the below command to link a random collection "mycollection" to the config that's uploaded
java -classpath .;/solr/solr-cli-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection mycollection -confname myconfig -zkhost localhost:3181

Starting the zoo keeper client to verify the collection that's created

Start the zoo keeper client using the zkCli.cmd like - zkCli.cmd -server localhost:3181
Also, please note that you can check the collection that's created using the ls /collections command.

Modifying Tomcat server configurations to use different port numbers

This is to make sure there are two solr instances running in the same machine.
Update the C:\solr\tomcat1\apache-tomcat-7.0.41\conf\server.xml in tomcat1 - Change the shutdown port # to 7005 and the HTTP connector port to 7070.
Update the C:\solr\tomcat2\apache-tomcat-7.0.41\conf\server.xml in tomcat2 - Change the shutdown port # to 9005 and the HTTP connector port to 9100.

Adding SOLR options in tomcat class path

Add an environment variable SOLR_DIST_HOME with the value "C:\solr" which you can add it in your start up script
Add SOLR_OPTS in the tomcat's environment file like below
Tomcat1 - startup.bat
SET SOLR_OPTS=-Dsolr.solr.home="%SOLR_DIST_HOME%\solr1" -Dport=7070 -DhostContext=solr -DzkClientTimeout=20000 -DzkHost=localhost:3181
SET JAVA_OPTS=%JAVA_OPTS% %SOLR_OPTS%
Tomcat2 - startup.bat
SET SOLR_OPTS=-Dsolr.solr.home="%SOLR_DIST_HOME%\solr2" -Dport=9100 -DhostContext=solr -DzkClientTimeout=20000 -DzkHost=localhost:3181
SET JAVA_OPTS=%JAVA_OPTS% %SOLR_OPTS%

Time to start the tomcat and create collection and shards

Start both tomcats using the startup.bat. You can see the solr instances using the localhost:7070/solr/# and localhost:9100/solr/# urls.

Create the collection that's already linked in the zoo keeper using the CreateCollection API provided by Solr. The createCollection API has various parameters. Please read the tutorial to get more information about those. For here, I have used the numShars = 2 which creates two shards in the solr instances with a replication factor of 2. That is, the maximum shards per node = 2 (2 * 2 = 4)
http://localhost:7070/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=2&maxShardsPerNode=2

There you are - you are done with the create collection - When you hit this URL in your browser you will see the cloud that's set up with the shards and the collection. Leader is 7070 and 9100 will take over if you bring down 7070. Please see the cores with the replicas - mycollection_shard1_replica1, mycollection_shard2_replica1, mycollection_shard1_replica2, mycollection_shard1_replica2 created in C:\solr\solr1 and C:\solr\solr2 .



2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Hi Madam,
    I followed your article , but i am unable to access different nodes ,Can you please , provide me steps for solr cloud 5.4.1 set up with zookeeper

    ReplyDelete