Friday, March 14, 2014

SolrCloud set up in windows machine

View Source

SolrCloud Set up in a windows machine

Getting Started

Solr is an open source search platform which uses Lucene libraries and features powerful full text search from Apache Foundation. The Solr 4 release brought in many features including but not limited to SolrCloud.
On a single node, Solr has a core that is essentially a single index. If you want multiple indexes, you create multiple cores. With SolrCloud, a single index can span multiple Solr instances. This means that a single index can be made up of multiple cores on different machines.Please refer the below listed URLs for tutorials/wiki on how SolrCloud works.
https://cwiki.apache.org/confluence/display/solr/SolrCloud

Installation

ZooKeeper
Although Solr comes bundled with Apache ZooKeeper, you should consider yourself discouraged from using this internal ZooKeeper in production, because shutting down a redundant Solr instance will also shut down its ZooKeeper server, which might not be quite so redundant.
SolrCloud uses Zookeeper as a repository for cluster configuration and coordination. Also it can be thought like a distributed file system which contains information about all the Solr servers.
Download ZooKeeper (version 3.4.5 which goes along with the Solr 4.6.0) http://zookeeper.apache.org/releases.html. Zookeeper can be downloaded as a tar file and can be unzipped in any folder of user's choice.
Solr
Download Solr 4.6.0 from http://lucene.apache.org/solr/downloads.html. This is again a zip file and can be unzipped it into any location (I did it under C:\DevApps\Solr 4.6.0).
Tomcat
Tomcat server can be used to deploy the solr.war which is given out of the box by Apache when it's downloaded. Multiple tomcat instances can be part of the cloud. Download Tomcat 7 from http://tomcat.apache.org/download-70.cgi. This is also a zip file and can be unzipped it into any location.

Zookeeper and Solr deployment in the local windows machine

Create a folder where you can store the various solr instances and the zookeeper configuration like C:\solr
Solr Directory structure -

Zookeeper Data

Unzip ZooKeeper into a central directory like C:\Solr\ZooKeeper-3.4.5
Create a folder C:\solr\zookeeper-data. This folder is going to contain the data that's indexed and all the configuration information of Solr. (this can be any folder but remember to configure the same folder in zoo.cfg)
Update the C:\solr\zookeeper-3.4.5\conf\zoo.cfg. The contents should look like below. Please note the port number 3181 and the dataDir (solr/zookeeper-data)
Zookeeper Configuration

Config, Data, Libraries

Solr has a set of config files. Those config files are available in the example\solr\collection1\conf directory. Collection1 is the default collection that comes with the solr package (Please read the tutorial to understand what a collection is). Attached is the screen shot which shows the config files that comes with the downloaded package.
Solr Config structure

Now, you are going to copy the config files, libraries to a centralized location which can be accessed by both the solr instances.
Create two folders solr1 and solr2 under C:\solr like shown in the Solr Directory structure screen shot.
Copy the C:\DevApps\Solr 4.6.0\solr-4.6.0\example\solr\solr.xml to C:\solr\solr1 and C:\solr\solr2. The xml contents needs to be modified. The default solr.xml comes with the "Jetty" server configuration. The contents should look like below - Make sure both the solr.xml (in Solr1 and Solr2) have the below contents.
Solr.xml

Copy the config files from C:\DevApps\Solr 4.6.0\solr-4.6.0\example\solr\collection1\conf\* to C:\Solr\solr-conf folder.
To copy the solr libraries - Unzip the C:\DevApps\Solr 4.6.0\solr-4.6.0\example\webapps\solr.war to C:\solr-war and copy the C:\solr-war\WEB-INF\lib\* to C:\solr\solr-cli-lib
Add the C:\solr\solr-cli-lib to C:\solr\solr-conf\solrconfig.xml
Solrconfig.xml

Solr-Cli-Lib

Tomcat configuration

Unzip Apache Tomcat into C:\solr\tomcat1 and C:\solr\tomcat2 and copy the C:\DevApps\Solr 4.6.0\solr-4.6.0\example\webapps\solr.war to both the tomcat's webapps folder.

Starting the zoo keeper server

Now, start the zoo keeper server using C:\solr\zookeeper-3.4.5\bin\zkServer.cmd (if you are on windows else zkServer.sh)

Using Zoo Keeper for managing the config files

Use the below command to upload the config to the zoo keeper under a random name "myconfig"
java -classpath .;/solr/solr-cli-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:3181 -confdir /solr/solr-conf -confname myconfig
Use the below command to link a random collection "mycollection" to the config that's uploaded
java -classpath .;/solr/solr-cli-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection mycollection -confname myconfig -zkhost localhost:3181

Starting the zoo keeper client to verify the collection that's created

Start the zoo keeper client using the zkCli.cmd like - zkCli.cmd -server localhost:3181
Also, please note that you can check the collection that's created using the ls /collections command.

Modifying Tomcat server configurations to use different port numbers

This is to make sure there are two solr instances running in the same machine.
Update the C:\solr\tomcat1\apache-tomcat-7.0.41\conf\server.xml in tomcat1 - Change the shutdown port # to 7005 and the HTTP connector port to 7070.
Update the C:\solr\tomcat2\apache-tomcat-7.0.41\conf\server.xml in tomcat2 - Change the shutdown port # to 9005 and the HTTP connector port to 9100.

Adding SOLR options in tomcat class path

Add an environment variable SOLR_DIST_HOME with the value "C:\solr" which you can add it in your start up script
Add SOLR_OPTS in the tomcat's environment file like below
Tomcat1 - startup.bat
SET SOLR_OPTS=-Dsolr.solr.home="%SOLR_DIST_HOME%\solr1" -Dport=7070 -DhostContext=solr -DzkClientTimeout=20000 -DzkHost=localhost:3181
SET JAVA_OPTS=%JAVA_OPTS% %SOLR_OPTS%
Tomcat2 - startup.bat
SET SOLR_OPTS=-Dsolr.solr.home="%SOLR_DIST_HOME%\solr2" -Dport=9100 -DhostContext=solr -DzkClientTimeout=20000 -DzkHost=localhost:3181
SET JAVA_OPTS=%JAVA_OPTS% %SOLR_OPTS%

Time to start the tomcat and create collection and shards

Start both tomcats using the startup.bat. You can see the solr instances using the localhost:7070/solr/# and localhost:9100/solr/# urls.

Create the collection that's already linked in the zoo keeper using the CreateCollection API provided by Solr. The createCollection API has various parameters. Please read the tutorial to get more information about those. For here, I have used the numShars = 2 which creates two shards in the solr instances with a replication factor of 2. That is, the maximum shards per node = 2 (2 * 2 = 4)
http://localhost:7070/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=2&maxShardsPerNode=2

There you are - you are done with the create collection - When you hit this URL in your browser you will see the cloud that's set up with the shards and the collection. Leader is 7070 and 9100 will take over if you bring down 7070. Please see the cores with the replicas - mycollection_shard1_replica1, mycollection_shard2_replica1, mycollection_shard1_replica2, mycollection_shard1_replica2 created in C:\solr\solr1 and C:\solr\solr2 .



Wednesday, April 1, 2009

Quick review of Design patterns

What is a design pattern and why it is so popular in the software industry? Is it restricted to software or can it be used in any area? When did it originate and how it became accepted by the world?

What is a design pattern?
When we give this question to Google, we get a definition from the wiki that "A pattern is the idea of capturing the architectural design ideas as resuable descriptions". This might be a bit complex to understand. But let me try to explain the same in my way! Let me take an example of preparing tea. Anybody can prepare tea in many number of ways. The main ingredients of tea are water,tea powder,sugar n milk. The last two are optional. The most common way is boiling the water, adding the tea powder. May be we can add the tea powder to the water and then boil the water. But this might take a little longer than the first way as water boils faster than a mix of tea + water. Hence the first way gives us more performance than the second one. When we prepare tea at home, this is not necessary, but try to think about a tea shop. Let me say here that adding the tea powder after the water boils is a pattern! Will this work for preparing coffee as well? Yes. So I can put the definition of pattern as "A reusable procedure that when implemented or used in various processes like software/manufacturing/healthcare or any other businesses gives the most perfect results." Ofcourse any procedure that does not give the perfect output can never be universally accepted. Also since its defined and proven already, implementing that is easier.

Is it restricted to software or can it be used in any area? When did it originate and how it became accepted by the world?
Patterns as the word says originated as an architectural concept in 1977 by Christopher Alexander. Software Industry made them popular! Yes. In 1987, Kent Beck and Ward Cunningham tried to define their ideas as patterns and applied them in their programming. Again, it was Erich Gamma who popularised those concepts through his book " Design Patterns: Elements of Reusable Object-Oriented Software ". Thus slowly, the concept of design patterns penetrated into the software world along with the OOAD concepts which directly points to C++ and mostly Java. Most of the design patterns are based on the object oriented concepts and are used by people who talk about objects.

Now we know about the origin of design patterns. We can pass on to the various types of design patterns that are defined already and being used in various applications.

Tuesday, March 24, 2009

IBM buys SUN

The big software giant IBM is buying another software bull, SUN Microsystems. The talks are going on, on and on. For more information, please see this serverside.com link.
IBM buying SUN

Sunday, March 22, 2009

My second year anniversary, March 22

I am celebrating my second year anniversary today - job quit anniversary. :) I quit my job and relocated to USA with my family of my husband and kid. Its been 2 years since then. Time passes away very fast!

Wednesday, February 25, 2009

Wonderful lyrics - Thamizh!

I love this song for lyrics and a wonderful music!


Tuesday, February 17, 2009

JavaFX

JavaFX is a new comer to the RIA area. Contributed by Sun Microsystems, this excellent technology plays with Graphics, Media, Web Services, Animation and many other dynamic activities in desktop as well as mobile platforms. This is a real breakthrough amidst the other competitors who are now ruling the WWW2.

I have joined the JavaFX passion course which is hosted online. Today being the first day, I was able to browse through the sample applications provided by Sun (Thanks to all the contributors). I really hope to venture out in this technology.

JavaFX online Training

JavaFX