Upgrade / Migrate Solr 3.x to Solr 4

Objective of this article is to provide an insight on upgrade / migrate Solr 3.x to Solr 4.  Every organization is unique and has its own set of requirements and hence Solr configuration files adhering to the unique business requirements of the the organization.  Of-course every Solr Configuration is unique in nature around the world with its schema (fields and data structure), analyzers, tokenizers, stopwords, Boosting & Blocking, synonyms, etc and solrconfig (indexConfig, request handler, etc).

upgrade / migrate Solr 3.x to Solr 4


Improvements and New Features

Solr 4 brings numerous improvements and new features. Bird Eye view of Solr 3.x and Solr 4.  Solr 4 contains numerous bug fixes, optimizations, and improvements since Solr 4 Beta release.


New Hardware Requirements

Indeed we have new hardware required for SolrCloud advantage from Solr 4.   A ZooKeeper cluster is used in code name SolrCloud as-

    • The central configuration store for the cluster
    • A co-ordinator for operations requiring distributed synchronization
    • The system-of-record for cluster topology

For production its recommend to you use external ZooKeeper ensemble rather than having Solr run embedded ZooKeeper server(s).  Check out an article Zookeeper Cluster (Multi-Server) Setup.  I hope you understood, why we need a new hardware requirement?


Upgrade / Migrate Solr 3.x to Solr 4

Article provides required changes should be done in your existing Solr Configuration and it helps to reduce the risk/errors on configuration changes while upgrade.  We are going to follow configuration file wise changes for upgrade / migrate Solr 3.x to Solr 4.  Let’s begin with solrconfig.xml

    • solrconfig.xml
    • schema.xml
    • solr.xml

solrconfig.xml Confirguration

We will go through step by step configuration change which we need to take care.  Open solrconfig.xml file in your favorite text editor and perform following changes.

Step 1

Update the luceneMatchVersion attribute to

Step 2

Solr 4 introduces the Soft Commit option.  Soft Commit is like Auto Commit behavior except it enables/ensures that changes are visible.  However it does not ensure that data is synced to disk.  Of course this is faster and more Near-Realtime friendly.

Step 3

Solr 4.0 introduces the Transaction Log.  It’s used for real-time get operation.  Update log accepts dir as parameter for storing transaction log in a directory.  By defaults to Solr Data directory.  Below updateLog config should reside inside your  <updateHandler ...> .... </updateHandler>

Step 4

Once we define updateLog configuration and it also requires a Near-Realtime Handler too.

Step 5: Solr Replication

Classic Solr: role of replication handler remains as-is.  However in SolrCloud mode Replication Handler is mandatory; In SolrCloud replication handler used to bulk transfer segments when nodes are added or need to recover.  So we should apply changes appropriately

Classic Solr migration/upgrade: preserve existing replication handler definition as-is

SolrCloud upgrade: Remove existing replication handler definition and add below one for SolrCloud mode

Step 6: Solr index Configuration

<indexDefaults> and <mainIndex> configuration sections was depreacated in Solr 3.6 and discontinued in Solr 4.0;  instead new section <indexConfig> introduced.  So remove old configuration section and add new one <indexConfig>.

Sample <indexConfig> section:

Since Solr 3.6 <useCompoundFile> value is false by default

<maxFieldLength> field is discontinued, to achieve similar behavior include LimitTokenCountFilterFactory in your fieldType definition.

For Example:

Step 7

Make a list of existing external library dependencies referred in solrconfig.xml, it should be used to get those libraries form Solr 4.0 artifact ( apache-solr-4.x.x.zip) and place appropriately during upgrade.

We are done with solrconfig.xml configuration changes, let’s move on to  schema.xml


schema.xml Configuration

In schema.xml we have less changes for upgrade/migration, those are-

Step 1

Update schema version in sync to Solr 4 i.e. 1.5

Bit of information around schema version (from schema.xml about versions, shipped with Solr artifact)

Step 2

Keenly notice to observe that, we have defined updateLog definition for near-realtime search function in solrconfig.xml, it requires the below field in the scheme.xml, add the below field definition inside tag <fields> ..... </fields>

Step 3

org.apache.lucene.search.DefaultSimilarity has been refactored to  org.apache.lucene.search.similarities.DefaultSimilarity; if your schema.xml has this reference, update the reference

PS: Solr sample configuration shipped with Solr 4 & 3.6.1 artifact has plenty of sample fieldType definition (with Tokenizer’s & Analyzers) for languages, path hierarchy, payloads, geospatial location, etc. Need be make use of it for your requirements.


solr.xml Configuration

solr.xml used to provide list of Solr core information to Solr.  Solr 4 has improvements of Code name SolrCloud.  Following are the available options/configuration (below tags/attributes details includes all available options up-to Solr 4)

XML Tag ‘<solr …>’ attributes are

  • persistent: persisting configuration changes to disk – default is  false
  • sharedLib: Share library directory path for Sor cores – default is  null
  • zkHost: ZooKeeper Host name with port #; if absent it tries to read from System property.  More importantly this attribute determines Solr is in Cloud mode or Classic mode during a startup – default is  null
  • coreLoadThreads: Solr Core loading threads – default is 3; minimum 2 coreLoadThreads is required

XML Tag ‘<cores …>’ attributes are

  • adminPath: Solr core management through request handler.  its mandatory attribute – default is  null
  • defaultCoreName(optional): Solr core name; while no core name is specified in request URL
  • zkClientTimeout: ZooKeeper client connection negotiation timeout in milliseconds – default is  15000
  • host(optional): host name of solr – default is  localhost
  • hostPort: Solr host port number for eg.: values are 8080, 9090, etc – default is  8983
  • hostContext: solr context name – default is solr for eg.:  http://localhost:8080/<hostContext>
  • leaderVoteWait: Shard leader vote waiting in milliseconds – default is  180000
  • shareSchema: sharing Solr schema configuration – default is  false
  • adminHandler: Solr Administration handler – default is  null
  • managementPath: Solr Management path – default is  null

XML Tag ‘<core …>’ attributes are

  • name: Name of the Solr core – default is  collection1
  • shard: Name of the Solr shard  for eg.: shard1, shard2, etc.
  • collection: Name of the Solr Collection – default is  collection1
  • instanceDir: Solr core directory
  • dataDir: Data directory Solr core
  • schema: Name of the solr core schema file name – default is  schema.xml
  • config: Name of the solr core config file name – default is  solrconfig.xml
  • properties: The solr core properties file name
  • loadOnStartup: Boolean value for Solr core loading – default is  null
  • swappable: Boolean value for wether Solr core swappable – default is  null

Sample solr.xml definition for SolrCloud, similarly define yours

We are done with solr.xml, let’s move on!


I have upgraded Solr Configuration files. What next?

Classic Solr:

  • Place your newly upgrade Solr configuration files solrconfig.xml, schema.xml in respective place
  • Deploy you Solr 4 War file
  • Start the Solr Instance(s)

SolrCloud:

Check out this article ‘SolrCloud Cluster (Single Collection) Deployment‘.


Pay Attention to Deprecated Elements in Solr 4

Next Important think to take care is Deprecated items in Solr 4.0; for reference below are the Solr core and Lucene core of 4.0 Javadocs

http://lucene.apache.org/core/4_0_0/core/deprecated-list.html and http://lucene.apache.org/solr/4_0_0/solr-core/index.html

Or Another simplest way to find out deprecated classes in Solr Configuration

For an instance: below line you find from log then replace it with UpdateRequestHandler class.  Follow similarly


What will happen to existing Solr Index?

Last and important concern of everyone ‘What will happen to existing Solr Index?’

Classic Solr

Migrating from 3.x classic solr to 4.x classic solr; after migrate issue an optimize command, Solr will take care rest (Index optimize, index format, etc).  To take advantage of Solr 4 features like Near-Realtime Get; Full Re-indexing required, else to use Older version of index version, update luceneMatchVersion to your existing version.  For example: LUCENE_33

SolrCloud

Upgrade / migrate Solr 3.x to Solr 4 (SolrCloud), then Full Re-indexing required and recommended to take advantage of Solr 4/SolrCloud feature such Shard Range, Near-Realtime search, Document version & counts, SolrCloud Clustering capabilities, etc.


Your Migration/Upgrade Journey Start Here

I hope this article gives an idea & insight of ‘Elements to take care for migrate / upgrade Solr 3.x to Solr 4‘.  All the best!