Solr 5: First look

Installation

Setting up Solr5

Solr 5 is now a stand-alone service and it is no longer necessary to run it in a container like Tomcat. The advantage of this is that Tomcat does not have to be installed any longer, which simplifies maintaining and securing the server Solr is running on.

Installation is also much simpler:

  1. Download Solr from the download page
  2. Extract the archive (e.g. to solr5)
  3. CD into the directory (e.g. cd solr5)
  4. In this directory there is a script: install_solr_service.sh which must be run as root and must have as first argument the path to the downloaded archive. Run this script with:
    sudo ./bin/install_solr_service.sh <path-to-solr-5.0.0.tgz\>

Installation defaults / options:

  • The installation directory defaults to /opt/solr, and is configurable with the '-i'-option
  • The data directory defaults to /var/solr, and is configurable with the '-d'-option.
  • The port defaults to 8983 (configuration option '-p')

Notes:

  • The script wil install Solr5 as a service, so that you can easily use sudo service solr startsudo service solr stop etc.
  • The script also creates a 'solr'-user and -group which are used to run the server.

Tweaking the installation

In the data-directory (default /var/solr) the file solr.in.sh lives. This script will be run as part of the Solr startup procedure and is the place in which the most important variables for Solr are set.

For example, debugging and development handy options for autocommit can be set here ('Why do I have to wait 2 minutes befor something is in the index...'):

SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000"

SOLR_OPTS="$SOLR_OPTS -Dsolr.autoCommit.maxTime=60000"

The directory in which the new create core command acts can also be set here by editing the SOLR_HOME variable, eg:

SOLR_HOME=/var/solr/cores

To use this, make sure the solr.xml is present in the directory, also make sure the solr user owns this directory, and the service is restarted:

sudo chown -R solr:solr /var/solr

sudo cp <solr install dir>/server/solr/solr.xml /var/solr/cores/

sudo service solr restart

Creating cores

The solr executable in the bin directory has the ability to create cores, so copying existing cores and adding them to solr.xml is not necessary anymore. Furthermore, Solr 5 uses the new core-discovery abilities to detect the cores and so these kind of settings:

<cores adminPath="/admin/cores" defaultCoreName="solrdev">

<core name="xxx" instanceDir="cores/xxx" />

<core name="yyy" instanceDir="cores/yyy" />

</cores>

in solr.xml are no longer present.

To create a new core you have to be in the ndirectory where Solr was installed and use:

sudo bin/solr create -c devel1

This command will fail with an error:

Failed to create core 'devel1' due to: Error CREATEing SolrCore 'devel1': Unable to create core [devel1] Caused by: /var/solr/data/devel1/data

due to permmission errors, fix them with:

sudo chown -R solr:solr /var/solr/

and rerun the commando:

sudo bin/solr create -c devel1

If you now point your browser to

http://localhost:8983/solr/#/devel1

you will see the new core.

Furtmermore, if you get an error like:

 Failed to create core 'devel1' due to: Error CREATEing SolrCore 'devel1': Unable to create core [devel1] Caused by: Can't find resource 'solrconfig.xml' in classpath or '/var/solr5/data/devel1/conf'

 make sure the directories '/var/solr5/data/devel1'  and '/var/solr5/data/devel1/conf' do not already exist. If they do exist, simply remove them and issue the command again.
 

 

By the way: if you go to the root of the server, eg:

http://localhost:8983/

You wil get a complex 404-error like:

Error 404 - Not Found.
No context on this server matched or handled this request.
Contexts known to this server are:

    /solr ---> o.e.j.w.WebAppContext{/solr,file:/opt/solr5/solr-5.0.0/server/solr-webapp/webapp/},/opt/solr5/solr-5.0.0/server/webapps/solr.war

Just add 'solr' to the path to go to the admin pages like this:

http://localhost:8983/solr/#/

Customizing your schema

Drupal schema

When using the 4.x schema from the Drupal ApacheSolr module only a few points prevent the Solr core from running:

Line 99:

<fieldType name="pfloat" class="solr.FloatField" omitNorms="true"/>

Line 122:

<fieldType name="date" class="solr.TrieDateField" sortMissingLast="true" omitNorms="true"/>

Both 'FloatField' and 'DateField' are deprecated:

The following legacy numeric and date field types, deprecated in Solr 4.8, are no longer supported: BCDIntField, BCDLongField, BCDStrField, IntField, LongField, FloatField, DoubleField, SortableIntField, SortableLongField, SortableFloatField, SortableDoubleField, and DateField. Convert these types in your schema to the corresponding Trie-based field type and then re-index. See SOLR-5936 for more information.

See also SOLR-5936

When both fields are changed to their Trie-based variants, the core will be starting, which is not to say that it is running optimal!

Furthermore in solconfig.xml solr.admin.AdminHandlers is deprecated, remove the line 1044:

<requestHandler name="/admin/" class="solr.admin.AdminHandlers" />

Also the extraction and clustering libs are not on the same location (and probably not necessary), remove line 71 and 72:

<lib dir="${solr.contrib.dir:../../../contrib}/extraction/lib" />

<lib dir="${solr.contrib.dir:../../../contrib}/clustering/lib/" />

 

Securing the data

Oviously securing the Solr server on IP via Tomcat is no longer an option. To secure the server running Solr we must add an 'IPAccessHandler' to the configuration of Jetty.

In the Solr installation directory there is the file  etc/jetty.xml If you edit this file, you see in the Handlers section the following

 

 <!-- =========================================================== -->
    <!-- Set handler Collection Structure                            -->
    <!-- =========================================================== -->
    <Set name="handler">
      <New id="Handlers" class="org.eclipse.jetty.server.handler.HandlerCollection">
        <Set name="handlers">
         <Array type="org.eclipse.jetty.server.Handler">
           <Item>
             <New id="Contexts" class="org.eclipse.jetty.server.handler.ContextHandlerCollection"/>
           </Item>
           <Item>
             <New id="DefaultHandler" class="org.eclipse.jetty.server.handler.DefaultHandler"/>
           </Item>
           <Item>
             <New id="RequestLog" class="org.eclipse.jetty.server.handler.RequestLogHandler"/>
           </Item>
         </Array>
        </Set>
      </New>
    </Set>
 

We want to wrap the item '<New id="Contexts" class="org.eclipse.jetty.server.handler.ContextHandlerCollection"/>' with an IPAccessHandler, which can be done by edting the file to:

 

<!-- =========================================================== -->
    <!-- Set handler Collection Structure                            -->
    <!-- =========================================================== -->
    <Set name="handler">
      <New id="Handlers" class="org.eclipse.jetty.server.handler.HandlerCollection">
        <Set name="handlers">
         <Array type="org.eclipse.jetty.server.Handler">
           <Item>
                <!-- here begins ip securing  -->
                <New class="org.eclipse.jetty.server.handler.IPAccessHandler">
                        <Call name="addWhite">
                                <!-- list of args with ip-addresses -->
                                <Arg>127.0.0.1</Arg>
                        </Call>

                       <Call name="addWhite">
                                <!-- list of args with ip-addresses -->
                                <Arg>127.0.0.1</Arg>
                        </Call>

                        <Set name="handler">
                                <New id="Contexts" class="org.eclipse.jetty.server.handler.ContextHandlerCollection"/>
                        </Set>
                </New>
                <!-- end of securing -->

           </Item>
           <Item>
             <New id="DefaultHandler" class="org.eclipse.jetty.server.handler.DefaultHandler"/>
           </Item>
           <Item>
             <New id="RequestLog" class="org.eclipse.jetty.server.handler.RequestLogHandler"/>
           </Item>
         </Array>
        </Set>
      </New>
    </Set>
 

Between the <IPAccessHandler></IPAccessHandler> there is a list <addWhite>-calls with ip-adresses that are allowed to access Solr 5.

Obviously, 127.0.0.2 is only given here as example and should be replaced by your own ip-adress to give you remote access to the server.

If Drupal is hosted on the same server, the 127.0.0.1 should certainly be add to allow the Drupal site to index its data.

Future todo's

Of course this is only a short first look and we still have to look deeper into things like security and perfomance. For now we are using Tika as extracting service, but the location extracting libs is also something which should be fixed in the Drupal solconfig.xml.