Introduction to Solr
Solr is a search engine platform based on Apache Lucene. It is written in Java Language and uses the Lucene library to implement indexing.
Solr can be accessed using a variety of REST APIs, including XML and JSON.
Below you can find a feature list directly from their website:
- Advanced Full-Text Search Capabilities
- Optimized for High Volume Web Traffic
- Standards Based Open Interfaces – XML, JSON and HTTP
- Comprehensive HTML Administration Interfaces
- Server statistics exposed over JMX for monitoring
- Linearly scalable, auto index replication, auto failover and recovery
- Near Real-time indexing
- Flexible and Adaptable with XML configuration
- Extensible Plugin Architectur
In this tutorial we will teach you how to install Solr with its binary distribution.
Prerequisites
There are multiple requirements for this tutorial:
At least 1 GB of Ubuntu 14.04 Dropךet at the minimum, your amount of RAM will be highly dependent on your specific situation.
A non-root sudo user.
Have Java Installed, you can refer to one of our other tutorials for Ubuntu 14.04 on how you can install Java.
Step One: Installing Solr
First, install Solr 5.2.1. Begin by downloading the Solr distribution.
You will find a suitable mirror on the official page.
After finding a suitable mirror, copy the link of ‘solr-5.2.1.tgz’ from the mirror.
As an example, we will use https://archive.apache.org/dist/lucene/solr/5.2.1/
Next, download the file into your home directory.
$ cd ~ $ wget http://apache.mirror1.spango.com/lucene/solr/5.2.1/solr-5.2.1.tgz
Then install Solr as a service with the following script:
$ sudo bash ./install_solr_service.sh solr-5.2.1.tgz
Check if the server is up using the following.
$ sudo service solr status
The results should look like the below.
Solr status output Found 1 Solr nodes: Solr process 2750 running on port 8983 . . .
Step Two: Creating a Collection
In this part of the article, you will make a simple Solr collection.
Solr may have a few collections but you will be using only one.
In order to create a new collection, execute the below command. In our example, we start it as the Solr user so we can avoid any permission errors.
$ sudo su - solr -c "/opt/solr/bin/solr create -c gettingstarted -n data_driven_schema_configs"
As you can see in the command, ‘gettingstarted’ will be the name of the collection and ‘-n’ will specify your configest.
There are multiple config sets (3) supplied by Solr by default. In this case, we have used the one that is ‘schemaless’, that means any field is able to be supplied, using any name, the type can be assumed automatically.
Now you should have the collection added and may start adding data. Normally the default schema will have only one required field: ‘id’.
It will have no other default fields, only dynamic fields. If you’ would like to have a peek at the schema, everything is explained clearly in a file at ‘/opt/solr/server/solr/gettingstarted/conf/schema.xml’.
Step Four: Adding and Querying Documents
In this part of the article, you can checkout the Solr web interface and add a couple documents to the collection.
After you visit ‘http://server_ip_here:8983/solr’ with your web browser, you should see the Solr web interface.
This web interface contains a lot of useful information, this information can be used to debug any problems you might have during use.
Collections will be divided up into cores and this is partly why there are a bunch of references to cores in the web interface.
Right now, the collection ‘gettingstarted’ will contain only one core that is also named ‘gettingstarted’.
At the left-hand side, the Core Selector pull down menu should be visible, this will let you select ‘gettingstarted’ to look at more information.
Once you have selected the ‘gettingstarted’ core, choose ‘Documents’. ‘Documents’ will store the real data which will be searchable by Solr.
Since, in our example, we have used a schemaless configuration, we may use any field.
Add a single document using the following example, copy the following into the document(s) field.
{ "number": 1, "president": "George Washington", "birth_year": 1732, "death_year": 1799, "took_office": "1789-04-30", "left_office": "1797-03-04", "party": "No Party" } Press on Submit Document to add an additional document into the index. After a couple moments, you should see the following: Status: success Response: { "responseHeader": { "status": 0, "QTime": 509 } }
You may add as many documents as you would like, perhaps with a similar structure or a completely different one. You can also just continue with one document.
Next, select Query on the left side in order to query the document you have added.
In this screen you will see the default values, now click on ‘Execute Query’ and you should see 10 documents at most; note that this will, of course, depend on how many you have added
{ "responseHeader": { "status": 0, "QTime": 58, "params": { "q": "*:*", "indent": "true", "wt": "json", "_": "1436827539345" } }, "response": { "numFound": 1, "start": 0, "docs": [ { "number": [ 1 ], "president": [ "George Washington" ], "birth_year": [ 1732 ], "death_year": [ 1799 ], "took_office": [ "1789-04-30T00:00:00Z" ], "left_office": [ "1797-03-04T00:00:00Z" ], "party": [ "No Party" ], "id": "1ce12ed2-add9-4c65-aeb4-a3c6efb1c5d1", "_version_": 1506622425947701200 } ] } }
Conclusion
There are a lot of options available out there, however, you should now have Solr installed and can start using it for your own site.