It's relatively easy to add advanced, scalable search capabilities to almost any application these days. All you have to do is sprinkle in a few calls to a popular open source search service in your code. That is what I did to the rudimentary news feed micro-service that I have implemented many times. In this blog, you will learn how the two most popular open source search projects, ElasticSearch and Solr, compare with each other in terms of performance under load on this micro-service.
Here is the list of Frequently Asked Questions regarding the document at the bottom of this page.
Q: | Which versions did you compare? |
A: | Solr 5.3.1 (released Sept 2015) vs ElasticSearch 2.3.3 (released May 2016). |
Q: | Which implementation of the news feed did you use? |
A: | The scala version. |
Q: | You mentioned not being able to use the ElasticSearch java client due to a version incompatibility. Could you elaborate on that? |
A: | The latest version depends on a really old version of jackson-core while Finatra depends on a newer version that is not backwards compatible. |
Q: | You mentioned an already developed load test. Where is that? |
A: | For the first two hours, I ran the load test application with the third parameter percent-searches = 0. For the last hour, I ran this tool again with 100 percent-searches. |
Q: | Where can I learn more about how you measured the results? |
A: | See this CLI tool for extracting Solr or ElasticSearch statistics and this write up on how I collected and analyzed the data for this research. |
Q: | What is a soft commit? |
A: | The document has been commited to the in memory index but has not yet been fsynced back to disk. |
Q: | How real world were these tests? Are the results applicable to my situation? |
A: | The news feed micro-service is somewhat rudimentary and academic. The testing started from zero documents. The setups were the default, developer friendly ones. These factors lowered the cost of testing and did not decrease the accuracy of the results but they also lower the liklihood of relevancy to your specific situation. You should most probably conduct your own tests if performance is a critical consideration. |
Q: | It looks to me like Solr out performed ElasticSearch in just about every category. Does that mean you recommend Solr over ElasticSearch? |
A: | Not neccessarily. The feature sets of the two technologies are not identical. If you need a feature that is stronger in one than the other, then that might be the deciding factor. The ElasticSearch API has a better design, from a REST perspective, than the Solr API. Finally, ElasticSearch performance may be good enough for your situation. |
Q: | You implied that the difference in performance may be attributed to the difference in how the micro-service connected to the search engine. Doesn't that invalidate your results? |
A: | I don't think so. I am evaluating two software applications. That evaluation includes the client libraries. One library was unusable to me and could likely be unusable in your situation as well. If I have to fork the library in order to use it, then that is no longer a fair comparison. |