Glenn Engstrand

Using Solr/Lucene to Surface the Big Data of Social Media

On Wednesday, May 9, I will be speaking at the 2012 Lucene Revolution conference in Boston on Using Solr/Lucene to Surface the Big Data of Social Media. Solr is an open source technology most known for its capabilities as a search engine. Big Data is a recent IT trend where large amounts of data (both in terms of volume and rate) are collected and used. The amount of data is too large for a single relational database to handle. Social Media is any system where the users contribute content and express affinity for other content and those actions get published to the user’s social graph.

In this presentation, I will be focusing on how to use Solr as a kind of NoSql solution for Big Data. Topics will include scaling Solr both up and out, sharding, replication, caching, SOA, indexing, and synchronization. I will also give advise on how best to integrate Solr with other open source technologies such as Jetty, RabbitMQ, Spring, Ehcache, and HOWL.

If you are attending this conference, then I hope that you catch my presentation. If not, and you are interested in Solr as a NoSql solution, then be sure to check back with this blog topic as I will include links to the published slide deck afterwards.

Update May 14: Back from the conference. Many Apache commiters talked about Lucene 4. SolrCloud is Solr on ZooKeeper. Good keynote from Hortonworks. Lame keynote from Microsoft. O’Reilly published a nice review of the conference too.

Update May 18: Looks like the sponsoring organisation Lucid Imagination has published my presentation. Check out the other presentations too such as automata invasion and Solr Cloud.

Update June 10: Here is the video of my presentation in Boston last month.

Update June 17: This experience is based on my work at Zoosk which has now been documented on their developer blog.

Update October 2012: Version 4 of Solr/Lucene is now released for GA.

Comments are closed.