One of the annual conferences that I always look forward to is Lucene/Solr Revolution. The reason is not only because of highly technical nature of the conference, but also you can get a glimpse of how future of search is evolving in the open and how Solr is being pushed to its limits to handle the big data use cases. The key themes that emerged out of the different tracks for me are inferring the user intent (what does the user intends to say on that search box rather than searching with just the keywords), analytics and different relevancy approaches.
Trey Grainger Talk — Leveraging Lucene/Solr as Knowledge Graph and Intent Engine
This is one of my favorite talk from the author of Solr in Action book hands down the best one and each slide has some key insights. The main idea is how CareerBuilder has built an intent engine & knowledge graph using solr index & query logs alone. Based on the graph how one can semantically determine the user intent with additional techniques.
- user really want to search on “things” and rather than the strings.
Intent Engine & Knowledge Graph
- We can generate a domain-specific phrases by mining the query logs for commonly searched phrases
- identifying the entities on incoming query using SolrTextTagger (Lucene FST)
- enriching the query with related concepts based on semantics
For people who wants to delve deeper into the semantic search & query augmentation definitely check out Trey Grainger previous talks
Learning to Rank(Machine Learned Ranking) in Solr — Bloomberg
This is an extension of Trey Grainger’s talk and my first introduction to learning to rank (LTR) systems. In IR systems once you have obtained the x documents that matches the user query how can you rerank them using intelligent machine learning models. Bloomberg team has integrated a reranking component into Solr so that others can build their own LTR and have automatic relevancy tuning using the models.
Solr Troubleshooting- Tree map
Alexandre one of the maintener of solr-start.com site(very useful to learn about the UpdateProcessor components & Analyzers) gave a talk on trouble shooting solr. He talked about a tree-map approach which is adapted from RootCauseAnalysis to isolate/debug the issues that we face in Solr. We can use the approaches that he described to have a mental model & narrow down the area where it happens instead of saying query X does not return document Y.
Etsy’s talk – Search Ranking for Fairness
Fiona Condon from Etsy’ Search Experience team brought up a different perspective on relevancy on how to ensure the fairness & uniqueness about the search results on their e-commerce site. They focussed on how to avoid sellers gaming the system to be on top of result set. Some keytake aways are to invest more on small changes to the Default scoring system, feedback look on ranking changes.
Chris Bredesen talk “Evolving to Open” on how Red Hat Customer Portal transitioned from GSA to Apache Solr and the challenges & lessons learnt.
There are couple of other interesting talks that I attended.
Ted Sullivan on “
Searching for Better Code Grant shared how Lucidworks engineering team uses their own Fusion product as part of their dogfooding efforts. Some use cases like using Fusion for Search & Analysis of their Code, Jira integration etc.
Timothy Potter on Solr & Spark (more modern alternative to hadoop) and how Solr can be used as Spark SQL Data source. He described how Fusion along with Spark can be used for dist.aggregation jobs for use cases like top queries, top documents.
Joel Bernstein talk on the upcoming parallel computing capabilities with SolrCloud.
There are additional talks which I am yet to explore and waiting for the remaining videos. In the mean time check out the Slides and Videos from the conference. Thanks to Lucidworks for organizing a fantastic conference and thanks to my employer Red Hat for sending me to the conference.