Using Stats to Not Break Search

At Envato (where I work) we’ve recently started a development blog called webuild.envato.com for everyone that helps build our various sites - developers, designers, product people, ops, whatever.

I wrote one of the launch articles: “Using Stats to Not Break Search”. It’s about the statistical approach we used in our work moving from Solr to elasticsearch to test that search relevancy hadn’t been broken.

How do you change around pretty much everything in your search backend, but still remain confident that nothing has broken? (at least not in a way you didn’t expect).

We can use statistics to do that. In particular, a technique called Spearman’s rank correlation coefficient. Lets have a look at it, and see how we can use it to compare search results before and after a change to make sure relevancy rankings haven’t gotten screwed up in the process.

Go and check it out