Naive Bayes Categorisation (with some help from Elasticsearch)

Back in November, I gave a talk during one of the Friday Hackers and Painters sessions at Plug-in@Block 71, aptly titled “How I do categorisation and some naive bayes sh*t” by Calvin Cheng. I promised I’d write a follow-up blog post with the materials I presented during the talk, so here it is.

My Quora Codesprint Submission

(this is x-posted on Quora)

I’ve had some experience in the past with machine learning, but I feel like I still don’t have a proper methodology. I’d like to hear what you guys think about what I’ve done here.

Vectorisation and the logsumexp trick

Having been played around and writing up some simple machine learning algorithms the last couple of months, I've decided to write about some useful tricks of the trade that I've learnt. These are are not specific to any algorithm, but stuff that I've found are useful. regardless of the kind of approach you intend to take. Many machine learning techniques can be performed as a series of matrix operations. As as a result, it isn't too uncommon to see the term ``vectorisation'' being thrown about in a machine learning related paper.

