Talk in English - UK at php Central Europe Conference 2017
Track Name:
Guru
Short URL: https://joind.in/talk/b4818
(QR-Code (opens in new window))
I will share with you a story about hashes, what they're good at and what they're bad at. Most importantly how to use them in a not-so-typical way.
I was faced with a challenge to search a database of questions (about 2 million records) and find duplicates among them. It may look like a pretty simple problem, but doing this efficiently was not trivial. I will explain the algorithms used, discuss their benefits, and show you how I tweaked them to our needs. My main topic will be MinHash and LSH, with a little reminder about general hashing algorithms.
Comments
Comments are closed.
I was expecting a bit more clear explanation of the ideas.
It would help to have concrete metrics: what was the volume of data, what were the processing time, how much time was saved with each algorithm and each improvements?
It was ok but for me when a lecture is a case study of a problem & solution I need more research, e.g. which other approaches were considered. Simple "I haven't come up with anything else" is not enough for me.