Hadoop, MapReduce, Weka & Python Pandas, Oh My? A Data Mining and Machine Learning Primer


Comments are closed.

I felt like this talk was full of lists and lists of data mining technologies, but never actually stopped to explain what they are or demonstrate how they work. Left with no valuable takeaways.

Anonymous at 21:35 on 8 May 2015

Just to respond to Stu's comment which I think it great feedback thanks! I didn't want anyone to walk away feeling they wasted their time and I am sensitive to that so you have my sympathy. I tried to communicate the nature of my talk in the description.

I did rate this talk as a "Beginner's" talk in the OpenWest rating system and the description of the talk is at the end of this comment. I apologize that you left feeling unfulfilled but your description was exactly the point of the talk -- to give an survey/overview of the field to newcomers.

If you had wanted a deep dive on anything I was familiar with, I believe I indicated my willingness to stay as long as needed after the set time to answer questions. I will take you comment and improve my next talk thanks again!

"From old school Java-based toolkits like Weka to the latest and greatest toolkit for machine learning like Python Pandas, here's what you need to know for an introduction to the world of Big Data and Machine Learning. Geared towards students and professionals who need to understand the basics of this topic, we will present various concepts from cluster-based file systems, "Not Only SQL" Databases, Map Reduce algorithms and the range of tools for machine learning and data mining including Weka, Python Pandas, R and many other Python toolkits. "

Oh and for anyone paying attention thanks to the guys who corrected me on Pandas, I got that wrong in my talk and my slides, it's not a ML toolkit it's a math toolkit mea culpa.

James had a lot of enthusiasm for the topic, but I'm afraid I have to echo the concern that while a TON of great tools were listed, there was not much depth on what the tools are really for, how they work, how to use them, etc.

In the future, I would maybe try organizing the talk by starting with the basic Big Data concepts (ML, Map-Reduce, cluster-file systems), having a bit more description and explanation of each concept, and then quickly mention the most popular tools that use it. Trying to list out every tool (Hive, Pig, Mahout, YARN, etc, etc, etc) in 45 minutes is just too overwhelming (buzzword soup!).

Please take this as constructive criticism, I did enjoy the talk. Thanks for presenting! I intend to check out some the Utah Big Data events in the future.

Basic but informative.

Anonymous at 12:12 on 11 May 2015

Good overview of important programs. Useful information that inspired me to look into the programs with which I am not as familiar. Fluid speaking style, kept our attention. Would attend again.