Hadoop is a big data technology suite from the Apache Foundation that provides a range of tools for storing, interacting with and manipulating large data sets, or to help solve big data problems which might relate to the structure of the data just as much as the size of it. With many companies approaching points where they need to be able to handle large amounts of data, either right now or looking forward in order to scale, Hadoop is one of the core technologies that can help aid you there. In this talk I’ll provide an outline of HDFS, Hive and Spark, three of the core Hadoop tools, and how you can use them; the differences between Hadoop and other technologies such as Elasticsearch; about Presto, a distributed SQL query engine that will allow you to query your big data clusters (HDFS, MySQL, PostgreSQL or Cassandra alike) with simple SQL queries. And finally, I’ll talk about how you can utilise these resources from within your PHP application, allowing your platform to interact with your huge volumes of data without having to copy your business logic into another language.

Comments

Comments are closed.

Pieter Gerber at 13:17 on 27 Sep 2018

Very interesting topic and well presented. Well done Michael.

Enjoyed the talk. Would've loved to see more detail on the hadoop, but it being such a wide topic it's unfair to expect that in such a short timeframe. What I was particularly interested in was Presto. Well done! Thank you.

Andre Smith at 13:58 on 27 Sep 2018

Michael seems very knowledgeable. The talk did seem a bit more rushed than his first one earlier today. But he is a very good public speaker, and conveys his message very clearly and understandably.

Micheal seems very passionate about Hadoop and it shows in this talk.

William Stam at 09:07 on 28 Sep 2018

thank you for a very interesting talk! it was far far too short tho :( it left me wanting to know more tho! next talk you get 3 days for the intro, deal?

Liam Norman at 00:26 on 29 Sep 2018

Really good talk and great delivery, Hadoop looks like a very good solution to scaling with big data.

Justin Fossey at 13:31 on 2 Oct 2018

Learning about Hadoop was just amazing, when I looked at it in the passed myself I often found myself confused and struggled to see how all the pieces fit together.

Michael really did a great job in providing the history and how all the different pieces fitted together and explaining how Hadoop is not one thing, but lots of different things arranged together in different ways.

Its unfortunate he had to leave early and could not be part of the speaker panel as I think his topics and experiences would have been a great addition to the discussion.

I think the only issue I have was that he unfortunately ran out of time for questions. The size and scope of the topic really leaves you with a few unanswered questions, but otherwise a really great talk.