Hadoop is a big data technology suite from the Apache Foundation that provides a range of tools for storing, interacting with and manipulating large data sets, or to help solve big data problems which might relate to the structure of the data just as much as the size of it. With many companies approaching points where they need to be able to handle large amounts of data, either right now or looking forward in order to scale, Hadoop is one of the core technologies that can help aid you there. In this talk I'll provide an outline of HDFS, Hive and Spark, three of the core Hadoop tools, and how you can use them; the differences between Hadoop and other technologies such as Elasticsearch; about Presto, a distributed SQL query engine that will allow you to query your big data clusters (HDFS, MySQL, PostgreSQL or Cassandra alike) with simple SQL queries. And finally, I'll talk about how you can utilise these resources from within your PHP application, allowing your platform to interact with your huge volumes of data without having to copy your business logic into another language.


Comments are closed.

Alistair Shaw at 13:18 on 30 Sep 2017

Fantastic overview of Hadoop and how to interact with it in PHP. Will be playing with it myself next week :)

Ian Smith at 14:00 on 30 Sep 2017

I came into this talk a complete beginner to Hadoop. So I'm not sure the talk was aimed at me.

Michael is clearly very knowledgeable and covered a lot of ground in the talk. I would have loved to have seen a common data example that could be followed through, culminating in how to interact with it using Phresto. As a summary of services the talk was great.

Jeroen v.d. Gulik at 14:16 on 30 Sep 2017

Great introduction to Hadoop. Good delivery, with clear explanation of the concepts.

Nathan Dunn at 15:43 on 30 Sep 2017

An insightful talk running through the basics of Hadoop. As a complete beginner to Hadoop, I now understand the basic principles and was great to see how fast it was at processing large amounts of data. The only thing missing was a real-world example of the kinds of data you can aggregate using Hadoop.

Informative talk, but the demo felt a little rough and a few concepts were a little rushed over. I think more detail over a few less aspects would have hit the nail on the head.

This talk was a good introduction to Hadoop. The explanation of the database and how it interacts with the distributed file system was a little unclear but apart from this I felt like everything else was presented well.

Simon R Jones at 11:09 on 1 Oct 2017

Great overview of Hadoop, some of it went a bit over my head - there was a lot of tech being explained - but it’s a really interesting area. Some more practical examples may have been useful.

Chris Emerson at 16:26 on 1 Oct 2017

A well presented talk on Hadoop and its using with PHP. I'm not sure I have any use cases for it but came just out of interest as I have no knowledge of Hadoop at all, and did learn a lot about what it is and what it is capable of. Something to keep in mind for future use if the right application comes up.

Mark Railton at 21:14 on 1 Oct 2017

Really insightful overview of Hadoop and how to use it with PHP, tho I felt it was somewhat too much of a topic to fit into a 45 min talk and it felt rushed.

Great talk about hadoop. Nice demos. And thank you for phresto library.

Your talk has piqued my interest in the hadoop cluster at work; thanks for the introduction to the technology stack behind hadoop!

Erik Smit at 16:54 on 8 Oct 2017

Great and clear introduction to Hadoop. Gave me some nice inside info on multi source query possibilities. It was a lot of information in 45 minutes.