Automate your Workflow: Removing Tedium in Everyday Work


Comments are closed.

Anonymous at 17:33 on 23 Oct 2014

Don't understand where is AI of diffbot. Chrome devtools + xslt

Interesting talk.

Not sure to understand all about Diffbot, but really appreciate the approach of automating repetitive and tedious task.
And I was glad to see the PHP editor for SitePoint, a good place to find articles on PHP and other web stuff.

Well, while diffbot seems a nice piece of software, I didn't see the link with php or event development workflow...

I'm not sure to understand all about Diffbot but i'll definitely take a look at it, for generating API easily based on urls.

The only (big) drawback is that it seems to be a commercial product (and not the cheaper one).
This is understandable, but is the Forum Php made for that ?

Don't really understand where is the "machine learning part"

@all: Thank you very much for the feedback, it's very important to me so I can see which areas I need to work on, and what explanations needs to be made more approachable. I'll try to address each comment individually now.

@Cedric: The magic of Diffbot is that it uses visual extraction of data - it doesn't rely only on HTML markup, but tries to learn and recognize what's important to humans. It gets better over time, so much so that even if the markup changes (essentially breaking the API in competitors), Diffbot will know to adapt to these changes and keep everything working.

@Romain: Thanks!

@Andre: I would be happy to clarify more? Is there anything in particular that confuses you?

@Rodrigue: Not sure what you mean, could you clarify? The link to diffbot is or @diffbot on Twitter, if that's what you mean, and the slides and source code links are coming soon, as soon as I regain proper internet access at home.

@Dits: See my comment to Cedric above, that explains a part of the machine learning. Why do you think it's a big drawback that it's commercial? You can have a free god token for two weeks, and you can also apply for a permanently free token if you have an open source or educational project. Get in touch with me if you'd like to know more about that.

It's also important to note that Diffbot also has a database, and a mega-crawler called Crawlbot (more on that on another occasion). You can unleash it on an entire domain and it will return the entire harvested content. Not only that, but it will also ignore landing pages and other trivial and unimportant pages, recognizing them from previous experience, making sure you only get back what you really need. What's more, this data is then saved in both JSON and markup format on Diffbot's servers, and you can fast-search it Elasticsearch-style after it's done, without having to re-crawl. You're essentially getting an entire database backend. This is why it's a commercial product - it needs to cover these costs somehow.

@all: I'm sorry if I focused on Diffbot too much in my talk - it was adapted from a workshop format, and the other aspects of my automation (which I didn't have the chance to cover here), use different tools. I picked the first aspect because it was most diverse (using the widest array of different technologies), but I could have just as easily skipped Diffbot and talked about something else (for example, inovice generation uses Swiftmailer, HTML2PDF, Gearman, Symfony's EventDispatcher, etc). Stay tuned for the code, I'll post the link here, it should be very interesting to everyone who attended the talk.

Once again, thanks for the feedback!

Anonymous at 21:24 on 24 Oct 2014

Nice talk, could have been better if not so formal.
Good insight of how you deal with your job as an editor for a very good tech site.

Thanks for the presentation,

I must admit I came just by reading the author and the topic as I'm a regular reader of SitePoint - that contains very interesting articles. The topic sounded good also, however, I did not read that it was almost all about Diffbot!

I still appreciated the talk in a perfect well understandable English because my company has a similar usage for a quite different topic (checking product prices) but I must admit that, as @Dits, I think it's a bit expensive unfortunately, there's no chance we can use it.

And regarding what said @Rodrigue, I think he wanted to know why was it presented in a PHP conference. Probably because it's developed in PHP? But I must admit I do not remember seeing any PHP there.

Thank for this product presentation.

@nelson Thanks! What do you mean by formal exactly?

@guoillaume Thanks for the feedback! I'm sorry if it felt too Diffbot focused, and the reason it was in the presentation is twofold: 1. I'm genuinely curious about what people can build with it and 2. while it isn't built in PHP, it is perfect for combining it with it due to the simplicity of libraries like Guzzle. It removed a huge part of my work after I combined it with PHP, and that was the gist of it.

I showed you some PHP in the shape of Guzzle and Laravel implementation, but this wasn't supposed to be an in depth tutorial. Stay tuned for slides, more code will follow. I'll definitely try and go more in depth with code next time I talk about this though, thanks for letting me know.

Maybe your talk would have been more appropriate if it was more about your job as an editor (choice of subjects, things like that) for a very good tech site

@alexis: hmm, interesting, thanks. Could you go into more specifics on what you'd like to hear about that?

This talk provided us with an amazing glance at the current state of automation made available to users.
I wish there would be even more liberation of the server code though.

Thank you, Thierry! Much appreciated