@all: Thank you very much for the feedback, it's very important to me so I can see which areas I need to work on, and what explanations needs to be made more approachable. I'll try to address each comment individually now.
@Cedric: The magic of Diffbot is that it uses visual extraction of data - it doesn't rely only on HTML markup, but tries to learn and recognize what's important to humans. It gets better over time, so much so that even if the markup changes (essentially breaking the API in competitors), Diffbot will know to adapt to these changes and keep everything working.
@Romain: Thanks!
@Andre: I would be happy to clarify more? Is there anything in particular that confuses you?
@Rodrigue: Not sure what you mean, could you clarify? The link to diffbot is diffbot.com or @diffbot on Twitter, if that's what you mean, and the slides and source code links are coming soon, as soon as I regain proper internet access at home.
@Dits: See my comment to Cedric above, that explains a part of the machine learning. Why do you think it's a big drawback that it's commercial? You can have a free god token for two weeks, and you can also apply for a permanently free token if you have an open source or educational project. Get in touch with me if you'd like to know more about that.
It's also important to note that Diffbot also has a database, and a mega-crawler called Crawlbot (more on that on another occasion). You can unleash it on an entire domain and it will return the entire harvested content. Not only that, but it will also ignore landing pages and other trivial and unimportant pages, recognizing them from previous experience, making sure you only get back what you really need. What's more, this data is then saved in both JSON and markup format on Diffbot's servers, and you can fast-search it Elasticsearch-style after it's done, without having to re-crawl. You're essentially getting an entire database backend. This is why it's a commercial product - it needs to cover these costs somehow.
@all: I'm sorry if I focused on Diffbot too much in my talk - it was adapted from a workshop format, and the other aspects of my automation (which I didn't have the chance to cover here), use different tools. I picked the first aspect because it was most diverse (using the widest array of different technologies), but I could have just as easily skipped Diffbot and talked about something else (for example, inovice generation uses Swiftmailer, HTML2PDF, Gearman, Symfony's EventDispatcher, etc). Stay tuned for the code, I'll post the link here, it should be very interesting to everyone who attended the talk.
Once again, thanks for the feedback!
Merci à tous pour vos retours. :) N'hésitez pas à tester cette techno qui vaut vraiment le détour ! Et si vous voulez plus d'infos, je suis dispo sur twitter. :)
Ubermuda, bien meilleur sur scène que devant un baby-foot. :P
Je ne connaissais pas “12 factor" mais j'ai beaucoup aimé les explications de chacun des différents points. :) C'était très clair et le message est bien passé : "ce ne sont pas des lois mais des précos."
Un retour d'expérience honnête, des choix pragmatiques, un oeil critique et des chiffres impressionnants ! J'ai adoré. :)
Génial, a refaire si l'occasion est presente, trés bonne approche !
Hamza
Nous avons le même genre de problématique actuellement et ce retour d'expérience était juste nickel. :D
Kenny, je te hais. :D
(mais j'ai adoré. ^^)
Super retour d'expérience, ça fait rêver et ça donne envie. :)
OK le code.JS c'est une techno web et y a un rapport mais au final j'aurais aimé qu'on parle un peu de PHP dans cette présentation. Donc conclusion : la cohabitation on ne l'a pas vue et c'est bien dommage.