While exposing data to developers through API is getting more typical, most of the data found on the internet is only available through raw HTML, often mixed in seemingly chaotic tags. This talk aims to be a quick introduction for the data scientist to politely extract data from a website and store it in a structured database with the help of the Python library Scrapy, and how one might extend it to fits their specific needs.

Comments

Comments are closed.