Righteous Wrath Online Community

General => Tech Chat => Topic started by: Darren Dirt on July 27, 2016, 07:50:58 PM

Title: IMPORT.IO -- extract tabular data from almost any webpage
Post by: Darren Dirt on July 27, 2016, 07:50:58 PM
 
https://www.import.io/

Apparently just paste in a URL and it will use some algorithmic smarts to figure out what of its contents are relatively structured data.


found via http://blog.silk.co/post/142737643047/data-journalism-tools-part-1-extracting-and



PS wow I had no idea there were *so many* "data scraper" services and tools out there! A bunch of them and intriguing features are described here (including dealing with pagination as well as infinite-scroll)
https://www.import.io/post/great-alternatives-to-every-feature-youll-miss-from-kimono-labs/

Title: Re: IMPORT.IO -- extract tabular data from almost any webpage
Post by: Mr. Analog on July 28, 2016, 09:23:22 AM
There are a plenty available because they are primary used by unscrupulous bot scripters / datamining sites to either collect data and create "aggregator" sites / keyword scraping for bots

There is a surprisingly sophisticated world of bots out there just waiting for content creators to tag certain things or use certain words, particularly for social media sites that don't offer "trending" information in their public APIs

For example: if you put "MILF" in a tumblr post on a sufficiently popular blog you will immediately be followed by bots, it's amazing to watch