There was an article in WaPo some weeks back detailing the places being data mined for Large…

Apr 28, 2023

There was an article in WaPo some weeks back detailing the places being data mined for Large Language training models.

Medium was in there. I would guess they were web scraping (if it's on the Internet where humans can read it, search engines (and other automated processes) can too. There are ways for a website to tell robots not to read (usually robots.txt or a X-Robots 'tag' in the HTML, but there are other ways) - but that requires the robot actually respect the indicator (sort of a 'please don't walk on the grass' sign).

My point? It's already happened. The question is more about how to more forward.

Written by Bob Koure

No responses yet