Bob Koure
Apr 28, 2023

--

There was an article in WaPo some weeks back detailing the places being data mined for Large Language training models.

Medium was in there. I would guess they were web scraping (if it's on the Internet where humans can read it, search engines (and other automated processes) can too. There are ways for a website to tell robots not to read (usually robots.txt or a X-Robots 'tag' in the HTML, but there are other ways) - but that requires the robot actually respect the indicator (sort of a 'please don't walk on the grass' sign).

My point? It's already happened. The question is more about how to more forward.

--

--

Bob Koure
Bob Koure

Written by Bob Koure

Retired software architect, statistical analyst, hotel mgr, bike racer, distance swimmer. Photographer. Amateur historian. Avid reader. Home cook. Never-FBer

No responses yet