Then, it seems, a good solution to solve the problem is to have server owner to ...

mintplant · on Sept 24, 2013

> Then, it seems, a good solution to solve the problem is to have server owner to declare in advance what are intended use and what's not.

You mean like the Terms of Use for the AT&T website?

http://www.att.com/gen/general?pid=11561#14

drdaeman · on Sept 24, 2013

Sort of, but in machine-readable form and under well-known location (like /robots.txt) so you could read and comply with them before you access the site.

As for those exact terms, I suspect (IANAL) those exact terms prohibit almost any access to the site, as, for example, they forbid any programmatic access to obtain the information, and I haven't heard of any non-software user-agent implementations.

gtirloni · on Sept 25, 2013

You can translate "programmatic" as "automated" as in "someone coded a program/tool to, in a programmatic way, access the website and retrieve the data"

As opposed to a human being in a non-programmatic way, opening his browser and accessing the website.

What's so hard about it?

drdaeman · on Sept 25, 2013

> someone coded a program/tool to, in a programmatic way, access the website and retrieve the data

Doesn't, for example, Firefox, perfectly fit this description? Yes, I do manually enter the base URL to access, but if that's the distinctive feature...

> As opposed to a human being in a non-programmatic way, opening his browser and accessing the website.

... then manually typing in ./scrape.py www.att.com is non-programmatic, too. :)

Or, maybe, I'm not getting the correct meaning of "automated" due to bad English comprehension and false analogies from other languages. But I always thought every request on the Internet is automated and done by some kind of hardware+software combo, so forbidding "programmatic" access is complete nonsense (access control and rate-limiting are the proper solutions).

(And, if that matters, author of scrape.py does not need to conform to AT&T's TOS if s/he don't actually use the script by themself.)

ajays · on Sept 25, 2013

Wait: so before accessing a website I have to go read its terms of use?

What if I set up a website, put a clause saying "you agree to pay $50/page view" in there, and hid it away. Google crawlers will find my site in no time, and then I can start raking the dollars in, right?

mintplant · on Sept 27, 2013

No: such a clause wouldn't be enforceable in that context.