Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Curl, 17 years old today (haxx.se)
420 points by bagder on March 20, 2015 | hide | past | favorite | 95 comments


> "If it doesn't load through curl, it's broken." --someone

So, so true. Thanks, curl.


That's pretty much my own test for a Web based API - if I can drive it from the command line using curl then great, if I can't then it's broken.


Curl is great and all but I don't think this quote makes much sense. Are there examples of services that would, for example, load in a browser but not through curl? I'd think it's rather the opposite since you can through pretty much anything at curl and it will work.


Anything that requires JavaScript to fetch and display content.


Yeah, building web apps this way is a fad. I predict everyone will go back to full page reloads and server-rendered content.

/s


On a serious note, “progressive enhancement – still the right way”. I don't think client-side rendering is going away but everyone who's relied on it exclusively has learned the hard way that it's just too unreliable and slow to have a failure mode which is an empty page unless a lot of complex code works perfectly.

Just think about how many engineer-hours Twitter flushed with that silly #! kludge – and then when they switched back, saw an 80% improvement in page load time.


> Just think about how many engineer-hours Twitter flushed with that silly #! kludge – and then when they switched back, saw an 80% improvement in page load time.

No hours were wasted, and they didn't really switch back. They're just using HTML5's History API on browsers that support it now. Essentially the same mechanism under the hood, just prettier URLs for it.


They did more than just switch to the history API. During that period, if anything went wrong, you saw a blank page and, of course, robots saw only the generic launcher HTML instead of any content.

Now, here's what a tweet looks like without JavaScript enabled:

https://www.dropbox.com/s/me7kinvje7ly781/Screenshot%202015-...

Here's what it looks with JavaScript enabled:

https://www.dropbox.com/s/04pjdlkuht6t2ja/Screenshot%202015-...

(The main difference would be that things like the search & menus are either interactive controls or simple links to basic HTML forms depending whether JavaScript loads)

During the hashbang era you couldn't use a page without a full rendering ending. Now, however, all of the content is available with fairly rich markup:

https://redbot.org/?uri=https%3A%2F%2Ftwitter.com%2Facdha%2F...


I think he was making reference to the DOM rendering in the twitter website which used to be on client side until recently when they have switched back to server side rendering. But, it's very similar to pre-rendering React components on the server then mounting it so it might not be a total waste, that was a fad that didn't make a lot of sense for the time though.


>but everyone who's relied on it exclusively has learned the hard way that it's just too unreliable and slow to have a failure mode which is an empty page unless a lot of complex code works perfectly.

FUD. I develop HTML5 gambling for a living and this anti-javascript sentiment on HN is getting really tiresome. You honestly sound like a bunch of old people, complaining that a PC isn't a typewriter or a fountain pen.

Yeah your fountain pen doesn't require power and it writes your name really well, but that doesn't mean that the PC isn't better.

Client side rendering means you will have to test in all browsers, writing android apps means you have to test on a lot of units.


You rather spectacularly missed the point: I wasn't saying not to do the fancy stuff but rather to start with something which degrades well and then have your JavaScript enhance that basic experience.

If you want to know why this is a good idea, you should start using something like getsentry.com or errorception.com to record your JavaScript errors. That won't tell you who couldn't execute JavaScript at all but it'll show how many times something didn't load due to a flaky ISP, adware, buggy anti-virus, odd browser settings, etc. With progressive enhancement, those people still have a reasonable chance of at least seeing the content on the page. With a pure JS approach, they're only going to see a blank and will probably be heading over to a competitor whose site degrades well.

(Note that this is only the question of the site working at all. In most cases, the progressive site will also render considerably faster – Twitter found an 80% improvement! – since the pure-JS approach breaks the browser's prefetch optimizations and requires much more work to achieve comparable performance)


> I develop HTML5 gambling for a living and this anti-javascript sentiment on HN is getting really tiresome.

I've been using websites since the early 90s and this pro-single-page sentiment is getting really tiresome. You are breaking the web. You are destroying users' security.

Sure, there are plenty of reasons to use JavaScript, and plenty of places where it's appropriate. It probably is a good idea for games and so forth. But requiring users to load and execute constantly-changing code from across the web in order to read a page or submit a form is in-friggin-sane.

Some one else pointed out that it'd be nice if browsers offered more support for things that certain types of developers clearly want to do. I completely agree; it'd definitely be nice to take advantage of many of the technologies which currently exist to do more, in a more structured way. But requiring code execution in order to read data is madness.


I really wish building single-page apps was better supported by browsers. The current setup is to try to kludge over the fact that DOM is for documents, not apps, and use the awfulness that is CSS and current layout paradigms (note, no CSS is strictly worse than yes CSS, but that doesn't make CSS good).

Breaking away from the DOM but keeping the ability to remotely load code and static assets would be the best of both worlds. Something like Google Maps is an application, not a document. Why are we rendering it with a document renderer? Why are we styling it with CSS, which is again, document-oriented? The good part about single page apps is that they deliver a package of code, then are able to keep local state and communicate with the server over a stateless protocol. Oh, and the current breed runs on a platform (browsers) that is installed on every PC and mobile device. Browsers just need a better format for delivering packages of code and better support for these than "manipulate the DOM".


Your server should just be providing data that's displayed by the JS App. The JS app can handle a bunch of requests, but each request should provide something useful and be accessible from curl.

bonus, it makes it easier to write your app in other languages or for other platforms since the web server really is a server and the front-end is just a client.


The two are not necessarily mutually exclusive, especially with JavaScript frameworks you can run on the server to render the initial page, which then gets wired up to the client-side application. i.e. "Isomorphic JavaScript" [1][2]

[1] http://nerds.airbnb.com/isomorphic-javascript-future-web-app...

[2] http://isomorphic.net/


Speaking as someone who often uses text-mode browsers on the web, I wish this would actually happen.


Well somebody tell Google and Facebook, they're probably missing out on a lot of revenue because of their broken websites.


Website != API. A cleverly-written API can exist in the same space as a Website (even the same URLs, if you differentiate by things like Accept: headers), but they're not the same thing.

It's perfectly acceptable for Websites to include, and even require the use of, JavaScript. It's also perfectly acceptable to offer a JavaScript client for your API. In fact, you pretty much have to do that if you want your Website to work with the API anyway. What's not OK is for an API to require downloading and using additional JavaScript while the API is being used.

As a contrived (and somewhat ludicrous) example, let's say that I queried "http://foo.com/bars" for some kind of collection. It's OK to return the collection. It's even OK to use an HTTP redirect if the collection actually resides elsewhere: HTTP 302, perhaps, with a Location of "http://foo.com.bazes". What's not OK is to return a line of JavaScript reading "window.location = 'http://foo.com/bazes';" which might work for the browser, but wouldn't work for most other clients.


Broken != unprofitable.


I love curl so much. I just learned that you can 'copy to curl command' from the chrome inspector's network panel by right clicking on any request!!

I want to make a library that reads the curl command (and maybe request syntax?) and outputs a function that will do that command.


curl has a --libcurl option which you can add to a curl command line, and then it'll generate a libcurl-using C code template for that same operation...


That's incredibly useful, never knew that. Thanks!


Similarly you can right-click and "Copy as cURL" from the Network pane of Firefox's web developer tools.


You can also edit requests and resend them from the FF dev tools. Alas, you can't (yet) do that from the Chrome dev tools.



Of course for PHP, you can also just use the curl bindings ;)


So many languages supported, but no plain C?


As someone else mentioned, this is actually built into Curl with the "--libcurl file.c" option.


That is very nice!


Wow! Thank you, what a great project!


We just built something[1] similar, but it uses the HAR request object to describe the request.

[1] https://github.com/Mashape/httpsnippet


Wow—that's incredibly useful. I didn't know about it either. Thanks for the tip!


Can I just say thank you for all those hours of hard work the maintainers have put in over the years?


Yes, you can! Here's the link:

http://curl.haxx.se/donation.html


Good call!


    alias wget='echo "How dare you." && curl -O'
    brew rm wget
Happy birthday.


Also, we have aria2 already :P


I'm surprised it's so new. And wget is only a year older... what did people use before then?


Sometimes just plain telnet. It wasn't pretty, but it worked:

    $ telnet 80
    GET /
It should be noted that the above isn't a strict HTTP request header, it's missing quite a lot of detail, but it works as an example.


Add two newlines and that is a perfectly valid HTTP/0.9 request.


Pre curl/wget that was probably sufficient.

HTTP/1.1 wasn't around until 1996 so back then Host: headers weren't even "optional". They simply didn't exist yet.


That's what HTTP/0.9 requests look like!


Those were the early days of HTTP, so no, people didn't really get a lot of HTTP "manually" like that. We used other existing tools to get gopher and ftp though.

When I started my work on httpget in 1996, I didn't know about wget and I didn't become aware of it until several years later....


I would figure that by 1997 most people stopped using gopher/archive/veronica. I think I stopped by 1995 or so.

now days we use curl/wget for web dev it seems? probably used it for the same thing back then but I also image that lots of people just used their own ad-hoc scripts or whatever; i wish i had firsthand knowledge of that.

i used wget to download copies of websites when i found one that was mostly documents/images rather than webapps.


I use wget for plain, old regular downloads a lot. If I need to save a file to a specific place (e.g. datasheets to a doc/ directory in my project) and I have a terminal already open there (which I usually do), doing wget and middle-click is far more convenient than dealing with the browsers' "Save to..." dialog. Especially ever since they started considering a sane download manager to be one of those confusing features that users totally don't want.


> I use wget for plain, old regular downloads a lot.

I loved wget for that. what are your options? Mine was always -r -p -k -nH --cut-dirs=2 -np


In the HTTP/1.0 days I used telnet quite a bit. Or libwww-perl, when Perl was the language of choice for web applications. LWP has copyright notice:

Copyright 1995-2009, Gisle Aas

Copyright 1995, Martijn Koster

Round about then I wrote something similar to WWW::Mechanize in order to automate access to the Orange Web-SMS gateway, which had an inconveniently deep login procedure.


Perl? I have only the vaguest recollection of those days. It looks like LWP is about 20 years old[1]. It's based on libwww which goes back to 1992[2].

[1] http://search.cpan.org/~ether/libwww-perl/ (see changelog)

[2] https://en.wikipedia.org/wiki/Libwww


`lynx -dump` was a very useful tool.


I had to use that the other year whilst attempting to migrate an old old old Slackware box running on two Pentium 3s with an array of SCSI disks (so quite fancy for its time!). The machine had kernel sources that were different to the kernel running so no chance of compiling various bits of software, and I couldn't grab any GCC (EGCS?) binaries from anywhere.

I can't remember what I eventually did but I do remember discovering lynx -dump, so thanks for the reminder!


I still can't stop myself typing `lynx -mime_header` to see server headers!


why not

  lynx -head -dump http://httpbin.org
Your version dumps the body as well, which is hard to parse if you just need the headers.

On another note: Thanks for curl - it's always high up in my charts somehow. Right now #2 in

  history 0 | awk '{print $2}' | sort | uniq -c | sort -n -r | head -n 20


How long have GET/POST commands been around? Probably not as long, but just curious.


The first documented version of HTTP was HTTP V0.9 (1991) has only GET: http://www.w3.org/pub/WWW/Protocols/HTTP/AsImplemented.html

Basic HTTP as defined in 1992 had GET, PUT, HEAD, POST, LINK, TEXTSEARCH, CHECKIN, etc.: http://www.w3.org/Protocols/HTTP/Methods.html

More generic info: http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol


I believe they originally come from the libwww-perl project? According to wikipedia it was first released with perl 4.0036, which was in Februrary 1993, but I can't find out if it included GET and HEAD etc already then...


They're mentioned in a February 1996 changelog entry for libwww-perl, so at least 19 years.


RFC 1945 dated may 1996 mentions GET and POST methods

http://www.rfc-base.org/txt/rfc-1945.txt

HTTP has been in use by the World-Wide Web global information initiative since 1990. This specification reflects common usage of the protocol referred to as "HTTP/1.0".


I used netcat - and still do for interacting with protocols in ways that curl doesn't support. AFAIK it was released in 1995, so it's only three years older than curl.


telnet and get


It's slightly weird to me that I am older than Curl and Wget. They always seemed like Unix Monoliths to me; I had just assumed they had always existed.


THIS. I have the exact same feelings. Same with the 'Kubuntu is 10 years old' link on the homepage. I think we just grew up at the perfect time!


Curl is great, but I also recently came across HTTPie (https://github.com/jakubroztocil/httpie) which has some nice features for playing around with HTTP APIs (JSON formatting, syntax highlighting, etc)


I abuse curl most weeks. How many more web apps would fail at a header only request, if not for prodding from a curl user.


Daniel Stenberg, the maintainer, was also on the packet pusher podcast recently, talking about HTTP/2 - http://packetpushers.net/show-224-http2-its-the-biggest-netw...


I was wgetting all my http requests until about 2 years ago. Curl's undeniable coolness won me over after 15 years. Now I practically live in curl when I'm setting up webservices, and libcurl for PHP does something on nearly every page request I have.

For PHP people, curl_multi_exec is the new event loop.


> Rough estimates say we may have a billion users already.

This cannot be true by a long shot. Or am I missing something?


It all depends on how you count and what a "user" is of course, but this little list helps to give a picture: http://curl.haxx.se/docs/companies.html


The most common language used on websites is PHP which uses libcurl to handle HTTP requests.


PHP applications may use libcurl when making outbound HTTP requests, right? That is only a small subset of these web sites. Most web sites just normally process HTTP requests.

This does not seem to explain the 1 billion users figure.


WordPress uses libcurl, according to a google search there are 74,652,825 sites using WordPress. How many unique visitors do you think these sites get? I would guess over a billion.


Oh come on, you just don't get it do you?

Just because Wordpress, which is a blog platform, uses libcurl for something you didn't even state (probably some outbound http stuff), doesn't mean it uses it to process most of the incoming requests for the millions of users.

We don't say Solitare is the most successful game ever just because it's installed with every copy of Windows.


If you count a user as a system which has curl installed, and has run it at some point, is curl on iOS and Android? If so, you will be in that ball park I imagine.


I've seen it in various toasters, smart TVs and other embedded devices...so this is also a large "user" base.


That wouldn't be that surprising depending on what they mean. Assuming that Google or Facebook use curl in various applications, 1 billion sounds about right.


Depends if you're defining that as someone who has manually run a cURL command or as someone who has used an app/hardware that uses cURL "behind the scenes". If you use the latter I would say 1 billion might be a significant underestimate.


Isn't it installed on OS X by default?


I don't know but just having it installed as part of OS distribution doesn't really count as "user", now does it?


curl is just awesome, thanks so much to the author and all maintainers over the years! It's still my go to application for testing and debugging HTTP requests.


Happy bday!


cUrl, I am as old as you <3


Curl is pretty great to use, but be warned it has old smelly code and is probably full of security issues. I wouldn't use it for anything too critical.


That's sufficiently vague to be useless. Would you care to elaborate?


And still better than wget, which only does http 1.0 and this has problems due to lacking a 'host' header. Curl just works.


> And still better than wget, which only does http 1.0 and this has problems due to lacking a 'host' header.

Well I had to check, but this is not true at all:

   $ nc -l -p 9999
   GET / HTTP/1.1
   User-Agent: Wget/1.15 (linux-gnu)
   Accept: */*
   Host: localhost:9999
   Connection: Keep-Alive
Also wget can recursively mirror webpages and there are nice options to carefully select contents you want to download. It's quite dated though, I wish it could use an external downloader (like aria2) and only do the walking and converting links part itself.


Do note that just a plain wget command isn't enough to properly archive/"back up" a webpage, because it doesn't save headers and other important metadata. Make sure to always use WARCs, which preserve metadata and have wonderful support, including the possibility to get absorbed into the Wayback Machine. More info:

http://www.archiveteam.org/index.php?title=Wget_with_WARC_ou...

http://www.archiveteam.org/index.php?title=The_WARC_Ecosyste...


That mirroring option in wget has been very useful over the years, even more so if you remember to add --no-parent, else you'll end up with the ENTIRE site.


Sorry I stand corrected.

I stumbled upon this info when writing a redirector app recently. You're right, I should have verified that it was true.


wget to make a static copy of a site is the one thing I use it for. It's awesome. I tend to reach for curl first in most other cases, for one thing it's included in the "minimal" RHEL installation so I can pretty much depend on it always being there, whereas wget is extra.


I love aria2c

I use it all the time even as the down loader for ArchLinux's pacman (package manager).


It's also used internally in apt-fast[1] and speeds up the process by an order of magnitude for me.

[1] https://github.com/ilikenwf/apt-fast


multiple connection/source downloads is the most significant feature. Great program.



curl supports FTP, FTPS, Gopher, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, LDAP, LDAPS, FILE, POP3, IMAP, SMB/CIFS, SMTP, RTMP and RTSP. Wget only supports HTTP, HTTPS and FTP.

Ha! Good to know :)


Being this misinformed takes effort. I take it you haven't used wget the last decade?


It was a simple mistake, but the 1.1 support was added in 2011: http://en.m.wikipedia.org/wiki/Wget




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: