How The Guardian successfully moved its domain to theguardian.com

lawl · on Feb 18, 2014

The only clever thing they did was this:

> the Identity team started laying cookies on www.theguardian.com in advance. This was a nice touch because it meant that visitors would still be logged into the site when we eventually changed domain.

Everything else? Yeah uhm, not very interesting. As they wrote themselves, there's a thing called 301 - permanently moved.

mcherm · on Feb 18, 2014

No, there was one other interesting point.

Management wanted a "big splash" public rollout. Development teams wanted to avoid a "big bang" development effort. They solved this by going live many months ahead of time but ONLY for clients using special headers. This allowed anyone to test the system while still not making it "public" until the day of the big reveal.

I had not previously heard of that particular technique (using special HTTP headers) and it's a useful one.

jcampbell1 · on Feb 18, 2014

> I had not previously heard of that particular technique (using special HTTP headers) and it's a useful one.

I am pretty sure by "special HTTP headers", they mean cookies.

Toucan · on Feb 18, 2014

I wouldn't be surprised if they were just manually adding a header.

I've seen it done like that before, using a browser extension to add a header and then mod_rewrite to apply a special set of rules if that exists.

watson · on Feb 19, 2014

The article specifically mentions how this special HTTP header was implemented client side using a browser extension and how it's handled server side

mcherm · on Feb 19, 2014

I doubt it. I think they've actually just made up an HTTP header.

(Cookies wouldn't work anyway... how would you place them? What about expiration? How would they interact with existing cookies? What if you have to clear your cookies while debugging? Whereas a custom header requires a browser plugin, but otherwise is innocuous.)

aroch · on Feb 18, 2014

More likely a useragent change

room271 · on Feb 18, 2014

I think you are slightly underplaying the challenges of making one of the biggest domain changes ever.

Yes, the technical details are not too complex. But the risk is massive and the legwork still considerable.

If you work for a very large website and want to change domain you will have been following The Guardian's move closely.

sopooneo · on Feb 18, 2014

I read that and understood how it would be beneficial, but not how it would be possible. Say I own x.com, and y.com and even have them both being served from the same box. If a user requests a page for x.com, how can I get them to accept a cookie for y.com?

jules · on Feb 18, 2014

On the old domain, make a request to the new domain with query parameters that have the information necessary to login as that user. You can do this using e.g. a hidden image, an iframe, or using javascript. The request on the new domain saves that login information in a cookie.

sopooneo · on Feb 18, 2014

Of course. I should have thought of that. Thank you.

bpeebles · on Feb 18, 2014

You could do it through redirects to from x.com to y.com/login, which sets cookies on y.com's domain, and then redirects back to x.com. Either only do it on login or set another cooking on x.com once you've done it once. (I actually work on a web property that does similar, although for different reasons.)

Tarang · on Feb 18, 2014

I was thinking the same thing. As I understand it it's possible only to do for subdomains.

I suppose one way of doing it is with some kind of script that links up to a prehosted theguardian.com so the cookie is set from where the js is included from.

yahelc · on Feb 18, 2014

The other answers to your question are technically all feasible, but are all extremely unlikely to be how this was actually done.

This is just what third party cookies are. Cookies set from the server-side (ie, the Set-Cookie header) can set whatever domain they want -- and if that cookie happens to not match the domain of the page you're on, that's what's called a third party cookie.

Some browsers (primarily Safari IIRC), however, will automatically reject those cookies, either in all instances or depending on if you've interacted with that domain before.

sopooneo · on Feb 19, 2014

Thank you, and I appreciate learning. But I think I'm reading something else on wikipedia. The article there says you can only set cookies on "the top domain and its subdomains" [1] and that third party cookies are those set by page assets (like images within a page) that are served from a different domain. [2]

What do you think?

[1] http://en.wikipedia.org/wiki/HTTP_cookie#Domain_and_Path

[2] http://en.wikipedia.org/wiki/HTTP_cookie#Third-party_cookie

yahelc · on Feb 19, 2014

Ah, yes. I should have been clearer in my comment. Obviously, the request needs to be served from the domain or subdomain in question, but it doesn't need match the URL of the page you're on, and it doesn't need to be an iframe, nor be set by JavaScript. It can be a simple image request; doesn't need to be an iframe, and doesn't need to involve complex JavaScript.

My point was that it doesn't need to require much complexity at all; just an HTTP request served by the domain in question that passes along the cookies that need to be served from the new domain.

judk · on Feb 19, 2014

You are correct. Parent is mistaken.

yahelc · on Feb 19, 2014

My point was that the domain of the top URL you're on isn't relevant. Should have been clearer that the domain of the request itself needs to match.

untog · on Feb 18, 2014

You could place an invisible iframe on the page. Ugly, but possible.

beachstartup · on Feb 18, 2014

you sound like the comic book guy.

back here in reality, moving sites that generate many millions of dollars is always a big deal, and when it goes correctly, acknowledgement is due.

nodata · on Feb 18, 2014

> We attempted to speak with all our major referrers including search engines and social media.

Reworded: "Google don't have a phone".

room271 · on Feb 18, 2014

They do for large organisations like The Guardian.

drpgq · on Feb 18, 2014

I wonder how large you have to be before Google will answer.

spada · on Feb 18, 2014

Spend > $250k/month on adwords and you can be friends with Google.

cowchase · on Feb 18, 2014

I'm surprised they went live with this without an expiration date on their permanent redirects. Now there's no way back, even if anything breaks. Looks like an unintentional Big Bang launch to me.

http://getluky.net/2010/12/14/301-redirects-cannot-be-undon/ http://mark.koli.ch/set-cache-control-and-expires-headers-on...

eponeponepon · on Feb 18, 2014

The Graun's address will forever be www.grauniad.co.uk to me.

(note to confused and/or non-UK people: look up the magazine Private Eye)

josephlord · on Feb 18, 2014

And even if you are familiar with Private Eye you might not get the joke that the Guardian was in the past notorious for tpyos and misprinst.

syncsynchalt · on Feb 18, 2014

I had always heard an interesting story about that — since the newspaper was printed in Manchester, it was the first press runs that had to be sent to far-off London. Then the later runs (with misspellings often fixed) were sent to the closer cities.

On the other hand, newspapers based in London would send their typo-ridden newspapers to far-off locales first, and the corrected editions would stay in London.

Since the tastemakers were in London this resulted in a situation where the newspaper becomes notorious for being ridden with errors.

No idea if there's any truth to it, the wikip page presents a different story that sounds like problems with collaboration tools (eg. TTY) used between the two cities.

maxharlow · on Feb 18, 2014

The Guardian actually owns thegrauniad.com -- somebody needs to set up the redirect

nhangen · on Feb 18, 2014

Not a very helpful article. I was hoping they'd share how they managed the SEO portion in a way that would prevent a drop in rankings. They glossed over almost every point.

lukasm · on Feb 18, 2014

They need to talk to Sean Parker. He'll convince them to drop "the"

room271 · on Feb 18, 2014

Ha, the issue there is really about domain availability and cost.

ChrisArchitect · on Feb 18, 2014

in previous years say 5+ years ago, this was a scary concept to anyone working on sites and still believing in nonexisitent SEO voodoo. But it has become commonplace and more than simple to 301 a site from one domain to another, updating the usual suspects like google etc to make sure it all goes smoothly. So nothing really that super here. Just nice to hear about process behind the scenes and that everything was taken into account etc..as it should be.

mhoad · on Feb 18, 2014

I am not sure how many domain migrations you have done in the past but based on this comment my guess would be few if any. There is a bit more to it than just slapping on a couple of 301's and hoping for the best.

ChrisArchitect · on Feb 18, 2014

have done many, obviously there's a lot of legwork with links and content, but it's not some technically groundbreaking thing or mass-mystery like it used to be. Maybe I've just gotten used to it.

jakub_g · on Feb 19, 2014

Random related trivia: `gu.com` also redirects to `theguardian.com` (useful on mobile, way faster to type).

macspoofing · on Feb 18, 2014

>Our goal was simple: “to serve all desktop and mobile traffic on www.theguardian.com and no longer serve any content on www.guardian.co.uk, m.guardian.co.uk or www.guardiannews.com"

Great!

So is the consensus that .mobi was one of the worst ideas in existence?

rhizome · on Feb 18, 2014

Once upon a time it was thought that device TLDs would be a useful thing, that's all. It just so happened that the smartphone was invented in the interim, and media queries and responsiveness and, heck, HTML became the standard way of representing mobile content.

paromi · on Feb 18, 2014

their first byte is not that fast :

http://www.webpagetest.org/result/140218_ZP_PHQ/

also many requests on the page

mattpointblank · on Feb 18, 2014

Try comparing that against the new responsive version of the site: http://www.webpagetest.org/result/140218_42_RAG/

paromi · on Feb 19, 2014

you have tested the mobile version

cs02rm0 · on Feb 18, 2014

There's something ironic about how long it takes tests to run with that site.

MichaelTieso · on Feb 18, 2014

Interesting that they contacted Yoast for SEO advice.

bhousel · on Feb 18, 2014

I was wondering about this too. The Guardian can't possibly run their site on Wordpress, or can they?

mattpointblank · on Feb 18, 2014

No, it's all internal, but that guy knows his stuff when it comes to SEO in general.

garethadams · on Feb 19, 2014

See for yourself - https://github.com/guardian/frontend

yoast · on Feb 20, 2014

I've been doing large site SEO for almost a decade, working for brands like eBay, Disney and others, it's not that weird. It's just that that's not the thing people know me for ;)

napcae · on Feb 18, 2014

>If the host was www.theguardian.com, we would rewrite all the URLs on the site to be www.theguardian.com. If the Host was www.guardian.co.uk we would rewrite all the URLs on the site to be www.guardian.co.uk.

wat?

peroo · on Feb 18, 2014

They couldn't change all URLs to be relative, so instead they wrote a filter which would rewrite absolute URLs to match the selected hostname. A simple fix for a relatively complex problem.

VBprogrammer · on Feb 18, 2014

Or a hack which will never be removed from the code-base, depending on your point of view.

I'm intrigued as to why changing to relative domains wasn't possible. If nothing else pushing 'http://www.theguardian.com' out for every link adds to a lot of bytes up for a busy site.

jefftk · on Feb 18, 2014

    pushing 'http://www.theguardian.com' out for every link
    adds to a lot of bytes up for a busy site

Fewer than you'd think after gzip compression:

    $ curl -s http://www.theguardian.com/us | wc -c
    223195
    $ curl -s http://www.theguardian.com/us | \
       sed s'~http://www.theguardian.com~~' | wc -c
    215473
    $ curl -s http://www.theguardian.com/us | \
       gzip | wc -c
    33783
    $ curl -s http://www.theguardian.com/us | \
       sed s'~http://www.theguardian.com~~' | gzip | wc -c
    33554

They have 7.7k of extra html due to repeating "http://www.theguardian.com" for every link, but gzip compressed this is only a difference of 229 bytes.

im3w1l · on Feb 18, 2014

Very nice writeup. What was the reason for switching?

grey-area · on Feb 18, 2014

They want to move from a local newspaper to a global news website.