Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Southwest Airlines grounds flights due to computer outage (ktvu.com)
29 points by gridscomputing on June 22, 2013 | hide | past | favorite | 16 comments


I would really hate to be the developer/sysadmin who potentially struck the fatal blow to their system(s). Its one thing in a business if people can't get email/work with some crm type tool. Its another thing entirely when thousands of people are stranded at airports.

Good luck to those poor souls working triage right now. Even if it wasn't you're fault they are gonna have someones head.


If it's any consolation, often in these sort of large scale outages at least part of the cause is management pushing back on recommendations by the IT staff.


Man I always underestimate the reach and extent of software, especially in enterprise. It really is everywhere.

As an aside, does anyone know if such things like ticketing systems are developed in house or are outsourced to places like, say, IBM?


I worked for Southwest as recently as last year.

In the industry, ALL reservation and ticketing systems are written by specialists. The major players are Sabre ( think IBM ) / Navitaire / ITA ( which was bought by Google not too long ago ). Southwest is a great company, and they pay well, but their engineering staff is made up of the neckbeard brigade . . . lots of guys in their 50s that have either retired before retiring, or finally decided to learn this new fangled "java" language and move on from C and COBOL. There is simply not the engineering skill, nor the organizational will, to develop anything that complex in house. They tried once about 5 years ago and after 4 months it exploded in a ball of flames because they couldn't solve the the throughput issue.

This outage was almost certainly a Sabre issue. The reservation system itself runs on old school IBM big iron written in C. The system is so damn old that when you go southwest.com, all the information retrieved is done via screen scraping. Not XML, not JSON, not even CORBA. The key point is when the spokesman describes the "weight" of the airplane, something that is calculated by the system according to how many people checked in, how much luggage, etc. Without the system up and running, the flight cannot be "closed" and therefore tracked properly.

Southwest hosts all of their own in house applications ( except for a few things on GCE ), but the Sabre system is run out of some nuclear proof bomb shelter in Tulsa. It goes down 1 ~ 2 a year, but typically it's only out for 10-15 minutes. For planes to be called back to the gate, the outage would have had to have been over 60 minutes, as that's how long they buffer in a separate system in case it does go down.

Finally, Southwest pays Sabre by . . . wait for it, requests executed per second.


> The reservation system itself runs on old school IBM big iron written in C. The system is so damn old that when you go southwest.com, all the information retrieved is done via screen scraping.

Man, say what you will about old mainframes and critical systems and C and COBOL versus racks of commodity x86 boxes and Java and Ruby and "scaling out" and load balancers and reverse proxies and ..., but those old business apps running on the AS/400s and mainframes?

They just work(TM).

I agree that these old systems need to come into the 21st century but some of the most stable, reliable systems I've seen in my career are 20- or 30-year-old applications running on that old iron.

(mmm, 5150, 3270, CICS, JCL, RJE, IBM printers as big as refrigerators... sigh... sometimes I miss those days.)


Can we all agree that your point (while well taken)... is really really funny in light of the subject of this article? I'd say this was a pretty spectacular instance of just not working.


Well - a lot depends on what part failed - was it the backend big iron, or the front end scraping system?


I believe Southwest uses a system called SAAS, which they call "a Sabre Product" though it is actually based on the old Braniff Cowboy system and modified in-house. I remember talk of them switching to Sabre proper, but it's been years since I've been in that industry. Sabre I believe was made by some IBM engineers originally.

Some Relevant Links if you want to dig in, working with these systems was a lot of fun (I worked at several airlines):

Sabre: https://en.wikipedia.org/wiki/Sabre_(computer_system)

Global Distribution System: https://en.wikipedia.org/wiki/Global_distribution_system

Sabre Company site: http://www.sabreairlinesolutions.com/home/


This is 100% correct. Southwest uses SAAS, which is so old that last year they begrudgingly updated because the Big Iron SABRE was using to host these things was no longer being supported by IBM.

SAAS is mostly a Sabre Product at this point though, Southwest has requested so many modifications at this point that SAAS is basically custom.


Usually they are a group of specialiced programs, systems and services like amadeus (reservations) , carmen( crew programing) , that have to be costumiced for each company.


More than likely outsourced, but probably not from places like IBM. I recall there was a rental car company that used Unisys systems and I believe all custom software.


The original software was from IBM, if memory serves, although I believe Southwest modified it extensively.


"Some flights were on the taxiway and diverted back to the terminal after the problem was detected"

Why? You've checked these guys in, the baggage is loaded, the pilots (presumably) know where they are flying to... why would you pull them back in?


The business is probably so computer-dependent that the business is essentially the software systems executing. Planes flying, at this point, are mere physical side effects of software systems executing. If your systems aren't executing, your business isn't running.

Add in probable regulatory requirements that are also implemented by software, and that probably makes it illegal to fly.


It may have been relating to the Pilot Operating Handbook procedures the Airlines' Air Operators Certificate requires.

They still have the means to manually calculate the weight/balance etc. Some (most?) airlines require the pilots to do a manual cross-check worksheet of the fuel, weight/balance and a few other safety of flight items, before they take-off. This helps avoids mistakes like over-rotation on takeoff etc.

One particularly amusing case was in a 767, where the Captain handed a copy of the worksheet to a young girl who loved math problems, and was travelling in first class as a VIP. The young girl found a mathematical error in the worksheet. It turned out the flight computer had the fuel CG incorrect by 2-3% in certain cases, and both were incorrect.

A quick call to Boeing, and 20 minutes later, fuel transferred and the flight proceeded.

I suspect the Southwest aircraft returning to the gate, was a procedural abort, than a safely-of-flight issue.


Unable to calculate the weigh of the plane perhaps?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: