The IT failure was widespread, affecting the multiple operational elements of BA’s service including flight planning, logistics, bookings, check-in and customer service. This left the company unable to process the huge numbers of travellers arriving at airports over the bank holiday weekend, and left the travellers with little or no information on the situation outside the company’s Twitter account.
While BA has stated that the IT failure was caused by a ‘power supply issue’, no specific details have been released. The fact that the failure was not immediately dealt with by backup systems, but instead brought the company’s entire operational capability to a grinding halt, has many airline and IT experts puzzled.
“No organisation should have a single point of failure,” said Colin Cochrane, director of Scottish infrastructure specialists Cosnadh. “Even the most basic device has two power supplies.”
Multiple reports in the media ask if BA’s decision to outsource much of their IT support to India was a significant factor in the failure. There is also speculation that a lack of training in switching to backup systems and few experienced staff may have compounded the problem.
The fact that a power failure – one of the most easily anticipated problems for an IT centre – caused such chaos, has left many asking how a company so dependent upon access to accurate real-time data, allowed its digital infrastructure to collapse so completely.
Business continuity processes are practiced by millions of companies around the world, especially those handling sensitive data, or which are entirely reliant upon secure and robust access to online services.
“The use of multiple data centres, multi-vendor power suppliers, geographically diverse sites, UPS batteries, generators, these are standard practices for any business, let alone one which handles such critical data,” says Cochrane.
“Many organisations utilise high availability services, or ‘automatic failover’ to meet and minimise the recovery point and return to operations objectives. Key to this is application dependency, you need to know what parts of the system depend upon other services, how they connect and what order to bring them online. This takes time to test, time-stamp and certify the results.”
The company’s procedures for dealing with a critical digital services failure may well be where the failure occurred, says Cochrane. “Any business, regardless of whether they’re in-sourcing or out-sourcing should have processes in place for business continuity, which should be documented and tested as part of the audit and insurance governance.”
Not The First Instance
There have been similar failures at other airlines around the world in the recent past. In 2016 US company Delta lost an estimated $100 Million after a fire in a data centre caused similar levels of delays and cancellations. The fact that BA has encountered an almost identical situation may well point to a fundamental failure in the company’s business continuity plans.
BA now faces compensation charges running into an estimated £100 Million as passengers begin the process of claiming for cancelled and delayed flights. In addition, despite recent cost-cutting measures in IT, including the controversial outsourcing decision, the company may have to set aside further funds to address the failure of its IT infrastructure and business resilience strategy.
“It would be money well spent when brand reputation and business revenue has been put at such risk,” says Cochrane.
As of writing the British Airways website is back online and claims the company is ‘closer to full operational capacity’. Passengers are advised to remain at home unless they have a confirmed booking for today and they know their flight is operating.