On Thursday 29th of March, Iomart, the Scottish hosting company, suffered a major network outage – only the company’s second in 20 years according to a spokesman – which took a number of services in the North of the country offline.
For a company which prides itself on 100% uptime, the loss of service was a significant disruption. The company’s normal resilience and fallbacks didn’t stop customers losing service, with companies including Virgin East Coast and ParentPay among those affected.
A spokesman for Iomart told DIGIT that the initial problem was caused by a hardware fault on the west coast of the UK, between the company’s data centres in Manchester and Slough, at around 09:45. The company’s monitoring systems alerted them to the issue and they started investigating the problem. At this point, no customers lost service, as Iomart’s network fallbacks redirected traffic back around the undamaged network.
Unfortunately at 12:24 on the other side of the country, an extensive fibre break occurred between North Allerton and Bradford in Yorkshire. According to Iomart a farmer putting in a new drainage system cut through all of the fibre cables in a bundle on the Zayo network, including those used by Iomart. At this point, with the network cut on one side of the country and damaged on the other, customers north of Manchester and Glasgow were cut off.
It was at this point that customers and users took to Twitter to complain that services were down and access lost.
Looks like we’ve ran into some IT issues this afternoon! If you’re having trouble booking or collecting tickets, or accessing tickets through the app, have no fear – the Web Wizards are on the case!
Apologies for any inconvenience caused. pic.twitter.com/SJMNBuROSF
— Virgin Trains EC (@Virgin_TrainsEC) March 29, 2018
At this point, with service being lost Iomart, along with network partner Zayo, were attempting to address both issues. The company applied a patch, creating a tunnel from its Manchester facility to that in Slough by 15:35, returning most services to normal.
The hardware issue on the West coast was fixed by 16:30, allowing the company to switch back to the main network, but it left the tunnel between Manchester and Slough in place in case of further problems.
The fibre break was finally fixed by 03:50 on Saturday morning, thanks to flooding on site, giving the company full operational capability back.
The Iomart spokesman told DIGIT this was only the second time in 20 years of operation that the company had been hit by such an outage. While they’re aware this this does not help customers, affected by the downtime, Angus McSween, the company CEO, has been vocal about ensuring the situation cannot occur again in the future.
According to Iomart, the company is now exploring a number of options to increase the network’s resilience still further and adding even greater redundancy. While several options are under discussion, the company is keeping the tunnel from Manchester to Slough in place, giving the data centre a third line. This may be replicated across the other data centres, or replaced with an alternative solution currently under discussion.
Mr McSween told DIGIT: “We saw a ‘perfect storm’ of incidents on Thursday. While most of our customers weren’t impacted, those with services north of Manchester did experience a service interruption and for that I am truly sorry.
“Our uptime record, I would argue, is as good, if not better, than most of the competition, but that does not mean we ever become complacent. We will learn from this event and are already taking steps to make our network even more resilient.”