Or The day things went wrong!
As senior developer here at Virgin Wines my life is usually relatively well planned out working on the enhancements and improvements we have in the pipeline for the site.
All that went out of the window on March 25th when suddenly I was told that something wasn’t right with the website and customers were having trouble placing orders.
Sound the alarm!
That sort of news is a major alert. After all, if you can’t place an order, we don’t have a business. Everything else gets abandoned and the whole team concentrates on just one thing – GET IT FIXED!
After checking all our internal systems and finding nothing wrong, a check over of the transaction logs showed that the problem wasn’t with our system but with our payment provider, Worldpay.
A bit of background here.
When an order is placed we don’t take payment straight away – we wait until the order is despatched. But we do need to check that the money is available. So we send an authorisation request to Worldpay. They check with the cardholder’s bank and come back to us with a “Yes” or “No”. If the answer is “Yes”, the order goes through and the money is reserved for future payment.
In this case Worldpay either wasn’t responding or was taking so long that our system gave up waiting and assumed a “No”.
The backup plan.
By now something like 10 minutes had gone by. We knew it wasn’t something we could fix but had to wait for Worldpay to sort out. So we had to put our backup plan into action.
Customer Services put a manual system into place so customers could phone or email their orders in. We would record these and place them manually as soon as the problem was fixed.
We put messages at strategic places across the website to inform customers of the problem and provide the contact details so they could place their orders by phone or email.
And we kept talking to Worldpay to find out when they would fix the problem – it took around 11 hours!
The Post Mortem.
When something like this happens the priority is to fix it. But once the crisis has passed, the post mortem begins.
What could we have done better?
Can we change anything to stop this happening again?
As it happens the answer to both these questions was “Yes”. So we’re going to make some changes.
- We are changing the site so that we can get messages to customers more quickly and in a more meaningful manner than the rather “ad hoc” method we used.
- We will rewrite the way in which we request authorisations from Worldpay and what responses we accept. There was more information in the response but if it wasn’t a straight “Yes” we treated it as a failure. In future we will have a much more comprehensive “conversation” with Worldpay and will be able to react more intelligently to this sort of situation..
To be fair, we weren’t alone in having these problems. There were lots of businesses in exactly the same situation. Hopefully they have also held their internal investigations to see what can be done if something like this were to happen again.
The important thing is that we learn from situations like this and improve our systems to cope. This was a unique failure but now it’s happened once I want to be sure we’re in the best possible position to cope should it happen again.
So now I have more, unexpected development to schedule in. But that’s all part of the job – and I love it!