Publish your capacity issues
We were unpleasantly surprised by a capacity issue in Amsterdam yesterday, which is still ongoing. You can't create any new instances in this zone.
In our business model, it is essential that we can continue to scale upwards on demand. I belive that's one of the main reasons for anybody to choose a cloud provider over handling your own servers.
I contacted support and got the following answer: "Unfortunately we do not currently have an ETA on the additional capacity in Amsterdam. Please check back in a few days."
I can live with this right now, because we're still in beta and our users might tolerate an answer like "We do not currently have an ETA on getting our site back online. Please check back in a few days." But if this happens after launch, it will severely hurt our business.
To my surprise, this issue is not published anywhere, not on blog.digitalocean.com, not on digitaloceanstatus.com. I asked why, and got the answer "Because it's not a network impacting event. It's a capacity issue and we don't put capacity issues online".
I feel that this is a very serious issue, but from the answers I get I get the impression that Digital Ocean does not agree. In that case, we'll forced to find another provider in order not to jeopardize our business.
And that would be sad, because we would really prefer to stay with DO, which we've been very satisfied with so far.
Would you please publish such issues and let us know what you're doing to fix it and give an ETA?
Thanks for bringing up this issue and apologies on the delayed response. There have been a large number of changes that we have implemented and are continuing to implement to resolve this matter permanently.
First and foremost since this was first brought up in the close of 2013 we have raised a large series A round led by Andressen Horowitz. The primary purpose for this fund-raise was to better finance the company to ensure that we could purchase equipment as necessary and continue to scale.
Since that round close we have many times improved our procurement and provisioning process to better meet the demands of our customers.
There are many incremental steps that we have taken to ensure capacity in all regions including those that are outside of the US.
The next steps that we have taken is to provide better international coverage so that customers can select facilities that make the most sense for their needs. With that we have launched our first Asia-Pacific region in Singapore and we will be launching the UK to provide better coverage for Europe in July.
Lastly, as we have continued to expand we see that providing several datacenters in a single region is essential for high availability but at the same time we do not want to increase complexity for customers in their selection of datacenters when a region is supporting multiple datacenters.
So with that we are rolling out a new initiative in the next 3 months to convert our existing named 1, 2, 3 etc datacenters into a selection of only two. This way we will handle on the background the interlink between different datacenters, provide more flexibility for growth, and also provide high availability as well.
When we finish rolling out this last initiative it should make all capacity issues a thing of the past entirely.
Zachary DuBois commented
Agreed, NYC2 and SGP1 are the only available right now.
I want to agree on this one. The way to solve this currently is to keep the maximum needed machines all the time. That means that a cloud provider devolves into a VPS provider.
The way that Amazon solves this is by providing multiple levels of service;
* reservation - you will get this many machines
* normal instance - this machine will keep running (unless something horrible happens)
* spot - this instance may disappear at any time, but you won't pay nearly as much
Basically, they can have their reservation above their normal usage by the amount of spot instances. If all their reservation customers suddenly need their reservations at the same time then the spot instances have to wait a while till the surge (e.g. a major shopping day) dies down.
Spot instances become a cheap way to do large scale data processing and similar.