Es scheint sich langsam zu einer never ending story zu entwicklen. Die Amazon Web Services (AWS) haben erneut mit einem Ausfall in der Region US-EAST-1 in North Virginia zu kämpfen. Dabei handelt es sich, wie erst kürzlich, um einen Stromausfall. Dieses Mal auf Grund schwerer Stürme.

Schwere Stürme sind für den Ausfall verantwortlich

Grund für den Stromausfall sind laut dem Stromversorger Dominion Virginia Power schwere Stürme mit 80 Meilen pro Stunde, ~~welche die Netzteile zerstört haben~~ die zu massiven Schäden geführt haben. Dominion Virginia Power versorgt mehrere Rechenzentren in der Region Virginia.

A line of severe storms packing winds of up to 80 mph has caused extensive damage and power outages in Virginia. Dominion Virginia Power crews are assessing damages and will be restoring power where safe to do so. We appreciate your patience during this restoration process. Additional details will be provided as they become available.

Viele Amazon Services betroffen

Von dem Ausfall sind dieses Mal deutlich mehr Services betroffen, als noch bei dem Ausfall vor zwei Wochen. Darunter Amazon CloudSearch, Amazon CloudWatch, Amazon Elastic Compute Cloud, Amazon Elastic MapReduce, Amazon ElastiCache, Amazon Relational Database Service und AWS Elastic Beanstalk.

Hier das Protokoll des Ausfalls.

Amazon CloudSearch (N. Virginia)

10:16 PM PDT We are investigating elevated error rates impacting a limited number customers. The high error rates appear related to a recent loss of power in a single US-EAST-1 Availability Zone. We are working to recover the impacted search domains and reduce the error rates which they are experiencing.

Amazon CloudWatch (N. Virginia)

8:48 PM PDT CloudWatch metrics for EC2, ELB, RDS, and EBS are delayed due to lost power due to electrical storms in the area. CloudWatch alarms set on delayed metrics may transition into INSUFFICIENT DATA state. Please see EC2 status for the latest information.
10:19 PM PDT CloudWatch metrics and alarms are now operating normally.

Amazon Elastic Compute Cloud (N. Virginia)

8:21 PM PDT We are investigating connectivity issues for a number of instances in the US-EAST-1 Region.
8:31 PM PDT We are investigating elevated errors rates for APIs in the US-EAST-1 (Northern Virginia) region, as well as connectivity issues to instances in a single availability zone.
8:40 PM PDT We can confirm that a large number of instances in a single Availability Zone have lost power due to electrical storms in the area. We are actively working to restore power.
8:49 PM PDT Power has been restored to the impacted Availability Zone and we are working to bring impacted instances and volumes back online.
9:20 PM PDT We are continuing to work to bring the instances and volumes back online. In addition, EC2 and EBS APIs are currently experiencing elevated error rates.
9:54 PM PDT EC2 and EBS APIs are once again operating normally. We are continuing to recover impacted instances and volumes.
10:36 PM PDT We continue to bring impacted instances and volumes back online. As a result of the power outage, some EBS volumes may have inconsistent data. As we bring volumes back online, any affected volumes will have their status in the “Status Checks” column in the Volume list in the AWS console listed as “Impaired.” If your instances or volumes are not available, please login to the AWS Management Console and perform the following steps: 1) Navigate to your EBS volumes. If your volume was affected and has been brought back online, the “Status Checks” column in the Volume list in the console will be listed as “Impaired.” 2) You can use the console to re-enable IO by clicking on “Enable Volume IO” in the volume detail section. 3) We recommend you verify the consistency of your data by using a tool such as fsck or chkdsk. 4) If your instance is unresponsive, depending on your operating system, resuming IO may return the instance to service. 5) If your instance still remains unresponsive after resuming IO, we recommend you reboot the instance from within the Management Console. More information is available at: http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/monitoring-volume-status.html

Amazon ElastiCache (N. Virginia)

8:43 PM PDT This service is currently affected by a power event. Please see the EC2 status for further information.
9:25 PM PDT We can confirm that a large number of cache clusters are impaired. We are actively working on recovering them.
10:21 PM PDT We are continuing to recover impacted Cache Nodes. Our APIs are operating normally.

Amazon Relational Database Service (N. Virginia)

8:33 PM PDT We are investigating connectivity issues for a number of RDS Database Instances in the US-EAST-1 region.
9:24 PM PDT We can confirm that a large number of RDS instances are impaired. We are actively working on recovering them.
10:43 PM PDT RDS APIs are operating normally. We are continuing to recover impacted RDS instances and volumes.

AWS Elastic Beanstalk (N. Virginia)

9:00 PM PDT This service is currently affected by a power event. Please see the EC2 status for further information.

Erneut bekannte Webseiten und Services betroffen

Der Ausfall hat mit Instagram, Pinterest und Netflix wieder viele bekannte Webseiten und Services betroffen. Ebenfalls der PaaS Anbieter Heroku, der viele Startups und mobile Anwendungen zu seinen Kunden zählt ist betroffen. Wohingegen Pinterest und Netflix erreichbar sind, ist Instagram vollständig down. Erst am 14. Juni 2012 gab es in North Virginia einen Stromausfall im Amazon Rechenzentrum.