Topic Last Modified: 2013-02-26
Microsoft Office 365 offerings are delivered by highly resilient systems that help to ensure high levels of service. Service continuity provisions are part of the Office 365 system design. These provisions enable Office 365 to recover quickly from unexpected events such as hardware or application failure, data corruption, or other incidents that affect users. These service continuity solutions also apply during catastrophic outages (for example, natural disasters or an incident within a Microsoft data center that renders the entire data center inoperable).
Note that after recovery from catastrophic outages there is a period of time before full data center redundancy is restored for the service. For example, if Data Center 1 fails, services are restored by resources in Data Center 2. However, there may be a period of time until services in Data Center 2 have service continuity support either by restored resources in Data Center 1 or new resources in Data Center 3. The Office 365 Service Level Agreement (SLA) applies during this time.
Microsoft ensures that customer data is available whenever it is needed through the following features:
- Data storage and redundancy Customer data is stored in a redundant environment with robust data protection capabilities to enable availability, business continuity, and rapid recovery. Multiple levels of data redundancy are implemented, ranging from redundant disks to guard against local disk failure to continuous, full data replication to a geographically diverse data center.
- Data monitoring and maintenance Along with avoiding data loss, Office 365 helps maintain data performance by:
- Monitoring databases Databases are regularly checked for:
- Blocked processes
- Completing preventative maintenance Preventative maintenance includes database consistency checks, periodic data compression, and error log reviews.
- Monitoring databases Databases are regularly checked for:
The Office 365 development and operations teams are complemented by a dedicated Office 365 support organization, which plays an important role in providing customers with business continuity. Support staff has a deep knowledge of the service and its associated applications as well as direct access to Microsoft experts in architecture, development, and testing.
The support organization closely aligns with operations and product development, offers fast resolution times and provides a channel for customers’ voices to be heard. Feedback from customers provides input to the planning, development, and operations processes.
- Online issue tracking Customers need to know that their issues are being addressed, and they need to be able to track timely resolution. The Office 365 portal provides a single web-based interface for support. Customers can use the portal to add and monitor service requests and receive feedback from Microsoft support teams.
- Self-help, backed by continuous staff support Office 365 offers a wide range of self-help resources and tools that can help customers to resolve service-related issues without requiring Microsoft support.
Before customers enter service requests, they can access knowledge base articles and FAQs that provide immediate help with the most common problems. These resources are continually updated with the latest information, which helps avoid delays by providing solutions to known issues. However, when an issue arises that needs the help of a support professional; staff members are available for immediate assistance by telephone and through the administration portal 24 hours a day, 7 days a week.
For more information about support, see the Support service description.
A service incident is an event that affects the delivery of a service. Service incidents occur when a portion of the service infrastructure becomes unresponsive and unavailable to customers. Service outages may be caused by hardware or software failure in the Microsoft data center, a faulty network connection between the customer and Microsoft, or a major data center challenge such as fire, flood, or regional catastrophe. Most service incidents can be addressed using Microsoft technology and process solutions and are resolved within a short time. However, some service incidents are more serious and can lead to long-term outages.
There are two types of service incidents:
- Planned downtime (maintenance events) Planned downtime results from regular Microsoft-initiated service updates to the infrastructure and software applications deployed. Planned maintenance notifications inform customers about service infrastructure work that might affect some Office 365 services. Customers are notified no later than five days in advance of all planned maintenance through the Service Health Dashboard on the Office 365 portal. Microsoft typically plans downtime for times when service usage is historically at its lowest—Fridays and Saturdays based on regional time zones:
The Americas: 21:00 to 03:00 Pacific Time (GMT-8)
Europe, the Middle East, and Africa: 20:00 to 02:00 (GMT)
Asia Pacific and Greater China: 22:00 to 04:00 (GMT+8)
- The Americas: 21:00 to 03:00 Pacific Time (GMT-8)
- Unplanned downtime Unplanned events occur when one or more of the services included in the Office 365 suite are unavailable or unresponsive.
When a service impacting event occurs, Microsoft Customer Service and Support recognizes that timely and accurate communications are critical for customer organizations and partners. Microsoft notifies Office 365 subscribers by updating the Service Health Dashboard that is available on the Office 365 portal. For more information, see Service Health.
Microsoft’s commitment to continuous improvement involves analysis for customer-impacting unplanned service incidents to minimize future recurrence. In some situations, identifying the root cause for a service incident can be hindered by incomplete forensic data.
For customer-impacting unplanned service incidents Microsoft will provide a Post Incident Review (PIR). This detailed report includes:
An incident summary and event timeline.
Broad customer impact and root cause analysis.
Actions being taken for continuous improvement.
Because of the time and resources required to conduct an in-depth analysis after an incident, Microsoft will provide the PIR within five business days following resolution of the service incident. Administrators can also request a PIR using a standard online service request submission through the Office 365 portal or a phone call to Microsoft Customer Service and Support.