This article provides an introduction to the new voice resiliency features offered with Microsoft Lync Server 2010 communications software. Resiliency architecture will be covered along with the new Survivable Branch Appliance device and scenarios about what the user will experience during a failure.
Author:
Keith Hanna
Publication date: June 2010
Product versions: Microsoft Lync Server 2010 communications software, Microsoft Office Communications Server 2007 R2
Microsoft Office Communications Server 2007 R2 introduced the capability to continue a call that is in progress if the user’s pool fails; however, the call was already in progress before the server failed. During the failure, any features of the call that required signaling were unavailable, including hold, transfer, and soon, limiting the user experience. Without signaling support, calls cannot be established.
Lync Server 2010 solves this problem with the introduction of native voice resiliency within the product, ensuring that voice functionality is preserved even in the event of a data center or pool-level failure.
Resiliency in Office Communications Server 2007 R2
Office Communications Server 2007 R2 introduced the Metropolitan Data Center design to support resiliency. This design involves a single enterprise pool that spans across two data centers. The network connection between the two data centers must have low latency links in place. Also required is a stretched VLAN because of the active/passive Microsoft SQL Server clustered database back end. This connection will probably require that the two data centers be in the same metropolitan area (hence the name). This architecture is covered in the Site Resiliency White Paper. Figure 1 shows a high-level view of this architecture.
Figure 1. Metropolitan Data Center high-level architecture
In addition to the Office Communications Server 2007 R2 workloads that are supported (instant messaging and presence), Lync Server introduces support for the voice workloads.
In this single pool architecture, there is no loss of features in the failover from one site to the other, and all Lync Server servers are active (except for the passive database node in the database cluster). The Metropolitan Data Center architecture is more complicated to deploy and expensive to implement.
Resiliency in Lync Server
Lync Server can provide voice resiliency across multiple pools without the need for the Metropolitan architecture, which can significantly reduce infrastructure costs.
Lync Server also introduces the Survivable Branch Appliance (SBA). It is a hardware device that includes a subset of Lync Server capabilities, including a Mediation Server and a gateway. SBA enables users to continue placing and receiving voice calls in a remote branch during a WAN failure.
In Lync Server, a user registers with a primary registrar pool and is assigned a failover pool (or back-up registrar pool). When the user cannot connect to the primary registrar pool, the client will attempt to register against the back-up registrar pool within a configurable time period. Enough time should be allowed to help ensure that network inconsistencies don’t cause the clients to failover to the back-up pools. This time-before-failover period is configured by using the Windows PowerShell Set-CsRegistrar command.
Note: |
|
An SBA can be designated as only a primary registrar pool.
|
Figure 2 shows this high-level architecture. A pool may have only a single back-up registrar defined. Figure 2 also shows the SBA in the Branch Office can point to either Pool 1 or Pool 2 because it’s a back-up registrar. Pool 1 and Pool 2 can act as a back-up registrar for each other.
Figure 2. Pool resiliency high-level architecture
Note: |
|
There is no requirement for the primary and back-up pools to be of equal capacity or version. A Standard Edition pool can act as a back-up for an Enterprise Edition pool. However, when planning pool capacity, you should take into account the resulting capacity in the event of a failure to help ensure that the servers don’t become overloaded.
|
Lync Server architecture has moved the registration and routing services onto individual front-end servers within a pool. User’s services (presence and so on) and conferencing are still in the back-end database. This means that in the event of a failure of the pool, the registration and routing services have a back-up provider. The other services don’t, resulting in the lack of these services in a failure scenario.
Note: |
|
Applications that are hosted within a pool aren’t available in a failure scenario, such as Conferencing Auto Attendant, Conferences, Presence-based routing, Response Group Service, Call, and Call Park.
|
Failure Scenarios
Using the architecture in the diagram shown in Figure 3, a number of failure scenarios will be considered and the results explained. These scenarios are as follows:
-
Primary pool failure (or loss of connectivity to a primary pool)
-
Back-up pool failure (or loss of connectivity to a back-up pool)
-
SBA failure
Figure 3. Failure scenarios
Table 1 shows the resulting impact to Alice and Bob under each failure scenario.
|
Failure Scenario
|
Impact to Alice
(Primary: Pool 1)
(Back up: Pool 2)
|
Impact to Bob
(Primary: Branch Office)
(Back up: Pool 2)
|
Comments
|
|
Failure of (or loss of connectivity to) Pool 1.
|
Voice services will failover to Pool 2.
Instant messaging (IM) and presence information is unavailable.
|
No impact.
Bob is not reliant upon Pool 1.
|
Any applications such as response groups or conferencing that are hosted in Pool 1 will be unavailable.
|
|
Failure of (or loss of connectivity to) Pool 2.
|
No impact.
Alice is not reliant upon Pool 2 for any services unless a failure occurs.
|
Voice services will continue.
IM and presence information is unavailable.
|
Any applications such as response groups or conferencing that are hosted in Pool 2 will be unavailable.
|
|
Failure of (or loss of connectivity to) SBA.
|
No impact.
|
Voice services will failover to Pool 2.
IM and presence information is available as usual.
|
|
User Experience During a Failure
As previously mentioned, Office Communications Server 2007 R2 enables users to continue a call during a failure, but the call has limited functionality. This prevents the user from putting the call on hold or transferring it.
Lync Server provides ongoing voice service, but degraded IM and presence service to the user. During the failure period, the client will visually indicate that only limited functionality is available when in the server is in failover mode. The user won’t have access to their buddy list. During this time, calls can be placed and received but other functionality is limited.
Lync Serverintroduces new resiliency to the configuration of multi-pool environments, as well as building upon the Site Resiliency White Paper introduced with Office Communications Server 2007 R2.
Communications Server Resources