Communication & Collaboration
Building an Emergency Operations Center on Groove and SharePoint
At a Glance:
- Emergency Operation Center design basics
- Building on Groove and SharePoint
- Lessons learned and best practices
Last year, as the rest of the world watched on television, Louisiana faced one of the worst disasters in U.S. history. Hurricane Katrina hurtled through the Louisiana and Mississippi Gulf Coast
on August 29th, destroying a good part of New Orleans and killing hundreds of people. In the aftermath, as thousands of evacuees fled the affected area, Louisiana State University (LSU) in Baton Rouge, 80 miles northwest of New Orleans, became a primary center for medical operations. Amid the chaos that reigned, many dedicated LSU staff worked tirelessly to help these evacuees. As part of that effort, another Microsoft employee, Matthew Lonergan, and I helped LSU use Microsoft® Office Groove to manage the vast numbers of people needing assistance.
This is not another article about Katrina, though, but one about the proactive steps LSU has taken to increase its preparedness for the next emergency, no matter what it is or when it occurs. In the months after the storm, realizing the great importance of having a centralized command and control system during emergencies, LSU instituted a formal Emergency Operations Center (EOC), which became the all-hazards C4I (command, control, communications, computing, and intelligence) center for LSU. Led by LSU's Chief of Police, the EOC includes a small number of key personnel from many LSU departments including public safety, facilities, public relations, procurement, and information technology. This article focuses on IT and explores the IT needs in an EOC and the types of technology that can help satisfy those needs.
Weather Map of the Site Hit by Katrina
When dealing with a crisis situation, there are two major classes of need technology can help satisfy. First, in any emergency there is a need to collect and store information. After Katrina, we tracked evacuees and medical volunteers. In future disasters, we may need to track specific supplies and equipment. Second, as a corollary to data collection, there is a need to share information. After data is gathered, you must be able to share it in ways that meet the needs of the mission at hand. For example, medical volunteer data must be made available to the police at the medical evacuation site so they can verify the identity of anyone claiming to be a volunteer.
IT systems operating in a crisis environment are faced with a number of constraints not commonly seen in typical day-to-day IT operations. This means that many common technologies used to satisfy the major needs identified earlier are not effective in an emergency environment. For example, information sharing within an organization is often done through file shares and access is controlled with access control lists (ACLs) based on Active Directory® users and groups. In an emergency, organizations will likely be collaborating with groups and individuals that don't have accounts in the organization's Active Directory and who may not have connectivity to file shares.
Based on our experiences with Katrina, we identified a number of constraints often faced by IT in an emergency. First, network connectivity and bandwidth are not guaranteed. While many organizations, including LSU, have large-scale wireless deployments and fast-wired connections, how many systems could sustain a ten-fold increase in demand? In an emergency situation, with mass mobilization of resources and spotty power, organizations can't rely on having fast, reliable connectivity.
Second, the Internet isn't always there. This may seem like a given in light of the previous constraint, but it's important to highlight the potential frailty of an Internet uplink. While you may have redundant hardware and fully meshed routing and your ISP may have the same, the local power company may not be able to restore power to them before backup generators run out of fuel.
Third, the definition of "users" expands greatly. Normally, an IT group considers its own organization's staff (and perhaps its customers and business partners) as its user base. Thus, identity provisioning and management systems, staffing levels, and general business processes are designed with a relatively well-established number and range of scenarios in mind. How many organizations are ready to bring potentially hundreds of volunteers, contractors, and various civilian and military governmental staff into their information systems in a rapid and secure way?
Finally, people need to access data using a variety of machines and connectivity scenarios. New users will often bring their own computers, configured in their own ways with their own sets of applications on them. You won't be able to use Systems Management Server to do a mass installation of Office or another application to provide a common baseline of software.
Understanding these needs and constraints led us to make some very specific choices about the types of systems we deployed in LSU's EOC. The system we designed needed to be able not only to easily collect data, but also to share it. We needed to be able to share data securely with people on our network, with those connected to random Internet access points, and with those connecting from other government agencies. We needed to be able to share this data effectively and with little user interaction, even when network connectivity was sporadic and bandwidth poor. We needed to be able to rapidly provision new users into the system without requiring them to authenticate into our Active Directory. And we needed to have a software platform that could be quickly and easily installed by the users themselves, without requiring any specific installation source.
EOC Design Basics
Given these needs, we built LSU's EOC around two Microsoft collaboration products—Groove and Windows® SharePoint® Services (WSS)—plus the infrastructure products that support them, mainly Windows Server® 2003 R2 and SQL Server™ 2005 (see Figure 1). Groove and WSS complement each other to satisfy the information collection and sharing needs, and they're ideally suited for operating in the constrained environment of an emergency situation. However, to provide a platform suitable for use in an EOC, we had to look at the solution from the bottom up, ensuring that everything from the physical plant to Active Directory was configured in a way that allowed these products to work optimally in an emergency.
Figure 1 Architecture of the EOC Computing Environment
There are both obvious and non-obvious parts of an EOC's physical plant that impact its IT operations. At LSU, the EOC (shown in Figure 2) is located in the university's Public Safety building, which also houses the LSU Police Department. So, in this case, physical security is already quite strong. There is redundant power to the building and directly attached generators with sufficient fuel for long-term operations without grid-provided current.
Figure 2 Interior of the University's EOC (Click the image for a larger view)
Less obvious is the fact that the machine room is not adjacent to any exterior wall and that it has a cooling system that can be powered separately from the rest of the building's air conditioning. When running on generator power, this allows us to save fuel by keeping the machines properly cooled even if the people aren't.
Another non-obvious aspect of the physical design is the server itself. Rather than a high-end quad-CPU server, we're using a relatively basic dual-CPU system to increase power efficiency. We're also using local storage on the WSS computer to eliminate the need to power a storage area network (SAN), network attached storage (NAS), or other external chassis. Finally, we've ensured that not only the servers but also the workstations, projectors, and networking equipment are on an uninterruptible power supply (UPS).
Because we need to ensure continuity of operations even in the event of a total power loss, the WSS environment is contained within an Active Directory forest completely separate from the primary LSU Active Directory. This forest, which is dedicated for use by the LSU Police Department, is wholly contained within the physical plant I've described. Thus, even a complete campus-wide outage of power or network connectivity will not affect operations within the EOC.
The EOC's Active Directory is based on Windows Server 2003 and uses a relatively simple single-forest, single-domain, single-site model. Two domain controllers (DCs) provide redundancy of directory services and each DC performs a system state backup on a daily basis. These backups are replicated to both onsite storage and (assuming the network is available) to an offsite storage location elsewhere in the state.
User objects have been provisioned for permanent EOC staff members. These accounts are used solely for access to the SharePoint site and are not used for logon at the workstation. Each EOC workstation is dedicated to a particular EOC role, such as Operations Commander, Logistics Specialist, and so forth. Because the EOC is designed to operate around the clock, the people filling these roles will change throughout the day. No data is stored locally on workstations, only on SharePoint and in Groove. Though normally this isn't a recommended best practice, in this case there is little value in having individuals log on to their workstations using specific accounts. Instead, we use a dedicated, heavily restricted user account across all EOC workstations that is configured to automatically log on after system restarts. This allows us to rapidly transfer anyone into any given role within the EOC without having to provision new accounts or manage passwords. It also has a significant advantage for Groove synchronization, as I'll detail later.
The DNS design within the EOC forest is also relatively simple. We utilize a dedicated, non-registered suffix with all zone authority held by the DCs. DNS forwarding is used and references LSU's primary public name servers for both LSU and Internet names.
Because the EOC is within the Public Safety building, it is segmented from the rest of campus and from the Internet at large. In many ways, it more closely resembles the classical corporate network design of a border firewall—with limited ingress ports and translated internal address spaces—than the more open designs often found in higher education. Access to the SharePoint site from the Internet is restricted to HTTPS, and no other connectivity is allowed from the public Internet to the EOC network.
Egress access from the EOC network to the Internet occurs through the local router's Network Address Translation (NAT) capability. Egress filtering is limited because of the variety of protocols used by agencies that may partner with the EOC in case of emergency. For example, assisting staff from local, state, and federal agencies may each require outbound connectivity to a different set of ports and destinations to facilitate access back to their organizations' networks. Because of the way Groove utilizes relay servers, workspace synchronization can also occur over this NAT connection. In the future, if we choose to implement more strict egress filtering, Groove would still function by wrapping its communications within HTTP frames for outbound access.
Building on Groove and SharePoint
Now that we've established the basic infrastructure components, let's look at the solution they support. SharePoint is the core of our EOC communication platform, providing an easy-to-use, full-featured Web site for storing documents, contacts, and task lists. Because it's a Web-based application, we can easily make its content available to anyone physically in the EOC building or anywhere on the Internet (assuming we have network connectivity). What we can't do with SharePoint is take the data offline, and that's where Groove comes in.
Groove provides key pieces of the overall solution. Through its Mobile Workspace for SharePoint, Groove allows us to take the entire SharePoint site and make it available offline. Groove users synchronize the contents of the SharePoint site to their local computers and can then view, edit, and add to the site even if they don't have a network connection. When they reconnect, the content is automatically synchronized back to SharePoint for the rest of the EOC to use.
Using Groove forms, we can quickly and easily collect large amounts of data in any emergency situation. During Katrina, we used Groove to track the location and health of thousands of evacuees and volunteers. Users could take a Groove-equipped laptop to a site with no power or network connectivity, collect as much information as battery power allowed, then return to a place with network connectivity. As soon as the Groove client reconnected back to the workspace, it began synchronizing the data collected offline with all its peers (see Figure 3). This is a tremendously powerful capability because it allows us to rapidly integrate dispersed information into a single distributed database.
Figure 3 Synchronizing Data through Groove and Sharepoint
Groove forms are easily created and modified, but their most important capability during a crisis is autonomous deployment. Once a user is provisioned into a Groove workspace, any forms created or modified in that space are added automatically to their workspace during the normal sync process. This allows us to react quickly to changing data collection needs on the fly without having to expend any effort updating the clients. For example, during Katrina, once we learned of the problems people were having locating friends and family, we modified our evacuee data collection form to include a "Transferred to" field that allowed us to provide much better intelligence on the location of anyone who had been through our medical centers.
Groove Design Elements
Because of the criticality of the EOC and the fact that Office Groove 2007 is still in beta, our current EOC software stack is built on an earlier Groove 3.1 release. Before the beginning of the next hurricane season, we plan on upgrading to Groove 2007 and SharePoint Server 2007, especially because of the advances in the integration between the two products. For now, though, our Groove deployment consists of the EOC workstations themselves, the laptops of some permanent EOC staff, and the machines that control our wall-mounted displays. The display systems allow us to project content in Groove onto the large displays in the EOC for easier analysis.
On the EOC workstations, Groove is configured with role-based accounts and automatic logon. As mentioned previously, EOC workstations are role-based so we have individual Groove accounts set up on a per-role basis. In other words, we have separate Groove accounts for the Incident Commander, Operations Commander, and Public Liaison roles, but not individual per-user accounts for the people who fill these roles. This approach allows us to easily transition work between shifts and to facilitate better collaboration, because whoever may be filling a role at a given time fully assumes its identity. It also makes tracking easier, since all events based on user names are represented by roles rather than individuals. For example, when viewing a discussion board or chat transcript, postings will be identified by role, making it easier for volunteers who don't know EOC staff members by name to understand. This approach also helps avoid creating replication storms during shift-change operations when all staff rotate out, which would require complete workspace synchronizations if we were using per-user accounts.
We utilize Groove Hosted Services to easily create and manage accounts on our own. The Hosted Services interface allows us to import account information from Active Directory or from Microsoft Excel® workbooks, making bulk account creation straightforward. Given our requirement to operate in the complete absence of any external network connectivity, we've already created all of the accounts we need and activated Groove on each workstation. This means we can continue using Groove even without any connectivity to the Hosted Services interface.
Two primary workspaces are used in the EOC. The first is the Mobile Workspace for SharePoint, which simply takes SharePoint offline and allows the synchronization of portal data changes between the Groove and SharePoint databases. The second is the more traditional type of workspace, in which we have forms for collecting data. We have pre-provisioned our workspaces with forms used for tasks we know will be important in any emergency, such as tracking evacuees or patients and tracking medical volunteers. As noted earlier, we can quickly update these forms automatically as new needs crop up. One of the important features of Groove 2007 is its integration with Microsoft Office InfoPath®, which will make form creation an even easier process.
SharePoint Design Elements
The EOC SharePoint design is quite simple. We use the basic Team Workspace template to provide the standard portal toolset. The primary tools we use are the Document Library, Links, and Contacts. Using Links and Contacts gives us an easy way to have shared favorites and address books; in combination with Groove, we have these data sources available offline as well.
One custom page we created was to drive the wall-mounted displays and show a list of major issues as they occur. To do this, we took a Custom List tool and optimized it for the wall-mounted displays, which included removing borders, headers, and tables to conserve screen space and adding a meta tag to force the page to refresh every 15 seconds. Because with SharePoint any user can update this list through a Web browser, it becomes simple for staff to post an alert to the wall-mounted displays.
SharePoint must be backed up using the Stsadm.exe backup and restore tool. Like the system state data from a DC, this backup is replicated both to local storage and to an offsite location. Because this backup can be easily restored to any Windows Server 2003 R2 system—even in a drastic case where the EOC itself is destroyed, we can set up our operations using the same toolset anywhere we can put a machine running Windows Server. With a few minutes of data restoration and reconfiguration of the Groove Mobile Workspace, we'd be able to continue using our existing set of tools and processes.
Here are a few of the lessons we learned in the process of setting up the EOC.
Groove Synchronization Limitations Groove workspace synchronization slows as the total number of records in the workspace approaches 10,000. In most EOC activation scenarios, 10,000 records are likely far more than is needed. However, in a large-scale disaster like Katrina, we did track more than 10,000 evacuees, which caused significant performance degradation of the Groove client. Thankfully, the solution to this problem is simple: as the total number of items in any given workspace approaches 10,000, create a new workspace and begin using it for any new items. Note, however, that this approach does make querying somewhat more complex since the data then exists in multiple workspaces, each of which would need to be queried individually. Luckily, however, a single instance of Groove on a single computer can be joined to many workspaces so all data is still available through the Groove client.
Another approach may be Groove Data Bridge, which allows an organization to develop a synchronization link between Groove and a traditional database system like SQL Server. The data bridge could be used to archive older records into SQL Server for more permanent storage, keeping the number of active records in Groove within the synchronization limit. At LSU, we chose not to use this approach primarily because of the complexity involved and the relatively infrequent need to work with such large datasets.
Groove Deployment and Activation Constraints Because of the hosted aspect of some Groove services (notably Groove Relay Server used by clients to find each other over the Internet), the Groove client must be activated prior to use. Since one of the primary objectives of the EOC's IT design is to facilitate disconnected operations, we obviously don't want to be dependent upon having an Internet connection available during an emergency. Furthermore, because each instance of Groove is uniquely dedicated to a particular user, it's not possible to deploy fully activated installations of Groove using traditional tools like Systems Management Server.
Because of these constraints, we deployed Groove as part of a standard image to all EOC workstations, but in a non-activated state. This meant that after imaging these systems, the Groove application binaries were available, but the product was not activated and no workspaces were joined. Thus, to complete our deployment, we created all of our role-based accounts (using the Hosted Services Web-based management tool) and manually activated each machine and joined it to the appropriate workspace. This provided a good balance between efficiency of deployment and the product's unique post-installation configuration needs.
Have Generic Role-Based Accounts Ready for Volunteers During a major emergency, there will most likely be many volunteers who can input data into the Groove workspace but who are not permanent EOC staff members and don't have assigned roles. Having pre-staged user accounts created for these data entry purposes allows LSU to more rapidly tap the services of such volunteers during an emergency. In our design, we have a number of generic accounts pre-built and ready for use by volunteers or any third party we may need to collaborate with.
Have the Groove Enterprise Installer Ready While Groove can be installed from www.groove.net, again we do not want to have dependencies on Internet connectivity during an emergency. Even if Internet connectivity is available, the installer on www.groove.net downloads files on demand, meaning it must download the entire application individually to each machine where it's used. In a bandwidth-constrained environment, this process may not complete quickly or successfully, or may consume bandwidth that could be used for higher purposes. Thus, rather than relying on the Web installer, use the Groove Enterprise Installer, which is an MSI-based installation package. Like other MSI-based packages, the installer can be fully contained on a local network share or USB disk and has no Internet dependencies during setup.
Groove Mobile Workspace for SharePoint Synchronization An important point to keep in mind about the Mobile Workspace for SharePoint is that only the computer that establishes the workspace can synchronize between the two environments. In other words, while all other Groove synchronization is peer-to-peer, the Mobile Workspace for SharePoint is one-to-one and only between the original creator of the workspace and SharePoint. Because of this, it's vital that the workspace be created on a machine that's always on and always connected to the network. In our case, the machine used to drive the wall-mounted displays is an ideal choice for this task.
An EOC is a unique environment for most IT professionals. Many of the services often taken for granted, such as high bandwidth and Internet connectivity, cannot be relied on during an emergency. Thus, the systems put in place to manage emergencies must be able to deal with scenarios where connectivity may be spotty and users widely dispersed, but where it's more crucial than ever to have secure, reliable access to data. By combining SharePoint and Groove, LSU has been able to build an EOC that's survivable and able to come to the needs of the university and the people of Louisiana even in the face of serious disasters.
John Morello spent six years with Microsoft as a Senior Consultant and is now the Deputy CISO at Louisiana State University.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.