Understanding Microsoft Exchange Server 2003 Operations
Topic Last Modified: 2004-10-19
Information Technology (IT) operations refers to the day-to-day management of an IT infrastructure. An IT operation incorporates all the work required to keep a system running smoothly. This process typically includes the introduction and control of small changes to the system, such as mailbox moves and hardware upgrades, but it does not affect the overall system design. Within a Microsoft® Exchange Server 2003 organization, the procedures, roles, and responsibilities that are involved in operations need to be formalized.
Implementing Exchange Server 2003 operations procedures according to Microsoft Operations Framework (MOF) requires:
- An understanding of MOF MOF is a collection of best practices, principles, and models that give you technical guidance on the management of IT projects such as daily Exchange Server 2003 operations. Following MOF guidelines will help you achieve mission-critical production system reliability, availability, supportability, and manageability for Microsoft products.
- Familiarity with best practices for Exchange organizations It is recommended that you implement proven and practical procedures to manage an Exchange Server 2003 organization. Using the tried, tested, and documented methods of managing operations in your organization may be more efficient than developing your own methods.
- Separating operations into daily, weekly, and monthly processes Document the operations tasks performed regularly in your company. Documenting how and when tasks are performed ensures that the information is preserved when your operations staff changes jobs or leaves the company. New employees also benefit from this documentation because it helps them quickly learn how your IT department conducts its Exchange operations.
- Deploying the tools required for operating an Exchange Server 2003 organization Several tools are available to help troubleshoot problems and automate tasks. You can define a standard set of tools so the tasks performed by the operations team are done efficiently, consistently, and in a controlled manner. You should also implement processes to track incidents and major configuration changes.
This topic gives you the understanding, tools, and best practices required to maintain an Exchange Server 2003 environment. It explains how the management of Exchange Server 2003 fits in with the overall MOF model. It will help you design your operational management environment and give you the means to implement procedures to keep your environment running smoothly.
Microsoft Operations Framework (MOF) is a template on which you can design the procedures, controls, and roles required for the efficient operation of your IT infrastructure.
MOF provides guidelines about how to plan, deploy, and maintain IT operational processes in support of mission-critical service solutions. MOF is a generic model so you must adapt many of the recommendations for use in your company. When you see references to “roles” in the MOF model, understand that a single person may be assigned many roles, especially in small companies. But even if you represent the whole IT department, the procedures and recommendations in this model are generally applicable.
MOF is a structured and flexible model that is based on:
Microsoft consulting and support teams and their experiences working with enterprise customers and partners, as well as internal IT operations groups at Microsoft.
The IT Infrastructure Library (ITIL), which describes the processes and best practices required for the delivery of mission-critical service solutions.
ISO/IEC 15504 from the International Organization for Standardization (ISO), which provides a normalized approach to assessing software process maturity.
MOF provides recommendations for deployments of various Microsoft products, such as Microsoft Windows® Server™ 2003 and Microsoft Exchange Server 2003. For detailed information about Microsoft Operations Framework see http://go.microsoft.com/fwlink/?LinkId=21640.
MOF complements and integrates with the Microsoft Solutions Framework (MSF). MSF is a disciplined approach to managing technology projects based on Microsoft internal practices, the experiences of Microsoft Product Support Services in working with customers and partners, and industry best practices in software development and project management. MSF is a deployment approach for the design and implementation of IT systems (for example, a migration project to move from Lotus Notes to Exchange Server 2003), whereas MOF addresses the daily management of a system or environment, such as an Exchange Server 2003 organization.
The MOF process model is composed of quadrants, operations management reviews, and service management reviews. Figure 1.1 shows how the MOF cycle works.
From the figure, you can see how the MOF process model moves clockwise and is split into four integrated quadrants, as follows:
These quadrants form a spiral life cycle that applies to IT operations from a specific application to a complete operations environment with multiple data centers. The process model is supported by Service Management Functions (SMFs) and an integrated team model and risk model. Each quadrant is supported by a corresponding operations management review (also known as review milestone), during which the effectiveness of that quadrant's SMFs are assessed. You should understand that although the model describes the MOF quadrants sequentially, activities from all quadrants can be occurring at the same time.
Briefly, the quadrants cover the following activities:
- Changing A change is planned and tested during the changing phase. After a Release Readiness Review, the change is rolled out to the production environment and enters the operating phase. The Release Readiness Review should not be the first time the release is evaluated; it should be a final review milestone before the actual deployment. Using SMFs provides a process and task road and guarantees a successful deployment and rollout for managed releases.
- Operating The goal of an Operations Review is to provide the processes, procedures, and tools that make supporting the system as simple and efficient as possible. Think of the SMFs in this quadrant as the typical data center activities, such as system administration, monitoring, and batch processing. These activities guarantee the smooth and predictable operation of the release.
- Supporting The supporting phase is the process of maintaining the system, using these tools and procedures. This quadrant contains the main SMFs required to provide ongoing support to the users of the IT service solutions. As with any process, system, application, or service, problems can start when operations start. The support and operations staff must identify, assign, and resolve problems quickly to meet the requirements set forth in the service level agreements (SLAs). The SLA Review is a measurement of how effectively the system is performing. Issues that come out of the SLA Review may highlight areas where improvements are required.
- Optimizing The mission of service for this quadrant is to reduce costs while maintaining or improving service levels. An improvement to the system might require a change to hardware, software, or procedures. The Release Approved Review evaluates the proposals for change, accounting for items like costs, risks, and benefits. Approved changes are fed into the changing quadrant and the process starts over. This iterative process typically occurs naturally as the various teams gradually introduce changes to the system to achieve improvements.
The MOF framework formally describes the steps involved in this improvement cycle, assigning responsibilities for each step and enabling the whole process to be managed. At the end of each phase, there is a review point. With a large IT department, this is likely to be a review meeting between the people or teams involved, such as release management, operations, and security. In a smaller company, review points are possibly only a checkpoint that indicated you are ready to proceed. Figure 1.2 shows the relationship between MOF and MSF.
The MSF process can help develop a solution in response to a business need such as the requirement to consolidate server resources. In this case, the solution may outline how to deploy powerful mailbox servers that are running Exchange Server 2003. After the solution is deployed, the design and deployment team hands the environment to the teams described in the MOF model. These teams manage the daily operations, and provide feedback about requirements or suggestions for change to the design team. Again, this is an iterative process that you can use to refine and continuously improve your solution.
Service management functions (SMFs) are the roles of people or teams in the organization, such as support professional or system administration. The SMFs represent the foundation of the MOF process model. Although SMFs are cross-functional and cross-quadrant, the primary role of an SMF applies to a specific stage in the quadrant. For example, system administration is part of the operating quadrant, and release management is part of the changing quadrant. SMFs and the MOF quadrant of the cycle that each SMF applies to are discussed in detail in this section. The IT department of your company may comply with these roles and quadrants. Figure 1.3 shows these service management functions within the MOF cycle.
- Changing The processes in this quadrant address the introduction of new solutions, technologies, systems, applications, hardware, and processes in the environment. This includes:
Change management Involves managing developing, testing, and rolling out changes to the production environment. A key goal of the change management process is to identify and provide detailed information to everyone who will be affected by the impending change.
Configuration management Involves identifying, documenting, and tracking components of the environment and the relationships between them. Configuration management is also responsible for maintaining the definitive software library (DSL) which houses the master copies of all the software deployed in the IT environment.
Release management Involves releasing new software, hardware, and process releases into the production and managed preproduction environment. Release management considers all aspects of a release, whether technical or non-technical. Make sure that releases are well defined, maintained, and scheduled for each IT service.
- Change management Involves managing developing, testing, and rolling out changes to the production environment. A key goal of the change management process is to identify and provide detailed information to everyone who will be affected by the impending change.
- Operating The processes in this quadrant revolve around effective and efficient execution of day-to-day tasks.
System Administration Involves maintaining the messaging systems and coordinating the IT teams.
Security Administration Involves maintaining a safe and secure computing environment.
Directory Services Administration Involves managing user accounts, organizational units, and other Active Directory® directory service objects. Directory Services Administration focuses on daily operations, maintenance, and support of the organization.
Network Administration Involves maintaining the physical network infrastructure, such as servers, routers, and firewalls, to ensure that messaging systems can communicate with each other.
Service Monitoring and Control Involves monitoring system performance to ensure that daily operations are compliant with SLAs.
Storage Management Involves maintaining the data repositories in your messaging organization to ensure the availability of data. This includes backup and capacity planning.
Job Scheduling Involves scheduling maintenance jobs during off-peak hours (for example, backups and batch processes), considering the available capacity.
- System Administration Involves maintaining the messaging systems and coordinating the IT teams.
- Supporting The processes in this quadrant revolve around the resolution of incidents, problems, and inquiries.
Service Desk Provides guidance about setting up and running the organizational unit or department that is the single point of contact between the users and the provider of IT services. Service Desk organizes the activities and customer communications about incidents, problems, and inquiries related to production systems.
Incident Management Involves managing the process of resolving any fault or disruption to the production system, including escalation to and communication with other SMFs.
Problem Management Focuses on structuring the escalation process of investigation, diagnosis, resolution, and closure of problems.
Optimizing Focuses on changes to optimize performance or capacity, increase availability, or decrease costs in the delivery of IT services.
Service Level Management Involves monitoring the performance of the IT department and periodically reviewing its compliance with SLAs.
Financial Management Involves justifying required changes and other expenditures in terms of cost versus benefit. For example, the cost of hiring additional user helpdesk staff versus the benefits of a reduced waiting time for support calls.
Capacity Management Involves monitoring the capacity of your messaging systems to ensure compliance with performance measures defined in SLAs.
Availability Management Involves managing, monitoring, and reporting the availability, reliability, and maintainability of your messaging systems.
Workforce Management Involves providing best practices and assessing staff requirements, developing skills and positive team attitudes, and transferring knowledge.
Security Management Defines and communicates the organization's security plans, policies, guidelines, and relevant regulations defined by the associated external industry or government agencies.
Infrastructure Management Ensures coordination of infrastructure development efforts, translating strategic technology initiatives to functional IT environmental elements, managing the technical plans for IT engineering, hardware, and enterprise architecture projects, and ensuring that quality tools and technologies are delivered.
- Service Desk Provides guidance about setting up and running the organizational unit or department that is the single point of contact between the users and the provider of IT services. Service Desk organizes the activities and customer communications about incidents, problems, and inquiries related to production systems.
The MOF Process Model and MOF Team Model are the core models that define Microsoft Operations Framework. The MOF Team Model provides guidelines for organizing teams, and the functions and competencies of each role cluster. The role clusters in the Team Model work with the SMFs of the Process Model. The Team Model role clusters enable the SMF processes to be followed. The MOF Team Model also suggests combinations of functions that should be kept separate. For example, the team that tests a change before it is released to the production environment should be separate from the team that developed the change. Figure 1.4 shows how these role clusters combine within MOF.
The MOF Team Model defines seven role clusters. These roles frequently reflect how teams are organized in a medium to large environment. The role clusters are discussed in this section:
- Release The release team manages the roll-out of changes into the production environment, is responsible for configuration management, maintains licensing information, and forms a liaison between development and operations groups.
- Service The service team is responsible for end-to-end management for a specific service such as provisioning a messaging solution service across the organization. The service team is involved in the design, deployment, and operations phases of the solution.
- Infrastructure The infrastructure team plans and manages the IT infrastructure, including capacity forecasting, managing standard builds and system images, and monitoring system availability and connectivity.
- Support The support team represents the liaison between users and helpdesk support. This team manages compliance with SLAs, provides incident and problem resolution, and maintains a knowledge base of common resolutions. This team is also responsible for providing customer feedback to design teams.
- Operations The operations team manages user accounts and mailboxes, monitors performance and availability of systems, manages connectivity with external systems, monitors queues and logs, and maintains firewalls.
- Partner The partner team liaises between suppliers and partners such as Internet Service Providers (ISPs). This team manages contractual SLAs with third parties, evaluates alternative suppliers, manages procurement, and makes purchasing decisions.
- Security The security team detects virus attacks, intrusion attempts, denial of service, and other attacks. This team must monitor the usage of IT resources and compliance with standards and audit tracking and reporting. This team frequently manages Public Key Infrastructure (PKI) technologies required for message signing and encryption, as well as external testing including mail relay testing and penetration testing.
This detailed MOF model forms the basis for organizing resources and responsibilities to operate, support, optimize, and make changes to an IT infrastructure.
Best practices are recommendations based on knowledge and experience gained by IT professionals across many environments. They provide standard procedures for typical tasks that your messaging administrators must accomplish daily and list what tools they should use to manage the Exchange 2003 Server organization.
Typical tasks for Exchange administration include the following:
- Capacity and Availability Management Define how and what to measure to predict future capacity requirements. Also, to report on the capacity, reliability, and availability of your systems. You must ensure that servers that are running Exchange are sized to handle load on the system, and that unplanned downtime is kept under the levels defined in the SLA. Additionally, you will need to upgrade hardware to continue to meet the defined requirements.
- Change Management and Configuration Management Control how changes are made to your IT systems. This should cover testing, application, feedback and contingency plans, documentation of all changes, and buyoff from management if problems occur. Keep a record of your software and hardware assets and their configurations.
- System Administration Outline standard methods for doing administrative tasks, such as database administration and messaging administration.
- Security Administration Have a detailed policy and plan to protect the data confidentiality, data integrity, and data availability of your IT infrastructure. This includes day-to-day activities and tasks related to maintaining and adjusting the IT security infrastructure.
- System Troubleshooting Outline methods for dealing with unexpected issues, including steps to prevent similar issues in the future.
- Service Level Agreements Maintain a set of goals for the performance of your IT systems and regularly measure performance against these goals.
- Documentation Document standard procedures, such as configuration information and lessons learned, and make them available to the staffs that need them. As changes to the configuration are made, update the documentation accordingly.
The purpose of capacity management and availability management is to measure and control system performance. It is recommended that you implement capacity management and availability management procedures so that you can measure and control system performance. You must know whether the system is available and whether it can handle the current and the projected demands by setting baselines and monitoring the system to look for trends.
Capacity management involves planning, sizing, and controlling service capacity to ensure that the minimum performance levels specified in your SLA are exceeded. Good capacity management ensures that you can provide IT services at a reasonable cost and still meet the levels of performance defined in your SLAs with the client. These criteria can include the following:
- System Response Time This is the measured time that the system takes to do typical actions. Examples include, the time taken for a client to log on and authenticate to the domain, the time allowed to do a full backup of an Exchange store, and the time taken to retrieve an item from a mailbox or public folder.
- Storage Capacity This is the capacity of a storage system, whether it is a network share, a backup tape drive, or an Exchange store. Examples include the minimum amount of storage space to be provided per user and the amount of time that backup tapes must be kept before being overwritten.
Adjusting capacity is frequently a case of ensuring that enough physical resources are available, such as disk space and network bandwidth. Table 1 lists typical resolutions for capacity-related issues.
Table 1 Typical resolutions for capacity-related issues
Slow logon to mailboxes
Introduce another domain controller to the site or increase network bandwidth
Slow retrieval of documents from a public folder
Ensure that a replica of the public folder is available locally
Access through Microsoft Office Outlook® Web Access is too slow
Increase the available bandwidth of the Internet connection
Shortage of free space on a network share
Add more disks to the server or storage array
Capacity is affected by system configuration and depends on physical resources such as network bandwidth. For example, if a server that is running Exchange Server 2003 is configured to log troubleshooting information to disk, log files may use up disk space over time. This can reduce the disk capacity available to the Exchange store. Capacity management is the process of keeping the capacity of a system within acceptable levels and addresses the following issues:
- Reacting to changes in requirements Capacity requirements need to be adjusted to account for changes in the system or the organization. For example, if you increase the maximum allowed size of messages to the Internet, you may notice a corresponding increase in traffic across the Internet connection. Then, you may need to increase the capacity of the connection to avoid unacceptable delays in message transfer.
- Predicting future requirements Some capacity requirements change predictably over time and can be planned for in advance. For example, the total volume of mail in an Exchange store typically increases at a fairly constant rate. By looking at how the Exchange store size has changed over the last six months, you may be able to predict approximately when it will reach the limit. The maximum size of Exchange stores on a Standard Edition Exchange 2003 server is 16 gigabytes (GB). Therefore you will need to either plan to upgrade to Enterprise Edition, introduce mailbox size quotas, or add Exchange servers and move some mailboxes to those new servers.
Availability management is the process of ensuring that any IT service consistently and cost-effectively delivers the level of availability required by the customer. Availability management is not concerned only with minimizing loss of service, but also with ensuring that appropriate action is taken if service is lost. In an Exchange Server 2003 environment, you may be concerned about whether the Exchange store service is available, whether an SMTP connector is functioning, and so on. An SLA defines what frequency and length of outages are acceptable, allowing for certain periods when the system is unavailable for planned maintenance and unexpected failures.
If you need to provide reports to your management about systems availability, or if you have financial or other penalties associated with missing availability targets, you must record availability data. Even if you do not have such formal requirements, it is a good idea to at least know how frequently a system has failed in a certain time period, for example, system availability in the last 12 months and how long it took to recover from each failure. This information will help you measure and improve your team’s effectiveness in responding to a system failure. It can also provide you with useful information if there is a dispute.
Measures related to availability are as follows:
- Availability This is typically expressed as the time that a system or service is accessible, compared to the time that it is down. It is typically expressed as a percentage. (You may see references to “three nines” or “five nines”. These refer to 99.9% or 99.999% availability.)
- Reliability This is a measure of the time between failures of a system and is sometimes expressed as Mean (or average) Time Between Failures (MTBF).
- Time to Repair This is the time taken to recover a service after a failure has occurred and is sometimes expressed at Mean (or average) Time To Repair (MTTR).
Availability, reliability, and time to repair are related as follows:
Availability = (MTBF - MTTR) / MTBF
For example, if a server fails twice over a six-month period and is unavailable for an average of 20 minutes, the MTBF is three months or 90 days and the MTTR is 20 minutes. Therefore,
Availability = (90 days - 20 minutes) / 90 days = 99.985%
Availability management is the process of ensuring that availability is maximized and kept within the parameters defined in SLAs. Availability management includes the following processes:
- Monitoring Examining when and for how long services are unavailable.
- Reporting Availability figures should be regularly provided to management, users, and operations teams. These reports should highlight trends and identify areas that are doing well and areas that require attention. The report should summarize compliance with targets set in the SLAs.
- Improvement If availability falls under targets defined in the SLAs, or where the trend is toward reduced availability, the availability management process should define what steps are planned. This should include working with other responsible teams to highlight reasons for outages and to plan remedial actions to prevent a recurrence of the outages.
Capacity and availability measurements are repetitive tasks that are ideally suited to automated tools and scripts such as Microsoft Operations Manager, which is discussed later in this topic.
Sometimes you must introduce changes to your IT environment, such as new technologies, systems, applications, hardware, tools, and processes, and also changes in roles and responsibilities. An effective change management system lets you introduce changes to your IT environment quickly and with minimal service disruption. A change management system brings together the teams involved in modifying a system. An example is the introduction of Microsoft Office Outlook® Web Access. Outlook Web Access is an integrated component of Exchange Server 2003 that uses a Web browser and an Internet or intranet connection to enable you to read your corporate e-mail messages, schedules, and other information that is stored on an Exchange server. Deployment of Outlook Web Access in your organization requires involvement from several teams such as:
- Test Team This team must load-test Outlook Web Access on a test server and provide the instructions to implement Outlook Web Access on the production servers. The test team must evaluate Outlook Web Access by using specified types and versions of popular Web browsers, such as Internet Explorer 6.0.
- Exchange Administrators This team administers the system after the change is deployed in the production environment. They must understand the effect of the changes and incorporate them in their procedures before the changes are put into production.
- Network Team This team is responsible for changes in firewall rules to allow access from the Internet to the Outlook Web Access servers.
- Security Team This team assesses security and minimizes risks. The security team must review known vulnerabilities and ensure that security risks are minimized.
- User Acceptance Team This team is composed of users who are willing to test the system and offer feedback for improvements.
The change management process defines the responsibilities of each team and schedules the work to be performed, incorporating checks and tests where they are required. Change controls will vary depending on the complexity and expected effect of a change. They can vary from automatic approval of minor changes, to change review meetings, to full project-level reviews. To illustrate this better, the groups of changes are discussed in this section.
- Major Changes Major changes have a global effect on the system and may require input from various teams. An example of this is upgrading from Exchange Server 5.5 to Exchange Server 2003. Major changes affect many different teams and perhaps different systems. The change management process may follow a similar procedure to the example discussed earlier about deploying Outlook Web Access in your organization, but will probably include one or more change review meetings to inform the teams that will be involved in the change or be affected by the change.
- Significant Changes Significant changes require significant resources to plan, build, and implement. Appropriate change controls should be introduced to ensure that the effect of the change is understood, deployment procedures are tested, and the rollback and contingency plans are ready. An example of a significant change is deploying a new service pack.
- Minor Changes Minor changes do not significantly affect the IT environment, for example, modifying certain Exchange system policy settings.
- Standard Changes Standard changes are performed regularly and are well understood and documented. Examples include creating a new mailbox or changing the scope of backups. Regular changes should be documented in standard operating procedures (SOPs), but they do not require change controls. For example, a procedure for creating a new mailbox may state that all new mailboxes will have a storage limit of 100 megabytes (MB), will have IMAP and Outlook Web Access enabled, and all other client protocols disabled. The change management process should review any changes to the procedure, but should not, for example, be involved in creating every mailbox.
The following example of change management examines how different teams interact and the actions that are performed when a new service pack is deployed. These actions are organized and managed by the change management process.
Raise a change request The security team has assessed the latest service pack and confirmed that it resolves a possible vulnerability in the production system. The team raises a Change Request to have the new service pack applied to all Exchange servers.
Service pack release notes review The Exchange administrator team reviews the service pack release notes to identify the effect on the system.
A series of lab tests is done The Exchange administrator team must perform test updates on a server in a non-production environment to decide whether the service pack can be applied successfully without affecting any of the installed applications and server systems. If there are third-party or internally-created applications that interface with Exchange in a production environment, these should be also tested. These tests can also be used to estimate the time required to perform the upgrades.
Users are informed of the outage The Exchange administrator team or user help desk informs all affected users about the planned maintenance cycle and how long the server will be unavailable.
A full backup of the Exchange store is performed before the upgrade The Exchange administrator team must ensure that there is a valid backup in place to be able to revert to the original system state if the service pack installation fails. It is recommended that the backup be restored to a standby server to have this system readily available if there are problems.
The service pack is deployed The Exchange administrator team does the installation during the planned maintenance cycle.
It is recommended that you implement a procedure for scheduling changes to avoid disruptions in overlapping sections of your work. For example, two teams may both be planning a minor change to a system. One team may be applying a service pack while another team is installing a custom form for an expenses claim application that runs under Exchange Server 2003. Neither team is affected by the changes that the other team is planning and they may not necessarily know about what changes the other team is planning. If both changes occurred at the same time, there could be problems implementing the changes. Also, if there are issues after the changes have been applied, for example if the expense claim application fails, it may be difficult to decide which change should be rolled back. There should be regular maintenance periods set up between IT and management to test the changes and accept them.
Configuration management is the process of recording and tracking hardware and software assets and system configuration information. It is generally used to track software licenses, maintain a standard hardware and software build for client computers and servers, and define naming standards for new computers. Configuration management generally covers the following categories:
- Hardware This category tracks what pieces of equipment the IT organization owns, where they are located, and who uses them. This information enables an organization to plan and budget for upgrades, maintain standard hardware builds, report on the value of IT assets for accounting purposes, and help prevent theft.
- Software This category tracks what software is installed on each computer, the version number, and where the licenses are held. This information helps plan upgrades, ensure that software is licensed, and detect the existence of unauthorized (and unlicensed) software.
- Standard Builds This category tracks the current standard build for the client computers and servers and whether the client computers and servers meet this standard. The existence and enforcement of standard builds helps support staff because they are required to maintain only a limited number of versions of each piece of software.
- Service Packs and Hotfixes This category tracks which service packs are tested and approved for use and which computers are up-to-date. This information is important to minimize the risk of computers being compromised and to detect users who have installed unapproved updates.
- System Configuration Information This category tracks the function of a system, the interaction between system elements, and what processes depend on the system running smoothly. For example, a connector to a third-party e-mail system may be configured on a single server. The e-mail system’s dependence on this server should be understood and contingency plans may be required if there is a failure. If a second connector is installed on another server, dependencies and contingency plans will probably change.
After you determine the purpose of your configuration management exercise and decide what items need managing, you need to implement configuration management by collecting data and reporting data. The simplest approach for small organizations is to collect data manually (number and model of client computers, operating system, software installed) and store it in a Microsoft Office Word or Microsoft Office Excel document. For larger, more complex, and constantly changing systems, the discovery of assets and collection of detailed information must be automated. Decide what information is relevant to your organization and record it in a database.
The configuration management database is a useful tool for support staff and management in the following areas:
- Security Audits The database enables you to identify Exchange servers and client computer systems that need to have hotfixes applied or that have missed the installation of a service pack or the latest antivirus updates.
- Software Installation If you identify client computers that already have Microsoft Office Outlook 2003 installed, this will save time if you are manually deploying Outlook 2003.
- Configuration Information If you maintain an up-to-date list of all Exchange 2003 connectors to mail systems, fax servers, and so on, you will be able to troubleshoot connectivity problems quicker and more effectively.
- Planning Upgrades If a capacity review reveals that additional storage space is required on your Exchange 2003 servers. If each server has an internal RAID controller but each has a different model and a different number of disks installed, the configuration management database will indicate what type of disk can be installed, how many and what the upgrade path will be in each case.
There are many tools to discover, audit, and report assets. Some of these tools are discussed in this section.
- Automated Scripts You can write simple scripts to report items like the operating system, service pack level, and existence of software on a specific set of computers. You can specify these scripts to an organization’s exact requirements; however, the number of scripts required and their complexity can make scripts expensive to create and maintain.
- Automated Tools Depending on the size of your business and your organizational needs, you may want to consider using automated tools. Tools such as Microsoft Systems Management Server (SMS) incorporate standard report templates (such as service pack level) and also enable you to create customized reports, for example, for a custom application. Microsoft Operations Manager (MOM) can also be used to report on hardware and software configurations. For more information about SMS and MOM, see the Systems Management Server and Microsoft Operations Manager sub-topics in “Tools and Technologies for Operating an Exchange 2003 Server Organization” later in this topic.
There are also tools that can be used to record configuration data and make it accessible to the appropriate IT personnel:
- Public Folders These can be useful for storing configuration data, because they can be accessed throughout the organization and can be easily controlled so that only appropriate staff can view or change items.
- Microsoft Windows SharePoint® Services Windows SharePoint Services is the Windows Server 2003 component that helps organizations increase individual and team productivity by enabling them to create Web sites for information sharing and document collaboration. Users can collaborate on documents, tasks, and events and share contacts and other information. Additionally, Windows SharePoint Services enables managers of teams and sites to manage site content and user activity. The SharePoint environment is designed for flexible deployment, administration, and application development.
- Custom Databases For larger organizations, it may be useful to store configuration information in a Microsoft Office Access or Microsoft SQL Server™ database, so more advanced queries can be run on the information. For example, list all Windows XP client computers that do not have Service Pack 2 installed.
- Automated Tools Tools such as SMS not only automatically gather the data, but store it in a central database where it can be used to do standard and custom queries, and reports on the data.
Configuration management is closely related to change management. Configuration management identifies the need for change and identifies and records that a change has occurred. For example, the configuration management database can be used to identify servers that require a hotfix. Change management then defines the process for applying the hotfix.
Conversely, if a new software package is rolled out, the change management process should feed this information to the configuration management system. The configuration management tools will probably need to be configured to identify the new software so that they can discover and track where and when the software is deployed.
System administration includes the day-to-day administrative tasks, both planned and on-demand, that are required to keep an IT system operating smoothly. Typically, system administration tasks are covered by written procedures. These procedures ensure that the same standard tools and methods are used by all support staff.
In an Exchange Server 2003 environment, typical system administration tasks include creating mailboxes, backing up and archiving mailboxes and public folder data, monitoring logs, maintaining and recovering mailboxes, and updating antivirus scanners.
There are several resources that help you define what standard procedures are required in your organization and how to do them. For more information about how to administer your Exchange organization, see the Exchange Server 2003 Administration Guide (http://go.microsoft.com/fwlink/?LinkId=21769). Because each organization is unique, you will have to customize and adapt these resources to suit your requirements.
Standard procedures will change and documentation will occasionally need to be revised. As changes are made, the change management process should identify how each change is likely to affect how administrative tasks are performed. Use the change management function to update and control the documentation.
Frequently, change management takes over where system administration finishes. If a task is covered by a standard procedure, it is part of the system administration function. If there is no standard procedure for a task, it should be handled using the change management function.
Roles and responsibilities for doing system administration tasks depend on whether the organization follows a centralized or decentralized model, or a combination.
The Centralized Model In a centralized model, one or several controlled administrative groups maintain complete control of the Exchange system. This administrative model is similar to a data center where all administration tasks are performed by a single information technology group. Roles and responsibilities within the team should be defined according to experience and expertise.
The Decentralized Model Decentralized organizations are located in several geographic locations and have Exchange servers and teams of administrators in different locations. For example, there may be local administration staff and one or more Exchange servers for each office in each country. Alternatively, there may be a cluster of Exchange servers and an administrative team for North America and one for Europe. Sometimes, you will want to ensure that administrators are responsible only for their own geographical area and that they do not have permission to administer other areas. In Exchange Server 2003, you can do this by using the Delegate Control Wizard to assign administrators to specific administrative groups.
An organization must be prepared to deal with unexpected problems and should have a procedure to manage problems from the point at which they are reported until their resolution. Information about how support staff diagnosed a problem should be recorded and used in the future, to avoid unnecessarily repeating work that has already been completed.
Figure 1.5 shows the system troubleshooting process and the interactions with other operations roles.
- Classify and Prioritize This task is typically performed by the service desk. For example, a problem may be grouped as a messaging issue or a hardware issue. The problem is then routed to the appropriate support team for investigation. The rules for determining the priority of a problem, together with the time to respond and time to resolve, are typically defined in the SLA.
- Investigate and Diagnose The appropriate support team diagnoses the problem and proposes changes to resolve the problem. If the solution is simple and does not require change control, the solution can be applied immediately. If the solution is not simple, a request for change should be raised and the proposed work should be managed by the change management process, frequently under a “fast-track” procedure. Any changes that are made should be recorded using the configuration management process.
- Close and Record After testing the resolution, the problem should be closed. If there are lessons to be learned from the problem, an entry should be created in the knowledge base.
- Review and Trend Analysis Periodic reviews of recent problems should be performed to identify problem trends. For example, if your users are experiencing frequent problems with slow logons to their mailboxes, network bandwidth issues may be the cause. Problem resolution times and the effect of any outages on system availability should be reviewed and compared with the SLA. The person who liaises with the customer on service issues, such as an account manager, should be informed of any significant problems.
Service desk tools enable staff to record, classify, and prioritize new problems. They will then provide the workflow processes to manage the problem “ticket” through investigation and diagnosis, often by more than one support team. They will frequently provide reports on resolution times and historical trends. They may also include a knowledge base database, which can be used to search through past problems. The Microsoft Knowledge Base is a useful record of support issues that have been encountered by Microsoft. For more information, see the Microsoft Help and Support Web site at http://go.microsoft.com/fwlink/?linkid=14898.
There is third-party software but it typically requires customization to suit the organization’s needs, such as the organization of teams, reporting requirements, and measures required by the SLA.
The service level agreement (SLA) is a document that defines what services your customer expects from you. The complexity and content of this document depends largely on whether customers are internal (within your company) or external.
If your customer is external, the SLA may be part of a legal contract with financial incentives and penalties for performance that falls inside or outside defined levels of service. Defining these levels of service should be part of the overall contract negotiation.
As with all contracts, it is important that both parties understand what is expected of them and what to expect. The SLA defines these expectations. The contents of the document should change infrequently and only because of negotiations with the customer.
If your customer is internal, you may still want to define the services expected of operations teams and of IT systems. The SLA may be created by the operations staff and intended as a set of goals for the availability of IT services within your organization. Alternatively, performance levels may be set by management and used as benchmarks when assessing staff performance.
Service level agreements include components that define criteria of minimum levels of availability, support, and capacity.
- Availability Define the hours and the operating systems on which e-mail and other Exchange services will be available (this may include handheld PDAs and mobile phones). Any routine maintenance that affects service availability should be defined. Define external factors that affect service, for example the loss of Internet connectivity.
- Support Define the hours when support for a system will be available. Specify methods for customers to contact support staff, how incidents are grouped, and target time to respond and to resolve the incident. Define frequency and content of feedback to the customer.
- Capacity Define the maximum allowed size of a user’s mailbox, together with steps to take if the limit is exceeded. Define the maximum allowed time to do standard tasks, such as the time to retrieve a document from a public folder. Define the maximum number of users and agree to a process to follow to increase capacity if more users are added.
The Microsoft Operations Framework (MOF) model is composed of many service management functions. Documentation about how and when tasks are performed can be shared with members of the same team or with other teams. The method of storing and sharing documentation can vary according to the type of function. For example, the procedures for system administration may be stored as Word documents because they are likely to be printed and referenced frequently. Configuration management information may be automatically generated and stored in a database for easy searching and indexing. Some documentation may be sensitive and should be restricted. An example of sensitive documentation is a document describing security measures to prevent e-mail spoofing, spam, virus, and attacks from malicious users. These documents should be made available only to the appropriate people. A mechanism for version control has to be in place so when documentation changes old copies in circulation can be replaced.
A documentation management system acts as a central repository for documents, ensuring that only the latest revision of a document is available. You can also consider archiving the older version of the document for reference purposes. Microsoft SharePoint® Portal Server is one of many appropriate applications for managing documents. To prevent the accidental use of an out-of-date procedure, discourage the IT staff from using local copies of documents. Documents can be restricted and viewed or edited only by people with permission to do this. Draft documents can be held pending approval, for example when waiting for an associated change request to be approved.
Several tools and management functions have been discussed that are suited to using databases. The configuration management process is likely to use automated processes that store large amounts of data that require indexing and searching. Support staff may search a database of past problems and resolutions when troubleshooting new problems.
It is likely that there will be different databases being used for different purposes. Decide if these databases should be linked or consolidated. For example, if the service desk identifies several problems with a common theme (such as new software causing a problem with a particular network card), they can query the configuration database to predict how many computers might be affected.
There are many IT services that should be monitored automatically and in real-time. Also, there could be critical situations in which operations staff must be alerted immediately, for example, a message queue backlog. If it were to be discovered only during a manual check at the end of a working day, the delay in mail getting through could have serious implications for a business.
There are many more complex monitoring tasks that are not commonly automated and these should be covered by regular manual checks. For more information about the need for regularly documented checks and maintenance task, see "System Administration" earlier in this topic. The following are lists of some of the standard tasks relevant to Exchange Server 2003. Use these lists as a basis for generating standard procedures for your organization. These lists will be discussed in more detail later in this topic.
It is recommended that the following tasks and procedures be performed daily:
- Examine the Windows Event Logs for Exchange warnings and errors The Exchange server event logs should be checked for unexpected warnings and errors. You can do this manually on each Exchange server or by using a tool like Microsoft Operations Manager (MOM), which can consolidate logs and filter out certain entries.
- Check Backup Jobs Ensure that the previous night’s backup jobs have run and investigate any errors or warnings. There should be a procedure for media rotation, labeling, and storage, according to the backup strategy being used. If applicable (based on the type of the backup that was run), determine whether the transaction logs have been flushed from the disk as part of the backup process.
- Check Performance Monitors You may be using Windows Performance Monitor or more advanced tools such as MOM to check key performance indicators, such as free disk space and message queue lengths for an overall view of the state of the server or system. Use the alert feature of these tools to set up warnings for any sudden changes or problems and set baselines that you can modify with the growth of your organization.
- Check Intrusion Detection Logs If you have dedicated intrusion detection software or if your firewall produces logs of intrusion attempts, review the logs for the previous day, investigating repeated authentication attempts and other suspicious activity.
- Check Antivirus Updates Check that the automatic antivirus signature update has been working on each Exchange server and/or gateway and that all the signatures are up-to-date. If you manually update antivirus signatures, do them daily.
It is recommended that the following tasks and procedures be performed weekly:
- Archive Event Logs If event logs are not configured to overwrite events as required, they must be regularly archived and purged. This is especially true of security logs, which may be required when investigating attempted security breaches.
- Check for Security Updates Identify any new service packs, hotfixes, or updates. If appropriate, test these in a test lab and use the change control procedures to arrange for deployment to the production servers.
- Review SLA Performance Figures Check the key performance data for the previous week. Review performance against the requirements of the SLA. Identify trends and items that have not met their targets.
- Check Public Folder Replication Check that public folder replication is up-to-date. If replication is failing, users might not be able to access data, or they may be accessing data from remote sites, resulting in gradually increasing WAN traffic.
- Archive Data Archive data to CD, DVD, tape, or similar media. After a user has left and depending on your organization’s policy, you may have to leave the mailbox for a period of time and then archive it to maintain a reasonable Exchange store size.
- Environmental Tests Air conditioning, temperature and humidity monitors, and physical security measures should be periodically checked and maintained.
It is recommended that the following tasks and procedures be performed monthly:
- Security Checks Depending on the level of required security, it may be appropriate to perform regular audits of security, including firewall rules, user rights, group membership, delegate rights, and so on.
- Capacity Planning Review capacity figures for the previous month and produce a plan for any upgrades that may be required in the coming months to keep the system operating within limits specified by the SLA.
- Disaster Recovery Test Do a system recovery for a single server to test hardware. This will simulate a complete hardware failure for one server and ensure that the resources, plans, and data are available for recovery. Try to rotate each month, so that failure of a different server or other piece of equipment is tested every time. For example, mail relay server, front-end Exchange server, back-end Exchange server, firewall, and so on.
The following tasks are performed as-required, but are frequently also covered by standard procedures:
- New Users and Leavers New users typically require a user account, a mailbox, certain rights and group memberships, possibly an e-mail copy of the organization’s IT and security procedures, and so on. For this to occur quickly, the exact procedure should be documented. People who leave the organization must have access to their mailbox and other systems revoked (often urgently). You may require a policy to define what should be done with e-mail destined for the user (should it be re-routed or rejected). You will also need a procedure to explain what happens to a user’s Exchange data after they leave.
- Public Folder Creation You can grant users permission to create some public folders, but other folders (especially top-level folders) should be created by administrators only. A procedure should define who can make requests and what permissions should be applied.
- Mailbox Recovery Recovery of an entire mailbox can be done by using the mailbox recovery center. You can do this quickly and safely if it is standardized in a procedure.
- Full Security Audit This may be performed regularly, in response to an upgrade or redesign of the messaging system, or in response to an attempted (or successful) security breach. The procedure may involve port scans on servers and firewalls, audits of security fixes, and third-party penetration tests.
- Update Performance Baselines Performance baselines should be updated after an upgrade or configuration change. Baselines will be used to measure performance changes and to detect problems that affect system performance.
- Database Maintenance Use disk defragmentation to perform database maintenance. Defragmenting your hard disks helps increase disk performance and helps ensure that your Exchange servers run smoothly and efficiently.
- Other Database Maintenance Other database maintenance can be categorized under system troubleshooting. It is useful to have a procedure about how to use Isinteg.exe, Eseutil.exe, and other standard tools in response to specific problems.
The basic set of tools for administering Exchange servers and users includes the Windows Server 2003 administrative tool pack and the Exchange Server 2003 system management tools. This section outlines some of the tools for managing the operations of an Exchange Server 2003 organization. These management tasks and tools are divided into six groups:
- Active Directory and Permissions Management Exchange Server 2003 is tightly integrated with, and depends on Active Directory. User attributes that pertain to Exchange servers (e-mail address, ability to connect using POP3, mailbox server, and so on) are stored as user attributes in Active Directory. Therefore, an important tool for managing Exchange users is the Active Directory Users and Computers MMC snap-in.
- Security Updates and Software Updates Messaging systems can be especially vulnerable to malicious attacks because most messaging systems are connected to the Internet and must be able to accept unsolicited connections from unknown systems. It is very important to apply all security updates to servers exposed to public networks as soon as they are available. Ensure that you test the security updates in a test environment before you deploy them in your production environment. To verify that your servers are up-to-date and report back on missing service packs and hotfixes for the operating system and applications, you can use Microsoft Baseline Security Analyzer (MBSA). MSBA, which is available as a free download from Microsoft, should be used to do audits on systems. For more information, see the Microsoft Trustworthy Computing: Security Web site (http://go.microsoft.com/fwlink/?LinkId=26388).
Note: MBSA looks for security updates on Exchange Server 5.5 and above. It will not check how Exchange is configured.
Conversely, Software Update Services (SUS) is a tool for automatically deploying security updates and other necessary updates. It is especially useful for updating workstations, but can also be used to update servers. SUS automatically downloads each update over the Internet as it is released by Microsoft. It allows you to test a new update before automatically deploying it. Using Active Directory policies, you can control which computers receive updates and how the updates are applied. Workstations can be configured to download and install updates and restart as needed without manual intervention. Servers are typically configured to download updates only. An administrator must manually install the updates and restart the server at a convenient time. For more information about SUS, see http://go.microsoft.com/fwlink/?LinkId=35215.
- Conversely, Software Update Services (SUS) is a tool for automatically deploying security updates and other necessary updates. It is especially useful for updating workstations, but can also be used to update servers. SUS automatically downloads each update over the Internet as it is released by Microsoft. It allows you to test a new update before automatically deploying it. Using Active Directory policies, you can control which computers receive updates and how the updates are applied. Workstations can be configured to download and install updates and restart as needed without manual intervention. Servers are typically configured to download updates only. An administrator must manually install the updates and restart the server at a convenient time. For more information about SUS, see http://go.microsoft.com/fwlink/?LinkId=35215.
- Security Management Use the IIS Lockdown Tool to secure Internet Information Server 4.0, 5.0, and 6.0. The IIS Lockdown (IISlockd.exe) tool is required for Windows® 2000 Server only. In Windows Server™ 2003, IIS Lockdown is a core part of Internet Information Services (IIS). For more information about how to use the IIS Lockdown tool for a server running Exchange Server 2003 on Windows 2000, see Security Operations Guide for Exchange 2000 Server (http://go.microsoft.com/fwlink/?linkid=11906). The IIS Lockdown Tool is composed of two parts. The first part changes configuration settings within IIS to prevent common attacks against Web servers. It also provides the option to remove or disable unused services and Web pages. The second part is referred to as URLScan, which prevents malicious and malformed requests from Web browsers. There are templates provided within IIS Lockdown for various server roles that disable only the services that are not required for a specific server role. When you are running IIS Lockdown on an Exchange server, make sure that you use the template for the appropriate version of Exchange. For more information about the IIS Lockdown Wizard, see Microsoft Knowledge Base article 325864, "HOW TO: Install and Use the IIS Lockdown Wizard" (http://go.microsoft.com/fwlink/?LinkId=3052&kbid=325864).
- Microsoft Operations Manager In medium to large enterprises, or in organizations where high availability is important, consider using an automated tool to track the performance and availability of servers and for capacity management. These tools can be configured to report on long-term trends such as a gradual drop in response to MAPI clients as the number of clients increase. They can also be used to alert support staff if a failure occurs or there is a rapid degradation of service. Microsoft Operations Manager is an application for monitoring and alerting, managing event logs, reporting, and trend analysis. It is used to monitor servers and automatically logs and analyzes events and statistics from many computers. For more information about Microsoft Operations Manager, see the Microsoft Operations Framework Web site (http://go.microsoft.com/fwlink/?linkid=21640).
- Systems Management Server Systems Management Server is a tool that enables you to automate configuration management tasks by gathering information about hardware and software assets, storing this data in a SQL Server database, and allowing custom queries and reports to be generated. Besides doing the reporting duties of configuration management, Systems Management Server can deploy software, service packs, and hotfixes. System Management Server can be configured to automatically detect devices on a network and to deploy agents to each server or workstation.
Each agent gathers configuration data from the device and passes the information back to the central database. It can therefore be used to automate many of the regular changes discussed in the change management section. Windows Server 2003 software deployment services can be used to deploy software to computers or users across an organization or in a specific organizational unit (OU). With the flexibility of Systems Management Server you could, for example, deploy software only to clients within a specific OU, running Windows XP, with at least 2 GB of free disk space, and at least 256 MB RAM. For more information about System Management Server, see http://go.microsoft.com/fwlink/?LinkId=34976.
- Exchange Server 2003 Tools Standard tools to manage an Exchange Server 2003 organization include Exchange System Manager and the Exchange Migration Wizard. For more information about using Exchange System Manager to do standard tasks, see the topic, "Daily Operations Tasks." The Exchange Migration Wizard, is used to migrate user mailboxes from other messaging systems, such as Exchange Server 5.5, Exchange 2000 Server, Microsoft Mail for PC Networks, or Lotus cc:Mail to Exchange Server 2003.
- Automation Tools and Scripts Many administrative tasks can be automated by using scripts. Microsoft Windows Scripting Host (WSH) is the host application that enables scripts to be run and is installed by default on Windows Server 2003 and Windows XP Professional. Of the scripting languages, VBScript (a derivative of Visual Basic®) is the most commonly used for administrative tasks, where speed of writing is more important than elegance and efficiency of the code. There are many examples of useful scripts on the Web that can be customized to suit your needs. For useful script examples, see the Script Center on the Microsoft TechNet Web site (http://go.microsoft.com/fwlink/?linkid=33284).