Deploying and Supporting Enterprise Search
Using SharePoint Server 2007 to Help Employees Locate Information and People
at Microsoft
Technical White Paper
Published: July 19, 2007
|
Situation
|
Solution
|
Benefits
|
Products & Technologies
|
|
The amount of information stored on the Microsoft intranet is growing exponentially.
Locating that information has become an increasing problem.
|
Microsoft IT has deployed the Enterprise Search feature of Microsoft Office SharePoint
Server 2007 to help users locate relevant information faster and easier than ever
before.
|
- Improved ability to locate relevant content on the Microsoft intranet
- Improved ability to publish content that can be located easily
- Improved ability to find employee and subject matter contacts
- Improved integration with line-of-business applications
- Improved consistency in search functions and search results
- Reduced effort to administer enterprise search
|
- Microsoft Office SharePoint Server 2007 Enterprise Edition
- Microsoft SQL Server 2005
- Windows Server 2003
|
Executive Summary
There are more than 120,000 employees, contractors, and vendors at Microsoft, and
they create and store an incredible amount of digital information in the form of
Microsoft® Office documents, pictures, videos, Web pages, and other formats.
The information is stored on servers throughout the Microsoft network. In many cases,
several people can use a document or file to complete assignments if only it is
accessible or advertised for use.
For example, an employee on the Office User Assistance team might be developing
user documentation for Microsoft Office Word. At the same time, another employee
might be creating marketing collateral for a specific vertical industry marketing
campaign that includes Office Word as a part of the solution. If the employee creating
the campaign is unaware of the user documentation created by the other employee,
the documentation is underutilized. Enterprise search can help an employee take
advantage of other employees' knowledge and information. This is the business value
of enterprise search. Considering the size of Microsoft and the diversity of jobs,
including product development, marketing, operations, sales, and administration,
the volume of information involved is overwhelming.
Files and information are stored on a variety of internal content sources, such
as document libraries, intranet Web sites, within Exchange Server 2007 data stores,
file shares, databases, and local hard disk drives. To make the situation even more
challenging, employees continually add new types of content every day.
The problem is how to identify and crawl the information that Microsoft has today
and to keep the index up to date with all the new content and content types in the
future in a way that is useful to a variety of users. Naturally, maintaining security
while ensuring that the index is current is essential to enterprise search.
The solution is the Enterprise Search feature provided in Microsoft Office SharePoint®
Server 2007. The new enhancements of the Search feature incorporate fast searching
and the Indexing Service. Fast searching builds off the Indexing Service of Office
2003 to create database catalogs of the Office files that are available on a computer's
hard disk, in projects, and in SharePoint Web folders.
Microsoft built the Indexing Service feature from the Index Server technology of
Microsoft Windows 2000 to create a better cataloging experience for fast searching
and to offer the ability to search any Office file for information that the file
contains.
Enterprise Search helps employees to collaborate more easily, reduce duplication
of efforts, and perform job functions more efficiently with more depth than in the
past. With Enterprise Search, employees can more easily find the people, information,
tools, and software necessary to perform their day-to-day job functions. Employees
can be more productive and improve the quality of their work without relying on
others to assist them in completing their tasks.
By using flexible and innovative indexing rules, Office SharePoint Server 2007
can find relevant data so that Microsoft employees can locate the people and information
that they are looking for with greater accuracy. Performance improvements in Office
SharePoint Server 2007 help employees locate people and information faster
than before.
This white paper covers:
- Teams at Microsoft that were involved in the development, deployment, and ongoing
management of the solution.
- Background on Enterprise Search at Microsoft.
- Business values achieved in deploying the Enterprise Search solution.
- Architectural design of the solution.
- Products and technologies used in creating the solution.
- Administration of the shared search services for the solution.
- Administration of indexed sites and content for the solution.
- User experience when using the Enterprise Search solution.
- Migration of Enterprise Search from Microsoft Office SharePoint Portal Server 2003
to Office SharePoint Server 2007.
- The best practices that Microsoft used in developing and deploying the solution.
This document shares the experiences of Microsoft teams in deploying Enterprise
Search at Microsoft. Because of the significant amount of knowledge that these teams
gained, the experience provides relevant guidance to organizations that want to
help improve employee productivity, improve the quality of work produced, reduce
duplication of efforts, and take advantage of the financial investment in digital
assets by deploying Enterprise Search.
This white paper assumes that readers are technical decision makers and are already
familiar with SharePoint Portal Server 2003, Office SharePoint Server 2007,
and Microsoft SQL Server 2005. Many of the principles and techniques described
in this paper can be employed to manage deployment and operations risk within any
organization, and the design considerations for enterprise search can likewise be
applied to almost any enterprise-scale IT environment through Microsoft products.
However, this paper is based on the experience and recommendations of Microsoft
Information Technology (Microsoft IT) as an early adopter. It is not intended to
serve as a procedural guide. Each enterprise environment has unique circumstances;
therefore, each organization should adapt the plans and lessons learned described
in this paper to meet its specific needs.
Note: For security reasons, the sample names of forests, domains, internal
resources, organizations, and internally developed security file names used in this
paper do not represent real resource names used within Microsoft and are for illustration
purposes only.
Introduction
Enterprise search has been a service in one form or another on the Microsoft intranet
for more than a decade. With the release of SharePoint Portal Server 2003,
the landscape for enterprise search at Microsoft began to change. SharePoint Portal
Server 2003 provided a holistic approach to locating information in the Microsoft
intranet and a consolidation of search services through a set of centralized services.
This became the foundation for the current solution.
it is worth noting that although the focus of this paper is on the Enterprise Search
features of SharePoint technologies, Microsoft makes heavy use of other collaborative
features of SharePoint, ranging from document libraries, to lists, to blogs, to
wikis. Over the past 10 years, the variety of collaboration has exponentially increased
the amount of content that Microsoft crawls.
Shared Services Provider
The architecture of Office SharePoint Server 2007 is based on shared service
providers (SSPs). An SSP represents a set of services that can be configured a single
time and shared across many different Office SharePoint Server 2007 portal
sites and Windows SharePoint Services sites. The SSP feature creates groups of shared
services and related shared resources managed on a central administrative console.
Administrators create and configure SSPs so that they are available to multiple
intranet sites within a farm. An administrator assigns each Web application in a
farm to an SSP, and although an administrator can create multiple SSPs for a farm,
an individual Web application or site can be associated with only one SSP.
Microsoft IT has data centers that provide enterprise shared services in Redmond
(WA), Dublin, and Singapore. The SSP feature enables Microsoft IT to simplify management
of search services in just three major regional data centers. Three full-time employees
are dedicated to managing the SSPs.
The SSP in Redmond crawls content worldwide on the Microsoft intranet. Sites that
connect to this SSP can create search scopes on their sites that contain results
from content worldwide. The SSPs in Dublin and Singapore index content in their
respective regions and provide search services for Web applications in the Europe,
Middle East, and Africa (EMEA) region and the Asia region, respectively.
Figure 1 illustrates the distribution of shared enterprise search services at Microsoft.
.gif)
Figure 1. Shared enterprise search services at Microsoft
The data centers in the three locations support the SharePoint services and infrastructures
deployed worldwide at Microsoft. The current SharePoint services and infrastructure
worldwide include:
- More than 350,000 sites and subsites.
- More than 400 geographic locations.
- Approximately 13 terabytes of content stored in the IT-hosted SharePoint deployments.
Unlike most enterprise organizations, Microsoft does not require all SharePoint
sites to be managed by Microsoft IT. Employees can create their own SharePoint sites
at any time. Even so, the SSPs index content on those sites.
Table 1 describes the characteristics of the current enterprise search services
that are located in Redmond.
Table 1. Characteristics of the Enterprise Search
Solution in Redmond
|
Characteristic
|
Description
|
|
Index content
|
- Approximately 27 million items are indexed.
- Indexed items include content stored on Office SharePoint Server 2007 sites,
SharePoint Portal Server 2003 sites, Windows SharePoint Services sites, file
shares, Microsoft Exchange public folders, custom Web sites, and structured data
sources.
- Indexed content is gathered from more than 25 content sources, some with hundreds
of individual start addresses (six of which are scheduled for daily incremental
indexing).
|
|
User profiles
|
- Imported from the Active Directory® directory service.
- Imported from custom data sources through the Business Data Catalog feature in Office
SharePoint Server 2007.
- Profile data supplied by end users is replicated between each of three regions.
|
|
Integration
|
- Additional Business Data Catalog connections to other line-of-business applications
(like the Microsoft customer relationship management [CRM] system and the in-house
library).
|
|
Database and index sizes
|
- SSP search database is approximately 340 gigabytes (GB).
- SSP profiles database is approximately 70 GB.
- Search index is approximately 300 GB.
|
|
Volume of queries
|
- Approximately 500,000 queries per month.
|
Sites Using Shared Enterprise Search Services
On a daily basis, employees throughout Microsoft use several sites that use enterprise
search services. Table 2 lists a few examples of sites on the Microsoft intranet
that use the shared enterprise search services and a description of each site.
Table 2. Sites That Use the Shared Enterprise Search
Services
|
Site
|
Description
|
|
MSW
|
This site is the primary intranet portal for Microsoft. Employees use this site
as the starting place for locating information that relates to their employment
and job function at Microsoft.
Typical usages for this site include locating:
- Campus services information.
- Daily news.
- Internal content and newsletter information.
- Links to internal resources for common tasks such as ordering hardware, software,
and planning travel.
|
|
Division-level or department-level portals
|
Some examples are:
- ITWeb, an internal portal for technical support.
- LCAWeb, a portal for the Legal and Corporate Affairs group.
- InfoWeb, a portal for sales and marketing information.
- InfoPlus, a portal for employee productivity documentation.
|
|
Team sites
|
Thousands of sites that are self-provisioned through a sign-up page on the ITWeb
site.
|
|
My Sites
|
Personal sites available for each employee.
|
Each of the sites in Table 2 uses a common infrastructure for enterprise search
services. The benefits that Microsoft realized by deploying shared enterprise search
services include:
- Improved consistency in search functions and search results.
- Improved availability of enterprise search.
- Reduction of effort to administer and support enterprise search.
- Reduction of redundant crawling by multiple systems.
Organization
The teams that were involved in the deployment of enterprise search services at
Microsoft include Microsoft IT and the Office SharePoint Server 2007 development
team.
Microsoft IT
Microsoft IT is unique from most other IT organizations because its first priority
is to provide as much feedback to the product development groups as possible. Microsoft
IT adopts this priority to help ensure that Microsoft products are thoroughly tested
in an enterprise-sized production environment prior to delivery to customers. For
example, from an operational perspective, this means that Microsoft IT may leave
a server offline longer than another organization might if doing so will provide
valuable data that will help the product development groups improve Microsoft products.
Microsoft IT's second priority is providing excellent service levels to employees.
For enterprise search, Microsoft IT must ensure that the services meet or exceed
the business and technical requirements of the various people that use the service.
The team's third priority is sharing best-practice information with Microsoft customers.
Because Microsoft IT runs software prior to release, it can transfer knowledge about
its experiences to customers (such as this white paper and other documents).
Microsoft IT is responsible for the design, development, deployment, and operations
for enterprise shared services at Microsoft. The following groups within Microsoft
IT have specific responsibilities in managing the enterprise search:
- Information Services team
- Operations team
- Engineering team
Information Services Team
The Information Services team is responsible for the business requirements and management
of the shared services, in addition to developing, testing, and documenting the
solutions. Table 3 lists the teams within the Information Services team and
the responsibilities of each team.
Table 3. Information Services Team Descriptions
and Responsibilities
|
Team
|
Description and responsibility
|
|
Program Management
|
This team provides business ownership of the shared services. The team also is responsible
for business requirements, rules, end-user experience, and deployment of new technologies
and features. Responsibilities also include being the liaison to the product development
groups, managing the testing process, managing the budget, and providing project
management reporting. The team also provides tier 2 support and consulting with
site administrators for sites that consume the shared search services.
This team consists of:
- One team lead.
- One individual dedicated to Search and Best Bets. Best Bets are specifically selected
search result URLs identified as highly relevant content based on a combination
of search keywords or synonyms.
- One individual dedicated to People Profiles and My Site experience.
- One individual dedicated to taxonomy and metrics.
- A vendor resource (equal to one or two full-time employees) who provides editorial
support; analysis of end-user query logs for search keyword and Best Bets development;
and the first tier of non-technical end-user feedback and questions.
|
|
Solutions Planning, Development and Test
|
This team is a shared resource that primarily provides services for the primary
corporate portals. However, it also provides consulting for other teams and builds
solutions that can be reused throughout the intranet. This team worked on specific
projects around enterprise search during the course of the migration, but also worked
on other aspects of the deployment.
This 12-person team is responsible for:
- Program management of solutions.
- Development and testing of custom code.
- Customization of Office SharePoint Server 2007 based solutions.
- Creation of proof-of-concept and production code.
- Production code updates after deployment.
|
Operations Team
The Operations team is a shared resource that supports the shared services, customer
portals, and content hosting. This team is responsible for maintaining the server
infrastructure, incident management, end-user technical support, Helpdesk training,
monitoring, upgrades, patch management, hotfix management, and backup and restore
services. It contains one full-time employee whose primary responsibility is oversight
of any configuration changes on the three regional SSPs.
Engineering Team
The Engineering team is a shared resource that is responsible for designing new
deployments or infrastructure changes, testing, and publishing documentation and
guidance. This includes providing detailed guidance, operations, and configuration
documents to the Operations team.
Office SharePoint Server 2007 Development Team
The Office SharePoint Server 2007 development team provides guidance on the
features and configuration settings to Microsoft IT. In return, Microsoft IT provides
feedback on the use of Office SharePoint Server 2007 in the Microsoft production
environment and feedback on the end-user experience. Much of the feedback that Microsoft
IT provides results in new features or enhancements of the product.
Previous Enterprise Search Services with SharePoint Portal Server 2003
The enterprise search services based on SharePoint Portal Server 2003 were
similar to the deployment illustrated earlier in Figure 1. Just as with the
current solution, the Redmond data center indexed content from worldwide sources
on the Microsoft intranet. The Dublin and Singapore data centers indexed content
and provided search services for regional content in the EMEA and Asia regions,
respectively.
The SharePoint Portal Server 2003 based solution provided improvements (from
previous solutions) that caused a proliferation of SharePoint-based sites for collaboration
within Microsoft. A large portion of employees began creating sites to replace traditional
file shares and to collaborate by using a quasi-social computing method. The server
infrastructure at Redmond for the previous SharePoint Portal Server 2003 based
solution included the following servers:
- Various Web front-end servers. Each of the top-level portals had a different number
of servers based on the scaling requirements.
- Two query servers that each had dual processors and 4 GB of memory.
- Three index servers that each had dual processors and 3 GB of memory.
- Two database servers running Microsoft SQL Server 2000 in an active/passive
cluster configuration.
Despite improvements in SharePoint Portal Server 2003, challenges still existed
with managing the service at the scale of Microsoft, particularly regarding crawl
performance and multiple indexes. A large index could take up to two weeks to complete
crawls, and content did not appear in the search until the crawl was completed and
the index was propagated to the query servers. Challenges also existed in providing
enterprise search for all content, because relevance could be skewed during queries
of multiple indexes of varying sizes.
Microsoft IT collected user surveys on the SharePoint Portal Server 2003 based
enterprise search solution. These surveys assisted Microsoft IT in identifying user
satisfaction and dissatisfaction. This information was in addition to the operations
experience that Microsoft IT gained and the challenges that Microsoft IT encountered
during the support of the solution.
As the SharePoint development team planned the next release of SharePoint, Microsoft
IT worked with that team to:
- Identify new features to improve user satisfaction.
- Improve efficiencies in common administration tasks to further reduce the cost of
operation and support.
- Test code improvements in a real-world environment.
- Validate user interface design.
- Gather employee feedback on search features.
After the release of SharePoint Portal Server 2003, the SharePoint development
team identified ways to:
- Scale the number of items indexed to even larger numbers.
- Reduce the time to index the same amount of content.
- Reduce the response time for returning search results.
Shared Services Architectur
Microsoft provides centralized shared services for many products and services (such
as Microsoft Exchange Server 2007, Active Directory, file services, print services,
and SharePoint-based shared services). Like most organizations, Microsoft is transitioning
from departmentalized services to centralized services in managed data centers.
Office SharePoint Server 2007 provides a number of improvements over SharePoint
Portal Server 2003 for providing shared services. Microsoft specifically designed
Office SharePoint Server 2007 to enable organizations to provide shared collaborative
services and enterprise search services.
Shared Services Provided Within
Microsoft
Microsoft IT manages and operates a number of shared services within Microsoft.
These services support all SharePoint sites connected to the centralized Web farms
that Microsoft IT hosts.
The enterprise search service is the primary service provided within Microsoft,
and it is the focus of this paper. This search service is for SharePoint Web sites
that are hosted on the Microsoft shared services (such as MSW, InfoWeb, My Sites,
and other sites throughout the Microsoft intranet).
Other shared services that interact with the enterprise search services include:
- Importing information from Active Directory. Active Directory is the centralized
repository for much of the information about employees. This information can be
imported into Office SharePoint Server 2007 for use by SharePoint sites hosted
in the shared services, in addition to People search.
- Synchronizing user profile information. Microsoft developed custom code to
replicate the user-supplied profile data between regions. When a user makes a change
to his or her profile, the code writes the changes to the SSP store in each region.
Note: This functionality is not a standard feature in Office SharePoint Server 2007.
However, Microsoft developed the custom code by using published application programming
interfaces (APIs), such as the change log and profile store Web services. Microsoft
customers can use the same code in conjunction with a Microsoft Services engagement.
- Maintaining integration of line-of-business applications by using Business Data
Catalog. Business Data Catalog enables the enterprise search services to index
the information in these line-of-business applications. Microsoft IT maintains the
configuration of Business Data Catalog to ensure that the appropriate information
is being indexed.
- Generating usage reports. Office SharePoint Server 2007 provides search
usage reports at both the SSP level and the local site collection level. These reports
give content owners and site administrators information about what information users
are searching for on their sites. For example, reports can tell a content owner
which search result is most popular for information on their site.
Different categories of sites connect to the shared services. These site categories
include:
- Self-service sites. These include sites such as My Sites or sites that can
be self-provisioned by employees. There is a large number of these sites, but they
typically require little customization. Users do not pay to create and use these
sites. The Microsoft Helpdesk provides support for these sites.
- Custom sites and portals. These sites typically have higher traffic, have
dedicated Web development teams and dedicated content management teams, and require
customization of search, Web Part, and other services. Microsoft IT has adopted
a tiered level of service for these sites. These tiers range from platinum services,
which allow the highest level of customization and support, to the silver level,
where fewer customizations are allowed. Cost models are associated with these levels,
and teams must pay for the hosting, consulting, and support services, depending
on what level of customizations and service they require.
Search Services within Microsoft
Microsoft runs the enterprise search services on an SSP that is independent from
the farms that host portals and team sites. The SSP at Redmond crawls content worldwide.
A user looking for specific information about a topic regardless of where the information
is stored would use the search services provided by the Redmond data center (typically
by going to the search features on MSW).
In addition to the search services in Redmond, Microsoft has deployed search services
in other regions. Microsoft took this approach because shared search services are
not supported over wide area network (WAN) links. Also, having search services available
in a local region enables faster performance for users in that region. This approach
also enables content uploaded in a region to be indexed more quickly for use within
a search on a local site.
Users who want to search content within their region perform searches by using the
regional search services within the regional data centers. For example, a user in
France looking for as many results as possible within France or Spain would use
the search on a site connected to the shared services provider in the Dublin regional
data center.
Server Infrastructure at
Redmond
The server infrastructure at Redmond supports the SSP services in the Office SharePoint
Server 2007 based solution. Larger farms, such as the MSW server farm and the
hosted http://team server farm, have computers that are Web front ends and are dedicated
crawl target servers. A crawl target server is a server in a Web farm that
is removed from the pool of Web front-end servers that handle service requests so
that the index services can crawl the content without adversely affecting user response
time.
Table 4 describes the server infrastructure at Redmond for the Office SharePoint
Server 2007 based solution.
Table 4. Redmond Office SharePoint Server 2007
Infrastructure
|
Server
|
Number
|
Description and server configuration
|
|
Web front end
|
Varies
|
Computers that host the Web sites belong to server farms that are separate from
the SSP farm.
For example, MSW has two active computers as Web front-end servers. A third computer
is used as a dedicated computer as a crawl target server that can be added into
the Web server farm if required.
|
|
Query server
|
3
|
Two processors (one 64-bit processor with two cores in each processor)
8 GB of memory
Disk configuration:
- Operating system, 50 GB (redundant array of independent disks [RAID] 1)
- Program files, 229 GB (RAID 1)
- Index, 558 GB (RAID 1+0)
|
|
Index server
|
1
|
Eight processors (four 64-bit processors with two cores in each processor)
16 GB of memory
Disk configuration:
- Operating system, 50 GB (RAID 1)
- Program files, 18 GB (RAID 1)
- Index, 300 GB (RAID 1+0)
- SSP Dump, 600 GB (RAID 1+0) (used for backups)
|
|
Database server
|
2
|
Clustered through Windows Clustering
Eight processors (two 64-bit processors with four cores in each processor)
10 GB of memory
Disk configuration:
- Operating system, 16 GB (RAID 1)
- Program files, 9 GB (RAID 1)
- Data, 300 GB (RAID 1+0)
- Logs, 100 GB (RAID 1+0)
- Temporary database (TempDB), 26 GB (RAID 1+0)
|
In addition to the production farm environment, Microsoft maintains a pre-production
farm environment that has two query servers, one index server, and one database
server. All servers have similar configurations to their production environment
counterparts.
Microsoft IT uses the pre-production environment for testing purposes, such as deploying
updates and other configuration changes before moving them to the production environment.
The pre-production environment is also available for custom portals while they are
under development. This enables developers to test connections to shared services
and verify custom code.
Crawls are not scheduled on a regular basis in the pre-production environment. Instead,
Microsoft IT runs the crawls on an as-needed basis. Microsoft IT also synchronizes
the pre-production environment with the production environment by occasionally restoring
backups taken from the production farm.
For more information about:
Server Infrastructure in
EMEA and Asia
The infrastructure used for the search services in the Dublin (EMEA) and Singapore
(Asia) regions is configured similarly to Redmond, but on a smaller scale. Table 5
describes the server infrastructure in EMEA and Asia for the Office SharePoint Server 2007 based
solution.
Table 5. EMEA and Asia Office SharePoint Server 2007
Infrastructure
|
Server
|
Number
|
Description and server configuration
|
|
Web front end
|
Varies
|
Computers that host the Web sites belong to server farms that are separate from
the SSP farm.
|
|
Query server
|
2
|
Two processors (one 64-bit processor with two cores in each processor)
8 GB of memory
Disk configuration:
- Operating system, 50 GB (RAID 1)
- Program files, 229 GB (RAID 1)
- Index, 558 GB (RAID 1+0)
|
|
Index server
|
1
|
32 processors (four 64-bit processors with eight cores in each processor)
8 GB of memory
Disk configuration:
- Operating system, 50 GB (RAID 1)
- Program files, 18 GB (RAID 1)
- Index, 300 GB (RAID 1+0, storage area network [SAN])
- SSP Dump, 1.56 terabytes (RAID 5, SAN) (for backups)
|
|
Database server
|
2
|
Clustered through Windows Clustering
Eight processors (two 64-bit processors with four cores in each processor)
10 GB of memory
Disk configuration:
- Operating system, 16 GB (RAID 1)
- Program files, 9 GB (RAID 1)
- Data, 300 GB (RAID 1+0, shared SAN)
- Logs, 100 GB (RAID 1+0, shared SAN)
- TempDB, 26 GB (RAID 1+0, shared SAN)
|
Administration of Shared Enterprise Search Services
One of the primary advantages to shared enterprise search services
is that a significant portion of the day-to-day administration is performed centrally.
Because of these centralized administration tasks, Microsoft is able to dramatically
reduce the amount of effort (and cost) to maintain and operate enterprise search
services. Another advantage is the reduction in overall network load on the system
caused by multiple indexing servers.
Administration of Shared Enterprise Search Services
in Redmond
As the largest provider of shared enterprise search services for Microsoft (as measured
by the amount of information indexed and the number of sites consuming the search
service), the level of administration in Redmond represents how other organizations
with a similar size and complexity would likely perform. Microsoft validates many
of the administrative procedures and processes first in Redmond, and then uses them
in the regional data centers.
The following sections provide details on the configuration of the Redmond SSP.
The information about the configuration settings provides a context for how Microsoft
IT manages the SSP.
Note: Although the configuration settings provided in Office SharePoint Server 2007
are flexible enough to enable the level of management described in the following
sections, this level of management is not required in every enterprise scenario.
Configuring Content Sources and Index Crawls
Content sources are the foundation of indexing. As an essential part of administering
the search service, Microsoft IT reviews and revises the content sources on a regular
basis. The configuration of content sources determines the type of indexing performed
(for example, full or incremental index) and the schedule for indexing the content.
Maintaining Content Sources
A content source can specify one or more start addresses (URLs), but each start
address within a content source references the same type of content. For example,
a content source for crawling Web sites can contain many URLs for Web sites, but
it cannot contain a URL to an Exchange public folder.
The overall administrative goal is to minimize the number of content sources required.
Microsoft reduced the number of content sources from more than 200 (in the SharePoint
Portal Server 2003 based solution) to 25 in the Office SharePoint Server 2007 based
solution. This reduction was in large part due to a set of new search scope features
in the 2007 version that enable more flexibility in scope creation and no longer
require content sources to be tagged with a specific source group. Microsoft continues
to regularly review content sources to look for methods of consolidating content
sources or having new content crawled by an existing content source. This reduces
complexity in crawl scheduling and management.
Table 6 lists the types of content sources for the Redmond SSP and the number
of each type of content source.
Table 6. Types and Number of Content Sources for
the Redmond SSP
|
Content type
|
Number
|
|
SharePoint sites
|
13
|
|
Web sites (other than SharePoint sites)
|
6
|
|
Network shared folders
|
1
|
|
Exchange public folders
|
1
|
|
Business Data Catalog
|
3
|
|
Custom
|
1
|
Some of the content sources listed in Table 6 contain multiple start addresses.
Although the larger hosted farms are crawled at the server level, a few content
sources have as many as a few hundred individual start address URLs that cover sites
not hosted on the central SharePoint farms. Adding the URL of the top-level site
in site collection causes all sub sites to be crawled (unless specifically excluded).
Selecting Indexing Types
Microsoft performs full indexing (crawls all content in the content source) and
incremental indexing (updates the index based on only content that has changed).
Microsoft runs full crawls as required for system changes such as adding a new managed
property or deploying a crawl-related hotfix. However, the goal is to reduce the
frequency of the full index creations to quarterly or less (if possible) to take
advantage of the faster incremental index creation.
Microsoft optimized incremental index creation in Office SharePoint Server 2007
by using new features such as the change log feature and the security-only crawl
feature. The change log feature allows the system to index only new or changed content
within the SharePoint store. The security-only feature indexes just the security
information on the content rather than requiring a full crawl of the content itself
each time an administrator changes the security settings. Both of these features
help Microsoft IT keep the search index updated more quickly and efficiently than
before.
Scheduling Index Crawls
For the largest server farm in Redmond, Microsoft IT adopts a tiered approach to
scheduling when content is crawled to update the index. The team categorizes each
content source into a tier based on how frequently the content source requires incremental
indexing. Microsoft IT divides the schedules into the following tiers:
1. Daily
or multiple times per day
2. Three
times per week
3. Once
per week
Microsoft crawls the larger portals that contain managed content (such as the MSW
portal and the portals for IT, finance, and sales and marketing) more frequently.
Microsoft crawls the larger portals with less managed content and a more collaborative
environment (such as the My Site portal) once a week. Microsoft indexes the People
Profile data once a day.
Configuring Index Propagation
In the internal deployment at Microsoft, the index services and search services
run on separate computers. When the index service and query service run on separate
servers, the search service has to copy (or propagate) the content index
from the index server to the query server. The continuous propagation feature in
Office SharePoint Server 2007 helps ensure that indexed content appears in
the search results within a few minutes of when the content is indexed.
Configuring the Content Access Account
To access content for indexing purposes, Microsoft uses one content access account
per region for all content sources, instead of managing multiple accounts and passwords.
Microsoft uses region-based accounts to minimize authentication traffic across WAN
connections.
In SharePoint Portal Server 2003, the content access account needed administrator-level
permissions to effectively index SharePoint content and gather the security permissions
for that content. In Office SharePoint Server 2007, a new kind of permission
(Full Read access) gives the indexer this information without requiring administrator-level
permissions.
Microsoft IT configures a content access account for each region to have the appropriate
permissions set at the server level for the server farms. Microsoft IT also posts
the information to administrators for sites outside the hosted environment so that
they can grant the correct level of access to the regional crawling accounts.
Configuring the Custom Protocol Handler
For its enterprise search solution, Microsoft developed only one custom protocol
handler that enables it to index an internal custom application. A protocol handler
implements the protocol for accessing a content source in its native format. The
indexing services use protocol handlers to expand the data sources that can be indexed.
In this case, the data store of the custom application had some custom requirements
that required the new protocol handler. Microsoft originally developed the protocol
handler for the SharePoint Portal Server 2003 deployment and then later made
minor modifications to adapt the handler for use in a 64-bit environment.
Configuring Crawl Rules
The current solution uses about 120 crawl rules. Crawl rules enable Microsoft to
customize the behavior of the index engine for a particular path. These rules indicate
paths that should be explicitly included or excluded from being crawled (and subsequently
indexed) and other settings, such as following complex link structures that use
parameters.
Even though Microsoft has reduced the number of rules compared to the SharePoint
Portal Server 2003 based solution, the Microsoft goal is to minimize the number
of crawl rules and require site content administrators to set permissions locally
to include or exclude content from being indexed.
Although Office SharePoint Server 2007 enables content to be crawled on the
Internet, Microsoft IT focuses the current enterprise search on the intranet only
at this time. Microsoft IT configured a crawl rule that explicitly prevents the
indexer from going outside the firewall by excluding the http://*.* path.
Configuring Crawler Impact Rules
Crawler impact rules enable administrators to specify how many simultaneous requests
for pages are made to a specified URL and the number of seconds to wait between
each request. Because Microsoft dedicates Web front-end servers as crawl targets
for larger server farms, Microsoft configures crawler impact rules only for sites
that are not directly administered by the teams listed earlier in Table 3.
For example, Microsoft has an internal site hosted by a team outside the centrally
hosted farms. That team needed the number of requests be reduced from the default
setting of eight requests per second lowered to three requests per second in order
to lighten the crawl load on its environment.
Selecting File Types to Index
The file type inclusions/exclusions list contains the list of extensions that identify
which file types the crawler should include or exclude from the index. For the crawler
to extract the contents and properties of a particular type of file, a filter (iFilter)
for that file type must be installed on the server on which the index service is
running.
Microsoft performed minimal configuration and customization for file types. In addition
to the default file types, Microsoft has installed iFilters for the new XML Paper
Specification (XPS) and Microsoft Office OneNote® file types. Microsoft plans
to include .zip file types in the near future and .pdf file types when a 64-bit
version of the .pdf iFilter is available (the computers running the index services
have 64-bit processors). Adobe and other vendors develop 64-bit versions of the
.pdf iFilter.
Optimizing Crawl Performance
Because of the scale and complexity of the content indexed in the Redmond SSP, Microsoft
configured crawl-related configuration settings. The configuration settings in Table 7
helped improve the performance of content crawling at Microsoft.
Note: This information is shared as a reference on the internal deployment
at Microsoft and may not be suitable for all deployments.
Table 7. Changes in Configuration Settings to Improve
Crawl Performance
|
Configuration setting
|
Description
|
|
Performance level
|
Configuration setting in the Shared Service Provider Central Administrative section
that increases the responsiveness of the index service.
Microsoft IT changed this value to 5.
|
|
Connection time time-out
|
Configuration setting in the Shared Service Provider Central Administrative section
that determines the number of seconds that the computer performing the crawl should
wait before timing out when contacting the computer hosting the content being crawled.
Microsoft IT changed this value to 120 seconds.
|
|
Request acknowledgement time-out
|
Configuration setting in the Shared Service Provider Central Administrative section
that determines the number of seconds that the computer performing the crawl should
wait for a request acknowledgement from the computer hosting the content being crawled.
Microsoft IT changed this value to 120 seconds.
|
Configuring Managed Properties
The properties that are part of the Search user experience are managed properties
(such as properties that are available for search results and advanced search).
Crawled properties are properties that the search index service component
discovers when crawling content. To make a crawled property available for the Search
experience, the administrator must map a crawled property to a managed property.
Microsoft started with the default managed properties and then made the following
customizations:
- Profiles. Properties (such as language fluencies, preferred name, and
region) were added to profiles based on the business requirements and the information
stored in the People Profile store.
- Business Data Catalog. Properties (such as region, sale district, and Web
site) were added to Business Data Catalog based on the business requirements and
the information stored in the CRM and Library catalog systems.
Microsoft added other custom properties for portals that required specific properties
for their local search centers. Local site administrators for these sites must contact
the SSP administrator to create the managed properties before they are available
for local use. Microsoft adds new managed properties as required, but it tries to
schedule for quarterly updates because adding new managed properties requires a
full crawl of the content sources.
Creating Shared and Local Scopes
A search scope defines a subset of information in the search index on which a user
can perform queries. Shared scopes are created in the SSP. Local scopes are created
at the individual site collection level.
Shared scopes are available to all sites that are configured to use a particular
SSP. Local scopes are created by a local site administrator and are available only
to the individual site collection and sub sites on which they are created. Scopes
can be created based on content sources, folder paths, or specific properties (such
as all content authored by a specific user).
For the Redmond SSP, the search scopes defined for MSW (Intranet, People, and Customers)
are shared scopes. All other scopes are local scopes. Site administrators have created
more than 200 local scopes in the SSP at Redmond.
Administrators create shared and local scopes by re-creating the scopes based on
the current index (knows as scope compilation). Office SharePoint Server 2007
performs automatic scope compilation, which adjusts the frequency of scope compilations
over time, depending on the number of scope changes. Scope compilation is required
because scopes are not incrementally updated, but rather are compiled as a task.
The SSP administrator can initiate a compilation at any time. The scope compilation
for the SSP at Redmond typically takes less than one minute.
Designating Authoritative Sites
Authoritative sites contain information that is more relevant to the organization
than other sites. Designating authoritative sites is a method of increasing (or
decreasing) the relevance of content within search results. Results from authoritative
sites are considered more relevant than results from non-authoritative sites.
Authoritative sites affect how Office SharePoint Server 2007 calculates relevance
rankings for items in the list of search results. Relevance ranking for an item
is determined (in part) by how many clicks away result items are from URLs that
are listed as top-level authoritative, second-level authoritative, or third-level
authoritative sites. An item that is one click away from a top-level authoritative
site is considered the most relevant. Second-level and third-level authoritative
sites have less authority by one or two clicks, respectively. For example, a result
item that is one click away from a third-level authoritative site is less relevant
than a result item that is one click away from a second-level authoritative site.
An administrator can also make a site less relevant by configuring it as a non-authoritative
site. Demotion to a non-authoritative site is independent of any click-distance
penalties. Microsoft designates a site as a non-authoritative site when search results
from that site appear too high in the search results.
Microsoft started with a limited number of sites that contain critical internal
content that applies to most employees as top-level authoritative sites. (Table 2
earlier in the paper lists these sites.) Microsoft IT added the major product group
portals, such as the Windows division site and the Microsoft Office division site,
as second-level authoritative sites.
Configuring Thesaurus and Noise Words
A search administrator can use the Thesaurus to customize the way queries are handled
by doing either query expansion or query replacement. For example, an administrator
can configure the system to:
- Search for both "hot fixes" and "hotfixes" if the user entered
either one of those terms.
- Replace "hot fixes" with "hotfixes" at query time.
Noise words are words (such as the, a, and
that) that are excluded from the index during content crawling due to
their common occurrence and low value in terms of search queries.
Microsoft did not customize the Thesaurus.xml and NoiseWords.txt files because the
default configuration contains the appropriate configuration for most instances
and it also helped the SharePoint development team validate the default settings.
Microsoft may customize the Thesaurus.xml file in the future, but it has no plans
to modify the NoiseWords.txt file because that file is sufficient as is.
Performing Search Usage Analysis
Microsoft IT performs regular analysis of the search usage for searches performed
at the SSP level and at the local site levels. Microsoft IT analyzes the data from
a number of sources, including the following search-related reports that Office
SharePoint Server 2007 provides:
- Queries over the previous 30 days
- Queries over the previous 12 months
- Top query origins per site collection over the previous 30 days
- Queries per scope over the previous 30 days
- Top queries over the previous 30 days
- Search results top destination pages
- Queries with zero results
- Most-clicked Best Bets
- Queries with zero Best Bets
- Queries with low click-through
Microsoft IT performs this analysis to help:
- Improve the user experience with enterprise search services.
- Develop Best Bets keywords for site collection.
- Optimize the hardware and software configuration for the computers, the operating
system, and search services.
- Provide feedback to management on the effectiveness of enterprise search services.
Performing Day-to-Day Operations
Microsoft IT performs day-to-day operations tasks to ensure the continuity and responsiveness
of enterprise search services. In addition to the administrative tasks discussed
earlier, Microsoft IT performs search business management, performs monitoring,
and performs backup and restore procedures.
Performing Search Business Management
The Program Management team of the Information Services team is responsible for
both the business management of MSW and the overall health of the search services.
These tasks include:
- Managing content sources. The team adds and removes start addresses for content
sources as required. The team also performs a full review of start addresses quarterly.
- Managing authoritative sites. The team reviews and updates authoritative
sites as needed based on business requirements, portal updates or changes, and end-user
relevance feedback.
- Collaborating with site administrators. Collaboration with site administrators
helps ensure that their appropriate site content is included in the enterprise search
if the content is not hosted on the central SharePoint farms. This collaboration
also includes assisting site collection administrators in customizing their sites.
- Responding to end-user and support issues. The team responds to end-user
questions and support issues relating to search (such as content coverage, relevance,
or query syntax). The team also responds to operational or technical issues with
the search service.
- Performing Best Bets editorial tasks. These tasks entail identifying candidate
URLs for specific search strings, writing useful titles and descriptions that will
appear for them in the search results, and tagging the appropriate keywords and
synonyms. Other tasks may include reviewing the keywords and Best Bets on a regular
schedule and identifying new Best Bets based on reports from the search usage reporting
(such as top queries that return zero Best Bets).
- Analyzing query logs. The team reviews the query logs on a regular basis
to determine details about the queries performed and query results, as well as identifying
new trends and terms to add to the corporate vocabularies. The corporate
vocabularies are a list of terms that are unique to an organization.
Performing Monitoring and Health Status Tasks
Microsoft uses Microsoft Operations Manager 2005 to help perform end-to-end
monitoring and health status tasks for the search services. Microsoft uses the following
management packs for its enterprise search services:
- Microsoft Office SharePoint Server 2007 Management Pack for Microsoft Operations
Manager 2005
- Microsoft Windows SharePoint Services 3.0 Management Pack for Microsoft Operations
Manager 2005
- Microsoft Web Sites and Services Management Pack for Microsoft Operations Manager 2005
- Internet Information Services (IIS) Management Pack for Microsoft Operations Manager 2005
- Windows Base Operating System Management Pack for Microsoft Operations Manager 2005
In addition to the monitoring that Microsoft Operations Manager 2005 provides,
Microsoft performs URL monitoring specifically for tracking availability of MSW
search. The URL monitoring software submits four specific queries (one for each
tab within the Search Center, plus one extra) to the MSW site. Then, the URL monitoring
software tracks the response times to ensure that they are within a specific threshold.
Microsoft also uses software that checks for the presence of specific terms on the
query response page. If the query response page does not have the specific terms,
the software generates an alert to the Operations team. Microsoft uses software
this to monitor general availability rather than to measure query performance.
Besides the monitoring described previously, Microsoft performs the following monitoring
and health status tasks on a regular basis:
- Adjusting crawler impact rules. Microsoft reviews the crawler impact rules
and adjusts them as needed.
- Reviewing crawl logs. Microsoft regularly reviews the crawl logs for errors
and troubleshooting if end users or site administrators report content missing from
a search.
- Removing query result items. In rare cases where Microsoft must immediately
remove content from search results (usually due to business or legal requirements),
Microsoft removes individual items from the query results by using the administrative
user interface for single-item removal.
Microsoft also performs capacity planning and analyzes performance trends by using
the event logs and performance monitoring counters (such as disk, memory, and application
performance counters) provided by the Windows Server® 2003 operating system
and products running on Windows Server 2003.
Performing Backup and Restore
Microsoft performs full backups of each of the three SSPs (in Redmond, Dublin, and
Singapore) once a week—on Tuesday evenings—by using the backup tools that ship with
Office SharePoint Server 2007 (by using either the graphical user interface
or stsadm.exe). This backup includes both the SQL Server databases and the search
index.
In addition to the backups performed through Office SharePoint Server 2007,
Microsoft performs backups of the SQL Server databases nightly by using a third-party
database compression-acceleration tool.
The complete backup of the Redmond SSP farm typically takes about eight hours on
average. A full restore of the environment would take approximately twelve hours.
Administration of Shared
Enterprise Search Services in Other Regions
The day-to-day management of the SSPs located in the EMEA and Asia regions is similar
to the Redmond farm, but on a much smaller scale. Because the regional data centers
index only content in their local region, each farm has fewer than 5 million items
indexed; but like the Redmond farm, they are growing. Microsoft is working to standardize
settings across farms as much as possible so that many of the settings are consistent
between Redmond and the other regions.
The other regions have fewer than 10 content sources and contain a minimal number
of crawl rules.
For each of the regional farm, Microsoft IT:
- Scheduled a daily incremental crawl of each content source.
- Replicated profile data between the regions.
- Synchronized profile data with shared services at Redmond.
Administration of Indexed Sites and Content
The SSP-level administrator performs most of the search administrative tasks (such
as crawling). However, the local site administrators perform a number of search‑related
tasks.
Many sites connected to the SSP in Redmond use the basic search that is available
to a site after Microsoft provisions the site on the farm. When a new site is created,
the default search scope is This Site. As users browse to the sub sites,
the search scope automatically includes sub sites in the search results. The search
scope This Site includes the contents of the current Web site and sub sites.
At Microsoft, the local SharePoint administrators for each site are responsible
for creating and defining local search scopes within a site collection if they require
this functionality. The local administrators create search scopes if they need search
scopes in addition to the This Site scope or if they choose to use any of
the shared scopes that the SSP provides. Administrators can base these additional
search scopes on the content source, a folder path, or managed properties. The administrators
also determine the usage of the search scope within the site (such as search scope
drop-down list and advanced search).
The local content owners are the individuals best suited to determine which content
is the most relevant for specific keywords. Local content owners manage Best Bets
at the local site level.
User Experience of Enterprise
When Microsoft provisions a new Web application on the SharePoint farms hosted by
Microsoft IT, Office SharePoint Server 2007 automatically creates the search
for the local site. Site collection administrators can customize the user's search
experience by creating a search center, adding custom scopes, modifying the appearance
of the search results, and adding Best Bets.
User Experience on MSW
MSW is one of the most heavily used intranet sites at Microsoft. MSW is the place
where employees start their search for information within the company. MSW was one
of the first sites migrated to Office SharePoint Server 2007 and was deployed
in production while the product was still in the beta stages.
Figure 2 illustrates the Search tab on the MSW home page. The Search
tab has high-level search scopes that users can select (Intranet, People,
and Customers) to narrow the search results.
.gif)
Figure 2. Search tab on MSW home page
After the users perform their search, the results appear in the search center in
Office SharePoint Server 2007 (as illustrated in Figure 3). The search
center has a tab for each of the search scopes.
.gif)
Figure 3. Search center results for MSW
Search Center Intranet Tab
The Intranet tab (and Intranet search scope on the MSW home page)
includes results from all content in the index, except for content that is on the
People and Customers tabs. Employees use this tab when they are looking
for general information about content on the Microsoft intranet and because this
search returns the most comprehensive content index.
Intranet Tab Customizations
Microsoft IT customized the behavior of searching through the Intranet tab
in the following ways:
- Changed the location of the Best Bets results Web Part on the search results
page. By default, the Best Bets results Web Part is on the right side of the
main page that lists search results. Based on user feedback, the MSW team decided
to move the Web Part to the top of the search results on the left side in order
to improve visibility.
- Included a custom Web Part for Glossary definitions. Microsoft added a custom
Web Part to the right-side margin of the search results in place of the existing
keyword definition Web Part that the search contains by default. When users enter
a term (such as an acronym) in the Intranet search query, a list of glossary definitions
appears. For example, if a user enters DOS in the query, a list of definitions
for DOS appears (such as denial of service and disk operating system). Microsoft
uses a custom Web Part for this view to take advantage of its internal custom-built
taxonomy management system, which houses the corporate vocabularies and definitions
for thousands of terms.
- Included a link to add a Windows Internet Explorer® 7 search provider. Users click this link to add MSW search as a search provider to Internet Explorer 7.
Internet Explorer 7 supports multiple search providers. For more information
about adding a search provider to Internet Explorer 7, refer to "Search
Provider Extensibility in Internet Explorer 7" at
http://msdn2.microsoft.com/en-us/library/ms532996.aspx.
- Included a New Search link. Users click this link to start a new search within
the Search Center. This link is important because there is no Search Center tab on the MSW home page (due to space
restrictions on the user interface).
Best Bets Customizations
Microsoft IT selects the keywords and synonyms from its corporate vocabulary. Microsoft
IT regularly refines the Best Bets by reviewing search results metrics (such as
query volume trends, most frequently performed queries, click-through rates on query
results, or queries that return zero results).
Any changes in the Best Bets take effect immediately, so users can see the updated
Best Bets results. An administrator can also set the preferred order of the items.
Microsoft tries to maintain fewer than five Best Bets URLs per keyword.
Search Center People Tab
The People tab (and People search scope on the MSW home page) includes
results that relate to employees at Microsoft. Employees use this tab when they
are trying to locate other employees or find information about other employees.
Microsoft collects the information indexed for the People tab by importing
information from other sources into Office SharePoint Server 2007. These sources
include:
- Active Directory. Microsoft imports the indexed employee information from
Active Directory into Office SharePoint Server 2007. Office SharePoint Server 2007
contains more than 126,000 active user profiles that are synchronized from the information
in Active Directory. Microsoft IT synchronizes the information in Office SharePoint
Server with Active Directory daily. Microsoft uses the user profiles from Active
Directory to help create customized pages for users.
- Feedstore. This database—which pulls data from 39 internal sources and feeds
about 500 subscribing applications—provides data replication from these data sources
to Office SharePoint Server 2007. Besides the use in people search, Microsoft
uses Feedstore to transfer information to internal data warehouses on a per-application
basis.
Search Center Customers Tab
The Customers tab (and Customers search scope on the MSW home page)
includes results from information that relates to Microsoft customers. Employees
use this tab when they are looking for information about the top enterprise Microsoft
customers and partner accounts. This tab takes advantage of the Business Data Catalog
feature in Office SharePoint Server 2007. This feature enables administrators
to integrate structured data sets from line-of-business applications easily into
SharePoint. Microsoft used the Business Data Catalog feature to create the Customers
tab. Figure 4 illustrates this tab.
.gif)
Figure 4. Search center results for Customers tab
For MSW search, Microsoft indexes approximately 200,000 enterprise top-level accounts
(parent accounts) that are stored in the internal CRM system. Microsoft decided
to index the top accounts based on user feedback, although the users can drill down
to related accounts (such as subsidiaries) through the customer profile pages (provided
in the CRM system) available through the search results. The solution also defines
relationships between customers or partners and their subsidiaries.
Microsoft IT developed the necessary integration for indexing the information on
the Customers tab in about one month (with the assistance of the SharePoint
development team). The initial deployments were based on beta versions of Office
SharePoint Server 2007.
Application Definition File for Business Data Catalog
Business Data Catalog retrieves specific data fields from the line-of-business applications
and stores the fields in a local database. Business Data Catalog supports the following
types of XML application definition files:
- Model. An application definition file that contains the base XML metadata
for a system.
- Resource. An application definition file that enables users to index or read
the localized names, properties, and permissions, in any combination.
Business Data Catalog enables the following actions to be performed on the data
stored in the internal CRM system:
- Enumerate. Crawls the list of companies for the purposes of indexing.
- Find specific. Retrieves a single item based on the company ID for crawling
detailed content about each company and for retrieving detailed data to populate
Web Parts on Web pages used to view customer details.
- Find. Retrieves a list of child accounts of a parent account for crawling
detailed content about each child account and for retrieving detailed data to populate
Web Parts on Web pages used to view child account details.
Customer Profile Page
When a user clicks a customer's name in the results on the Customers tab,
Office SharePoint Server 2007 displays the customer profile page. The customer
profile page provides detailed information about each customer account (such geographic
location, industry, and contact information). The customer profile page also displays
the Microsoft account team contacts for the customer and the account hierarchy (any
parent or child accounts).
The customer profile page contains Web Parts that display the customer account information.
These Web Parts display information retrieved through Business Data Catalog to the
users (by using the "find specific" and "find" actions in Business Data Catalog).
Future Search Center Webcasts Tab
In the near future, Microsoft will add a Webcasts tab (and search scope option)
to the MSW site. Microsoft executives commonly use webcasts to host "wireside"
chats and to hold large-audience meetings that would be logistically difficult to
host on a regular basis. Employees will be able to locate events and the supporting
presentations and documents from the events by using the Webcasts tab. The
initial index data will come from the Studioscast service (an intranet-based service
that provides webcasts to employees).
User Experience on Other Intranet Sites
Another Web site that uses the shared enterprise search services is ITWeb. Although
MSW and ITWeb use the same enterprise search services, the user experience for search
services on ITWeb is slightly different from that of MSW. Figure 5 illustrates
the home page for ITWeb.
.gif)
Figure 5. Search user interface on ITWeb home page
The designers of ITWeb added the search user interface to the right side of the
page. Users select search scopes through a drop-down list. Table 8 lists the
search scopes on the ITWeb search user interface and the search result limitations
of each search scope.
Table 8. Search Scopes for ITWeb and the Search
Results
|
This search scope
|
Limits the scope of the results to
|
|
This site
|
Results within the current site. At the home page, it means only ITWeb. On other
sites, it is only within those sites.
|
|
ITWeb
|
Results within ITWeb. This search scope excludes other sites and content sources
from the search results.
|
|
InfoPlus
|
Results within InfoPlus. This search scope excludes other sites and content sources
from the search results.
|
|
Intranet
|
Results throughout the Microsoft intranet. This search scope includes all index
sites and content sources in the search results. This search scope redirects users
to the Intranet tab on MSW search interface.
|
Another example of sites that use the shared enterprise search service is the My
Sites portal. In Redmond, Microsoft IT configured the default search scope on the
My Sites portal at the SSP level to point to the Intranet search on MSW.
Measuring Effectiveness of User Experience
Microsoft constantly strives to improve the user experience for enterprise search
services. Microsoft uses the following resources to help measure the effectiveness
of the user experience for enterprise search services:
- Office SharePoint 2007 search usage reports. Microsoft IT reviews the
reports in Office SharePoint 2007 to identify the searches performed by users
and their search results. The types of statistics include the most-popular links
clicked by users.
- WebTrends reporting console reports. Microsoft IT reviews the reports in
the WebTrends reporting console to collect statistics on usage of pages and the
search engine. WebTrends is a non-Microsoft product that monitors and tracks Web
site usage.
- Feedback from users and surveys. Employees complete surveys on their satisfaction
with search services on an ad hoc basis. Employees can also contact Microsoft IT
directly by e-mail to provide feedback.
- Usability testing. As a part of the design, development, and testing process,
Microsoft IT performs usability testing on the user experience. Employees who participate
in the usability testing comment on the user interface, process flow, and interaction
to identify any areas of improvement.
- Focus group feedback. During product development, Microsoft IT conducts focus
groups on various aspects of the user interface and experience. These focus groups
help Microsoft IT identify methods for improving overall user experience.
Migration to Office SharePoint Server 2007
Microsoft IT began preparing for the move to SharePoint Portal Server 2007
in early 2005. The team had to ensure that the migration of enterprise search services
from the previous solution to Office SharePoint Server 2007 caused minimal
outages of required services. To accomplish this, and to complete the migration
at a reasonable pace, Microsoft IT co-hosted the services. That is, during the migration
process, Microsoft IT ran the SharePoint Portal Server 2003 based and Office
SharePoint Server 2007 based solutions on different hardware.
Microsoft took different approaches for moving sites and content to the new platform.
After the new SSPs were installed and configured, Microsoft IT used both the database attach and gradual upgrade
methods to update sites so that they could consume search from the new SSPs.
The database attach method attaches a SharePoint content database from SharePoint
Portal Server 2003 to Office SharePoint Server 2007. The gradual upgrade
method runs both the previous and new versions, so that the sites can be moved gradually
to the new environment, and both versions of the sites are available for transferring
customizations or for comparison.
The teams that own some of the larger portals, such as MSW, chose to migrate their
sites rather than upgrade. This method moved the sites to a new design and organization,
and it took advantage of the new content publishing and workflow features available
in Office SharePoint Server 2007.
In a complex IT environment (such as that of Microsoft), planning and preparation
were essential for a successful enterprise-wide upgrade to Office SharePoint Server 2007.
Microsoft IT took the following steps when planning the search aspects of the configuration
of the new SSPs:
1. Reviewed
the existing content sources to:
Update start addresses
Ensure that the right content was being crawled
Note any changes due to site upgrades, content migration, and related processes
2. Consolidated
the list of content sources and created a new, smaller list based on:
Protocol handlers required
Location of the content
3. Added
new content types to the search by:
Identifying new content to be indexed by new Business Data Catalog sources
Beginning projects to develop connections as required
4. Reviewed
the existing list of crawl rules and updated as required for the consolidated content
sources
5. Identified
the hierarchy of authoritative sites by:
A. Creating
an initial list based on the list of MSW Best Bet root URLs
B. Updating
the list based on organizational charts
C. Updating
the list based on large top-level sites (listed earlier in Table 2)
6. Reviewed
requirements of managed properties
7. Installed
Office SharePoint Server 2007 on new hardware
8. Created
the new content sources
9. Added
the refined crawl rules
10. Configured the
file types and added new file types and iFilters as required
11. Configured the
content access account for the content sources and other SSP level settings
12. Indexed content
13. Configured managed
property mappings
14. Configured authoritative
sites
15. Created shared
scopes
16. Established crawl
schedules
17. Set up service
monitoring and enabled usage analysis
In January 2006, Microsoft switched MSW search to the Office SharePoint Server 2007
SSP farm for search, while the rest of the portal remained on SharePoint Portal
Server 2003. Microsoft performed this migration by replacing the existing search
server with a new search server that pointed to the search center on a site created
on the new farm.
Microsoft removed all navigation and all elements not related to search from this
new site, and then it added the existing MSW branding and appearance to ensure some
consistency in the user experience between the two sites. Because the migration
occurred so early in the Office SharePoint Server 2007 product development,
Microsoft wrote code to allow the sites to revert to the SharePoint Portal Server 2003 based
search if required.
In June 2006, Microsoft launched an Office SharePoint Server 2007 based version
of MSW in production (based on the Office SharePoint Server 2007 Beta 2 Technical
Refresh). At the same time, Microsoft upgraded the SSP farm to the same release.
Over the next few months, Microsoft deployed some subsequent internal builds to
the SSP farm.
Microsoft continued to upgrade each of the regional farms to subsequent beta builds
throughout 2006. Microsoft upgraded each farm to the official released version of
the code by the end of 2006.
Best Practices
Microsoft IT has gained practical, real-world experience with designing, deploying,
administering, and operating enterprise search by using Office SharePoint Server 2007
and SQL Server 2005. Because of this experience, Microsoft IT recommends the
following best practices in the areas of deployment and architecture:
- Implement fault tolerance for SQL Server by using Windows Clustering.
Office SharePoint Server stores the search index in a database that SQL Server hosts.
Ensure that there are no service outages by configuring two or more computers running
SQL Server 2005 as nodes in a cluster. An administrator can use Windows Clustering
to create a cluster.
- Place search and TempDB databases on dedicated high-speed disk drives.
Indexing and search queries create a large amount of disk activity on the disk drives
where the search and TempDB databases are stored. To improve performance, place
each database on a separate, dedicated high-speed drive.
- Run index services, search services, and SQL Server on high-performance computers. Indexing and search create high processor and memory utilization on the computers
that run them. To reduce the amount of time for indexing and search response, run
index and search services in Office SharePoint Server 2007 and SQL Server 2005
on dedicated computers with sufficient processor and memory system resources.
- Dedicate a front-end server in a Web farm to be a crawl target server for large
sites. Crawling content on large sites consumes a large amount of system
resources on the server that hosts the sites. Dedicating one of the servers in the
Web farm to be a crawl target server allows the index to be updated during peak
periods of use.
- Populate development, test, or pre-production environments by using backups of
the SSP index in the production environment. Restore backups of the SSP index
in the production environment for development and testing purposes. Although some
indexing occurs in these environments after the backups are restored, restoring
from the production index avoids the need to re-index the full set of content.
- Ensure that all servers have the latest hotfixes, updates, and service packs. Installing the latest hotfixes, updates, and service packs on all servers helps
to minimize any security threats and helps to minimize function-related or feature-related
product problems. Ensure that all servers are running the same level of hotfixes
and service packs, because running mixed levels can have unpredictable results.
Best practices in the area of administration include:
- Minimize the number of content sources. Reducing the number of content sources
reduces the administrative complexity. Fewer content sources equates to fewer entities
to administer, manage, and monitor.
- Use a separate content source for People Profiles. For large My Site deployments,
creating a separate content source for People Profiles enables specific crawl configuration
settings (such as the type of crawl or crawl frequency) for the My Site content.
- Review start addresses. The larger the number of start addresses, the more
content is indexed. Review the start addresses in each content source and eliminate
any unnecessary start addresses.
- Review and update crawl rules regularly. Crawl rules determine the content
sources to crawl, file types to crawl, and other crawl-related criteria. Review
and update the crawl rules to ensure that all content is included in the index.
- Review and update Best Bets and keywords regularly. Best Bets and keywords
determine the relevance of content. Review and update the Best Bets and keywords
to help identify relevance for all content.
- Review and update the search metadata property schema regularly. Crawled
properties, managed properties, and the mapping between the properties determine
the content metadata to index and include in search queries. Update the search metadata
property schema to include the new metadata added to content.
Best practices in the area of content management include:
- Include keywords within file names. The indexing service identifies content
where keywords are included in file name within the URL paths as more relevant than
content where keywords are not included in the file name. If possible, make the
file name (the rightmost part of the URL) readable. Use spaces (%20) between keywords
in the file names so that the search engine is able to identify keywords in the
URL.
- Include keywords or descriptive text within anchor text. Anchor text is the
text between the <A HREF> and </A> tags in hyperlinks (the text that
appears on the Web page and that the user clicks to go to the content). Typically,
this text is highly indicative of the content referenced by the link. As such, the
search engine uses this text as a separate relevance ranking computation after indexing.
- Anchor text often carries better information about a document than the document
itself (in relationship to indexing). Search crawls the following elements in the
anchor text:
- HTML anchor elements
- Windows SharePoint Services link lists
- Office SharePoint Portal Server 2003 listings
- Microsoft Office Word 2007, Microsoft Office Excel® 2007, and Microsoft
Office PowerPoint® 2007 hyperlinks (only for files that use the new Open
XML Formats)
- Include keywords and descriptive text within titles. Make sure that the titles
of Web pages or documents convey the content and are readable as a search results.
Avoid titles like "Home" or "Index." Instead, create titles that focus on the content
covered on the page. Titles have a higher weight than the full text when ranked
by the search engine.
- Include relevant metadata in Microsoft Office documents and HTML pages. Files
that Microsoft Office creates can contain information that describes attributes
about the file (such as author, manager, keywords, or custom properties). The indexing
service crawls the metadata, and the query processor uses the metadata to determine
relevance of the file in a search query. Consider adding metadata, in the form of
keywords, to Microsoft Office documents or HTML pages. This can be especially useful
when the document is not rich in text (such as a spreadsheet or an image-rich document).
- Include relevant metadata in files stored in SharePoint document libraries. Files stored in SharePoint document libraries contain information that describes
attributes about the file (such as author, department, keywords, or custom properties).
The indexing service crawls the metadata and uses the metadata to determine relevance
of the file in a search query. For example, ensure that the author property reflects
the person who actually authored the document, not the person who created the template
or uploaded the document to a site.
- Place important content higher in site hierarchy.
Documents and Web pages that appear high in the URL hierarchy in an organization
(with fewer slash marks) tend to be identified as more important. In instances where
you can place a priority or importance on content, placing the sites that contain
the more important content higher in the site hierarchy helps ensure that users
will find the content that is of the most use to them.
Best practices in the areas of monitoring and operations include:
- Monitor indexing and search services with Microsoft Operations Manager 2005. Download the Office SharePoint Server 2007 Management Pack, SQL Server Management
Pack, Windows Base Operating System Management Pack for Microsoft Operations Manager 2005,
and Internet Information Services (IIS) Management Pack for Microsoft Operations
Manager 2005 to monitor the indexing and search services. These management
packs can help identify when the services are not running and collect statistics
about the health and status of the services.
- Pause index creation instead of stopping an index
build if crawling must be temporarily halted. Stopping an index build requires
that a full update is performed the next time the index is built. Pausing an index
build enables an administrator to resume the index build at the point where he or
she paused the process.
- Perform systematic backups of the index and SSP
databases. This helps ensure a quick recovery from a potential disaster. For
example, a full crawl of an index such as the one on the Redmond SSP can take a
few weeks to complete.
Conclusion
Locating the appropriate information or people in an organization can be a difficult
and time-consuming process. Without enterprise search services, employees at Microsoft
might:
- Spend countless hours duplicating effort on multiple projects and teams.
- Be unaware of critical information produced by other employees.
- Make business or technical decisions with incomplete or inaccurate data.
- Require more technical support time and Helpdesk resources to resolve technical
issues.
- Require additional managerial and human resources time to address job-related issues.
Microsoft IT reviewed the existing content that is indexed, existing content sources,
business requirements, and technical requirements. Then, the team deployed shared
enterprise search services for employees to find relevant, up-to-date information
that is essential in performing their day-to-day job functions. The same enterprise
search services put Microsoft employees in contact with peers all over the word,
enabling them to collaborate more quickly and easily than ever before.
Microsoft IT combined the Enterprise Search feature in Office SharePoint Server 2007,
SQL Server 2005, Windows Server 2003, custom-developed Web Parts, and
integration with other line-of-business services and applications into its enterprise
search services solution. Developing and deploying this solution provided the following
benefits:
- Increased ability to locate relevant content on the Microsoft intranet
- Increased employee satisfaction with enterprise search services
- Improved ability to publish content so that the content can be located
- Improved ability to find employees and subject matter contacts
- Improved integration with line-of-business applications
- Improved consistency of search functions and search results
- Improved overall uptime and access to enterprise search services
For More Information
For more information about Microsoft products or services, call the Microsoft Sales
Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information
Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact
your local Microsoft subsidiary. To access information through the World Wide Web,
go to:
http://www.microsoft.com
http://www.microsoft.com/technet/itshowcase
For information about IFilters, go to:
http://msdn2.microsoft.com/en-us/library/ms691105.aspx
The information contained in this document represents the current view of Microsoft
Corporation on the issues discussed as of the date of publication. Because Microsoft
must respond to changing market conditions, it should not be interpreted to be a
commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy
of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES,
EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user.
Without limiting the rights under copyright, no part of this document may be reproduced,
stored in or introduced into a retrieval system, or transmitted in any form or by
any means (electronic, mechanical, photocopying, recording, or otherwise), or for
any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other
intellectual property rights covering subject matter in this document. Except as
expressly provided in any written license agreement from Microsoft, the furnishing
of this document does not give you any license to these patents, trademarks, copyrights,
or other intellectual property.
Unless otherwise noted, the example companies, organizations, products, domain names,
e-mail addresses, logos, people, places, and events depicted herein are fictitious,
and no association with any real company, organization, product, domain name, e-mail
address, logo, person, place, or event is intended or should be inferred.
© 2007 Microsoft Corporation. All rights reserved.
Microsoft, Active Directory, Excel, Internet Explorer, OneNote, PowerPoint, SharePoint, Windows, and Windows Server are either registered
trademarks or trademarks of Microsoft Corporation in the United States and/or other
countries.
All other trademarks are property of their respective owners.