Microsoft SharePoint Portal Server 2001 Resource Kit

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.
On This Page

Planning
Analysis and Design
Deployment
Key Points
Management
Summary

Approximately 25,000 Microsoft users conduct nearly 125,000 searches each month (which translates to 750,000 queries against the search server) across the corporate intranet, called corpnet. Consequently, a small increase in performance can improve the search experience for users.

As Microsoft's Information Technology Group (ITG) approached beta testing of Microsoft® SharePoint™ Portal Server 2001 in the summer of 2000, it used Microsoft Site Server 3 as its enterprise search solution. With Site Server 3, Microsoft employees worldwide could easily find and aggregate information from across the enterprise.

Migrating to SharePoint Portal Server yielded two key benefits:

More relevant and timely search results delivered to users. 

  • Latency, or response time, improved by 22 percent. 

  • Indexes updated nightly by using adaptive updates. 

Improved crawling performance. 

  • Full update of an index nearly three times faster than Site Server 3. 

  • Adaptive update of an index seven times faster than Site Server 3. 

Microsoft employees now enjoy more relevant and timely search results because of the improvements in performance, nightly updates to the content indexes, and the new probabilistic ranking algorithm used for relevancy ranking.

This chapter describes the ITG deployment plan of SharePoint Portal Server and the subsequent results. It provides detailed information and recommendations based on this deployment. It includes technical information on the existing environment, design decisions, deployment steps, and testing considerations. It concludes with a summary of recommendations based on this experience.

Note This is not intended to serve as a procedural guide. The intranet site names provided are for illustration only and do not necessarily reflect actual names.

Planning

Cc750141.spacer(en-us,TechNet.10).gif Cc750141.spacer(en-us,TechNet.10).gif

The migration from Site Server 3 to SharePoint Portal Server for intranet search at Microsoft included the following stages: Planning, Analysis and Design, Deployment, and Management.

Identifying Deployment Goals

In addition to running the enterprise IT utility, ITG plays a strategic role as one of Microsoft's early adopters, testing and deploying Microsoft software before customer release. All ITG early adoption efforts must show tangible business benefits to Microsoft beyond testing for scale and load in a real-world production environment. This was true for the SharePoint Portal Server beta deployments.

Among other benefits and services, this deployment extends the "Microsoft software as a service" model to continue to provide:

  • A customer-specific, service-level agreement for each portal owner that defined the service and clearly stated the procedures for support and maintenance over time. 

  • Search across multiple (even disparate) content sets. 

  • Better performance and more timely and relevant results. 

  • The inclusion of existing content and additional content in the index. 

The project team established one key metric to measure their success. The team had to ensure that the system handled the stress of crawling about 6 million documents in a time frame that matched their existing results. The existing enterprise search solution included only about 3 million documents in an index. The team also planned to add additional intranet content to the indexes. In addition, ITG required additional room for the growth of content over time.

To verify that SharePoint Portal Server would handle the same load as Site Server 3, the team ran both products in parallel for 30 days before retiring the Site Server 3 solution.

Establishing a Project Timeline

ITG began planning in the summer of 2000 to test SharePoint Portal Server as an enterprise index and search technology through all interim releases, including Beta 1, Beta 2, Release Candidates, and the final release-to-manufacturing (RTM) version.

The team divided the project into the following four phases:

Planning

  • Establish the team. 

  • Collect information on the current environment. 

  • Develop a project plan. 

Analysis and Design

  • Create the architecture and select the hardware. 

  • Review and redefine the catalogs. 

Deploying

  • Install the hardware and software. 

  • Configure servers running SharePoint Portal Server. 

  • Create workspaces. 

  • Set up content sources and site rules. 

  • Complete property mapping from custom document properties to the SharePoint Portal Server schema. 

  • Modify Active Server Pages (ASPs) for searching and for returning results. 

  • Test crawling. 

  • Test searching. 

  • Operate Site Server 3 and SharePoint Portal Server in parallel. 

Managing

  • Make the transition to production. 

  • Manage operations and perform maintenance. 

The team spent about nine months on this effort from beginning to end, working part-time. From midsummer when the team was formed until the end of the year 2000, the team spent most of its time testing the index and search capabilities of SharePoint Portal Server and optimizing for the goal of 6 million documents, as shown in Figure 27.1.

The migration to production began in early January 2001 with development of the search page and completion of the final tests of crawling. In mid-February, ITG set up the parallel environment. Before RTM in mid-March, SharePoint Portal Server replaced Site Server 3 for search queries on the primary corporate portal, called MSWeb, and the Product Group Portal. After RTM, ITG began converting all major portals at Microsoft to SharePoint Portal Server for search. When this process is complete, Microsoft will retire the Site Server 3 solution throughout the corporate intranet.

Cc750141.f27xx01(en-us,TechNet.10).gif

Figure 27.1 Project and development timeline 

Based on its experience, the team estimates that a typical enterprise customer migration of similar scale might take approximately three months, as illustrated in the following table.

Typical Enterprise Project Timeline 

 

Month 1

Month 2

Month 3

1. Planning

1 week

 

 

2. Catalog review (optional) Note: can parallel activities 3 and 4

1–4 weeks

 

 

3. Hardware installation and setup

1 week

 

 

4. Configuration of servers and workspaces, and setup of site rules

1 week

 

 

5. Test of crawling operations

 

2 weeks

 

6. Modify existing custom ASP pages

 

1 week

 

7. Test of search page

 

1 week

 

8. Parallel operations

 

 

2–4 weeks

9. Ongoing catalog and index review

 

 

 

Collecting Information

The next part of the planning process included collecting critical information about the existing environment, including several critical components.

Hardware Specifications

Hardware specifications for both crawl and search servers:

  • Specify similar hardware for test comparison. 

  • Comply with upgrades in accordance with ITG hardware standards. 

Architecture Diagrams

Architecture diagrams, indicating:

  • Hardware, network, propagation paths 
Catalog Information

List of all Site Server 3 catalogs, including:

  • The server on which the catalogs are stored 

  • Who owns the catalogs 

  • How frequently the catalogs are crawled 

  • What start addresses and site rules are contained in the catalogs 

  • Whether complex URLs are enabled 

Key metrics for each individual catalog and across all catalogs:

  • For index (per catalog): number of site rules, number of documents, catalog size, time to crawl, and propagation time 

  • For search (total and per catalog): number of queries per month and at peak load 

Network Environment

Network factors, including:

  • Networking protocols 

  • Firewall configuration 

The project team examined the existing network environment for possible factors that would affect deployment, but determined that they did not need to make any configuration changes.

User Environment

Unique environmental factors including:

  • The Microsoft corpnet spans the world. Consequently, SharePoint Portal Server must crawl documents in multiple languages for inclusion in the content indexes and must allow users to submit queries in multiple languages. SharePoint Portal Server allows users to submit queries in English, French, Italian, German, Swedish, Spanish, Dutch, Japanese, Chinese Simplified, Chinese Traditional, Korean, and Thai. 

  • Corpnet is in use worldwide 24 hours a day, 7 days a week, 365 days a year. 

  • Security is enforced per document at the file share level. 

Analysis and Design

Cc750141.spacer(en-us,TechNet.10).gif Cc750141.spacer(en-us,TechNet.10).gif

After collecting information and creating a deployment plan, the project team synthesized information to provide a description of the existing infrastructure and a vision of the new infrastructure.

Searching Using Site Server

Originally, most sites within Microsoft did not offer any type of search. Individual departments or groups built their own sites, and the overhead of setting up, running, and maintaining a search capability on each site was burdensome. The major business division portals—such as IT, HR, Product, Finance, Sales, Support, Legal, Operations, and Microsoft Corporate—offered some search capability. The basic problem was that they all set up their own environments and often crawled each other's sites, resulting in duplication of efforts, sometimes three or four times over.

Site Server 3 became the backbone of this centralized search solution. It was set up with dedicated servers for crawling and searching. Site Server created a catalog for each site. The owner of each site or portal specified what content to include or exclude from the catalog for its site, in addition to what, if any, content on its site should not be crawled.

After developing the process for including content in an index, ITG created a set of custom ASP pages—one for querying and one for returning results. ITG modified these pages to fit each portal's needs for custom query capabilities and custom results sets. One by one, the major portals moved to this search solution because they could get better search capabilities for less effort. After all the major intranet sites had migrated, a number of second-tier sites also implemented this search solution.

Site Server 3 Infrastructure

The Site Server 3 architecture at Microsoft consisted of one search server and two crawl servers. The crawl servers included content in their indexes from their respective catalogs, and then propagated the information from the catalogs to the search server. This architecture, shown in Figure 27.2, ensured that the search capability was always available to users.

Cc750141.f27xx02(en-us,TechNet.10).gif

Figure 27.2 Site Server 3 search solution architecture 

This solution crawled about 3 million corporate intranet documents and files, handling nearly 30,000 queries per day. There were 48 catalogs on these servers, and many sites requested that their searches include several of these catalogs.

Searching with SharePoint Portal Server

SharePoint Portal Server is a complete solution—integrated document management, corporate portal, and search. However, this deployment implements only the search and index creation aspects of SharePoint Portal Server. Because this migration does not include the dashboard site and document management features, separate teams started projects to test those features.

The project team modeled the new design largely on the existing Site Server 3 design. The team modified the existing set of custom search and results pages to handle SharePoint Portal Server in addition to Site Server 3. In this design, as each portal migrates to SharePoint Portal Server, the portals simply change their Web forms to point to the new query page on the SharePoint Portal Server computer.

SharePoint Portal Server Propagation Model

The propagation model includes two servers dedicated to creating and maintaining indexes and one server dedicated to searching as part of the centralized search service, as illustrated in Figure 27.3.

Cc750141.f27xx03(en-us,TechNet.10).gif

Figure 27.3 Enterprise search tiered server architecture 

The server dedicated to searching stores a copy of the index propagated from the index workspaces of the servers dedicated to creating indexes.

Note The task of creating an index is resource intensive. Consequently, with SharePoint Portal Server, you can create an index workspace on a separate server to isolate the tasks associated with creating and maintaining indexes from other SharePoint Portal Server tasks. After you create the index, SharePoint Portal Server propagates it to the server dedicated to searching. SharePoint Portal Server propagates the index immediately after creating it, or you can schedule the creation of the index to coincide with times of low network traffic.

For more information about this scenario, see Chapter 3, "Introducing SharePoint Portal Server: Configuration Flexibility."

Architecture Comparison

The migration to SharePoint Portal Server did not change the basic architecture for searching across the corpnet. The Site Server 3 architecture used two servers dedicated to crawling content and one search server. The hardware configuration for the Site Server 3 architecture included one server with four processors, used for searching, and one server with two processors, both used for crawling. The largest Site Server 3 catalog existed on a server with four processors. The SharePoint Portal Server architecture uses the same architecture as Site Server except that both servers used for creating and maintaining indexes use four processors. This difference in hardware configuration did not affect the results because most performance measures were made by using the largest catalog.

The project team estimated that additional RAM might also help performance. A master merge is an MSSearch process in which separate content index sub-files are merged into a single content index file. Because SharePoint Portal Server performs master merges less frequently while updating indexes, performance on the server used for creating and maintaining indexes improves with additional memory. Previous tests of additional RAM on the servers running Site Server 3 and Microsoft Windows NT® 4 did not show significant performance gains. However, the ITG corporate server standard for operating systems changed from Windows NT 4 to Microsoft Windows® 2000 Advanced Server. Windows 2000 makes better use of additional memory than Windows NT 4. Therefore, the project team doubled RAM to 512 megabytes (MB) on each server that hosted an index workspace.

The project team estimated hard disk size requirements based on the index size in Site Server 3 and added room for growth. After determining this number, they doubled it to hold a backup copy of the indexes on the server. The ITG standard hard disk configuration for running SharePoint Portal Server places the document store that includes documents and associated metadata on one hard disk, the content indexes on a second disk, and the logs on a third disk to minimize bottlenecks and maximize input/output (I/O) throughput.

Server Configurations

The following table lists the server configurations that ITG used for this project.

Enterprise Search Hardware Configurations 

Hardware configuration

Enterprise search

Index 1

Index 2

Processor

4 X 550 megahertz (MHz)

4 X 550 MHz

4 X 400 MHz

Memory (initial)

512 MB RAM

512 MB RAM

512 MB RAM

Memory (final)

2 gigabytes (GB) RAM

2 GB RAM

512 MB RAM

Disk space

92 GB

68 GB

35 GB

OS

Windows 2000 Advanced Server SP1

Windows 2000 Advanced Server SP1

Windows 2000 Advanced Server SP1

Note As the table shows, the team increased RAM in one of the crawl servers to test scalability; this nearly doubled the crawl speed. The team also increased RAM in the search server to provide approximately 1 GB for the server to cache the property store. This reduced latency.

SharePoint Portal Server Architecture

Figure 27.4 shows the current architecture at Microsoft for enterprise search.

Cc750141.f27xx04(en-us,TechNet.10).gif

Figure 27.4 SharePoint Portal Server architecture 

Reviewing the Catalog

Site Server 3 creates catalogs to enable searching of content. SharePoint Portal Server creates indexes. An index is a resource that is built to enable full-text search of documents, document properties, and content stored outside the workspace but made available through content sources. A workspace can include multiple propagated indexes. When you create the workspace, SharePoint Portal Server automatically creates one index. You can propagate indexes only from index workspaces and only to a single destination workspace on another server (usually a server that is used primarily for searching). A destination workspace can accept indexes from up to four index workspaces. An index workspace is designed to manage only content sources.

The review identified 48 catalogs in the Site Server 3 environment. The primary intranet catalog included approximately 2.5 million documents; the remaining half million documents were spread across the other 47 catalogs.

Search Scopes

There were two main reasons to redefine the catalogs using search scopes. First, many of these catalogs wasted resources crawling the same content. Second, because the SharePoint Portal Server search service is multi-threaded, it was possible for the SharePoint Portal Server to have two threads crawling the same content at the same time.

Search scopes in SharePoint Portal Server offer the ability to restrict searching to a subset of an index. Scopes label entries in the full-text index so that they can be quickly identified by queries to deliver faster and more relevant information. The design of the index handles the search scopes by ensuring that the server passes the correct catalog parameters to the custom search page.

The project team created search scopes to help classify content for a single index without having to create additional workspaces. For example, suppose that Human Resources Web and Legal Web wanted to offer search of their own sites, but both wanted to include the Policy site. Instead of having two separate workspaces for each and crawling the Policy site twice, the team created a single workspace with three search scopes. The team created a scope of the content source pointing to the Policy site called "Policy" and then created a scope for all the content sources pointing to the Legal sites called "Legal." They also created a scope, called "HR," for all the content sources pointing to the Human Resources site. This reduced the number of index workspaces from three to one and prevented crawling the Policy site twice. From the Human Resources site, users can also search the Human Resources and Policy sites by using the different search scopes. Likewise, from the Legal site, users can also search the Policy and Legal sites by using the different search scopes. The queries return more relevant query results by using only the relevant search scopes.

Query Performance

Another consideration in catalog review and redesign was query performance and load balancing. Although search scopes are useful, overusing them can cause performance issues. One logical extension of search scopes includes crawling everything in one workspace, and creating scopes for each content source accordingly. In that case, using the index from the single workspace with many scopes performs all queries. However, as the number of search scopes increases, query performance declines and the index size increases. Because of this, the project team decided to limit search scopes to only two or three, and mainly in smaller workspaces.

An alternative approach is to create a workspace for each site or group of sites on the intranet, and then create a query that spans both workspaces. This also causes query performance to decline as you increase the number of index workspaces included in the query, so the team also decided to limit these types of queries to include only two or three workspaces.

Duplication

The team reviewed the existing catalog structure to eliminate redundant crawling. They reviewed the content sources and created a better design. During the process, the team closely examined scopes or queries across index workspaces that might compromise performance. In certain cases, performance was improved by crawling the same content twice from different workspaces and having search run a query against one workspace rather than having multiple dashboard sites query only one workspace.

To conduct the review of the catalogs, the team described each Site Server 3 catalog in a Microsoft Excel spreadsheet, as shown in the following tables.

Reviewing Content Sources 

Content source

Hops and depth

Adaptive

Scope

Schedule

\\server01\d$\ Inetpub\handbook

This folder and all subfolders

Yes

Handbook

None

\\server01\d$\ Inetpub\ humanresourcesWeb

This folder and all subfolders

Yes

None

None

https://search1/sas/ dir.asp?setid=1

1 page hop, 0 site hops

No

None

Weekly

Reviewing Site Path Rules 

Site path rules

Crawl account

Complex URLs

Avoid

file://server01/d$\inetpub\handbook\*_vti*\*

 

 

Crawl

file://server01/d$\inetpub\handbook\*

default

Yes

Avoid

file://server01/d$\inetpub\humanresourcesrweb\*_vti*\*

 

 

Crawl

file://server01/d$\inetpub\hrweb\*

default

No

Reviewing Catalog Information 

Source

Display Mappings

\\server01\d$\Inetpub\handbook

https://corphandbook/

\\server01\d$\Inetpub\hrweb

https://hrwebsite/

The team then compared and identified catalogs to consolidate. The initial examination reduced more than half the number of catalogs, from 48 to 20. After several iterations, the team reduced the number of catalogs to 11.

Consolidation and Workspace Creation

As an outcome of this exercise, the team decided to create a one-to-one correspondence between remaining catalogs and workspaces. Figure 27.4 shows the final layout of the servers and workspaces.

Identifying Key Points

The key points learned in the Analysis and Design phase were:

  • Deployment requires no significant hardware change. Additional memory or processors improve performance. 

  • Migration is a great time to review and clean up catalogs. 

  • Catalog redesign requires a variety of approaches:

    • Remove duplicate crawls of content where possible. 

    • Limit searches to no more than two or three scopes or workspaces. 

Deployment

Cc750141.spacer(en-us,TechNet.10).gif Cc750141.spacer(en-us,TechNet.10).gif

The deployment phase included installing hardware and software, modifying settings, and testing. After deploying the SharePoint Portal Server environment, ITG ran it in parallel with Site Server 3.

Installing and Modifying Settings

This section reviews the installation and configuration for the workspaces. In particular, it reviews the process for creating content sources.

Install Hardware and Operating Systems

The project team installed the hardware for the SharePoint Portal Server deployment in the same data center as the Site Server 3 environment, so network connectivity and other environmental variables remained the same.

Next, the team installed the operating system. For more information about installation requirements, see Chapter 11, "Installing SharePoint Portal Server." You must deploy a server dedicated to searching before deploying a server dedicated to index workspaces. When you create an index workspace, you must specify the destination workspace, as shown in Figure 27.5. Therefore, the project team began by first configuring the server dedicated to searching and then configuring the servers that would host index workspaces.

Cc750141.f27xx05(en-us,TechNet.10).gif

Figure 27.5 Creating an index workspace 

Specify Workspace Settings

The team specified the settings as detailed in the following table.

Workspace configuration settings 

 

Enterprise search

Index 1

Index 2

Catalog Name

All catalogs propagate to the Enterprise Search server

BestbetsCorpPortal, HumanResourcesWeb, corporate portal, WebCat2, WindowsUA

ITG portal, KBInt portal, corporate portal param, Product Group Portal, SAP portal, MSWordTest

General

 

 

 

Indexing Resource Usage

1 (Background)

5 (Dedicated)

5 (Dedicated)

Search Resource Usage

5 (Dedicated)

1 (Background)

1 (Background)

Site Hit Frequency Rules

None

None

None

Proxy Server

 

 

 

Do not connect using a proxy server

Disable

Enable

Enable

Use the proxy server settings of the default content access account

Enable

Disable

Disable

Use the proxy server specification below

Disable

Disable

Disable

Default File Types in Site Server 3 removed from catalog

asp, doc, htm, html, ppt, xls, txt, exch,

asp, doc, htm, html, ppt, xls, txt, exch

asp, doc, htm, html, ppt, xls, txt, exch

Removed from Enterprise Search:

nsf, xml, odc, tiff, eml, dot, tif, mht

nsf, xml, odc, tiff, eml, dot, tif, mht

nsf, xml, odc, tiff, eml, dot, tif, mht

The team specified a System Resource Usage of 5 as the default for the servers hosting index workspaces. This allows full system resource usage when the server crawls content.

Note SharePoint Portal Server provides resource usage controls for searching and index creation, the two resource-intensive processes that are commonly performed on SharePoint Portal Server computers.

It is recommended that you balance resource usage to optimize performance depending on your server configuration. If you distribute searching and index creation across multiple servers, dedicate resources on each computer to the specific task that each computer performs. If you use one server to perform both index creation and searching, balance resource usage evenly between the two processes.

By design, this enterprise search solution does not crawl content outside the firewall. To allow SharePoint Portal Server to crawl only internal sites but without having to specify many rules (for example, exclude all *.com, *.edu, *.org), the team disabled the proxy server on each of the servers that hosted index workspaces. This prevented crawling anything outside the corporate environment.

To minimize unnecessary security changes, SharePoint Portal Server uses the same accounts to crawl and propagate content as Site Server 3. As with Site Server 3, SharePoint Portal Server respects Access Control Lists (ACLs). The use of ACLs maintains security as implemented in each of the original content sites.

Create Workspaces

The team created one workspace to correspond to each Site Server 3 catalog. After creating all the workspaces, the team created the content sources. Figure 27.6 shows an example of the content sources (called start address in Site Server 3) in one workspace.

Cc750141.f27xx06(en-us,TechNet.10).gif

Figure 27.6 Example of content sources 

Most workspaces contained several content types. A single content source cannot refer to different content types, but you can refer to multiple content types in a workspace.

During testing, the team discovered the following tips for properly configuring hops and depth:

  • To crawl this entire site, set SiteHops to 0 and set page depth to unlimited. 

  • To crawl a single page, set SiteHops to 0 and set page depth to 0. 

  • Custom: Manual setup for the number of sites and hops. 

For tracking purposes, the project team created a matrix showing which workspaces and sites used complex URLs and which content sources used which protocols, as shown in the following table.

Note The team restricted the use of complex URLs to well-known parameterized URLs, to minimize the risk of crawling URLs that continued to generate additional links without end.

Tracking Spreadsheet 

Workspace name

Complex URL

File protocol

HTTP protocol

Exchange protocol

bestbetsCorporatePortal (Index 1)

Y

N

Y

N

Corporate Portal Intranet (Index 1)

N

Y

Y

Y

HumanResourcesWeb (Index 1)

Y

Y

Y

N

WebCatalog2 (Index 1)

Y

N

Y

N

WindowsUA (Index 1)

Y

Y

Y

Y

Corporate Portal Param (Index 2)

Y

Y

Y

Y

ITG portal (Index 2)

Y

Y

Y

Y

KBInt portal (Index 2)

N

Y

N

N

Product Group Portal (Index 2)

N

Y

Y

Y

SAPWeb (Index 2)

Y

Y

Y

Y

MSWordTest (Index 2)

Y

Y

Y

Y

Modify Additional Settings

The team specified three additional settings when configuring content sources: site path rules, Access/Display mappings, and file types.

Figure 27.7 shows the properties page for modifying site path rules in a single workspace.

Cc750141.f27xx07(en-us,TechNet.10).gif

Figure 27.7 Example of site path rules 

The spreadsheet of catalogs created during the Analysis and Design phase contained the site path rules and mappings. It is critically important that the site path rules be set exactly as intended. For more information about adding content sources, see Appendix B, "For More Information."

Create Content Sources

The following principles can assist you when you need to create content sources:

  • Site path rules match in order from the top down. 

  • Use the asterisk (*) character with care. For example, an inclusion rule for https://searchserver/* crawls all subdirectories on the site. By contrast, an inclusion rule for https://searchserver/ crawls only the home page of that server. 

  • Enable complex URLs to crawl links with parameters following a question mark (?) in the link; for example, default.asp?name=abc. 

  • To exclude a protocol, add a site restriction as follows: 

    File:* 

    https://* 

    Exch:* 

Map Properties across Workspaces

SharePoint Portal Server crawls the text content of a Microsoft Office document and standard Office summary properties. If you want to include additional properties, you must create a document profile in the workspace with those properties. SharePoint Portal Server includes the metadata from the document profile in the index.

Important When SharePoint Portal Server propagates the indexes to a server dedicated to searching, the destination server must possess the same document profiles.

You must map properties of HTML documents or custom metadata of external documents to a document profile. This allows SharePoint Portal Server to crawl the additional properties. HTML files usually store custom properties in <META> tags. For more information about mapping custom properties, see Chapter 25, "Crawling Custom Metadata."

To map properties between servers, the project team performed the following procedure.

To map properties between servers:
  1. Create document profiles for each index workspace. 

    The team created a document profile called "Search Custom Tags" for each index work-space. Each workspace included additional metadata, as shown in the following table. 

    Example of property mapping for index workspaces 

    Workspace: bestbetsCorporatePortal

    Workspace: CorporatePortal

    META_Categories 

    META_Categories

    META_PageURL 

    META_PageURL

    META_XMLTerms 

    META_XMLTerms 

    META_Keyword 

    META_Keyword 

    Keywords 

    Keywords 

    Description 

    Description 

    Title 

    Title 

    Author 

    Author 

    Workspace: HumanResourcesWeb 

    Workspace: LibraryCatalog 

    META_Categories 

    META_MainAuthor 

    META_PageURL 

    META_itemtype 

    META_XMLTerms 

    META_pubdate 

    META_Keyword 

    META_subtitle 

    Keywords 

    Keywords 

    Description 

    Description 

    Title 

    Title 

    Author 

    Author 

  2. Create a document profile on the server dedicated to searching. 

    The team created a document profile with the same name used in step 1 on the server dedicated to searching. This document profile includes all the properties of the document profiles from each index workspace, as shown in the following table. 

    Example of property mapping for server dedicated to searching 

    Server dedicated to searching

    META_Categories 

    META_PageURL 

    META_XMLTerms 

    META_Keyword 

    META_MainAuthor 

    META_itemtype 

    META_pubdate 

    META_subtitle 

    Keywords 

    Description 

    Title 

    Author 

  3. Note The document profile on the server dedicated to searching must contain the union of the properties of all the document profiles on the servers that host index workspaces. Any properties that are mapped and crawled on the server that maintains indexes, but are not present in the document profile on the server dedicated to searching, are not available in the index workspace that propagates to the search server. 

  4. Run the property mapping script. 

    The team ran the property mapping script for each index workspace. For more information about this script, see Chapter 25. 

    Note It is important to note that the account credentials under which the property mapping script runs must have administrator rights on the server and coordinator roles on the workspace. 

  5. Restart services on the servers. 

    To flush the caches, the team restarted the following services on the servers hosting index workspaces:

    • SharePoint Portal Server 

    • Microsoft Exchange Information Store 

    • Microsoft Search 

  6. Start a full update of the index. 

    After restarting the services, the team reset the index and began a full update. 

Modify Search Pages

The existing search solution allowed customized query and results sets for each portal. Because of this, the team chose not to use the default dashboard site provided as part of SharePoint Portal Server.

By contrast, many customers may have only a single centralized search page to which all internal sites link. These customers could simply replace the existing page with the Search dashboard from SharePoint Portal Server and avoid creating custom search pages.

From each portal, a user uses a search box to submit queries. After submission, the user is redirected to a hosted ASP page on the server dedicated to searching. Site Server 3 takes the following steps during this process:

  1. Accepts the query from the referring site 

  2. Executes the search by using Site Server 3 Component Object Model (COM) objects 

  3. Receives the results set 

  4. Converts the results into Extensible Markup Language (XML) 

  5. Returns the XML to the user's browser (Microsoft uses Microsoft Internet Explorer 5.5 and passes XML to the client) 

The transition from Site Server 3 to SharePoint Portal Server required the project team to modify step 2 and step 4 of the preceding process. For step 2, the team changed the query so that it used the Structured Query Language (SQL) syntax with full-text extensions instead of native Site Server 3 COM objects.

The following example illustrates a SELECT statement using WebDAV in SharePoint Portal Server.

SELECT "urn:schemas-microsoft-com:office:office#Office", "DAV:parentname", "DAV:href",  
"urn:schemas-microsoft-com:office:office#Title",  
"urn:schemas.microsoft.com:fulltextqueryinfo:description", "urn:schemas-microsoft- 
com:office:office#META_PageURL","urn:schemas-microsoft-com:office:office#META_Categories",  
rank, "DAV:getcontentlength", "DAV:getcontenttype", "DAV:getlastmodified" 
FROM TABLE corpportal..SCOPE() 
WHERE WITH ("urn:schemas-microsoft-com:office:office#Title",  
"urn:schemas.microsoft.com:fulltextqueryinfo:description",  
"urn:schemas.microsoft.com:fulltextqueryinfo:contents") AS #DocDesc (FREETEXT (#DocDesc,  
'401k') 
RANK BY COERCION ABSOLUTE , 1000)) ORDER BY rank DESC 

Note The SELECT list returns the mapped meta properties (in the Office namespace).

The team used the workspace-level scope to restrict results to one of the index workspaces. They also used group aliasing in addition to freetext and rank coercion. For more information about restricting search results, see Appendix B.

To modify step 4 in the preceding process, the team modified the process for formatting results. Originally, the page used a custom routine to create XML from the results set for Site Server 3, but SharePoint Portal Server returns XML natively. This eliminated the need to convert results to XML. The team simply applied an Extensible Stylesheet Language (XSL) transformation to achieve the formatting they wanted.

Samples of the ASP pages for Site Server 3 and SharePoint Portal Server are provided in the following code.

Site Server 3 Search ASP Page Sample Code

This is a sample of the Site Server 3 ASP code.

<%@LANGUAGE="VBScript" %> 
<% ' Copyright 1997-1998 Microsoft Corporation. All rights reserved. %> 
<% 
DisplayText=Request("q1") 
RecordNum=Request("RecordNum") 
if RecordNum= "" then RecordNum=1 
%> 
<html> 
<head><title>Search Page</title> 
<meta http-equiv=content-type content="text/html; charset=iso-8859-1"> 
<meta http-equiv=[cchev]content-language[cchev] content=[cchev]EN[cchev]> 
</head> 
<body text="#000000" link="#000000" alink="#000000" vlink="#000000" topmargin=17  
leftmargin=15 bgcolor="ffffff"> 
<form method=get>Search: 
<input type=Text name="q1" value="<%=DisplayText%>" size="23"> 
<input type=submit name="Search" value="Go"> 
<input type=hidden name="ct" value="MyCatalog"> 
</form> 
<% 
If DisplayText <> "" Then 
%>Searching for <b><%=DisplayText%></b> 
<% 
' Set query and utility objects, and define query object properties. 
set util = Server.CreateObject("MSSearch.util") 
set Q = Server.CreateObject("MSSearch.Query") 
Q.SetQueryFromURL(Request.QueryString) 
Q.MaxRecords = 25 
Q.SortBy = "Rank[d],DocTitle" 
Q.Columns = "DocTitle, DocAddress, FileWrite, Size, Description, FileName,  
DocSignature, Rank, DetectedLanguage, MimeType, SiteName, NNTP_MessageID" 
' Create the recordset holding the search results. 
on error resume next 
set RS = Q.CreateRecordSet("sequential") 
if err then 
createerror = err.description 
createerrnumber = err.number 
end if 
' Error description. 
if err then 
Response.write createerror 
' Display results 
else 
Response.write "<table><tr><td><font size = 2>" 
' Set up number found. 
NumberFound= RS.Properties("RowCount") 
if RS.Properties("RowLimitExceeded") = true then 
NumberFound = "More than " & NumberFound 
end if 
' Set up loop to iterate through results. 
Do while not RS.EOF 
' Set up title for links, providing an alternative if DocTitle is blank. 
if RS("DocTitle") <> "" then 
Title = RS("DocTitle") 
else 
Title = "No title: " & RS("DocAddress") 
end if 
' Set up link itself. 
Link = RS("DocAddress") 
' One table is used for each search result. 
Response.write "</font></td></tr><tr><td> </td></tr></table>" 
Response.write "<table cellpadding=0 cellspacing=0>" 
Response.write "<tr><td width=21><font size=2><p>" 
Response.write "<table cellpadding=1 cellspacing=1 border=0><tr><td  
align=top>" 
Response.Write "<font size='2'>" & RS("Rank") & "</font>" 
%> 
</td></tr></table> 
<% 
Response.Write "</font></td>" 
Response.Write "<td bgcolor='#80BBDD'><font size=2>" 
%> 
<a <% = LinkTarget %> href='<% = Link %>'><% = Title %></a> 
</font></td></tr><tr><td></td><td><font size=2> 
<% Response.write util.TruncateToWhiteSpace(RS([ochev]Description[cchev]),250) %> 
</font></td></tr> 
<tr><td></td><td height=5></td></tr> 
<tr><td></td><td> 
<font color=808080 size=1>[<% = util.TruncateToWhiteSpace(RS("FileWrite"),  
12 ) %>] 
<% iSize = CInt(CLng(RS("Size"))/1024) %> 
  (<% = iSize %>k)   
</font> 
<%  
' Increment the results. 
RS.MoveNext 
RecordNum = RecordNum + 1 
Loop 
Response.write "</font></td></tr></table>" 
' If there are more results pages, set up the "More Results" link. 
if RS.Properties("MoreRows") = true then 
Q.StartHit = RS.Properties("NextStartHit") 
' Repeat query with new start hit. 
L_MoreResults_link = "More Results" 
MoreLink = "<a href=?" & Q.QueryToURL & "&" _ 
& "DisplayText=" & Server.URLEncode(DisplayText) & "&" _ 
& "RecordNum=" & RecordNum _ 
& ">" & L_MoreResults_link & "</a>" 
end if  
%><% = MoreLink %> 
</font></td> 
</tr> 
</table> 
<%  
End if 
End If 
%> 
SharePoint Portal Server Search ASP Page Sample Code

This is a sample of the SharePoint Portal Server ASP code.

<%@LANGUAGE="VBScript" %> 
<% ' Copyright 2001 Microsoft Corporation. All rights reserved. %> 
<%  
DisplayText=Request("q1") 
ct=Request("ct") 
If DisplayText = "" Then %> 
<html> 
<head><title>Search Page</title> 
<meta http-equiv=content-type content="text/html; charset=iso-8859-1"> 
<meta http-equiv="content-language" content="EN"> 
</head> 
<body text="#000000" link="#000000" alink="#000000" vlink="#000000" topmargin=17  
leftmargin=15 bgcolor="ffffff"> 
<form method=get>Search: 
<input type=Text name="q1" value="<%=DisplayText%>" size="23"> 
<input type=submit name="Search" value="Go"> 
<input type=hidden name="ct" value="MyCatalog"> 
</form> 
<%  
Else 
Response.ContentType = "text/xml" 
Response.Write("<?xml version='1.0' encoding='ISO-8859-1'?>" & vbCRLF) 
Response.Write("<Results xmlns:dt='urn:schemas-microsoft-com:datatypes'>")  
set oProc = Application("StyleTransform").createProcessor 
Set xh = Server.CreateObject("Msxml2.SERVERXMLHTTP") 
strQuery = "<?xml version=""1.0"" encoding=""utf-8""?><a:searchrequest  
xmlns:a=""DAV:""><a:sql>" &_ 
"SELECT ""rank"", ""DAV:href"", ""urn:schemas-microsoft- 
com:office:office#Title"", ""urn:schemas.microsoft.com:fulltextqueryinfo:description"",  
""DAV:getcontentlength"", ""DAV:getlastmodified""" &_ 
"FROM " & ct & "..SCOPE() " &_ 
"WHERE WITH (""urn:schemas-microsoft-com:office:office#Title"",  
""urn:schemas.microsoft.com:fulltextqueryinfo:description"",  
""urn:schemas.microsoft.com:fulltextqueryinfo:contents"") AS #DocDesc (FREETEXT (#DocDesc,  
'" & DisplayText & "')) " &_ 
"ORDER BY ""rank"" DESC</a:sql></a:searchrequest>" 
'Make DAV request 
xh.setTimeouts 0, 6000, 6000, 0 
xh.open "SEARCH", "https://myServer/myWorkspace", False 
xh.setRequestHeader "content-type", "text/xml" 
xh.setRequestHeader "range", "rows=0-9" 
xh.setRequestHeader "MS-Search-MaxRows", 200 
xh.setRequestHeader "MS-Search-UseContentIndex", "t" 
xh.send strQuery 
'Process DAV response 
if xh.Status <> 207 then 
Response.Write "<error>Status: " & xh.Status & ". Status Text: " &  
xh.statusText & "</error>" 
Response.Write "<errorReason>&lt;![CDATA[" & xh.responseText &  
"]]&gt;</errorReason>" 
else 
if xh.responseXML.parseError.errorCode <> 0 then 
Response.Write "<error>XML response error code = " &  
xh.responseXML.parseError.errorCode & " " & xh.responseXML.parseError.reason & "</error>" 
end if 
'Display results 
if xh.responseXML.selectSingleNode("a:multistatus").haschildnodes = false  
then 
Response.Write("<ResultSet totalhits='0'><error>No documents match your  
query.</error></ResultSet>") 
else 
oProc.input = xh.responseXML.documentElement 
oProc.transform 
Response.Write(oProc.output) 
end if 
end if 
Response.Write "</Results>" 
End If 
%> 

Testing

Testing included two tasks. The project team verified that SharePoint Portal Server met the criteria for creating and maintaining indexes for the identified content. In addition, they verified that SharePoint Portal Server met the criteria for searching, including the criteria for workspace propagation process and speed, basic functionality, and the custom search page.

Index Testing

The team identified two goals for testing the process of creating an index:

  • SharePoint Portal Server can crawl all the content crawled by using Site Server 3. 

  • SharePoint Portal Server can crawl up to 6 million documents. 

The second goal verified the scalability of the SharePoint Portal Server search solution. ITG's goal was 6 million documents. That number was based on 3 million documents in the index at the beginning of the test, plus additional occasional sources, and an additional number used as a growth factor.

To measure crawl performance, the test team established several metrics. The following table shows these metrics according to source.

Index Test Metrics 

Data collection

Found at

Number of documents

Event viewer, SharePoint Portal Server Administration in Microsoft Management Console (MMC), ASP event log

Crawl status

SharePoint Portal Server Administration in MMC, Web folders view

Crawl start time

Event viewer application log

Crawl end time

Event viewer application log

Crawl duration

Manual calculation using the preceding data

Catalog size

SharePoint Portal Server Administration in MMC

Property store
Note Property store size is applied at server level and not at catalog level

Folder <…\SharePoint Portal Server\\FTData\ SharepointPortalServer\sps.edb>, by using Windows Explorer

The team executed each crawl several times. They refined the rules until they were satisfied the proper content was actually being included in the index. They used the dashboard search on the server dedicated to searching to assist with this check.

The team used the event viewer and gatherer log viewer from SharePoint Portal Server to examine the system to ensure that the index was operating normally and without problems. Figure 27.8 shows an example of the event viewer entries for starting and stopping the index.

Cc750141.f27xx08a(en-us,TechNet.10).gif

Cc750141.f27xx08b(en-us,TechNet.10).gif

Figure 27.8 Example event viewer entries 

The following table shows an example of the data collected to track crawls.

Example Index Test Metrics 

Catalog name

Full crawl # of docs

Full crawl duration

Full crawl prop. duration

Full crawl catalog size

Full crawl property store size

bestbetsCorpPortal (Index 1)

851

1 min

1 min

1 MB

4.61 GB

Corporate Portal Intranet (Index 1)

2,920,178

3,127 min

65 min

5,081 MB

 

HumanResourcesWeb (Index 1)

3,927

24 min

1 min

4 MB

 

WebCatalog2 (Index 1)

17,882

24 min

1 min

14 MB

 

WindowsUA (Index 1)

14,198

8 min

1 min

14 MB

 

CorpPortal Param (Index 2)

694

3 min

1 min

1 MB

1.04 GB

ITG portal (Index 2)

13,250

37 min

1 min

13 MB

 

KBInt portal (Index 2)

226,474

269 min

15 min

325 MB

 

Product Group Portal (Index 2)

159,257

224 min

19 min

605 MB

 

SAPWeb (Index 2)

3,609

47 min

1 min

3 MB

 

MSWordTest (Index 2)

15,233

11 min

1 min

24 MB

 

SharePoint Portal Server completed the full crawls with satisfactory results at a volume of about 3 million documents. ITG added more content sources for scale testing. Eventually, SharePoint Portal Server crawled just over 6 million documents. Crawl performance did not drop off due to the size of the index.

Next, the team tested incremental updates on each of the catalogs. The incremental crawls took about half the time of the original full index and proved successful.

Finally, the team tested adaptive crawling on the largest catalogs in multiple passes until the number of documents modified converged. In doing so, the team discovered that convergence took about eight passes for the largest workspace. In these passes, crawl time was reduced from 51 hours for a full index to less than 8 hours for the shortest adaptive crawl, a nearly sevenfold improvement. Figure 27.9 shows the index times per pass.

Cc750141.f27xx09(en-us,TechNet.10).gif

Figure 27.9 Adaptive crawl times 

The testing process involved the following steps:

  • Perform a full index: n days 

  • Perform an incremental index: day n+1 

  • Perform an adaptive update and track the number of documents changed: each night 

When an index reaches a steady state of number of documents updated or crawl time, it has converged. After convergence, the crawl time remains approximately the same each night, unless SharePoint Portal Server detects a large change in content such as a new site coming online.

Search Testing

ITG tested three additional features. First, they tested the workspace propagation process and times. Next, they tested the basic searching by using the dashboard site. Finally, they tested the custom search page.

When examining propagation, it is important to determine that propagation completes successfully. In addition, ITG needed an estimate of how long the propagation took to complete. The following table outlines the metrics and their sources.

Note You should measure the duration of propagation, from the start of the process on the server hosting the index workspace to the end of the process on the search server.

Search Test Metrics 

Data collection

Currently found at

Propagation status

SharePoint Portal Server Administration in MMC, Event Viewer

Propagation start time

Event Viewer (on both servers)

Propagation end time

Event Viewer (on search server)

Propagation duration

Manual calculation from data collected

The ITG team tested the results for simple full-text queries that used SharePoint Portal Server. After performing queries, they compared the results seen in Site Server 3 queries with those in SharePoint Portal Server to ensure that crawling returned the proper documents and appropriately followed the rules.

Finally, after completing the custom ASP page modifications, they tested the ASP pages. The test involved both the query and results pages. Final tests measured performance and accuracy of the results sets.

For query latency, the ASP page recorded the exact time of the request and the exact time of the response in an SQL database, along with other relevant data used to track usage metrics. The team created a set of 47 queries, most of them from the top 100 queries run the previous month. This set included one-term and two-term phrases and some unusual queries. They ran this set of queries on Site Server 3 and then on SharePoint Portal Server. The data collected included the time of the first request of a query and then the results of the next four queries for the same term. These latency times, in seconds, are shown in the following table.

ASP Page Performance Testing 

Product

Initial

#2

#3

#4

#5

Site Server 3

1.11

0.84

0.81

0.81

0.86

SharePoint Portal Server

4.28

0.65

0.65

0.65

0.65

ITG determined the disparity between initial response times with SharePoint Portal Server and Site Server 3 to be the cache. Because Site Server 3 was already in use and taking queries, many queries and terms were already loaded into memory. This helped reduce the initial response time. SharePoint Portal Server had none of the terms in memory, so all queries required reading from the disk. Subsequent queries with SharePoint Portal Server were 22 percent faster than Site Server 3.

In addition to faster query rates with SharePoint Portal Server, tests determined that the server dedicated to searching was capable of taking advantage of additional memory. When the team increased RAM from 1 GB to 2 GB on this server, latency time dropped. They allowed 1 GB of RAM for running the operating system and SharePoint Portal Server and 1 GB of RAM for caching the property store. Loading a large part of the property store helped improve performance by speeding access to data used in search queries. The numbers in the previous table were from the testing once the team added the additional memory, but before they ran the "warm-up" script.

To facilitate this pre-loading or "warm up" of the cache, the team developed a script that runs immediately after crawling completes and propagates. This script loads the cache with data, so the cache is ready when the service enters production. For more information about this script, see Appendix B.

Note If you set the maximum cache size too high, you can leave insufficient memory for SharePoint Portal Server, the operating system and any other applications on your server. A good rule-of-thumb is to leave at least 0.5 GB for use by SharePoint Portal Server and the operating system. For example, on a server with 2 GB of physical memory, set the minimum cache size to 1 GB and the maximum cache size to 1.5 GB (or less, if you have other applications running).

You must leave enough memory for other processes and for monitoring Microsoft Search objects in Performance Monitor.

Results of Testing

The search tests yielded the following results:

  • Average latency time was reduced by 22 percent (after the cache was pre-loaded). 

  • On the server dedicated to searching, you can improve performance when caching the property store by adding additional memory. 

  • To maintain NTLM credentials, ASP pages must be on the search server. 

After developing the custom ASP pages, the team validated the search results through testing. They added a link to the results page for Site Server 3, asking users to try the new search page that relied on SharePoint Portal Server. From this process, the team monitored the following data:

  • Query string and number of times it was requested 

  • Total number of queries 

  • Total number of unique users 

  • Average response times 

Key Points

Cc750141.spacer(en-us,TechNet.10).gif Cc750141.spacer(en-us,TechNet.10).gif

The key points learned during the Deployment phase and index testing were:

  • The test crawled about 3 million documents, with the largest catalog averaging 970 documents per minute. This represented a nearly threefold increase over 330 documents per minute with Site Server 3. In addition, SharePoint Portal Server crawled a larger number of documents and more diverse file types. 

  • Although comparable, the SharePoint Portal Server deployment used hardware that is more powerful. 

  • Adaptive crawling reduced crawling time from 51 hours to less than 8 hours. This represents a sevenfold increase. 

  • By using adaptive crawling, SharePoint Portal Server updates the largest index nightly instead of weekly. This results in more timely and relevant information. 

  • Additional RAM and processors make a significant difference in crawl performance with SharePoint Portal Server. 

  • To improve performance, optimize the site rules and content sources. 

Management

Cc750141.spacer(en-us,TechNet.10).gif Cc750141.spacer(en-us,TechNet.10).gif

To transition a portal that uses Site Server 3 for searching to SharePoint Portal Server, ITG modified the URL on the page where users perform search queries to point to the SharePoint Portal Server computer dedicated to searching. For example:

  • Existing URL: https://siteserver3/search/default.asp 

  • New URL: https://spsearch/search/default.asp 

First, the team modified the URL for searching on the primary corporate portal, MSWeb, to point to the SharePoint Portal Server computer. As expected, the load immediately increased from a few thousand queries per week to nearly 30,000 per day.

Next, the team modified the URL for searching on the Product Group Portal to point to the SharePoint Portal Server computer. This portal used the dashboard site included with SharePoint Portal Server for searching, instead of a custom ASP page. The team simply added a new Web Part to the search dashboard. This site handles about 2,000 searches per month.

The team continued this process for each of the major business portals across the corporate intranet. In addition to completing the transition to SharePoint Portal Server, ITG must continue to monitor performance for SharePoint Portal Server and to implement a disaster recovery plan that is compatible with SharePoint Portal Server. The next section reviews these steps.

Monitoring Performance

Through effective monitoring, ITG has determined that this deployment meets performance expectations. ITG also captures monitoring data for trend analysis to predict future problems and fine-tune alert thresholds.

Note Although separate administration is possible, ITG administers Windows 2000 and SharePoint Portal Server together.

Server activity for SharePoint Portal Server generates performance data that Windows 2000 can track and log on the system. The data is described as a performance object and is typically named for the component generating the data. Every performance object provides counters that represent data on specific aspects of the object. ITG monitors standard Windows 2000 Advanced Server performance objects, along with several specific objects for SharePoint Portal Server. For example, to monitor MSSearch, select the performance object called Microsoft Gatherer and the Heartbeats counter.

The performance objects to monitor for enterprise search include:

  • Microsoft Gatherer 

  • Microsoft Gatherer Projects 

  • Microsoft Search 

  • Microsoft Search Catalogs 

  • Microsoft Search Indexer Catalogs 

The following table describes the counters that ITG routinely monitors.

Monitoring Performance Objects 

Performance object

Counter

Explanation

Microsoft Gatherer

Documents Filtered (and Rate)

Number of documents attempted to be crawled since the service started.

 

Documents Successfully Filtered (and Rate)

Number of documents successfully crawled since the service started.

 

Documents Delayed Retry

Non-0 means the Microsoft Web Storage System is having problems; by default, retries until cleared.

 

Reason to Back off

Non-0 means crawling is paused, because of high disk I/O, low memory, etc.

 

Server objects

Number of servers crawled.

 

Time outs

Too high means network problems.

 

Adaptive Crawl Accepts

Documents accepted by adaptive update.

 

Adaptive Crawl Error Samples

Documents accessed for error sampling.

 

Adaptive Crawl Errors

Documents that adaptive update incorrectly rejects.

 

Adaptive Crawl Excludes

Documents that adaptive update excludes.

 

Adaptive Crawl False Positives

Number of false positives that occur when the adaptive update has predicted that a document has changed when it has not. If this number is high, the adaptive update algorithm is not modeling the changes in the documents correctly.

 

Adaptive Crawl Total

Documents to which adaptive update logic was applied.

Microsoft Gatherer Projects

Crawls In progress

Number of concurrent crawls.

 

Status Success (and Rate)

Number of documents successfully filtered for this workspace.

 

Status Error

Number of errors.

 

URLs in History

Number of URLs covered in all crawls.

 

Waiting Documents

Gatherer queue length—0 means idle.

Microsoft Search

Failed Queries

Number of failed queries.

 

Successful Queries (and Rate)

Number of successful queries.

Microsoft Search Indexer Catalogs

Merge progress 0–100%

Non-100 means indexes are currently being merged—crawl can be paused during that time.

 

Number of Documents

Number of documents in the catalog included in the index.

 

Index Size

Size of the index in megabytes

Planning for Disaster Recovery

The backup and restore process represents the only substantial change in the operation of SharePoint Portal Server over Site Server 3. SharePoint Portal Server provides a built-in script for backing up the entire server with all the workspace and catalog information to an image file. You can then restore this image on another server.

ITG uses the MSDMBACK utility installed with SharePoint Portal Server to copy the backup files to disk each night, and then uses Windows 2000 Backup to back up those files to tape.

Important This output from MSDMBACK must be saved to a local drive.

Because Windows 2000 Backup attempts to lock the files while backing up, which prevents crawls from continuing, servers that host index workspaces must be set to exclude the following directory from the Windows 2000 backup:

operating_system_drive\Program Files\SharePoint Portal  
Server\Data\FTData\SharepointPortalServer 

The MSDMBACK utility takes a snapshot of all necessary SharePoint Portal Server data directories as part of its backup.

Each night, ITG runs the backup process on each server that hosts an index workspace and the enterprise portal server. MSDMBACK is run, backing up the data to another partition according to the following steps:

  1. From the directory: operating_system_drive\Program Files\SharePoint Portal Server\Bin, run the following command: 

cscript msdmback.vbs /b "path_to_backup_file_name"

  1. where the path_to_backup_file_name parameter is the name of the backup file to be created. 

  2. The preceding command is entered into a .cmd file that is scheduled to run by using the Windows 2000 task scheduler. Note that the start in parameter specifies operating_system_drive\Program Files\SharePoint Portal Server\Bin. 

    Note The script/schedule task must be run under the context of an account that has administrator privileges on each server. 

  3. Run a full Windows 2000 backup to tape every night for each server. 

It is important to note that the backup process stores passwords for content sources in encrypted form in the backup image. The optional password used for the backup image (provided during backup) encrypts only the passwords. Use of the optional password does not encrypt the remainder of the backup image, including the documents and metadata. If the administrator loses the password that was used to create the backup image, and attempts to restore, the restoration succeeds, but the restored information for the content source access account is invalid. In addition, subsequent crawls of this content source may fail because of authentication failures.

The backup process also stores user name and password pairs that are used for content sources in encrypted registry files. The optional password provided during restoration decrypts only the user name and password pairs. If the administrator loses the password that was used to create the backup image, and then tries to restore the image, the restoration succeeds but leaves the user name and password pairs for content sources blank.

Identifying Key Points

Organizations that want to make the transition from Site Server 3 to SharePoint Portal Server may find the approach taken by Microsoft's ITG group (outlined in the following list) to be helpful:

  • Run a test server to learn how the new technology operates. 

  • Take the opportunity to examine site rules, property mappings, and catalogs; make necessary changes for enhanced performance. 

  • Build a custom ASP page if the SharePoint Portal Server dashboard site user interface is not used. 

  • Run Site Server 3 and SharePoint Portal Server in parallel to test performance. 

  • When the acceptance criteria have been met, remove the old environment and leave the SharePoint Portal Server environment. 

Summary

Cc750141.spacer(en-us,TechNet.10).gif Cc750141.spacer(en-us,TechNet.10).gif

Migrating to SharePoint Portal Server yielded two key benefits, including:

More relevant and timely search results delivered to users.

  • Latency, or response time, improved by 22 percent. 

  • Indexes updated nightly by using adaptive updates. 

Improved crawling performance.
  • Full update of an index nearly three times faster than Site Server 3 

  • Adaptive update of an index seven times faster than Site Server 3 

In addition to these benefits, ITG identified the following key points:

The migration process is straightforward. Migrating to SharePoint Portal Server is not complex. SharePoint Portal Server can use the same architecture as, and similar hardware to, Site Server 3. You can begin the catalog review process while ordering hardware and learning the product. Modifying existing ASP pages is simple, and you can use the built-in user interface included with SharePoint Portal Server. You encounter few changes in day-to-day operation from administering Site Server 3.

It is recommended that you review and refine existing catalogs. The appropriate time to review existing catalogs is before implementation. Over time, your catalogs have probably lost accuracy. Start addresses do not exist anymore; your servers crawl the same content multiple times; some catalogs are redundant or unnecessary. As you review catalog definitions, you can also review your internal customer requirements. Customers now have the opportunity to redefine and refine their requirements for searching.

SharePoint Portal Server gives improved full update performance and adaptive crawling benefits. With SharePoint Portal Server, you can crawl more content in the same amount of time as Site Server 3, using similar hardware. This provides room for growth. Alternatively, you can crawl existing content with less hardware than Site Server 3. This allows you to buy less expensive hardware. In addition, you can update existing content more frequently than Site Server 3, using similar hardware. This provides more timely and relevant results to your users.

Adding memory improves performance. Adding memory provides a quicker and less expensive way to improve performance than adding servers to your infrastructure.

This chapter describes the ITG deployment plan of SharePoint Portal Server and its subsequent results. It provides detailed information and recommendations based on this deployment. It includes technical information on the existing environment, design decisions, deployment steps, and testing considerations. It concludes with a summary of recommendations based on this experience.

Cc750141.spacer(en-us,TechNet.10).gif