Microsoft SharePoint Portal Server 2001 Resource Kit
Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. |
On This Page
Planning
Analysis and Design
Deployment
Key Points
Management
Summary
Approximately 25,000 Microsoft users conduct nearly 125,000 searches each month (which translates to 750,000 queries against the search server) across the corporate intranet, called corpnet. Consequently, a small increase in performance can improve the search experience for users.
As Microsoft's Information Technology Group (ITG) approached beta testing of Microsoft® SharePoint™ Portal Server 2001 in the summer of 2000, it used Microsoft Site Server 3 as its enterprise search solution. With Site Server 3, Microsoft employees worldwide could easily find and aggregate information from across the enterprise.
Migrating to SharePoint Portal Server yielded two key benefits:
More relevant and timely search results delivered to users.
Latency, or response time, improved by 22 percent.
Indexes updated nightly by using adaptive updates.
Improved crawling performance.
Full update of an index nearly three times faster than Site Server 3.
Adaptive update of an index seven times faster than Site Server 3.
Microsoft employees now enjoy more relevant and timely search results because of the improvements in performance, nightly updates to the content indexes, and the new probabilistic ranking algorithm used for relevancy ranking.
This chapter describes the ITG deployment plan of SharePoint Portal Server and the subsequent results. It provides detailed information and recommendations based on this deployment. It includes technical information on the existing environment, design decisions, deployment steps, and testing considerations. It concludes with a summary of recommendations based on this experience.
Note This is not intended to serve as a procedural guide. The intranet site names provided are for illustration only and do not necessarily reflect actual names.
Planning
The migration from Site Server 3 to SharePoint Portal Server for intranet search at Microsoft included the following stages: Planning, Analysis and Design, Deployment, and Management.
Identifying Deployment Goals
In addition to running the enterprise IT utility, ITG plays a strategic role as one of Microsoft's early adopters, testing and deploying Microsoft software before customer release. All ITG early adoption efforts must show tangible business benefits to Microsoft beyond testing for scale and load in a real-world production environment. This was true for the SharePoint Portal Server beta deployments.
Among other benefits and services, this deployment extends the "Microsoft software as a service" model to continue to provide:
A customer-specific, service-level agreement for each portal owner that defined the service and clearly stated the procedures for support and maintenance over time.
Search across multiple (even disparate) content sets.
Better performance and more timely and relevant results.
The inclusion of existing content and additional content in the index.
The project team established one key metric to measure their success. The team had to ensure that the system handled the stress of crawling about 6 million documents in a time frame that matched their existing results. The existing enterprise search solution included only about 3 million documents in an index. The team also planned to add additional intranet content to the indexes. In addition, ITG required additional room for the growth of content over time.
To verify that SharePoint Portal Server would handle the same load as Site Server 3, the team ran both products in parallel for 30 days before retiring the Site Server 3 solution.
Establishing a Project Timeline
ITG began planning in the summer of 2000 to test SharePoint Portal Server as an enterprise index and search technology through all interim releases, including Beta 1, Beta 2, Release Candidates, and the final release-to-manufacturing (RTM) version.
The team divided the project into the following four phases:
Planning
Establish the team.
Collect information on the current environment.
Develop a project plan.
Analysis and Design
Create the architecture and select the hardware.
Review and redefine the catalogs.
Deploying
Install the hardware and software.
Configure servers running SharePoint Portal Server.
Create workspaces.
Set up content sources and site rules.
Complete property mapping from custom document properties to the SharePoint Portal Server schema.
Modify Active Server Pages (ASPs) for searching and for returning results.
Test crawling.
Test searching.
Operate Site Server 3 and SharePoint Portal Server in parallel.
Managing
Make the transition to production.
Manage operations and perform maintenance.
The team spent about nine months on this effort from beginning to end, working part-time. From midsummer when the team was formed until the end of the year 2000, the team spent most of its time testing the index and search capabilities of SharePoint Portal Server and optimizing for the goal of 6 million documents, as shown in Figure 27.1.
The migration to production began in early January 2001 with development of the search page and completion of the final tests of crawling. In mid-February, ITG set up the parallel environment. Before RTM in mid-March, SharePoint Portal Server replaced Site Server 3 for search queries on the primary corporate portal, called MSWeb, and the Product Group Portal. After RTM, ITG began converting all major portals at Microsoft to SharePoint Portal Server for search. When this process is complete, Microsoft will retire the Site Server 3 solution throughout the corporate intranet.
Figure 27.1 Project and development timeline
Based on its experience, the team estimates that a typical enterprise customer migration of similar scale might take approximately three months, as illustrated in the following table.
Typical Enterprise Project Timeline
|
Month 1 |
Month 2 |
Month 3 |
---|---|---|---|
1. Planning |
1 week |
|
|
2. Catalog review (optional) Note: can parallel activities 3 and 4 |
1–4 weeks |
|
|
3. Hardware installation and setup |
1 week |
|
|
4. Configuration of servers and workspaces, and setup of site rules |
1 week |
|
|
5. Test of crawling operations |
|
2 weeks |
|
6. Modify existing custom ASP pages |
|
1 week |
|
7. Test of search page |
|
1 week |
|
8. Parallel operations |
|
|
2–4 weeks |
9. Ongoing catalog and index review |
|
|
|
Collecting Information
The next part of the planning process included collecting critical information about the existing environment, including several critical components.
Hardware Specifications
Hardware specifications for both crawl and search servers:
Specify similar hardware for test comparison.
Comply with upgrades in accordance with ITG hardware standards.
Architecture Diagrams
Architecture diagrams, indicating:
- Hardware, network, propagation paths
Catalog Information
List of all Site Server 3 catalogs, including:
The server on which the catalogs are stored
Who owns the catalogs
How frequently the catalogs are crawled
What start addresses and site rules are contained in the catalogs
Whether complex URLs are enabled
Key metrics for each individual catalog and across all catalogs:
For index (per catalog): number of site rules, number of documents, catalog size, time to crawl, and propagation time
For search (total and per catalog): number of queries per month and at peak load
Network Environment
Network factors, including:
Networking protocols
Firewall configuration
The project team examined the existing network environment for possible factors that would affect deployment, but determined that they did not need to make any configuration changes.
User Environment
Unique environmental factors including:
The Microsoft corpnet spans the world. Consequently, SharePoint Portal Server must crawl documents in multiple languages for inclusion in the content indexes and must allow users to submit queries in multiple languages. SharePoint Portal Server allows users to submit queries in English, French, Italian, German, Swedish, Spanish, Dutch, Japanese, Chinese Simplified, Chinese Traditional, Korean, and Thai.
Corpnet is in use worldwide 24 hours a day, 7 days a week, 365 days a year.
Security is enforced per document at the file share level.
Analysis and Design
After collecting information and creating a deployment plan, the project team synthesized information to provide a description of the existing infrastructure and a vision of the new infrastructure.
Searching Using Site Server
Originally, most sites within Microsoft did not offer any type of search. Individual departments or groups built their own sites, and the overhead of setting up, running, and maintaining a search capability on each site was burdensome. The major business division portals—such as IT, HR, Product, Finance, Sales, Support, Legal, Operations, and Microsoft Corporate—offered some search capability. The basic problem was that they all set up their own environments and often crawled each other's sites, resulting in duplication of efforts, sometimes three or four times over.
Site Server 3 became the backbone of this centralized search solution. It was set up with dedicated servers for crawling and searching. Site Server created a catalog for each site. The owner of each site or portal specified what content to include or exclude from the catalog for its site, in addition to what, if any, content on its site should not be crawled.
After developing the process for including content in an index, ITG created a set of custom ASP pages—one for querying and one for returning results. ITG modified these pages to fit each portal's needs for custom query capabilities and custom results sets. One by one, the major portals moved to this search solution because they could get better search capabilities for less effort. After all the major intranet sites had migrated, a number of second-tier sites also implemented this search solution.
Site Server 3 Infrastructure
The Site Server 3 architecture at Microsoft consisted of one search server and two crawl servers. The crawl servers included content in their indexes from their respective catalogs, and then propagated the information from the catalogs to the search server. This architecture, shown in Figure 27.2, ensured that the search capability was always available to users.
Figure 27.2 Site Server 3 search solution architecture
This solution crawled about 3 million corporate intranet documents and files, handling nearly 30,000 queries per day. There were 48 catalogs on these servers, and many sites requested that their searches include several of these catalogs.
Searching with SharePoint Portal Server
SharePoint Portal Server is a complete solution—integrated document management, corporate portal, and search. However, this deployment implements only the search and index creation aspects of SharePoint Portal Server. Because this migration does not include the dashboard site and document management features, separate teams started projects to test those features.
The project team modeled the new design largely on the existing Site Server 3 design. The team modified the existing set of custom search and results pages to handle SharePoint Portal Server in addition to Site Server 3. In this design, as each portal migrates to SharePoint Portal Server, the portals simply change their Web forms to point to the new query page on the SharePoint Portal Server computer.
SharePoint Portal Server Propagation Model
The propagation model includes two servers dedicated to creating and maintaining indexes and one server dedicated to searching as part of the centralized search service, as illustrated in Figure 27.3.
Figure 27.3 Enterprise search tiered server architecture
The server dedicated to searching stores a copy of the index propagated from the index workspaces of the servers dedicated to creating indexes.
Note The task of creating an index is resource intensive. Consequently, with SharePoint Portal Server, you can create an index workspace on a separate server to isolate the tasks associated with creating and maintaining indexes from other SharePoint Portal Server tasks. After you create the index, SharePoint Portal Server propagates it to the server dedicated to searching. SharePoint Portal Server propagates the index immediately after creating it, or you can schedule the creation of the index to coincide with times of low network traffic.
For more information about this scenario, see Chapter 3, "Introducing SharePoint Portal Server: Configuration Flexibility."
Architecture Comparison
The migration to SharePoint Portal Server did not change the basic architecture for searching across the corpnet. The Site Server 3 architecture used two servers dedicated to crawling content and one search server. The hardware configuration for the Site Server 3 architecture included one server with four processors, used for searching, and one server with two processors, both used for crawling. The largest Site Server 3 catalog existed on a server with four processors. The SharePoint Portal Server architecture uses the same architecture as Site Server except that both servers used for creating and maintaining indexes use four processors. This difference in hardware configuration did not affect the results because most performance measures were made by using the largest catalog.
The project team estimated that additional RAM might also help performance. A master merge is an MSSearch process in which separate content index sub-files are merged into a single content index file. Because SharePoint Portal Server performs master merges less frequently while updating indexes, performance on the server used for creating and maintaining indexes improves with additional memory. Previous tests of additional RAM on the servers running Site Server 3 and Microsoft Windows NT® 4 did not show significant performance gains. However, the ITG corporate server standard for operating systems changed from Windows NT 4 to Microsoft Windows® 2000 Advanced Server. Windows 2000 makes better use of additional memory than Windows NT 4. Therefore, the project team doubled RAM to 512 megabytes (MB) on each server that hosted an index workspace.
The project team estimated hard disk size requirements based on the index size in Site Server 3 and added room for growth. After determining this number, they doubled it to hold a backup copy of the indexes on the server. The ITG standard hard disk configuration for running SharePoint Portal Server places the document store that includes documents and associated metadata on one hard disk, the content indexes on a second disk, and the logs on a third disk to minimize bottlenecks and maximize input/output (I/O) throughput.
Server Configurations
The following table lists the server configurations that ITG used for this project.
Enterprise Search Hardware Configurations
Hardware configuration |
Enterprise search |
Index 1 |
Index 2 |
---|---|---|---|
Processor |
4 X 550 megahertz (MHz) |
4 X 550 MHz |
4 X 400 MHz |
Memory (initial) |
512 MB RAM |
512 MB RAM |
512 MB RAM |
Memory (final) |
2 gigabytes (GB) RAM |
2 GB RAM |
512 MB RAM |
Disk space |
92 GB |
68 GB |
35 GB |
OS |
Windows 2000 Advanced Server SP1 |
Windows 2000 Advanced Server SP1 |
Windows 2000 Advanced Server SP1 |
Note As the table shows, the team increased RAM in one of the crawl servers to test scalability; this nearly doubled the crawl speed. The team also increased RAM in the search server to provide approximately 1 GB for the server to cache the property store. This reduced latency.
SharePoint Portal Server Architecture
Figure 27.4 shows the current architecture at Microsoft for enterprise search.
Figure 27.4 SharePoint Portal Server architecture
Reviewing the Catalog
Site Server 3 creates catalogs to enable searching of content. SharePoint Portal Server creates indexes. An index is a resource that is built to enable full-text search of documents, document properties, and content stored outside the workspace but made available through content sources. A workspace can include multiple propagated indexes. When you create the workspace, SharePoint Portal Server automatically creates one index. You can propagate indexes only from index workspaces and only to a single destination workspace on another server (usually a server that is used primarily for searching). A destination workspace can accept indexes from up to four index workspaces. An index workspace is designed to manage only content sources.
The review identified 48 catalogs in the Site Server 3 environment. The primary intranet catalog included approximately 2.5 million documents; the remaining half million documents were spread across the other 47 catalogs.
Search Scopes
There were two main reasons to redefine the catalogs using search scopes. First, many of these catalogs wasted resources crawling the same content. Second, because the SharePoint Portal Server search service is multi-threaded, it was possible for the SharePoint Portal Server to have two threads crawling the same content at the same time.
Search scopes in SharePoint Portal Server offer the ability to restrict searching to a subset of an index. Scopes label entries in the full-text index so that they can be quickly identified by queries to deliver faster and more relevant information. The design of the index handles the search scopes by ensuring that the server passes the correct catalog parameters to the custom search page.
The project team created search scopes to help classify content for a single index without having to create additional workspaces. For example, suppose that Human Resources Web and Legal Web wanted to offer search of their own sites, but both wanted to include the Policy site. Instead of having two separate workspaces for each and crawling the Policy site twice, the team created a single workspace with three search scopes. The team created a scope of the content source pointing to the Policy site called "Policy" and then created a scope for all the content sources pointing to the Legal sites called "Legal." They also created a scope, called "HR," for all the content sources pointing to the Human Resources site. This reduced the number of index workspaces from three to one and prevented crawling the Policy site twice. From the Human Resources site, users can also search the Human Resources and Policy sites by using the different search scopes. Likewise, from the Legal site, users can also search the Policy and Legal sites by using the different search scopes. The queries return more relevant query results by using only the relevant search scopes.
Query Performance
Another consideration in catalog review and redesign was query performance and load balancing. Although search scopes are useful, overusing them can cause performance issues. One logical extension of search scopes includes crawling everything in one workspace, and creating scopes for each content source accordingly. In that case, using the index from the single workspace with many scopes performs all queries. However, as the number of search scopes increases, query performance declines and the index size increases. Because of this, the project team decided to limit search scopes to only two or three, and mainly in smaller workspaces.
An alternative approach is to create a workspace for each site or group of sites on the intranet, and then create a query that spans both workspaces. This also causes query performance to decline as you increase the number of index workspaces included in the query, so the team also decided to limit these types of queries to include only two or three workspaces.
Duplication
The team reviewed the existing catalog structure to eliminate redundant crawling. They reviewed the content sources and created a better design. During the process, the team closely examined scopes or queries across index workspaces that might compromise performance. In certain cases, performance was improved by crawling the same content twice from different workspaces and having search run a query against one workspace rather than having multiple dashboard sites query only one workspace.
To conduct the review of the catalogs, the team described each Site Server 3 catalog in a Microsoft Excel spreadsheet, as shown in the following tables.
Reviewing Content Sources
Content source |
Hops and depth |
Adaptive |
Scope |
Schedule |
---|---|---|---|---|
\\server01\d$\ Inetpub\handbook |
This folder and all subfolders |
Yes |
Handbook |
None |
\\server01\d$\ Inetpub\ humanresourcesWeb |
This folder and all subfolders |
Yes |
None |
None |
https://search1/sas/ dir.asp?setid=1 |
1 page hop, 0 site hops |
No |
None |
Weekly |
Reviewing Site Path Rules
Site path rules |
Crawl account |
Complex URLs |
|
---|---|---|---|
Avoid |
file://server01/d$\inetpub\handbook\*_vti*\* |
|
|
Crawl |
file://server01/d$\inetpub\handbook\* |
default |
Yes |
Avoid |
file://server01/d$\inetpub\humanresourcesrweb\*_vti*\* |
|
|
Crawl |
file://server01/d$\inetpub\hrweb\* |
default |
No |
Reviewing Catalog Information
Source |
Display Mappings |
---|---|
\\server01\d$\Inetpub\handbook |
https://corphandbook/ |
\\server01\d$\Inetpub\hrweb |
https://hrwebsite/ |
The team then compared and identified catalogs to consolidate. The initial examination reduced more than half the number of catalogs, from 48 to 20. After several iterations, the team reduced the number of catalogs to 11.
Consolidation and Workspace Creation
As an outcome of this exercise, the team decided to create a one-to-one correspondence between remaining catalogs and workspaces. Figure 27.4 shows the final layout of the servers and workspaces.
Identifying Key Points
The key points learned in the Analysis and Design phase were:
Deployment requires no significant hardware change. Additional memory or processors improve performance.
Migration is a great time to review and clean up catalogs.
Catalog redesign requires a variety of approaches:
Remove duplicate crawls of content where possible.
Limit searches to no more than two or three scopes or workspaces.
Deployment
The deployment phase included installing hardware and software, modifying settings, and testing. After deploying the SharePoint Portal Server environment, ITG ran it in parallel with Site Server 3.
Installing and Modifying Settings
This section reviews the installation and configuration for the workspaces. In particular, it reviews the process for creating content sources.
Install Hardware and Operating Systems
The project team installed the hardware for the SharePoint Portal Server deployment in the same data center as the Site Server 3 environment, so network connectivity and other environmental variables remained the same.
Next, the team installed the operating system. For more information about installation requirements, see Chapter 11, "Installing SharePoint Portal Server." You must deploy a server dedicated to searching before deploying a server dedicated to index workspaces. When you create an index workspace, you must specify the destination workspace, as shown in Figure 27.5. Therefore, the project team began by first configuring the server dedicated to searching and then configuring the servers that would host index workspaces.
Figure 27.5 Creating an index workspace
Specify Workspace Settings
The team specified the settings as detailed in the following table.
Workspace configuration settings
|
Enterprise search |
Index 1 |
Index 2 |
---|---|---|---|
Catalog Name |
All catalogs propagate to the Enterprise Search server |
BestbetsCorpPortal, HumanResourcesWeb, corporate portal, WebCat2, WindowsUA |
ITG portal, KBInt portal, corporate portal param, Product Group Portal, SAP portal, MSWordTest |
General |
|
|
|
Indexing Resource Usage |
1 (Background) |
5 (Dedicated) |
5 (Dedicated) |
Search Resource Usage |
5 (Dedicated) |
1 (Background) |
1 (Background) |
Site Hit Frequency Rules |
None |
None |
None |
Proxy Server |
|
|
|
Do not connect using a proxy server |
Disable |
Enable |
Enable |
Use the proxy server settings of the default content access account |
Enable |
Disable |
Disable |
Use the proxy server specification below |
Disable |
Disable |
Disable |
Default File Types in Site Server 3 removed from catalog |
asp, doc, htm, html, ppt, xls, txt, exch, |
asp, doc, htm, html, ppt, xls, txt, exch |
asp, doc, htm, html, ppt, xls, txt, exch |
Removed from Enterprise Search: |
nsf, xml, odc, tiff, eml, dot, tif, mht |
nsf, xml, odc, tiff, eml, dot, tif, mht |
nsf, xml, odc, tiff, eml, dot, tif, mht |
The team specified a System Resource Usage of 5 as the default for the servers hosting index workspaces. This allows full system resource usage when the server crawls content.
Note SharePoint Portal Server provides resource usage controls for searching and index creation, the two resource-intensive processes that are commonly performed on SharePoint Portal Server computers.
It is recommended that you balance resource usage to optimize performance depending on your server configuration. If you distribute searching and index creation across multiple servers, dedicate resources on each computer to the specific task that each computer performs. If you use one server to perform both index creation and searching, balance resource usage evenly between the two processes.
By design, this enterprise search solution does not crawl content outside the firewall. To allow SharePoint Portal Server to crawl only internal sites but without having to specify many rules (for example, exclude all *.com, *.edu, *.org), the team disabled the proxy server on each of the servers that hosted index workspaces. This prevented crawling anything outside the corporate environment.
To minimize unnecessary security changes, SharePoint Portal Server uses the same accounts to crawl and propagate content as Site Server 3. As with Site Server 3, SharePoint Portal Server respects Access Control Lists (ACLs). The use of ACLs maintains security as implemented in each of the original content sites.
Create Workspaces
The team created one workspace to correspond to each Site Server 3 catalog. After creating all the workspaces, the team created the content sources. Figure 27.6 shows an example of the content sources (called start address in Site Server 3) in one workspace.
Figure 27.6 Example of content sources
Most workspaces contained several content types. A single content source cannot refer to different content types, but you can refer to multiple content types in a workspace.
During testing, the team discovered the following tips for properly configuring hops and depth:
To crawl this entire site, set SiteHops to 0 and set page depth to unlimited.
To crawl a single page, set SiteHops to 0 and set page depth to 0.
Custom: Manual setup for the number of sites and hops.
For tracking purposes, the project team created a matrix showing which workspaces and sites used complex URLs and which content sources used which protocols, as shown in the following table.
Note The team restricted the use of complex URLs to well-known parameterized URLs, to minimize the risk of crawling URLs that continued to generate additional links without end.
Tracking Spreadsheet
Workspace name |
Complex URL |
File protocol |
HTTP protocol |
Exchange protocol |
bestbetsCorporatePortal (Index 1) |
Y |
N |
Y |
N |
Corporate Portal Intranet (Index 1) |
N |
Y |
Y |
Y |
HumanResourcesWeb (Index 1) |
Y |
Y |
Y |
N |
WebCatalog2 (Index 1) |
Y |
N |
Y |
N |
WindowsUA (Index 1) |
Y |
Y |
Y |
Y |
Corporate Portal Param (Index 2) |
Y |
Y |
Y |
Y |
ITG portal (Index 2) |
Y |
Y |
Y |
Y |
KBInt portal (Index 2) |
N |
Y |
N |
N |
Product Group Portal (Index 2) |
N |
Y |
Y |
Y |
SAPWeb (Index 2) |
Y |
Y |
Y |
Y |
MSWordTest (Index 2) |
Y |
Y |
Y |
Y |
Modify Additional Settings
The team specified three additional settings when configuring content sources: site path rules, Access/Display mappings, and file types.
Figure 27.7 shows the properties page for modifying site path rules in a single workspace.
Figure 27.7 Example of site path rules
The spreadsheet of catalogs created during the Analysis and Design phase contained the site path rules and mappings. It is critically important that the site path rules be set exactly as intended. For more information about adding content sources, see Appendix B, "For More Information."
Create Content Sources
The following principles can assist you when you need to create content sources:
Site path rules match in order from the top down.
Use the asterisk (*) character with care. For example, an inclusion rule for https://searchserver/* crawls all subdirectories on the site. By contrast, an inclusion rule for https://searchserver/ crawls only the home page of that server.
Enable complex URLs to crawl links with parameters following a question mark (?) in the link; for example, default.asp?name=abc.
To exclude a protocol, add a site restriction as follows:
File:*
https://*
Exch:*
Map Properties across Workspaces
SharePoint Portal Server crawls the text content of a Microsoft Office document and standard Office summary properties. If you want to include additional properties, you must create a document profile in the workspace with those properties. SharePoint Portal Server includes the metadata from the document profile in the index.
Important When SharePoint Portal Server propagates the indexes to a server dedicated to searching, the destination server must possess the same document profiles.
You must map properties of HTML documents or custom metadata of external documents to a document profile. This allows SharePoint Portal Server to crawl the additional properties. HTML files usually store custom properties in <META> tags. For more information about mapping custom properties, see Chapter 25, "Crawling Custom Metadata."
To map properties between servers, the project team performed the following procedure.
To map properties between servers:
Create document profiles for each index workspace.
The team created a document profile called "Search Custom Tags" for each index work-space. Each workspace included additional metadata, as shown in the following table.
Example of property mapping for index workspaces
Workspace: bestbetsCorporatePortal
Workspace: CorporatePortal
META_Categories
META_Categories
META_PageURL
META_PageURL
META_XMLTerms
META_XMLTerms
META_Keyword
META_Keyword
Keywords
Keywords
Description
Description
Title
Title
Author
Author
Workspace: HumanResourcesWeb
Workspace: LibraryCatalog
META_Categories
META_MainAuthor
META_PageURL
META_itemtype
META_XMLTerms
META_pubdate
META_Keyword
META_subtitle
Keywords
Keywords
Description
Description
Title
Title
Author
Author
Create a document profile on the server dedicated to searching.
The team created a document profile with the same name used in step 1 on the server dedicated to searching. This document profile includes all the properties of the document profiles from each index workspace, as shown in the following table.
Example of property mapping for server dedicated to searching
Server dedicated to searching
META_Categories
META_PageURL
META_XMLTerms
META_Keyword
META_MainAuthor
META_itemtype
META_pubdate
META_subtitle
Keywords
Description
Title
Author
Note The document profile on the server dedicated to searching must contain the union of the properties of all the document profiles on the servers that host index workspaces. Any properties that are mapped and crawled on the server that maintains indexes, but are not present in the document profile on the server dedicated to searching, are not available in the index workspace that propagates to the search server.
Run the property mapping script.
The team ran the property mapping script for each index workspace. For more information about this script, see Chapter 25.
Note It is important to note that the account credentials under which the property mapping script runs must have administrator rights on the server and coordinator roles on the workspace.
Restart services on the servers.
To flush the caches, the team restarted the following services on the servers hosting index workspaces:
SharePoint Portal Server
Microsoft Exchange Information Store
Microsoft Search
Start a full update of the index.
After restarting the services, the team reset the index and began a full update.
Modify Search Pages
The existing search solution allowed customized query and results sets for each portal. Because of this, the team chose not to use the default dashboard site provided as part of SharePoint Portal Server.
By contrast, many customers may have only a single centralized search page to which all internal sites link. These customers could simply replace the existing page with the Search dashboard from SharePoint Portal Server and avoid creating custom search pages.
From each portal, a user uses a search box to submit queries. After submission, the user is redirected to a hosted ASP page on the server dedicated to searching. Site Server 3 takes the following steps during this process:
Accepts the query from the referring site
Executes the search by using Site Server 3 Component Object Model (COM) objects
Receives the results set
Converts the results into Extensible Markup Language (XML)
Returns the XML to the user's browser (Microsoft uses Microsoft Internet Explorer 5.5 and passes XML to the client)
The transition from Site Server 3 to SharePoint Portal Server required the project team to modify step 2 and step 4 of the preceding process. For step 2, the team changed the query so that it used the Structured Query Language (SQL) syntax with full-text extensions instead of native Site Server 3 COM objects.
The following example illustrates a SELECT statement using WebDAV in SharePoint Portal Server.
SELECT "urn:schemas-microsoft-com:office:office#Office", "DAV:parentname", "DAV:href", "urn:schemas-microsoft-com:office:office#Title", "urn:schemas.microsoft.com:fulltextqueryinfo:description", "urn:schemas-microsoft- com:office:office#META_PageURL","urn:schemas-microsoft-com:office:office#META_Categories", rank, "DAV:getcontentlength", "DAV:getcontenttype", "DAV:getlastmodified" FROM TABLE corpportal..SCOPE() WHERE WITH ("urn:schemas-microsoft-com:office:office#Title", "urn:schemas.microsoft.com:fulltextqueryinfo:description", "urn:schemas.microsoft.com:fulltextqueryinfo:contents") AS #DocDesc (FREETEXT (#DocDesc, '401k') RANK BY COERCION ABSOLUTE , 1000)) ORDER BY rank DESC
Note The SELECT list returns the mapped meta properties (in the Office namespace).
The team used the workspace-level scope to restrict results to one of the index workspaces. They also used group aliasing in addition to freetext and rank coercion. For more information about restricting search results, see Appendix B.
To modify step 4 in the preceding process, the team modified the process for formatting results. Originally, the page used a custom routine to create XML from the results set for Site Server 3, but SharePoint Portal Server returns XML natively. This eliminated the need to convert results to XML. The team simply applied an Extensible Stylesheet Language (XSL) transformation to achieve the formatting they wanted.
Samples of the ASP pages for Site Server 3 and SharePoint Portal Server are provided in the following code.
Site Server 3 Search ASP Page Sample Code
This is a sample of the Site Server 3 ASP code.
<%@LANGUAGE="VBScript" %> <% ' Copyright 1997-1998 Microsoft Corporation. All rights reserved. %> <% DisplayText=Request("q1") RecordNum=Request("RecordNum") if RecordNum= "" then RecordNum=1 %> <html> <head><title>Search Page</title> <meta http-equiv=content-type content="text/html; charset=iso-8859-1"> <meta http-equiv=[cchev]content-language[cchev] content=[cchev]EN[cchev]> </head> <body text="#000000" link="#000000" alink="#000000" vlink="#000000" topmargin=17 leftmargin=15 bgcolor="ffffff"> <form method=get>Search: <input type=Text name="q1" value="<%=DisplayText%>" size="23"> <input type=submit name="Search" value="Go"> <input type=hidden name="ct" value="MyCatalog"> </form> <% If DisplayText <> "" Then %>Searching for <b><%=DisplayText%></b> <% ' Set query and utility objects, and define query object properties. set util = Server.CreateObject("MSSearch.util") set Q = Server.CreateObject("MSSearch.Query") Q.SetQueryFromURL(Request.QueryString) Q.MaxRecords = 25 Q.SortBy = "Rank[d],DocTitle" Q.Columns = "DocTitle, DocAddress, FileWrite, Size, Description, FileName, DocSignature, Rank, DetectedLanguage, MimeType, SiteName, NNTP_MessageID" ' Create the recordset holding the search results. on error resume next set RS = Q.CreateRecordSet("sequential") if err then createerror = err.description createerrnumber = err.number end if ' Error description. if err then Response.write createerror ' Display results else Response.write "<table><tr><td><font size = 2>" ' Set up number found. NumberFound= RS.Properties("RowCount") if RS.Properties("RowLimitExceeded") = true then NumberFound = "More than " & NumberFound end if ' Set up loop to iterate through results. Do while not RS.EOF ' Set up title for links, providing an alternative if DocTitle is blank. if RS("DocTitle") <> "" then Title = RS("DocTitle") else Title = "No title: " & RS("DocAddress") end if ' Set up link itself. Link = RS("DocAddress") ' One table is used for each search result. Response.write "</font></td></tr><tr><td> </td></tr></table>" Response.write "<table cellpadding=0 cellspacing=0>" Response.write "<tr><td width=21><font size=2><p>" Response.write "<table cellpadding=1 cellspacing=1 border=0><tr><td align=top>" Response.Write "<font size='2'>" & RS("Rank") & "</font>" %> </td></tr></table> <% Response.Write "</font></td>" Response.Write "<td bgcolor='#80BBDD'><font size=2>" %> <a <% = LinkTarget %> href='<% = Link %>'><% = Title %></a> </font></td></tr><tr><td></td><td><font size=2> <% Response.write util.TruncateToWhiteSpace(RS([ochev]Description[cchev]),250) %> </font></td></tr> <tr><td></td><td height=5></td></tr> <tr><td></td><td> <font color=808080 size=1>[<% = util.TruncateToWhiteSpace(RS("FileWrite"), 12 ) %>] <% iSize = CInt(CLng(RS("Size"))/1024) %> (<% = iSize %>k) </font> <% ' Increment the results. RS.MoveNext RecordNum = RecordNum + 1 Loop Response.write "</font></td></tr></table>" ' If there are more results pages, set up the "More Results" link. if RS.Properties("MoreRows") = true then Q.StartHit = RS.Properties("NextStartHit") ' Repeat query with new start hit. L_MoreResults_link = "More Results" MoreLink = "<a href=?" & Q.QueryToURL & "&" _ & "DisplayText=" & Server.URLEncode(DisplayText) & "&" _ & "RecordNum=" & RecordNum _ & ">" & L_MoreResults_link & "</a>" end if %><% = MoreLink %> </font></td> </tr> </table> <% End if End If %>
SharePoint Portal Server Search ASP Page Sample Code
This is a sample of the SharePoint Portal Server ASP code.
<%@LANGUAGE="VBScript" %> <% ' Copyright 2001 Microsoft Corporation. All rights reserved. %> <% DisplayText=Request("q1") ct=Request("ct") If DisplayText = "" Then %> <html> <head><title>Search Page</title> <meta http-equiv=content-type content="text/html; charset=iso-8859-1"> <meta http-equiv="content-language" content="EN"> </head> <body text="#000000" link="#000000" alink="#000000" vlink="#000000" topmargin=17 leftmargin=15 bgcolor="ffffff"> <form method=get>Search: <input type=Text name="q1" value="<%=DisplayText%>" size="23"> <input type=submit name="Search" value="Go"> <input type=hidden name="ct" value="MyCatalog"> </form> <% Else Response.ContentType = "text/xml" Response.Write("<?xml version='1.0' encoding='ISO-8859-1'?>" & vbCRLF) Response.Write("<Results xmlns:dt='urn:schemas-microsoft-com:datatypes'>") set oProc = Application("StyleTransform").createProcessor Set xh = Server.CreateObject("Msxml2.SERVERXMLHTTP") strQuery = "<?xml version=""1.0"" encoding=""utf-8""?><a:searchrequest xmlns:a=""DAV:""><a:sql>" &_ "SELECT ""rank"", ""DAV:href"", ""urn:schemas-microsoft- com:office:office#Title"", ""urn:schemas.microsoft.com:fulltextqueryinfo:description"", ""DAV:getcontentlength"", ""DAV:getlastmodified""" &_ "FROM " & ct & "..SCOPE() " &_ "WHERE WITH (""urn:schemas-microsoft-com:office:office#Title"", ""urn:schemas.microsoft.com:fulltextqueryinfo:description"", ""urn:schemas.microsoft.com:fulltextqueryinfo:contents"") AS #DocDesc (FREETEXT (#DocDesc, '" & DisplayText & "')) " &_ "ORDER BY ""rank"" DESC</a:sql></a:searchrequest>" 'Make DAV request xh.setTimeouts 0, 6000, 6000, 0 xh.open "SEARCH", "https://myServer/myWorkspace", False xh.setRequestHeader "content-type", "text/xml" xh.setRequestHeader "range", "rows=0-9" xh.setRequestHeader "MS-Search-MaxRows", 200 xh.setRequestHeader "MS-Search-UseContentIndex", "t" xh.send strQuery 'Process DAV response if xh.Status <> 207 then Response.Write "<error>Status: " & xh.Status & ". Status Text: " & xh.statusText & "</error>" Response.Write "<errorReason><![CDATA[" & xh.responseText & "]]></errorReason>" else if xh.responseXML.parseError.errorCode <> 0 then Response.Write "<error>XML response error code = " & xh.responseXML.parseError.errorCode & " " & xh.responseXML.parseError.reason & "</error>" end if 'Display results if xh.responseXML.selectSingleNode("a:multistatus").haschildnodes = false then Response.Write("<ResultSet totalhits='0'><error>No documents match your query.</error></ResultSet>") else oProc.input = xh.responseXML.documentElement oProc.transform Response.Write(oProc.output) end if end if Response.Write "</Results>" End If %>
Testing
Testing included two tasks. The project team verified that SharePoint Portal Server met the criteria for creating and maintaining indexes for the identified content. In addition, they verified that SharePoint Portal Server met the criteria for searching, including the criteria for workspace propagation process and speed, basic functionality, and the custom search page.
Index Testing
The team identified two goals for testing the process of creating an index:
SharePoint Portal Server can crawl all the content crawled by using Site Server 3.
SharePoint Portal Server can crawl up to 6 million documents.
The second goal verified the scalability of the SharePoint Portal Server search solution. ITG's goal was 6 million documents. That number was based on 3 million documents in the index at the beginning of the test, plus additional occasional sources, and an additional number used as a growth factor.
To measure crawl performance, the test team established several metrics. The following table shows these metrics according to source.
Index Test Metrics
Data collection |
Found at |
---|---|
Number of documents |
Event viewer, SharePoint Portal Server Administration in Microsoft Management Console (MMC), ASP event log |
Crawl status |
SharePoint Portal Server Administration in MMC, Web folders view |
Crawl start time |
Event viewer application log |
Crawl end time |
Event viewer application log |
Crawl duration |
Manual calculation using the preceding data |
Catalog size |
SharePoint Portal Server Administration in MMC |
Property store |
Folder <…\SharePoint Portal Server\\FTData\ SharepointPortalServer\sps.edb>, by using Windows Explorer |
The team executed each crawl several times. They refined the rules until they were satisfied the proper content was actually being included in the index. They used the dashboard search on the server dedicated to searching to assist with this check.
The team used the event viewer and gatherer log viewer from SharePoint Portal Server to examine the system to ensure that the index was operating normally and without problems. Figure 27.8 shows an example of the event viewer entries for starting and stopping the index.
Figure 27.8 Example event viewer entries
The following table shows an example of the data collected to track crawls.
Example Index Test Metrics
Catalog name |
Full crawl # of docs |
Full crawl duration |
Full crawl prop. duration |
Full crawl catalog size |
Full crawl property store size |
bestbetsCorpPortal (Index 1) |
851 |
1 min |
1 min |
1 MB |
4.61 GB |
Corporate Portal Intranet (Index 1) |
2,920,178 |
3,127 min |
65 min |
5,081 MB |
|
HumanResourcesWeb (Index 1) |
3,927 |
24 min |
1 min |
4 MB |
|
WebCatalog2 (Index 1) |
17,882 |
24 min |
1 min |
14 MB |
|
WindowsUA (Index 1) |
14,198 |
8 min |
1 min |
14 MB |
|
CorpPortal Param (Index 2) |
694 |
3 min |
1 min |
1 MB |
1.04 GB |
ITG portal (Index 2) |
13,250 |
37 min |
1 min |
13 MB |
|
KBInt portal (Index 2) |
226,474 |
269 min |
15 min |
325 MB |
|
Product Group Portal (Index 2) |
159,257 |
224 min |
19 min |
605 MB |
|
SAPWeb (Index 2) |
3,609 |
47 min |
1 min |
3 MB |
|
MSWordTest (Index 2) |
15,233 |
11 min |
1 min |
24 MB |
|
SharePoint Portal Server completed the full crawls with satisfactory results at a volume of about 3 million documents. ITG added more content sources for scale testing. Eventually, SharePoint Portal Server crawled just over 6 million documents. Crawl performance did not drop off due to the size of the index.
Next, the team tested incremental updates on each of the catalogs. The incremental crawls took about half the time of the original full index and proved successful.
Finally, the team tested adaptive crawling on the largest catalogs in multiple passes until the number of documents modified converged. In doing so, the team discovered that convergence took about eight passes for the largest workspace. In these passes, crawl time was reduced from 51 hours for a full index to less than 8 hours for the shortest adaptive crawl, a nearly sevenfold improvement. Figure 27.9 shows the index times per pass.
Figure 27.9 Adaptive crawl times
The testing process involved the following steps:
Perform a full index: n days
Perform an incremental index: day n+1
Perform an adaptive update and track the number of documents changed: each night
When an index reaches a steady state of number of documents updated or crawl time, it has converged. After convergence, the crawl time remains approximately the same each night, unless SharePoint Portal Server detects a large change in content such as a new site coming online.
Search Testing
ITG tested three additional features. First, they tested the workspace propagation process and times. Next, they tested the basic searching by using the dashboard site. Finally, they tested the custom search page.
When examining propagation, it is important to determine that propagation completes successfully. In addition, ITG needed an estimate of how long the propagation took to complete. The following table outlines the metrics and their sources.
Note You should measure the duration of propagation, from the start of the process on the server hosting the index workspace to the end of the process on the search server.
Search Test Metrics
Data collection |
Currently found at |
---|---|
Propagation status |
SharePoint Portal Server Administration in MMC, Event Viewer |
Propagation start time |
Event Viewer (on both servers) |
Propagation end time |
Event Viewer (on search server) |
Propagation duration |
Manual calculation from data collected |
The ITG team tested the results for simple full-text queries that used SharePoint Portal Server. After performing queries, they compared the results seen in Site Server 3 queries with those in SharePoint Portal Server to ensure that crawling returned the proper documents and appropriately followed the rules.
Finally, after completing the custom ASP page modifications, they tested the ASP pages. The test involved both the query and results pages. Final tests measured performance and accuracy of the results sets.
For query latency, the ASP page recorded the exact time of the request and the exact time of the response in an SQL database, along with other relevant data used to track usage metrics. The team created a set of 47 queries, most of them from the top 100 queries run the previous month. This set included one-term and two-term phrases and some unusual queries. They ran this set of queries on Site Server 3 and then on SharePoint Portal Server. The data collected included the time of the first request of a query and then the results of the next four queries for the same term. These latency times, in seconds, are shown in the following table.
ASP Page Performance Testing
Product |
Initial |
#2 |
#3 |
#4 |
#5 |
---|---|---|---|---|---|
Site Server 3 |
1.11 |
0.84 |
0.81 |
0.81 |
0.86 |
SharePoint Portal Server |
4.28 |
0.65 |
0.65 |
0.65 |
0.65 |
ITG determined the disparity between initial response times with SharePoint Portal Server and Site Server 3 to be the cache. Because Site Server 3 was already in use and taking queries, many queries and terms were already loaded into memory. This helped reduce the initial response time. SharePoint Portal Server had none of the terms in memory, so all queries required reading from the disk. Subsequent queries with SharePoint Portal Server were 22 percent faster than Site Server 3.
In addition to faster query rates with SharePoint Portal Server, tests determined that the server dedicated to searching was capable of taking advantage of additional memory. When the team increased RAM from 1 GB to 2 GB on this server, latency time dropped. They allowed 1 GB of RAM for running the operating system and SharePoint Portal Server and 1 GB of RAM for caching the property store. Loading a large part of the property store helped improve performance by speeding access to data used in search queries. The numbers in the previous table were from the testing once the team added the additional memory, but before they ran the "warm-up" script.
To facilitate this pre-loading or "warm up" of the cache, the team developed a script that runs immediately after crawling completes and propagates. This script loads the cache with data, so the cache is ready when the service enters production. For more information about this script, see Appendix B.
Note If you set the maximum cache size too high, you can leave insufficient memory for SharePoint Portal Server, the operating system and any other applications on your server. A good rule-of-thumb is to leave at least 0.5 GB for use by SharePoint Portal Server and the operating system. For example, on a server with 2 GB of physical memory, set the minimum cache size to 1 GB and the maximum cache size to 1.5 GB (or less, if you have other applications running).
You must leave enough memory for other processes and for monitoring Microsoft Search objects in Performance Monitor.
Results of Testing
The search tests yielded the following results:
Average latency time was reduced by 22 percent (after the cache was pre-loaded).
On the server dedicated to searching, you can improve performance when caching the property store by adding additional memory.
To maintain NTLM credentials, ASP pages must be on the search server.
After developing the custom ASP pages, the team validated the search results through testing. They added a link to the results page for Site Server 3, asking users to try the new search page that relied on SharePoint Portal Server. From this process, the team monitored the following data:
Query string and number of times it was requested
Total number of queries
Total number of unique users
Average response times
Key Points
The key points learned during the Deployment phase and index testing were:
The test crawled about 3 million documents, with the largest catalog averaging 970 documents per minute. This represented a nearly threefold increase over 330 documents per minute with Site Server 3. In addition, SharePoint Portal Server crawled a larger number of documents and more diverse file types.
Although comparable, the SharePoint Portal Server deployment used hardware that is more powerful.
Adaptive crawling reduced crawling time from 51 hours to less than 8 hours. This represents a sevenfold increase.
By using adaptive crawling, SharePoint Portal Server updates the largest index nightly instead of weekly. This results in more timely and relevant information.
Additional RAM and processors make a significant difference in crawl performance with SharePoint Portal Server.
To improve performance, optimize the site rules and content sources.
Management
To transition a portal that uses Site Server 3 for searching to SharePoint Portal Server, ITG modified the URL on the page where users perform search queries to point to the SharePoint Portal Server computer dedicated to searching. For example:
Existing URL: https://siteserver3/search/default.asp
New URL: https://spsearch/search/default.asp
First, the team modified the URL for searching on the primary corporate portal, MSWeb, to point to the SharePoint Portal Server computer. As expected, the load immediately increased from a few thousand queries per week to nearly 30,000 per day.
Next, the team modified the URL for searching on the Product Group Portal to point to the SharePoint Portal Server computer. This portal used the dashboard site included with SharePoint Portal Server for searching, instead of a custom ASP page. The team simply added a new Web Part to the search dashboard. This site handles about 2,000 searches per month.
The team continued this process for each of the major business portals across the corporate intranet. In addition to completing the transition to SharePoint Portal Server, ITG must continue to monitor performance for SharePoint Portal Server and to implement a disaster recovery plan that is compatible with SharePoint Portal Server. The next section reviews these steps.
Monitoring Performance
Through effective monitoring, ITG has determined that this deployment meets performance expectations. ITG also captures monitoring data for trend analysis to predict future problems and fine-tune alert thresholds.
Note Although separate administration is possible, ITG administers Windows 2000 and SharePoint Portal Server together.
Server activity for SharePoint Portal Server generates performance data that Windows 2000 can track and log on the system. The data is described as a performance object and is typically named for the component generating the data. Every performance object provides counters that represent data on specific aspects of the object. ITG monitors standard Windows 2000 Advanced Server performance objects, along with several specific objects for SharePoint Portal Server. For example, to monitor MSSearch, select the performance object called Microsoft Gatherer and the Heartbeats counter.
The performance objects to monitor for enterprise search include:
Microsoft Gatherer
Microsoft Gatherer Projects
Microsoft Search
Microsoft Search Catalogs
Microsoft Search Indexer Catalogs
The following table describes the counters that ITG routinely monitors.
Monitoring Performance Objects
Performance object |
Counter |
Explanation |
---|---|---|
Microsoft Gatherer |
Documents Filtered (and Rate) |
Number of documents attempted to be crawled since the service started. |
|
Documents Successfully Filtered (and Rate) |
Number of documents successfully crawled since the service started. |
|
Documents Delayed Retry |
Non-0 means the Microsoft Web Storage System is having problems; by default, retries until cleared. |
|
Reason to Back off |
Non-0 means crawling is paused, because of high disk I/O, low memory, etc. |
|
Server objects |
Number of servers crawled. |
|
Time outs |
Too high means network problems. |
|
Adaptive Crawl Accepts |
Documents accepted by adaptive update. |
|
Adaptive Crawl Error Samples |
Documents accessed for error sampling. |
|
Adaptive Crawl Errors |
Documents that adaptive update incorrectly rejects. |
|
Adaptive Crawl Excludes |
Documents that adaptive update excludes. |
|
Adaptive Crawl False Positives |
Number of false positives that occur when the adaptive update has predicted that a document has changed when it has not. If this number is high, the adaptive update algorithm is not modeling the changes in the documents correctly. |
|
Adaptive Crawl Total |
Documents to which adaptive update logic was applied. |
Microsoft Gatherer Projects |
Crawls In progress |
Number of concurrent crawls. |
|
Status Success (and Rate) |
Number of documents successfully filtered for this workspace. |
|
Status Error |
Number of errors. |
|
URLs in History |
Number of URLs covered in all crawls. |
|
Waiting Documents |
Gatherer queue length—0 means idle. |
Microsoft Search |
Failed Queries |
Number of failed queries. |
|
Successful Queries (and Rate) |
Number of successful queries. |
Microsoft Search Indexer Catalogs |
Merge progress 0–100% |
Non-100 means indexes are currently being merged—crawl can be paused during that time. |
|
Number of Documents |
Number of documents in the catalog included in the index. |
|
Index Size |
Size of the index in megabytes |
Planning for Disaster Recovery
The backup and restore process represents the only substantial change in the operation of SharePoint Portal Server over Site Server 3. SharePoint Portal Server provides a built-in script for backing up the entire server with all the workspace and catalog information to an image file. You can then restore this image on another server.
ITG uses the MSDMBACK utility installed with SharePoint Portal Server to copy the backup files to disk each night, and then uses Windows 2000 Backup to back up those files to tape.
Important This output from MSDMBACK must be saved to a local drive.
Because Windows 2000 Backup attempts to lock the files while backing up, which prevents crawls from continuing, servers that host index workspaces must be set to exclude the following directory from the Windows 2000 backup:
operating_system_drive\Program Files\SharePoint Portal Server\Data\FTData\SharepointPortalServer
The MSDMBACK utility takes a snapshot of all necessary SharePoint Portal Server data directories as part of its backup.
Each night, ITG runs the backup process on each server that hosts an index workspace and the enterprise portal server. MSDMBACK is run, backing up the data to another partition according to the following steps:
From the directory: operating_system_drive\Program Files\SharePoint Portal Server\Bin, run the following command:
cscript msdmback.vbs /b "path_to_backup_file_name"
where the path_to_backup_file_name parameter is the name of the backup file to be created.
The preceding command is entered into a .cmd file that is scheduled to run by using the Windows 2000 task scheduler. Note that the start in parameter specifies operating_system_drive\Program Files\SharePoint Portal Server\Bin.
Note The script/schedule task must be run under the context of an account that has administrator privileges on each server.
Run a full Windows 2000 backup to tape every night for each server.
It is important to note that the backup process stores passwords for content sources in encrypted form in the backup image. The optional password used for the backup image (provided during backup) encrypts only the passwords. Use of the optional password does not encrypt the remainder of the backup image, including the documents and metadata. If the administrator loses the password that was used to create the backup image, and attempts to restore, the restoration succeeds, but the restored information for the content source access account is invalid. In addition, subsequent crawls of this content source may fail because of authentication failures.
The backup process also stores user name and password pairs that are used for content sources in encrypted registry files. The optional password provided during restoration decrypts only the user name and password pairs. If the administrator loses the password that was used to create the backup image, and then tries to restore the image, the restoration succeeds but leaves the user name and password pairs for content sources blank.
Identifying Key Points
Organizations that want to make the transition from Site Server 3 to SharePoint Portal Server may find the approach taken by Microsoft's ITG group (outlined in the following list) to be helpful:
Run a test server to learn how the new technology operates.
Take the opportunity to examine site rules, property mappings, and catalogs; make necessary changes for enhanced performance.
Build a custom ASP page if the SharePoint Portal Server dashboard site user interface is not used.
Run Site Server 3 and SharePoint Portal Server in parallel to test performance.
When the acceptance criteria have been met, remove the old environment and leave the SharePoint Portal Server environment.
Summary
Migrating to SharePoint Portal Server yielded two key benefits, including:
More relevant and timely search results delivered to users.
Latency, or response time, improved by 22 percent.
Indexes updated nightly by using adaptive updates.
Improved crawling performance.
Full update of an index nearly three times faster than Site Server 3
Adaptive update of an index seven times faster than Site Server 3
In addition to these benefits, ITG identified the following key points:
The migration process is straightforward. Migrating to SharePoint Portal Server is not complex. SharePoint Portal Server can use the same architecture as, and similar hardware to, Site Server 3. You can begin the catalog review process while ordering hardware and learning the product. Modifying existing ASP pages is simple, and you can use the built-in user interface included with SharePoint Portal Server. You encounter few changes in day-to-day operation from administering Site Server 3.
It is recommended that you review and refine existing catalogs. The appropriate time to review existing catalogs is before implementation. Over time, your catalogs have probably lost accuracy. Start addresses do not exist anymore; your servers crawl the same content multiple times; some catalogs are redundant or unnecessary. As you review catalog definitions, you can also review your internal customer requirements. Customers now have the opportunity to redefine and refine their requirements for searching.
SharePoint Portal Server gives improved full update performance and adaptive crawling benefits. With SharePoint Portal Server, you can crawl more content in the same amount of time as Site Server 3, using similar hardware. This provides room for growth. Alternatively, you can crawl existing content with less hardware than Site Server 3. This allows you to buy less expensive hardware. In addition, you can update existing content more frequently than Site Server 3, using similar hardware. This provides more timely and relevant results to your users.
Adding memory improves performance. Adding memory provides a quicker and less expensive way to improve performance than adding servers to your infrastructure.
This chapter describes the ITG deployment plan of SharePoint Portal Server and its subsequent results. It provides detailed information and recommendations based on this deployment. It includes technical information on the existing environment, design decisions, deployment steps, and testing considerations. It concludes with a summary of recommendations based on this experience.