Planning Your Information Structure Using Microsoft Office SharePoint Portal Server 2003
|Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.|
This is a sample chapter from the Microsoft SharePoint Products and Technologies Resource Kit. You can obtain the complete resource kit (ISBN 0-7356-1881-X), which includes a companion CD-ROM, from Microsoft Press.
Microsoft Office SharePoint Portal Server 2003 has several compelling features and tools that make it an ideal solution for implementing your information management system. Managing information well is critical to any organization’s success. Poor information management leads to inefficient collaboration, ineffective decision-making processes, and lost business opportunities.
SharePoint Portal Server 2003 does not automatically organize your information into an overall taxonomy for you. Instead, you’ll need to plan and implement your own taxonomy. (A taxonomy is a method of organizing or categorizing information and information resources.) Because each environment is unique, you should take the planning process seriously and understand that installing SharePoint Portal Server 2003 is not the same thing as implementing it.
In this chapter, we’ll outline the key decision areas that must be addressed before a product server is ever built—specifically, during the architecting and planning phases. You shouldn’t be surprised to learn that the key decision areas are built on the information management features of this product.
On This Page
Key Information Management Features of SharePoint Portal Server 2003
The key information management features of SharePoint Portal Server 2003 include the following:
Search and indexing
Best Bets and keywords
Areas and Topics
Personal sites (personal site)
Site directory area
The purpose of this chapter is not to discuss how to implement or administer each of these features. Instead, the focus of this chapter is on planning for the implementation for each of these features and showing how they relate to the overall information management picture.
Where to Start
When beginning the planning process for a new information management system, you’ll need to start by answering two basic questions:
What information does your organization need to have and use to be successful?
Where is that information right now?
It might seem a bit silly, but most IT professionals bypass these two questions and immediately get to work on building an information management system without ever considering where their information currently resides. Because most organizations don’t have an overall structure to their information, they don’t have a good understanding of where their business-critical information currently resides and, surprisingly, what that information really is (and isn’t). When asked, most project managers and system administrators will acknowledge that the organization’s current information resides in a plethora of disconnected data islands, including the following:
Local hard drives
Web pages and websites
Document management systems
Users’ home directories
These data islands cannot be connected unless they are first identified and inventoried for the data that they host. Only after an organization knows where its information resides and what that information is can it really begin the process of detailing how that information will be accessed using SharePoint Portal Server 2003, how it will be structured, and how information growth will be accommodated.
This is not an easy process. Much of this work will need to be accomplished at the team or departmental levels. And this process can become time consuming, requiring thoughtful analysis and methodical discovery of where employees go to get the information they need to perform their jobs and what that information is.
Many administrators who are asked to play the role of an information architect jump ahead and begin the planning process by looking at the organizational chart. They quickly conclude that they should have site collections for each department, a child portal site for each division, and perhaps an overall portal site for the entire organization. These hastily determined structures are nearly always modified from their original configuration because the decisions used to construct them lack any real basis.
Even though SharePoint Products and Technologies is very flexible and can support both the technical needs of an organization as well as its cultural needs, it’s not wise to jump ahead of the planning process and start near the end. As you read through this chapter, you’ll begin to understand what is meant by this, but for now, the following example will suffice.
You might start the process by quickly looking at your organizational chart and concluding that you’ll need one document library for each collaboration team in your organization. However, upon closer inspection, you’ll need a different document library for each unique combination of user permission assignment, document profile matrix, approval group, and site placement. Unless you know which documents you want to host in a portal site, which ones you want to host in a team site, and who should access these documents, it will be difficult to know how many document libraries to create and where to create them.
Perhaps your organization will allow the end users to create their own sites and document libraries. And certainly, SharePoint Portal Server 2003 and Windows SharePoint Services support this. But it is best—at least for capacity-planning purposes—to have at least a rough idea of what your document library matrix will look like.
As part of the architecting and planning processes, an organization is well advised to determine where their information currently resides and what that information is. Once you have that information, the planning processes presented in this chapter for using SharePoint Portal Server 2003 to implement a new structure for your information will be of more value to you.
It is important to understand that many of the planning processes we discuss in this chapter are ongoing processes an organization will perform well into the future. SharePoint Portal Server 2003 can be implemented without doing much of this work, but most system administrators and architects who have skipped the planning processes have found that their implementations were much less effective and successful than those who did perform the planning processes. Even with good planning, however, most organizations will need to return to the concepts presented in this chapter multiple times as they hone their organization-wide taxonomy.
Key Decision Areas for SharePoint Portal Server 2003
There are several key decision areas that you’ll want to address as part of your pre-implementation planning process. We’ll discuss these areas in this section.
This section describes the components of the SharePoint Portal Server search functionality. It provides guidelines on planning a customized version of each search component for your solution.
Overview of Search Functionality
There are four components that contribute to the SharePoint Portal Server search functionality:
A content source is a location where content is stored. A content source specifies the starting place for crawling a file system, a portal site, a SharePoint site, an Exchange Public Folder, a Lotus Notes database or a website. The content can be located in a different portal site on the same server, on another server within your intranet, or on the Internet.
SharePoint Portal Server builds a content index by crawling the locations specified by the content sources and storing the results—such as Web pages and files. For some content source types such as file shares and SharePoint sites, the content index also stores the appropriate security credentials on the crawled content. This enables the search results to show only items to which a user has access by enforcing the security settings on each document as the result set is built.
You need to consider what information will and will not be hosted in SharePoint Portal Server. For example, documents that will remain on a file server will need to be crawled if those documents are to appear in the search result set. Knowing that you have those documents and where they currently reside allows you to make an informed decision about whether the documents should be moved into a SharePoint document library or crawled and left in their current location.
Generally speaking, older documents that will not change need not be placed in a SharePoint document library if the only method needed to find those documents is the Search method. However, if users will need to browse the taxonomy hierarchy to find older documents, you’ll need to move the documents into a SharePoint document library or at least link to them using a links list.
A content index is a flat text file that holds the data from crawled content in a content source. Updating a content index requires crawling the locations specified by the content sources and storing the results on the job server. Propagating a content index consists of copying the index from the index server to the search servers. A portal site search returns results from the content indexes stored on the search servers.
Every portal site includes content indexes that allow users to search for documents inside or outside the portal site. After content is included in an index, the content appears in search results on that portal site.
Source groups, topics, and areas are the elements of search scopes. Search scopes allow you to define the breadth and depth of searches within portal sites and across portal sites. These assignable components can create very flexible searches.
A source group is a list consisting of one or more content sources. Source groups are one of the elements used to define search scopes. Source groups are created and managed at the shared services level (if you are using shared services) and can be assigned in any combination to a portal site search scope. These characteristics allow you to easily define search scopes across portal site boundaries. For example, if you want your marketing portal site users to be able to search content on the sales portal site, you can create a search scope consisting of a source group that encompasses all sales portal site data and a source group that encompasses all marketing portal site data.
A search scope is a list of one or more source groups, in combination with any specified areas and topics located on the portal site on which they are defined. Search scopes allow users to narrow their searches based on the topics, areas, and content sources of items on the portal site.
Search scopes can be limited by topics and areas, or by groups of content sources. Source groups outside the portal site can be grouped, and you can limit your search scope to exclude or include particular source groups.
Search scopes are defined by a portal site administrator and are exposed only to the portal site on which they are created. For example, a search scope created on the Human Resources portal site named “this portal site” might consist of a source group containing content sources that define all content on the HR file server. This scope would be available only on the HR portal site.
You can use search scopes from remote portal sites to give your overall search scope taxonomy consistency throughout the portal sites in your organization. This is an advanced topic that is covered in Chapter 22, “Managing External Content in Microsoft Office SharePoint Portal Server 2003.”
Search scopes appear to all users in a drop-down list next to the portal site search box. These search scopes are typically limited to specific topics and source groups that are important and common enough to make them useful to users in the organization as a separate searchable scope.
Planning Content Indexes
SharePoint Portal Server 2003 comes with two content indexes. You can create as many as you need. However, keep in mind that each search query has to be run on each index and the results aggregated before they are returned to the user. The result is that the more indexes you have, the longer search results take to generate. Also, the more index files you have, the greater the possibility is that ranking in the result set will be skewed. This is because that ranking is determined on a per-index file basis and there is no support for single-set ranking when the result set is generated from multiple index files. The advantages of having more (and thus smaller) index files is that propagation between an Index and Search server in a server farm scenario is much faster than copying a few very large index files from the Index to the Search server.
Advanced Search Administration mode should be enabled so that you can create and manage additional indexes. In addition, when you create content sources, you can specify the index in which the content source will appear and the source group.
When planning content indexes, consider the following factors:
Number of documents.
If the number of documents is very large, you should consider breaking the content sources into many content sources with smaller scopes. Doing this will make the size of the indexes more manageable.
The default content access is used as the security context through which all content sources are crawled. This account needs only Read access, regardless of which content source it is crawling. You can use nearly any account, but one must be specified as the Default Content Access Account. The account should have appropriate access for other internal resources such as site directories, portal site contents, and shared folders, which are part of the personal site.
For shared folders, you should use an access account that is a member of the SharePoint Portal Server administrator group. This level of access allows for crawling all types of content and its properties in portal sites and site directories.
To provide different accounts for crawling content, you must use include and exclude rules. You should add rules and include paths that you want to be crawled with different access accounts. These rules are applied to all content sources bound to a content index that creates additional crawling. To avoid duplicate or additional crawling, you need to assign rules only to specific content sources.
Creating separate content indexes allows you to use more flexible scheduling that is based on the nature of content in each content source. Some content sources change quickly and need more updates, while other content sources are less volatile and require fewer updates.
Having smaller content indexes provides faster propagation of content indexes to search servers. However, there is a drawback. When a user initiates a search, the search engine has to run that query against each index and then aggregate the results. The more indexes you have, the longer it takes.
In addition, if you’re in a federated search environment, you need to remember that the number of indexes you have and the size of each will affect how fast these indexes can be propagated or copied from the indexing servers to the search servers.
Finally, you should remember that if ranking documents in the result set is important, you should build your index/source group/content source/search scope topology in such a way that most queries query only one index file at a time. Ranking is performed on a per-index file basis, and there is no method supported to group results from multiple index files and have the result set reranked as a single unit. Another solution is to make sure that your index files are statistically equivalent with the same approximate size and number of documents.
Backup and restore.
Smaller content indexes also provide more flexibility when backing up and restoring.
SharePoint Portal Server does not allow spaces in content index names. Spaces are supported in source group names. Also, note that Non_Portal_Content and Portal_Content exist by default.
Planning Search Scopes
Search scopes should be planned with the end user in mind. By this, we mean that the Scopes should reflect the most natural way that people will want to search for information. A good search-scope matrix will allow educated users the ability to tightly define the portion of the overall index they want to search, giving them a more lean, yet still meaningful result set. A best practice is to ask representatives from each interested party, department, division, or team to help you create a search-scope matrix that will enhance the users’ experience in the portal site by allowing them to search for targeted information.
You’ll also need to consider using a hierarchical approach to your scope matrix. For example, let’s suppose you have a research department with three teams: Chemicals, Data Modeling, and Quality. Each team produces documents of importance to the larger enterprise. In such a scenario, you might find yourself creating four search scopes: chemicals, data modeling, quality, and research. The fourth scope, research, would encompass the documents from all three teams. By using a hierarchical approach, you can give the portal site user flexibility in defining the portion of the overall index that needs to be searched.
In many cases, building the scope matrix will be more art than technology, meaning that the search-scope matrix will be built over time in response to constructive feedback. In some environments, you might have to place a small icon near the Search Web Part that will take the user to a Web page that outlines the various search scopes in the matrix and the information that will be searched via each scope.
If you’ll be using multiple portal sites, you can create a consistent search-scope experience across all the portal sites by propagating each scope to the other portal site or sites. If you’re planning to have multiple portal sites, you should also plan to propagate your scopes across all your portal sites unless you have a specific reason for not doing this.
If you want to see changes to a search-scope definition immediately, you must reset Internet Information Services (IIS) by using the IISRESET command. This should be done only during the setup process and not when the system is live in production.
Planning Content Sources
When operating a corporate portal site with multiple divisional portal sites, you should, at a minimum, configure content sources as follows:
One content source for the corporate portal site.
One content source for each divisional portal site.
One content source for the people in your organization. This content source is set up by default. It returns matches based on entries in the profile database as well as content from a user’s personal site.
One content source for each divisional Windows SharePoint Services virtual server in your organization.
You’ll also need to remember the old adage “garbage in, garbage out.” The tighter and more defined your content sources are, the leaner and more meaningful the data in the result set will be. For example, if you need to crawl five documents on a given website that hosts 200 documents, it would not be a best practice to crawl the entire website. Your index would end up with 195 unneeded documents. In a situation like this, you might consider creating one content source to crawl all five documents or even five content sources, one for each document, depending on their location in the directory structure and if they are shared via the same share or different shares.
The point here is to remember that what you crawl gets placed in your index and you should crawl only information that is required to be in your index. This gets back to our earlier discussion about where your information is and what it consists of. Knowing this will help you build a tight list of content sources that will, in turn, give you a tight index that will return highly meaningful results to your users when they issue a query in the Search Web Part. Good planning is paramount to a successful deployment.
Planning Source Groups
As a starting point, you should define your source groups as follows:
Use one source group for each index. Doing this allows you to broadly define search scopes for each portal site. For example, if your organization chose to hold all its indexed content in a single content index, the source group assigned to that index would allow you to easily define a search scope on all content managed by that index.
Use one source group for each content source. Doing this allows you to more narrowly define search scopes for each portal site. For example, defining a separate content source for the corporate portal site and one for each divisional portal site would allow you to easily define a search scope on the portal site content that each source group crawls.
SharePoint Portal Server does not allow spaces in content index names. Spaces are supported in source group names.
When you select the content source as SharePoint Portal Server Site Directory, define the address of the portal site for the content source (for example, http://sales/*), and then save definitions of content source parameters, the address of the content source automatically changes—for example, it changes to sps://sales/site$$$site/scope=*.
There is no way to directly crawl the virtual server of another team site. If you want to have certain site collections indexed, you have two options. The first is to add the site collection to another site directory manually. The second option—which is less optimal—is to add a separate content source for each site collection. The first option is the preferred and recommended approach.
Planning Deltas Between Source Groups and Content Indexes
A best practice is to create a source group for each content index so that you’ll have maximum flexibility in creating the search scopes. To use the example from the preceding sections, you’d create a content source to each document group for each team (chemicals, data modeling, and quality). You would assign each content source to its own source group, such as Chemicals Source Group, Data Modeling Source Group, and Quality Source Group. Then, for the research search scope, you’d select all three source groups to offer portal site users the ability to search documents across the research department.
Propagating Content Indexes
Index propagation occurs only when the Search application and the Index application are run on two different servers in a medium or large server farm. Propagation happens automatically at the end of every successful update (or crawl). Before propagation can be successful, the following conditions must be met:
You have configured a search service account for the server farm. This account must have local administrator permissions on the search (destination) server.
The destination server is on a trusted domain.
There is sufficient disk space available on the destination server. For each propagated index, allow for more than twice its size in disk space to accommodate both the current index and the propagating index.
SMB (Server Message Block) traffic must be enabled between the two servers, and if there is a firewall between them, the appropriate ports must be opened too. These ports include the common Netbios and RPC (Remote Procedure Call) ports.
Propagation is considered successful if the index is successfully copied to any one search server. If you have a scenario where propagation is successful to one or more, but not all, search servers, the search servers to which propagation failed is taken off line, an error is logged in the event log, and the error appears on the propagation status page.
If propagation fails because of lack of disk space on the destination server, SharePoint Portal Server 2003 logs an error in the Application Log of the Windows Server 2003 Event Viewer (event log) of both the destination search server and the index management server.
The contents of a new index are not accessible on the search server until propagation has been completed. The index is not accessible if propagation fails, even if a previous propagation was successful.
Updating Content Indexes
The four methods for updating content indexes are as follows:
The method used to update the index can have a significant effect on performance. The following topics describe these four methods.
During a full update, SharePoint Portal Server updates all content in a content index. A full update of a content source includes adding new content, modifying changed content, refreshing the content index for existing unchanged content, and removing deleted content from the content index. This is the most time-consuming and resource-intensive type of update, and it should be done only in the following situations:
If you create a new rule that affects only one content source.
If files are renamed in a specific content source.
If it is the first crawl of a content source.
If you include or exclude a new file type.
If permissions are changed on documents in the content source. While all updates pick up permission changes, only the full updates pick up changes to membership in local groups. This is why it is recommended that you not use local groups to secure content that SharePoint Portal Server crawls.
If there is a power outage. In this case, SharePoint will want to run full indexes to reset the index. It might be faster to reset the index files and clean them out before running a full index. (Resetting the index files is discussed in Chapter 22.)
If you change the noise word file.
If there is an area name change.
If you reset the content index.
Given all these changes that will force you to rerun a full index, it is best that you plan for them so that the full indexes can be run without overloading either the indexing server or the content source’s server.
An incremental update of a content source includes only changed content. SharePoint Portal Server does not remove deleted content from the content index and does not recrawl unchanged content. For this reason, performing an incremental update is faster than performing a full update.
You can perform an incremental update if you know that content has changed but you do not want to perform a full update and you don’t mind having some deleted content continue to appear in your index. A periodic incremental update creates the index without using the time or resources required for a full update, which enables you to perform a full update less frequently.
SharePoint Portal Server 2003 introduces another type of update known as the incremental (inclusive) update. This update is similar to the incremental update except that it includes deleted content. The incremental (inclusive) update also detects deleted entries in the Microsoft Windows SharePoint Services document libraries and lists. The incremental update detects modified or new documents and list items only.
The incremental update is the least expensive update if it is used with Windows SharePoint Services sites. The incremental (inclusive) update is more resource intensive than the regular incremental update and should therefore be run less often if performance is a top priority for you.
An adaptive update, like the incremental update, crawls only the content that, statistically speaking, is most likely to have changed since the last adaptive update. Because adaptive updates are likely to miss at least some content changes, these updates will crawl all the content in a content source every two weeks.
Unlike the incremental update, the adaptive update increases its efficiency every time it is run based on a statistical analysis of the historical information on what content has and has not changed. The time required for an adaptive update varies and is based on the different types of content sources and the protocol handler.
The recommended approach is to run adaptive updates daily for large source groups and more often for smaller source groups. Avoid running full updates whenever possible. However, how you choose to configure your index updates is dependent on your search requirements. If your organization is search intensive and requires immediate updates to the search indexes as new content is added or removed, you might need to schedule updates to occur more frequently. You must balance the search requirements with the time it takes to perform an update and propagate content indexes.
There is also a scheduling factor to consider too. Updates are both processor and RAM intensive. You’ll want to ensure that you’re scheduling your updates to occur when your servers running SharePoint Portal Server and Windows SharePoint Services are not being backed up, scanned for viruses, or performing any other routine that consumes large processor resources, RAM resources, or both. In addition, you shouldn’t crawl content sources during their nightly routines either. Therefore, a best practice is to create a schedule matrix and schedule the crawling of content sources when those sources are being used the least.
Alerts provide notification when information of interest is added or updated on the portal site and associated content sources. You can define areas of interest and identify how and when you want to be notified. You can add an alert to track new matches to a search query, changes to content in an area, or a new site added to the Site Directory.
When configuring alerts, keep in mind that alerts for SharePoint Portal Server 2003 and Windows SharePoint Services are managed separately. There are three key differences between SharePoint Portal Server alerts and Windows SharePoint Services alerts:
SharePoint Portal Server alerts are managed by the user on the My Alerts page, which is located on the user’s personal site. Windows SharePoint Services alerts are managed through the Manage My Alerts link on the Site Settings page of each site.
SharePoint Portal Server alert notifications can be delivered to the user’s e-mail inbox, to the user’s My Alerts Summary Web Part on the personal site, or to both. Windows SharePoint Services alert notifications can be delivered only through e-mail.
SharePoint Portal Server users can set alerts on more items than Windows SharePoint Services users can.
SharePoint Portal Server can track alerts for the following items:
New listings and listings in general, such as listings for people, news, new list items, and so forth
Sites added to the Site Directory
SharePoint lists and libraries
List items (requires modification of a site path rule)
Portal site users
Windows SharePoint Services can track alerts on the following items within a site:
When shared services are enabled, management of alert settings is possible only through the Central Administration interface. Through this interface, you can set quotas for alerts per user and for all portal sites. You can define and adjust the default numbers based on the following criteria:
The number of users
How many alerts each user can have
How many alert results you want to be returned per alert
You can adjust these defaults based on the number of users and the alerts per user setting to determine the maximum settings for alerts at the site level. For example, if your company has 20,000 users and you are setting a limit of 10 search alerts per user and 20 other alerts per user, your maximum number of alerts for all portal sites is calculated as follows:
20,000 * 10 = 200,000 (maximum number of search alerts for all sites)
20,000 * 20 = 400,000 (maximum number of other alerts for all sites)
If you’re using shared services, these are the maximum numbers allowed for all divisional portal sites and the corporate portal site combined. They are not per site or per portal site.
Alerts on portal sites and Windows SharePoint Services sites are not consolidated.
Planning Topics and Areas
There are basically two ways to find information in the portal site: browse and search. Topics and areas give portal site users the ability to browse for information. Portal site users will use the Search Web Part to find information, a topic that is covered in Chapter 22.
A Topic is a specialized area that hosts specific Web Parts and Portal Listings that expose links to site collections and other URL-addressable content. Topics usually contain highlights of other areas or frequently used content, and they might or might not be limited to a single subject.
Areas are sites templated to host static information and to provide a method of structuring (or categorizing) your information and information sources. Similar in some respects to Topics, areas enable you to organize and structure your data in any manner that makes sense to your organization so that users can browse the area or Topic hierarchy to find the information they are looking for.
There are seven types of area templates and Portal Listings. We’ll briefly describe each one here:
TOC (Table of Contents) Category template.
This is the home page of the Topics area hierarchy. It is used to view three levels of topics in your organization. It is the tree view of the Topics areas.
Topic Category template.
This is the template used to create an individual Topic area.
News Category template.
This is the template used to create individual News areas.
News Home Category template.
This template “rolls up” news items from News areas under the News Home. If there are subareas—such as Public News, Corporate News, and Competitor News—you would see each of the latest items from those three areas in the News Home area. There is also a targeted Web Part named News for You, which allows you to target news items to audiences.
Community Category template.
This template is designed to create an online community for any purpose you might have, such as an event or a social or business concern.
Shared Page template.
Found under the Page tab of any area when you click on Change Settings. You can select Inherit the Parent Template, which allows you to create multiple areas using the same template that hosts different content.
Sites Directory template.
This template is used to organize site collections that are created as part of your overall collaborative effort.
When planning out your topics and areas, you need to gain the insight and recommendations from each interested party who will be using the portal site. How people think about information will heavily influence how they will want to browse for information.
For example, let’s suppose you’re on a marketing team charged with next year’s advertising campaign. You develop a marketing budget for the campaign. Now, who would have a legitimate business interest in seeing that budget? Well, several groups come immediately to mind: Accounting, Executives, Sales, Marketing, Content Development, and others. The sales team will quite likely want to browse for this budget differently than the accountants. Because you can’t possibly read the minds of your portal site users, a best practice is to glean from them what they think is the most logical way to structure the data they most commonly use and then to use those discussions with the different groups, teams, and departments to build an area and Topic hierarchy that makes sense.
Because areas are more flexible than Topics in how static information is presented, you might find yourself using Topics to structure your data and then using areas to present static information that might or might not be time limited. Areas can be used to structure data too. An area’s flexibility comes from the fact that different area templates are available when creating a new area. Each template offers different ready-to-use functionality and Web Part designs. The flexibility of SharePoint Portal Server also means that planning is an up-front activity that you cannot ignore.
Here are some recommendations when considering areas:
Use the area templates to present static, time-limited information. For example, if your company has an annual summer party, use the Community Category template to disseminate information about the party and use the date functions in the area’s properties to automatically remove the area from the portal site after the party has taken place.
Use the Topic Category template to create additional Topic areas and create a taxonomy based on the subject or topics of your information. But remember that you can use the Topic areas to create a taxonomy based on nearly any method of structuring data. For example, you can use the topic areas to create a taxonomy by
Or any other method of organizing your data that makes sense in your unique environment
Use the Sites Directory template to create a new or additional Site Directory.
Do not create too many top-level areas. If you do, your users will be forced to scroll sideways and this will diminish their positive experience in the portal site.
Base your area (and Topic) hierarchies on static information, not information that changes often or rapidly. For example, don’t create an area for each customer because some of your customers won’t be with you a year from now and new ones will continually appear. In this example, a best practice would be to create a generic Customers area and then organize your customers according to some other criteria within the Customers area.
Topics enable users to locate information faster by organizing content into logical groups. The following are some recommendations for planning Topics:
The tree of Topics should not be too deep—usually not more than three levels.
Select topics that users are likely to look for.
Find appropriate topics, especially for the top-level Topics, and retain those Topics for a long period of time rather than changing them frequently.
If you have too many Topics, select those that are pertinent to the most users, or organize the Topics into two levels.
Provide Topics that are unique to each portal site.
If you have duplicate Topics, qualify them with appropriate prefixes to avoid confusion. For example, if you want to create a topic named Contacts for each division, create IT–Contacts, HR–Contacts, and so forth. Alternatively, you can create a topic named Contacts, with subtopics such as IT, HR, and so forth.
Another example is the Location Topic, which is provided by default when you create a portal site. In this solution, all locations of organizations in the Corporate Portal site are listed under the Location Topic. Remove the Location Topic from other portal sites, or replace it with other topics that are unique and more relevant to a division.
After defining your Topic structure, you can add content (such as documents, list items, and persons) to each Topic and sub-Topic. Each Topic or sub-Topic can have its own document library to which documents are uploaded. You can assign specific groups to manage each of these Topics and sub-Topics by using the Manage Security option, which is on the list of actions in the Topic area. By the same token, you can restrict areas to certain users by using the same security option. By simply assigning permissions at the area level, you can restrict access to areas in the portal and essentially customize the look and feel of the portal through the use of permissions.
Planning Keywords and Keyword Best Bets
Keywords mark specific content as relevant to a particular word included in a search so that the specific content appears more prominently in search results. Users with the Create Area right can create keywords for common searches. The Create Area right is included by default in the Web Designer, Administrator, and Content Manager site groups. For organizational purposes, you can nest related keywords. For example, the keyword operating system could contain the keywords Windows 2000 and Windows XP.
Users with the Manage Area right can add keyword Best Bets for each keyword to identify items most relevant to that keyword. The Manage Area right is included by default in the Web Designer, Administrator, and Content Manager site groups. Keyword Best Bets are specific to individual keywords. Any Best Bets that you associate with nested keywords will not return Best Bets for keywords up or down the chain of nesting.
Best Bets are not limited to documents—you can also define people as Best Bets. For example, you can assign a person who is a subject matter expert in an area as the Best Bet. This facilitates person-to-person communication and knowledge transfer in organizations.
When a user types a keyword or synonym for a keyword in the search box, its keyword Best Bets are shown with the highest relevance in search results. These items are also identified with a distinctive icon as keyword Best Bets.
Keyword Control with SharePoint Portal Server 2003
SharePoint Portal Server 2003 allows keywords to be created at the portal site level and documents to be assigned as Best Bets to each keyword. For example, with a keyword such as SharePoint, a user on the IT portal site with the Manage Area right can assign a document with technical content as the Best Bet for that keyword. Then, when IT users search on that keyword through their IT division search scope, they get the best technical content for it. Likewise, a user with the Manage Area right on the Sales portal site can create the same SharePoint keyword and assign documents as Best Bets that are more suited to the needs of the Sales division. In all cases, if the user chooses the all sources search scopes to search for the keyword, the search returns all documents assigned as Best Bets to that keyword.
You can reorganize keywords over time based on users’ needs. For example, you can refine the keyword definitions based on the most frequently searched keywords. You can filter the IIS log using a third-party utility to get a better understanding of what users are looking for. For example, you can filter to learn the ten most frequently searched keywords in each portal site, and assign Best Bets based on this analysis.
Planning User Profiles
When administrators new to SharePoint hear about importing user profiles from Active Directory into SharePoint, they usually cringe at the thought of having to manage another directory similar to Active Directory. So let’s put your mind at ease: importing user profiles from Active Directory into SharePoint Portal Server 2003 does not create another directory for you to manage. All we’re doing is grabbing the rich directory information out of Active Directory and using that information in different ways in SharePoint Portal Server to provide the following features and benefits:
Searching for and connecting with people within your organization
Generating a personal site in the portal site for individual users
Providing better search results
Targeting content to audiences
Importing User Profiles
You can import user profile information directly from Microsoft Active Directory directory services or enter it manually. You can also customize the properties of the user profile to meet the needs of your organization or to map it to Active Directory properties.
If you have already invested in an infrastructure based on Active Directory or any LDAP (Lightweight Directory Access Protocol)-compliant directory, you can import user information stored in the directories. With Active Directory, use the Import Profile to import the user information to the SharePoint Portal Server. Importing user profiles requires a domain account. To use the incremental import feature from Active Directory on a Windows Server 2000–based computer, a domain administrator account is required.
While you can import directory information from any LDAP-compliant database, only Active Directory is supported by Microsoft.
You can import user profiles from the same domain that SharePoint Portal Server is installed on or from any trusted domain. You can also configure and customize your connection to Active Directory to import users based on specific criteria as a script in an LDAP query. (For example, you can create a script that selects all the users that belong to a specific organizational unit [OU], or only users whose e-mail address property is not empty.)
If you’re using other platforms, you can add users manually or write your own connector by using the object model. You can add more properties to the user profile if you need to extend the information that you want to be displayed about a user. However, any such updates will not be propagated, nor will they update Active Directory.
Updating User Profiles
After the first import from the directory, you can schedule incremental updates based on the frequency of users being added to Active Directory. In most cases, scheduling incremental updates daily and full updates weekly is sufficient.
Removing a user from Active Directory and fully updating the user profile does not remove a user profile from the profile database. Nevertheless, a user who is removed from Active Directory is not able to access the SharePoint Portal Server because users are authenticated through Internet Information Services, which authenticates users through Active Directory.
Audiences allow organizations to target content to users based on any property in Active Directory or by group membership. Hence, you can build an audience based on who is a member of a Windows Server 2003 security or distribution group. Moreover, you can create an audience based on any property assigned to the user account in Active Directory. For example, you can create an audience based on the Department field in the user account so that those who are assigned to the Accounting Department become members of the Accounting Audience.
Audiences are created based on a set of rules that you define, based on:
Windows security group, distribution list, or organizational hierarchy
User profile public property
A rule is a simple query based on properties of user profiles or the membership of users in security groups and distribution groups. If you have already created security groups or distribution groups in Active Directory, you can create audiences based on those groups. For example, you can define a rule for an audience group that reports to or under a specific manager (using the format of domain\managerUserName). You can also create an audience that belongs to a specific department if you have assigned a value to the Department field of user profiles.
After you create or make changes to an audience, you must compile it for use. Compiling an audience group is simply a matter of executing queries to find users who meet the criteria defined in rules.
Audiences can be compiled at will, or they can be compiled by using a schedule that you create. Any changes to security or distribution group membership, security or distribution member properties, or user profile public properties will not be reflected in the audience until it has been recompiled.
To a point, you can customize the appearance of the portal site by using audiences and permissions. But the real purpose in using audiences is to target information to an individual user based on the audience rules. When planning for audiences, you’re basically asking the question, “Who needs to see which information?” and then seeing whether audiences (as opposed to permissions or sites) is the best way to quarantine the information to those users.
Targeting vs. Alerts
Targeting and alerts provide an efficient way of pushing and pulling information. The distinction between the two can be summed up as follows:
Using alerts, users can choose to be notified about certain types of content.
Using targeting, administrators or managers can push specific content to users and employees.
Targeting and Access Control
In some cases, you can control who can access specific content through access rights and managing security, but you should be aware of differences between managing security and targeting content and try to use each task as intended:
Access control lists (ACLs), rights, and permissions.
Are used to manage security and to limit access to resources. Users’ ACLs are verified each time they navigate and access the content or perform actions on the portal sites.
The filtering of content delivery. Targeting content is based on audiences, not on ACLs, and even Active Directory distribution groups can be used for audiences. For example, all users have access to the Links For You Web Part on the home page of a corporate portal site, but the content of the Web Part can vary for different users based on the items that are targeted to different audiences.
Targeting Links to Personal Sites
The SharePoint Portal Server administrator can target links to a user or a group of users that will be shown on the Links For You Web Part on a user’s personal site. This feature, called Manage Targeted Links On Personal Site, is available only on the portal site that provides shared services.
Planning Personal Sites
The My Site feature of SharePoint Portal Server 2003 enables each employee in the corporate portal site to create and manage a personal site. Personal sites provide a mechanism for person-to-person collaboration. Each personal site has two interfaces:
The public interface, which is accessible to all users. The owner of the site decides what information to share.
The private interface, which contains information available only to the owner of the site.
When shared services are enabled in the corporate portal site, this portal site will host personal sites by default and all personal site links in the other portal sites will redirect users to http://corp/Mysite or the portal site that is selected to host My Sites in a shared services deployment.
Prerequisites for Personal Sites
User profiles must be part of the planning for personal sites. To create a personal site, each user needs a user profile and the required permissions. To enable users to create personal sites, the portal site administrator must add users to the prebuilt member site group of the portal site. Audience groups must be created for targeting links and content.
Web Parts in Personal Sites
Some Web Parts are added to personal sites when they are created. Other Web Parts can be added after the site has been created. Web Parts perform three main functions:
Viewing and managing information that a user has selected. One example is My Alerts Summary, which shows a list of alerts that a user has subscribed to. Another is My Links Summary, which presents a list of links that a user has added.
Providing content and links that are targeted to the user by the portal site administrator or others. Examples of this are Links For You and News For You.
Providing content from the system. An example of this is Your Recent Documents, which shows a list of links to documents that have been recently uploaded or modified by the user.
Users can add Web Parts only to their private view. For consistency, all public views of the personal site look the same unless the user adds subsites and workspaces, which can be customized.
Changes made to the properties of a user profile through the portal site interface will not be reflected in Active Directory.
Disabling Personal Sites
The ability to create personal sites is enabled by default. There is no specific switch to turn it off. Rather, access to creating personal sites is controlled through portal site security. Disabling personal site creation is a matter of setting permissions on the various groups that have access to the portal site.
Planning Windows SharePoint Services Team Sites
The corporate portal site addresses the needs of all employees for accessing corporate-wide information, divisional portal sites provide information for department levels, and Windows SharePoint Services sites offer collaboration sites for workgroups and teams. There are different options for deploying team sites. We recommend (although this is not required) creating a separate virtual server for each SharePoint team site collection and for each divisional portal site. This configuration provides the maximum amount of flexibility for granular database backups because each Windows SharePoint Services Virtual Server has its own content database. This configuration also makes it easier to scale out and host Windows SharePoint Services on its own front-end Web servers. If you aren’t required to have a one-to-one relationship between your content databases and your site collections, you can host up to 50,000 site collections in a single virtual server.
For the purpose of distributing the load, multiple team site collections can use their own content database and all site collections can use the server farm configuration database. A separate site collection is assigned for hosting team sites with cross-divisional usage or work groups that do not have a portal site.
This model of site collections deployment provides the highest level of flexibility for scaling out. This model can be migrated to a separate server farm that has its own SQL Server cluster. It also provides a scale-out option for many team sites and allows more resources to be allocated for the corporate portal site, shared services, and the divisional portal site in the existing server farm. Partitioning and associating site collections for divisional portal sites provides the following benefits:
More flexibility to scale out
Better load distribution on content databases
Easier navigation on divisional portal sites and SharePoint sites
Integration of SharePoint site search scopes within the divisional portal site
More flexibility with delegation of ownership and administrative tasks
More flexibility with backup and restore of individual site collections
The drawback of this model is that it requires more memory because it hosts each site collection in a separate virtual server.
Configuring SQL Server 2000 Search on Windows SharePoint Services Sites
You must enable the search feature before your site members can use it. If you want to enable SQL Server 2000 searching, you must install the full-text searching feature for SQL Server 2000, and then enable searching in Windows SharePoint Services.
Note that this is not the same as using the Search Web Part in the portal site. The Search Web Part queries the full-text index produced by MSSearch.exe. MSSearch.exe is a different search engine than SQL Server 2000 Search. Moreover, you should be aware that indexes produced by these two different engines cannot be combined or propagated between portal sites and Windows SharePoint Services sites.
Enabling Searching for SQL Server 2000
To use the search feature with Windows SharePoint Services and SQL Server 2000, you must have full-text searching installed on your SQL Server computer. Full-text searching is usually installed by default, but if it is not installed on your server, you can install it using the SQL Server Setup tools. When configuring the search feature on a Windows SharePoint Services site, a link to searching content on the site’s associated portal site automatically appears on the Windows SharePoint Services search results page.
Providing the search feature to team sites allows for a granular search scope. However, keep the following points in mind:
Windows SharePoint Services team site search is provided by SQL 2000 full-text search. Enabling it creates an additional load on the SQL Server.
Configuring search on Windows SharePoint Services team sites does not supplant configuring SharePoint Portal Server–based search and indexing. They are two entirely separate search engines.
The following topics discuss other considerations for planning Windows SharePoint Services sites:
Sites storage management
Site collection ownership and administration
Sites archiving and autodeletion
Sites Storage Management
Each division will use SharePoint sites in a different way. Some divisions will create and use more SharePoint sites than others. To support different usage patterns, create a separate quota template for each division (instead of a default quota template), with an estimate of disk storage allocated at the site collections. You should also create a separate quota template for each site collection and adjust the quota based on usage of team sites in each organization.
Quotas give the administrator a high level of control over disk storage and content database size for team sites. When these quotas are reached, an e-mail message will notify the administrator to adjust the size or quota. You can write an SQL query script to increase quotas by using a simple update statement.
Site Collection Ownership and Administration
For each site collection, assign two owners to manage sites and to be responsible for users’ requests. Assigning two owners ensures that you do not lose the function when one owner is unavailable.
If a user tries to access a site to which he or she does not have access, the user is prompted for authentication three times. After this, a site access request form is shown, which the user can send to the site owner. Site collection owners also receive any quota or autodeletion notices, and they have site collection administrator privileges.
Making a user a site owner also adds the user to the list of site collection administrators. Removing users from the list of site owners also removes them from the list of site collection administrators, but it doesn’t change any other group membership or rights granted to them.
Windows SharePoint Services Sites Archiving and Autodeletion
Sites are usually created for the collaboration of project workgroups or project meetings. In most cases, team sites won’t be used after projects are completed. You should plan a policy for archiving and removing such team sites to reclaim the storage consumed by them. The Windows SharePoint Services administration interface provides such a capability. Set it to notify owners of site collections if sites are not used for a period of time. (The default setting is 90 days.) Based on the policy for retaining data in your organization, you can set up the autodeletion feature to delete unused sites. As a best practice, you should archive these SharePoint sites before they are removed because the deletion is permanent and the content will not be retrievable. You can use smigrate.exe to archive the sites and then allow the autodeletion feature to delete the sites.
Planning and Managing Properties
SharePoint Portal Server 2003 displays the properties (or metadata) of items crawled by the content index server. The properties are on the Manage Properties Of Crawled Content page. Based on your business needs, you can manage which properties you want to be shown in the advanced search and trigger alerts when specific properties change. However, having too many properties takes longer and uses more resources when content is crawled.
You can customize the properties of a document based on your needs. For example, you can add a property such as confidential to your documents, to distinguish the confidential documents or allow users to search and quickly locate these types of documents. To add the property, add a column to your Document Library view.
When the documents are crawled, the new property is discovered and added to the list of properties that can be viewed through the Manage Properties tool. In this case, confidential is shown under:
You can configure the confidential property to be included in Advanced Search. You can also use this property to trigger an alert if the object has changed.
In this chapter, we have discussed some of the planning questions and best practices that should be considered before your SharePoint Products and Technologies implementation. We have discussed how to plan for search and indexing, keywords, Best Bets, user profiles, audiences, personal sites, and Windows SharePoint Services sites.
You absolutely should not skimp on the planning portion of your overall deployment. Furthermore, the quality of your planning will directly affect the quality of your deployment and the ability of the user to have a positive portal site experience.