Share via


Plan metadata properties for search (Search Server 2010)

 

Applies to: Search Server 2010

Topic Last Modified: 2011-04-25

This article describes how to plan metadata properties for search in Microsoft Search Server 2010. When content is crawled, the crawler also crawls the metadata that is associated with that content — for example, information such as author, title, and e-mail address. The search system stores this information as crawled properties and managed properties. Crawled properties are all properties (such as author, title, or subject) that are extracted from documents during crawls. Managed properties are crawled properties that can appear in refined or advanced searches. When users perform a general search, results include items with any crawled properties. In order to provide refined search capabilities to users, however, you need to plan for managed properties. Because they can appear in refined searches, managed properties help users perform queries that are more successful and relevant. Search Server 2010 provides a default set of managed properties, but you can create new managed properties and map crawled properties to the managed properties that will appear in search results.

Refined searches can be performed only on managed properties, not crawled properties. To make a crawled property available for refined search queries, you must map the crawled property to a managed property. You can map multiple crawled properties to a single managed property or map a single crawled property to multiple managed properties. If a managed property has multiple crawled properties mapped to it, and a document contains values for more than one of the crawled properties, the order in which the properties are mapped and their priority determine the value of the managed property.

For example, three different document types might have different names for the property that identifies the author. One document type might name this property Author, another Writer, and a third Property3. Although all three are crawled properties, only the documents that have the Author property appear in search results when a user queries by author (for example, by typing author:John Smith in the search box). To ensure that documents that have the other property names appear in refined search results, you must map each of these crawled properties to the Author managed property.

In this article:


  • About managed properties


  • Reducing duplicate managed properties


  • Adding properties for key concepts in the information architecture


  • Scenario

About managed properties

To create a useful set of managed properties, you analyze the most important content to find metadata in the content that you can map to managed properties.

It is difficult to discover properties of content without first crawling content. Therefore, we recommend that you wait to plan managed properties until after you know what content is in each site collection. Then, you can crawl all that content by using a test server. After the crawl, you will have a list of crawled properties to compare against the information architecture when you create managed properties. It can be difficult to map properties even after the system crawls. This is because it is difficult to identify the content type or application that uses the property. If you are unsure about a particular property, you might want to set up a mapping in a test environment and experiment with searches over this property.

Many of the most useful managed properties are automatically created when Search Server 2010 is installed. Use these managed properties as a starting point when planning the other managed properties. The properties that are automatically created include the following:

  • Author

  • Description

  • Site Name

  • Type

  • File Size

  • Last Modified Date

  • URL

  • Title

Keep in mind that in order to effectively search by using properties, the crawled properties must first be assigned values. For example, if you have a Microsoft Word 2010 document that has the Author property (which maps to a managed property named Author), and no value is assigned to the Author property on that document, the document is not displayed in search results when users query by using the Author property. To guarantee the best results for refined searches, consider implementing an enterprise content management solution that includes document metadata planning. For more information about document metadata planning, see Plan managed metadata (SharePoint Server 2010) and Content type and workflow planning (SharePoint Server 2010). For more information about planning an enterprise content management solution, see Enterprise content management planning (SharePoint Server 2010).

For information about managing metadata properties, see Manage metadata properties for search (Search Server 2010)

Reducing duplicate managed properties

Some basic properties might appear as different crawled properties in different types of content. For example, the crawled properties might be Owner, Writer, and Created By, all of which are synonymous with Author. The most important thing you can do is to plan to reduce duplication. That is, plan to create one set of managed properties and map the crawled properties that have the same meaning to managed properties. In this case, you map Owner, Writer, and Created By to the managed property Author.

You can prioritize multiple crawled properties so that if more than one property is found during crawling, only the value of the highest priority property is used for queries using the managed property or properties. If you do not prioritize crawled properties, values for all crawled properties mapped to the managed property are used for queries. In this way, the managed property becomes multi-valued. This means that a query returns results for all content that contains values for any of the mapped properties that match the query. A sensible approach for a single-value property is to choose the most common crawled property as the managed property, and then prioritize mapped properties by how often they occur. It is not always easy to determine which property is crawled most often, but one strategy is to prioritize properties that you know are associated with commonly used applications. For example, Microsoft Office 2010 documents contain a default set of properties, such as Author, Title, Company, Type, and others. If most of your users use Microsoft Office 2010 and your content set also contains documents created in other applications, consider mapping properties of the documents that were created in the other applications to the properties in Office 2010 documents. If a document created with another application contains a property named Writer, consider mapping it to the managed property named Author.

Be careful when mapping properties that you do not map poorly matched or irrelevant properties. This is because imprecise mappings can reduce the relevance of search results. For example, mapping a property called Last saved by to the managed property called Author could yield search results that are less relevant. If possible, test searches for managed properties before initial deployment, and plan to review usage data for search queries during normal operations to fine tune the properties you have mapped. For more information about reviewing usage data for search queries, see View Web Analytics reports (SharePoint Server 2010).

Adding properties for key concepts in the information architecture

In addition to the crawled properties that are mapped to managed properties by default, other crawled properties might clearly map to concepts in the information architecture that are captured by existing managed properties. For example, an organization might identify customer service as a key business process in its information architecture. Key concepts associated with customer service in the information architecture might include customers, customer service representatives, and customer service regions.

For each concept in the information architecture, ask yourself if there is a crawled property that represents this concept that can be mapped to a managed property. If so, make the property a managed property.

Scenario

A line-of-business application tracks customer and employee data, and the properties of that data are likely candidates for managed properties after they are registered in the Business Data Catalog and crawled as part of a business data content source. You might also find crawled properties for applications that should be mapped to these managed properties — for example, a customer service representative identifier (ID) property in a separate data application, or an Author property for an application type that is used exclusively by customer service representatives. A search query that uses that property or a term associated with that property will include search results for all items that contain any of the crawled properties mapped to the customer service representative ID managed property.

Each major business process identified in the information architecture will have a set of associated file types or business data applications that can be used to discover likely managed properties.

Note that although many concepts in the information architecture are not represented by properties, those concepts are useful during site structure planning and the implementation of other search features. The information architecture can identify managed properties that you overlooked. However, just because a concept is listed in the information architecture does not mean that there should be a managed property for that concept.