Chapter 3: Creating and Customizing WebMaps

With Content Analyzer, it's easy to make a map of your site. That map is really just a database of all your site resources, as well as their properties (such as URL, hyperlink text, MIME type, and so on). You can customize the map so that it works best for you. This chapter explains how to get started making your own maps and describes the various ways you can control the appearance of the maps. You'll learn how to:

  • Create a map With Content Analyzer, you can map local sites--such as those on a local or networked file system or on an internal Web server--and public sites located on the World Wide Web. You can also copy a site from a Web server to your local hard disk (so you don't have to access the server to manage or browse the site), to another Web server (to create mirror sites), or to a CD (for CD mastering).

  • Change the way Content Analyzer maps your site Content Analyzer includes a set of default options it uses when mapping a site. For example, because mapping large sites can be time-consuming, Content Analyzer sets a page number limit when exploring a site, unless you specify otherwise. Depending on the site and the task you're trying to accomplish, you can set the mapping options to meet your needs.

  • Explore the site further Once you've created a map, you may decide to explore some areas of the site that weren't explored when you first mapped the site. You can continue to explore the entire site or only the portions you specify. Depending on the size of the site, you may want to explore a limited number of pages, or specific areas of interest.

  • Show or hide types of map objects You can specify which objects you want to see--just pages, all objects, or some combination of specific object types. You can also show different resources in each view--for example, just pages in the Cyberbolic view, but pages, images, and audio files in the Tree view.

  • Change object labels Sometimes the descriptive label of a map object is not the piece of information you want for the task you're trying to accomplish. You can quickly change the label of an object type to provide a different piece of information. For example, you could choose to label pages with the full URL, or choose size as the label for images.

Topics in this chapter:

Creating a WebMap

Changing the Default Mapping Options

Exploring More (or Less) of Your Site

Showing or Hiding Types of Objects

Changing How Objects are Labeled

Creating a WebMap

Making a WebMap is a very simple process: You choose New from the File menu, then give Content Analyzer the location of the site you want to map. The location can be a path and domain name if the site is on a local or networked file system, or it can be a URL if the site is on a Web server (either your own internal Web server or the public Web). Content Analyzer will then explore your entire site, create a map and generate reports (in HTML format) that give you information about your site (its structure, any broken links, and so on).

While this is the easiest way to create a map, there are a number of options you can change to customize the map-making process. For example, you can set a page limit, extend or restrict the domains and/or site paths to be mapped, and much more. See "Changing the Default Mapping Options" for details.

When you create a map, you can start at the home page, or you can map part of a site. Mapping part of a site is a particularly useful strategy if the site you want to map is very large; it's quicker to map portions of a large site rather than the entire site at once. For more information about working with large sites, see "Tips for Mapping Someone Else's Large Web Site," and "Strategies for Making Great Maps of Your Site" in Chapter 8, "Site Management Tips & Techniques."

Once you've created your map, it's easy to keep it up to date as your site changes. See "Keeping Maps Current" in Chapter 8, "Site Management Tips & Techniques" for details.

Note If you're using a proxy server, or if there are password-protected pages in the site you want to map, you'll want to change some of Content Analyzer's program options before you begin mapping. See Appendix , Appendix A: Content Analyzer Setup, for details.

To Map the Site
  1. Choose New from the File menu, and then choose either Map from File or Map from URL from the submenu.

    Shortcut: Press Ctrl+N or click the New Map button on the Main toolbar to display the New Map dialog box. Select the URL or File button and click OK. 

    New Map from URL and New Map from File dialog boxes 

Now, follow one of the next two procedures, depending on whether you're mapping from a URL or a file system.

To map from a file system
  1. Enter the path and file name for the home page (or any other page in the site where you want to start mapping) in the Home Page Path and file name text box.

  2. In the Domain and Site Root text box, enter the domain and root directory for the site.

    If you want to start mapping the site from a page other than the site's top home page, add the path to that page after the domain name (but don't include the page's file name). 

  3. If you have any CGI scripts in your site, and they're not in the disk directory \cgi-bin (where "\" is the site root on disk), enter or browse for their location in the CGI Bin Directory text box. If you don't enter a location, Content Analyzer won't be able to find the scripts, and they'll show up as broken links in the map. For instance, if you've created an alias directory for your CGI scripts called /usr/bin, you'd enter that alias in the CGI BIn Directory box.

  4. Click OK.

To map from a URL:
  1. In the Home Page Address box, enter the URL of the site's home page (or any other page in the site where you want to start mapping).

    Shortcut: If your Web browser is open, you can copy a URL from your browser's Location text box and paste it to Content Analyzer's Home Page Address text box. 

    Be sure to include the domain, but you don't need to precede the URL with https://. For example, to map the Microsoft site, you'd enter *www.microsoft.com/index.htm*. 

  2. Click OK.

To cancel mapping

To cancel the mapping process at any point, click Cancel, or press Esc; the map will contain as much of the site as Content Analyzer has mapped so far.

Set Routes by URL Hierarchy Checkbox

By default, the Set Routes by URL Hierarchy checkbox is selected in the New Map dialog box. This causes Content Analyzer to build the map according to the hierarchy of URLs in the site. If you clear this checkbox, the map will be built according to the order in which links are discovered, without regard for where the links lead to in the hierarchy. For more information about setting routes, see "Different Ways to Set Routes" in Chapter 5, "All About Routes."

Explore Entire Site and Generate Site Reports Checkboxes

By default, the Explore Entire Site and Generate Site Reports checkboxes are selected in the New Map dialog box. If you leave these options as they are, Content Analyzer maps your whole site, regardless of how many pages it contains. It also generates HTML site reports that you can view in your browser when mapping is completed.

If your site is very large, you may want to clear one or both of these checkboxes. If you clear Explore Entire Site and click OK, the Explore dialog box appears, allowing you to create a smaller map (by limiting the number of pages or levels to map). See "Exploring More (or Less) of Your Site" for details. If you clear Generate Site Reports, you'll save disk space, since the site reports can be large if your site has many resources. See Chapter 4, Working* *with HTML Reports, for information about site reports.

Changing the Default Mapping Options

If you want to change any default mapping options, click the Options button on the New Map from URL or New Map from File dialog box. Typically, you might want to change the default when you're mapping from a URL but want to copy the site to your file system. Depending on whether you're mapping from a URL or the file system, some options may not be available, as noted in the following sections.

Mapping options are saved with the map. The next time you open your map, the options will be as you last set them. If you want to retain these options the next time you make a map of your site, you can use the Remap command to remap the site, without losing any of your map settings. See "Keeping Maps Current" in Chapter 8, "Site Management Tips & Techniques," for information.

Note If you want to look at or change mapping options later, without exploring or creating a new map, you can also access them with the Mapping Options command on the Mapping menu.

The General tab

The following options are available on the General tab of the Mapping Options dialog boxes.

Ignore Case of URLs

If the site you're mapping is on a case-insensitive server (such as Windows NT), the link URLs to a particular resource could use inconsistent capitalization. For example, you might have used MYPAGE.HTM in a link URL in one place and Mypage.htm in another place to reference the same resource. If you did use inconsistent capitalization, you might see duplicate main-route objects in the WebMap. This occurs because, by default, Content Analyzer creates a unique main-route object for each differently specified URL (since UNIX servers are case-sensitive).

If this is your own site, consider changing the link URLs to use consistent capitalization. Not only will your maps be more usable, but your site will be more portable if you need to move it to a case-sensitive server.

If you don't want to fix your site's link URLs right now, you can eliminate any duplicate main-route objects caused by inconsistent capitalization. To do so, select Ignore Case of URLs in the Creation Options area of the New Map dialog box.

If No Default File, Map All Files in Directory (Mapping from File Only)

If your site contains any links that point to a directory root, but the link specifies no default file name (for example, /mysite/marketing/), Content Analyzer first looks for a default file in that directory (in this case in the marketing subdirectory). The default file's name must be the same as your site's home page. So if your home page is named index.html, Content Analyzer only looks for index.html in other directories (it won't look for names like default.html or welcome.html, other common default names). If Content Analyzer doesn't find a default file, any links that point to this directory will show up broken in the map, unless you select this option.

With this option selected (the default), if Content Analyzer doesn't find a default file in any directory that is pointed to by a directory-root link, it first "cooks up" a temporary page consisting only of links to every file in the referenced directory. Next, Content Analyzer maps that page and adds it to the map of the site. In essence, Content Analyzer simply creates a directory listing for that directory, and creates a page of links from it. (This ability to create a directory listing is a common feature of many servers, and Content Analyzer simulates this server feature when you map from file.)

You may want to clear this option if you always expect to have a default file in a directory that may have such directory-root links pointing to it (so that you can detect any broken links if the file doesn't exist).

Ignore Default WebMaps (Mapping from URL Only)

At times, you might try to create a WebMap for a site that already has one. Some Webmasters have created maps of their sites for your convenience. If a site contains a default map (which must be named default.wmp) in the site's root directory, Content Analyzer detects it when you try to map the site and displays it in the map window. Often this will be a complete map of the Web site. You can then use this prebuilt map for browsing the site.

If you want to bypass the default map to create a new map (if you're the site's Webmaster, for example), select the Ignore Default WebMaps option.

Your Web site (or the site you're mapping) can contain links to objects on other sites. Content Analyzer displays these offsite objects in blue, with a question mark for pages. When exploring a site, Content Analyzer does not automatically look up the URLs for offsite objects. That's because it can take extra time to verify the existence of objects on other sites, particularly if your site is large and has many links to other sites.

If you want Content Analyzer to simultaneously explore and verify that all offsite objects exist, select Verify Offsite Links. Depending on network traffic and the speed of your Internet connection, this might take some time. Any unavailable objects will appear in red. If you don't use this option, you can always verify links to offsite objects later by choosing Mapping|Verify Links. (See "Verifying Links" in Chapter 7, "Managing Links," for information.)

Content Analyzer only verifies that each offsite object exists. It doesn't look up any of the links on an offsite page, so a question mark still appears next to the offsite object. This prevents Content Analyzer from exploring every object in other sites, which could also be linked to yet other sites. You can always explore offsite pages individually (see "Manually Exploring a Particular Page").

Honor Robot Protocol

The Honor Robot Protocol option (selected by default when you map a site on the Web) ensures that Content Analyzer will not explore any area of a site that is set up to block spiders. If you want to ignore robot protocol for your own site, clear this option; then you can build the map without any limitation on how much of the site you can map. However, ignoring robot protocol that has been set up on other sites is usually considered bad "netiquette."

Note If you use the Extensions tab to tell Content Analyzer to explore additional offsite domains, Content Analyzer automatically obeys any robot protocol that has been set up for those sites.

For more information about honoring robot protocol, see "Exploring Sites that Restrict Spiders."

Local Site Root Directory (Mapping from File Only)

When you first create a map from the file system, this option is informational only; it simply shows you the site root that you entered in the Home Page Path and file name box.

Later on, if you want to share a map you've created with a coworker who has the same site in a different file system location (or who has mapped the network drive containing the site to a different letter than you have), you can change the local site root by choosing Mapping|Mapping Options. See "Changing the Map's Local Path" in Chapter 8, "Site Management Tips & Techniques," for details.

User Agent (Mapping from URL Only)

Use this option to select which user agent (robot) Content Analyzer employs when it maps the site. You can enter the user agent of your choice, or use one of the drop-down choices: Content Analyzer Default (the default), Mozilla 2.0, Mozilla/2.0 (Internet Explorer compatible), and Mozilla 3.0. If you keep the default setting, Content Analyzer's robot will be identified to other sites as Microsoft. Unless your server creates your site differently (depending on the user agent), you probably shouldn't change the default.

If you've set up your site to be displayed differently, depending on which browser people use, you may want to choose one of the other options. This will ensure that your site maps the way it is browsed. Microsoft Internet Explorer uses Mozilla 2.0 (Internet Explorer compatible). This second choice is a ready-to-use Internet Explorer 3.0 user agent string--Mozilla/2.0 (compatible; MSIE 3.0; Windows 95). Mozilla 3.0 is the latest version of Netscape Navigator's user agent. If you choose one of the other user agents, you'll want to enter the precise string for the agent. For instance, for Netscape Navigator 3.0, the string is Mozilla/3.0 (Win95; I).

The Site Copy tab (Mapping from URL Only)

You can use the Copy Site option to copy a site from a Web server to your local hard disk as Content Analyzer builds the map. Select the Copy Site checkbox, then enter (or browse for) the directory name in the Local Site Root Directory text box. Then click OK.

Note Be sure to select Ignore Default WebMap on the General tab. If you don't select this option and there's a default WebMap on the site, Content Analyzer will not copy all the files you expect it to copy.

Site copies are useful for a variety of reasons. Perhaps you're setting up a mirrored site in order to split incoming requests between multiple servers. Or perhaps you're getting ready to master a CD of your site. There may also be times when someone needs to work with your site on a laptop (for example, a sales representative on the road), and that's easy to do when you've copied a site locally. In addition, working with a site locally is much faster than on the Web, because you don't have to deal with server traffic and long load times.

If you try to copy a site to a directory that already contains one or more files, Content Analyzer displays a warning message. You can still copy the site to this directory (by clicking OK in the message box), but some files might be overwritten if they have the same name. It's a good idea to store your site copies in separate directories, one for each site you map. Otherwise, you could end up with a jumble of files from several different sites in the same directory.

The Extensions Tab

Use this tab to tell Content Analyzer to explore additional domains when it creates a map. By default, Content Analyzer only explores the domain you specified in the New Map dialog box. All other domains are shown in blue in the map, and are not explored automatically; you must click individual question marks if you want to explore these offsite resources.

To extend the mapping area, you can tell Content Analyzer to do one of the following:

  • Auto-explore all other domains (and/or relative paths) it encounters when mapping the site. To do this, select the All Other Domains button on the Extensions tab.

  • Auto-explore just the domains you specify. To do this, select the Auto-Explore URLs Starting With button.

In either case, Content Analyzer will obey any robot protocol that has been enabled for the sites you specify.

Note Be sure to use URL path syntax, not file syntax. For instance, if you want to specify a path that is currently located at c:\products/widgets, enter /products/widgets. Note also that you cannot use periods to indicate a relative path (for example, .../widgets). You must enter paths starting at the site root.

For instance, suppose you're mapping www.microsoft.com, but you have some links to www.somecompany.com. You want Content Analyzer to explore SomeCompany's site as well as Microsoft's. Here's what you'd do:

  1. Click the Auto-Explore URLs Starting With button.

  2. Enter the new domain in the New Entry text box.

  3. Click Add.

The Extensions feature is particularly useful if you want to completely map several sites that are part of a larger corporate intranet. The Extensions feature also comes in handy when you have important files in an area of your site that is not directly off the site root. For instance, suppose you're mapping a portion of your site, beginning with www.mysite/hobbies. And suppose you keep all your site's graphics in a directory called www.mysite/graphics. Since, by default, Content Analyzer only maps resources linked to the hobbies subdirectory and below, you'll need to add www.mysite/graphics to the Extended list.

Warning: If the Explore Entire Site checkbox is selected on the New Map dialog box, Content Analyzer will completely explore all sites you've specified on the Extensions tab (as well as the site you're mapping). This includes public Web sites that you have links to in your site. Be aware that this could take a very long time if some of these sites are large. You could potentially map the entire World Wide Web!

For more information about how Content Analyzer explores a site, see "Exploring More (or Less) of Your Site."

The Restrictions Tab

Use this tab to tell Content Analyzer not to explore the particular domains or paths you specify when it creates a map.

For example, suppose you're mapping a site (www.mysite.com), but you want to exclude the Corporate section (www.mysite.com/html/corporate) and the Products section (www.mysite.com/html/products) from the map. You'd do this:

  1. Select the Restrictions (Don't Auto-Explore URLs Starting With) checkbox.

  2. In the New Entry text box, enter each domain or path you want to ignore. In this case, you'd enter /html/corporate and /html/products. You don't have to enter the full domain, just the path relative to the site root.

  3. Click Add. Your entry appears in the Don't Auto-Explore URLs Starting With list box.

You can also restrict particular paths within domains you've added to the Extensions tab. For example, suppose you have link URLs in your site pointing to www.cnn.com. You've added this domain to the Extensions tab, but you don't want Content Analyzer to auto-explore a particular area of that site, /WORLD. You'd enter the full URL in the New Entry text box on the Restrictions tab: www.cnn.com/WORLD. Note that you must include the domain in this case; if you just enter /WORLD, Content Analyzer will assume that WORLD is a subdirectory off the root of your own site.

For more information about exploring a site, see "Exploring More (or Less) of Your Site."

Exploring More (or Less) of Your Site

With Content Analyzer, you can discover the structure of as much (or as little) of the site as you like. That is, you can have Content Analyzer look up (explore) the URLs for all pages and other objects linked to them in the site, or you can choose to explore only some of them. You can choose to explore part of your site when you first create a map, then explore more later.

If the Explore Entire Site checkbox is selected when you first create a map, Content Analyzer explores the entire site and builds a complete map; you'll be able to view and work with all objects and links in the site. If the site is very large, however, you might want to limit exploration, since it could take a long time to build a complete map, especially if you're mapping a site on the Web and your connection to the Internet is slow. See "Tips for Mapping Someone Else's Large Web Site" for more details.

There are a variety of ways to explore a site, or part of a site:

  • When you create a new map, clear the Explore Entire Site checkbox, then use the Explore dialog box.

  • Click the question mark icon next to any unexplored page to manually explore that page (Tree view only).

  • Right-click a page in the map and choose Explore Branch to explore a certain region of the site.

  • Right-click a page in the map and choose Explore Page (Cyberbolic view only).

  • Choose Explore Site from the Mapping menu or click the Explore Site button on the Main toolbar to explore a particular number of pages or site levels.

  • Choose Explore Branch from the Mapping menu to explore only a particular region of the site.

The Explore Dialog Box

You can use the Explore dialog box when you first create a map (as long as you've cleared the Explore Entire Site checkbox in the New Map dialog box), or at any time while you're working with the map.

Note The Set Routes by URL Hierarchy checkbox will have the same setting as when you first mapped the site. For more information about route-setting, see Chapter 5, All About Routes.

Exploring the Entire Site

For Content Analyzer to explore (look up URLs for) more of the site as a whole, choose Explore|Site from the Mapping menu. Content Analyzer will explore the site from the top down, beginning at the home page (or where it left off the last time you explored), to the limits you specify in the Explore dialog box.

Exploring a Particular Region (Branch) of a Site

Choose Explore|Branch to explore just a particular region of the site, starting at any point in the map. Content Analyzer will explore only that branch of the map (that is, the selected page and its children, grandchildren, and so on--its "descendents"), up to the limits you specify in the Explore dialog box. Content Analyzer can explore a maximum of 150 pages. You might find this particularly useful with large sites, when you don't want to spend the time required to explore the entire site, but you do want to quickly analyze part of it.

Using Explore Branch to map one region of the Microsoft Cinemania site 

To use the Explore dialog box
  1. Choose Explore from the Mapping menu, or click the Explore button on the Main toolbar.

  2. If you want to explore more of the entire site, choose Site from the submenu.

  3. If you want to explore only a certain area of the site, select the page from which you want to start exploring, then choose Branch from the submenu. (Or, choose Explore Branch from the right-click menu.)

  4. In the Explore dialog box that appears, set the page and level limits as you like (these options are explained in the topics that follow these steps).

  5. If you want to set routes in the map by the structure of URLs in your site, be sure the Set Routes by URL Hierarchy checkbox is selected. (For more information about setting routes, see "Different Ways to Set Routes" in Chapter 5, "All About Routes.").

    Note: If you select this checkbox, all routes in the map will be set by URL hierarchy, even if you are exploring just one branch of the map.

  6. If you want to change any other mapping options, click the Options button. See "Changing the Default Mapping Options" for information about each option.

  7. Click OK.

A status box appears, showing a running count of all objects and links in the site. As Content Analyzer finds objects, it dynamically updates the list. You can click Stop to halt exploration at any time; the resulting map will contain all objects and links found so far.

Limiting the Number of Pages to Map

To save time, you can tell Content Analyzer to automatically explore only so many pages in a site. This is useful if a site is large, or if you prefer to explore parts of the site manually. When you're mapping from a URL on the Web, the default page limit is 100 pages. When you map from a local file, the suggested limit is 1000, because server congestion is not an issue when you map locally. You can enter a different number if you want Content Analyzer to map more or less of the site. You cannot leave the Page Limit text box blank; you must enter some limit.

When you're mapping from a URL, if you enter a number greater than 500, a warning message appears, recommending that you enter a smaller limit. It takes a long time to map more than 500 pages, and automatic exploration of such a large site can bog down the server hosting the site. This might not be a concern if you're exploring from a local or networked file system, but be aware that it does take longer to explore a large number of pages, no matter where the site is located. You might want to create several smaller maps of a large site instead of attempting to map the whole thing at once. (See "Tips for Mapping Someone Else's Large Web Site.")

Limiting the Explore Level

In the Explore dialog box, select Limit Levels To to limit the exploration to a particular level in the map hierarchy (the home page is considered "level 1"). You can enter any number, though you might want to limit exploration to the first two or three levels. Otherwise, it might take longer to build the map, particularly if you're mapping from the Web and your Internet connection is slow or network traffic is heavy.

You should specify a number greater than 1 because, when Content Analyzer creates a map, it automatically explores the first level.

Changing the Mapping Options

Just as when you first create a map, you may want to change some of the default mapping options when you explore more of your site. When you click the Options button, you get the same mapping options that are available to you by clicking the Options button on the New Map dialog box.

You can also change the mapping options at any time by choosing Mapping|Mapping Options. See "Changing the Default Mapping Options" for information about each option.

Manually Exploring a Particular Page

A page that is accompanied by a question mark icon is unexplored--that is, Content Analyzer has not verified its URL and has not displayed its hyperlinks to other objects. Rather than exploring entire levels of a site, you can explore individual objects to see their links.

To explore a page
  • In the Tree view, click the question mark icon.

  • In the Cyberbolic view, right-click the page and choose Explore Page. (The question mark in Cyberbolic view isn't clickable; it's just an indicator to tell you the page isn't explored yet.)

    When you click a page's question mark icon, choose Explore Page in the Cyberbolic view, or use the Explore command to build the map automatically, the question mark icon does one of the following things: 

  • It disappears if the page contains no links to other objects.

  • In the Tree view, it turns into a minus if the page contains one or more links to other objects.

    Objects such as images or audio files might be hidden from view (because of the current display options), so there might be no plus or minus sign next to some pages, even though the children of those pages are actually contained in the map. (See "Showing or Hiding Types of Objects" for more information.) 

  • It remains a question mark if Content Analyzer couldn't access the page (because its URL isn't valid, or there are problems with the server or Internet connection). In this case, the label turns red, indicating that the page is unavailable.

To stop the exploration, click the Stop button or press Esc.

Exploring Sites that Restrict Spiders

When Content Analyzer explores an entire site, it operates as a spider--that is, it requests many Web pages in a very short period of time, far faster than you could do by exploring pages individually. (Spiders are also sometimes called robots or crawlers in Internet parlance.) When exploring a site automatically, Content Analyzer obeys any robot exclusion protocol that has been enabled for that site unless you choose to ignore it (see "Limiting the Number of Pages to Map" for more information). The robot exclusion protocol prevents the site's server from becoming bogged down with too many "hits" (accesses). This can be a real problem with popular sites, which might receive hundreds of thousands of hits per day.

When Content Analyzer encounters a site with an area that restricts spiders, it stops exploring that region of the site. When the map appears in the map window, any page that could not be accessed because of the site's robot exclusion protocol is shown with the Robot icon:

 

Content Analyzer is affected by robot protocol only when you explore the site using the Content Analyzer spider (when you create a new map, and when you use either the Explore|Site or Explore|Branch command). If you explore pages individually, Content Analyzer ignores any robot protocol. So if you've used the Explore command and you see Robot icons in your map, you can always explore the page(s) manually later to explore that area of the site.

In general, you can help prevent server overload and practice good "netiquette" by not using the Content Analyzer spider to explore large or particularly busy sites, unless it's your own site. If you want to use the spider to explore a large Web site that's not your own, try contacting the Webmaster of the site to obtain permission to ignore the robot protocol enabled for that site. However, a better strategy is to create several maps for different regions of the site. See "Tips for Mapping Someone Else's Large Web Site," and "Strategies for Making Great Maps of Your Site" in Chapter 8, "Site Management Tips & Techniques," for more information.

Tips for Mapping Someone Else's Large Web Site

This section presents some ideas for mapping large, busy sites that are not your own. For ideas on how to map and maintain your own large site, see "Strategies for Making Great Maps of Your Site" in Chapter 8, "Site Management Tips & Techniques."

Depending on the speed of your network connections and the load on the server where the site resides, it can take a long time to explore a large, complex site. If the site you want to map is quite large--say, more than 500 pages--consider the recommendations in this section to save time and cut down on Internet and server traffic.

Make separate maps for different areas of the site. Try creating several maps of one site, each starting at a different point. Rather than generating a map from the site's home page, you can begin mapping at any page. For instance, for a large corporate site that covers many different products, you could create a map for each different product area. Just enter the URL for any page in the Home Page Address text box in the New Map dialog box. You can then save the maps under different names and use them later to browse areas of the site.

Limit exploration of the site. When initially creating a map, clear the Explore Entire Site checkbox. Content Analyzer explores the home page and displays its child objects with question marks. You can then define limits in the Explore dialog box that appears. Enter a number (or accept the default) in the Page Limit text box to set a maximum number of pages to explore. If you enter a number greater than 500, Content Analyzer displays a warning message. You can also choose to limit the explore level (with the Limit Levels To checkbox). As an alternative, use the Explore|Branch command on the Mapping menu to explore only a particular part of the site.

You can also click question marks individually (in the Tree view) to explore the areas of the site you're most interested in. In the Cyberbolic view, you can select a map object and then choose Explore Page from the right-click menu.

Verify offsite links by exploring pages manually. Verifying links to other sites creates more Internet traffic, and if the network is busy, this process can be time-consuming. When you create a new map, or use the Explore command, click the Options button and make sure that Verify Offsite Links is cleared on the General tab. You can always explore the offsite objects manually later.

Save large maps. If you've taken the time to map and fully explore a large site, you'll probably want to save the map to disk so you can use it again later (rather than spending time remapping the site).

Showing or Hiding Types of Objects

While it's sometimes useful to see all Web site resources in your map, you'll often want to reduce clutter and conceal objects that aren't particularly useful to you. For example, you might not want to see any images at all in your map. Or you might want to hide all alternate routes, so that each object is shown once and only once in the map. Deciding what to hide and show in your maps is an art, and varies according to what your map is for: your own management tasks, or for other people who want to use your maps to browse your site.

Content Analyzer includes the default display option settings shown in the following figures. Each view has its own tab in the Display Options dialog box; that's because you can configure each view differently.

By default, in the Tree View, all map objects are displayed, including alternate routes. In the Cyberbolic View, only pages are shown; all other objects are hidden, as are alternate routes. Note that if you're using a map published by someone else, the default display options may have been changed by the map publisher.

You use the Display Options dialog box to determine which object types you want to view while using Content Analyzer. When you save a map, any display options you've changed are saved with the map. The next time you open the map, any hidden objects will still be hidden, but you can see them again by changing the display options. To hide an individual object (such as one particular image), see "Showing or Hiding Individual Objects" in Chapter 8, "Site Management Tips & Techniques."

Tree view default settings in the Display Options dialog box 

Cyberbolic view default settings in the Display Options dialog box 

You can permanently remove certain types of objects in the map using publish options, and then publish the map so that any hidden objects are actually removed from the map; they can't be retrieved. However, any objects you've hidden only with the Display Options dialog box will still be available in the published map. For more information about publishing options, see "Publishing WebMaps of Your Site" in Chapter 8, "Site Management Tips & Techniques."

Showing or Hiding Object Types

You use the Display Options dialog box to show and hide objects. Each view has its own tab inside the dialog box, so you can configure the display options for each view differently. The Show Only list (in both views) shows all the object types that can be present in a map. Of course, not all Web sites contain all object types, so some of the checkboxes may be irrelevant to the map you're working with right now.

To show or hide object types
  1. Choose Display Options from the View menu or click the Display Options button on the Main toolbar.

  2. Select the appropriate tab (either Tree or Cyberbolic) and specify which object types you want displayed in that view.

    To show all objects: Select the Show All Objects checkbox. All other checkboxes under Show Only will be dimmed and not selectable. 

    To show only particular object types: Select the object types in the Show Only list. 

    To show most (but not all) object types: Click Select All and then clear the one or two object types you don't want to show. If you want to show just a few object types, choose Deselect All and select the object types you want to show. If you later choose Show All Objects, your checkbox settings will be dimmed but will remain as you set them. 

  3. When you're satisfied with your settings, click OK.

  4. Repeat this process for the other view, if you like.

If you want to hide individual objects, see section 9.2, "Showing or hiding individual objects."

Object Types in the Display Options Dialog Box

Not all the MIME types you see in a map are listed in the Show Only list, but might be included in a more general category. For example, the Gopher object type is a subtype of the Internet Services type. Note that pages are not included in the Show Only list because they're always displayed in a map. You can't hide all pages, because to do so would mean that nothing would be visible in the map. (You can, however, hide individual pages; see "Showing or Hiding Individual Objects" in Chapter 8, "Site Management Tips & Techniques," for information.)

Object Type

Included MIME Types

Applications

Java applets, executable files, PDF files, Microsoft Word documents, PostScript files, and other applications

Audio

WAV, AIFF, AU, and other audio files

Gateways

CGI script files

Images

GIF, JPEG, and other types of images

Internet Services

FTP, Telnet, Mailto, WAIS, NNTP, Gopher, and all Internet services other than HTTP

Text

Text files (other than HTML pages), including plain text

Unrecognized

Any object that Content Analyzer can't identify

Video

MPEG and other video file types

WebMaps

WebMaps

Showing or Hiding All Alternate Routes

When the Show Alternate Routes checkbox (in the Display Options dialog box, either view) is cleared, only the main route to an object appears in the map window. This can be useful when you just want to navigate around a site to see what's there. You might want to view alternate routes, however, if you're interested in seeing the different ways a site is connected by hyperlinks. In the Tree view, selecting the Show Alternate Routes checkbox lets you view alternate routes as either object or link icons; alternate routes are only viewable as objects in the Cyberbolic view. In the Tree view, you can also opt to expand the alternate routes, if you like.

Note For detailed information about main and alternate routes, see Chapter 5, All About Routes.

Note that you can hide all alternate routes to individual objects using the Properties dialog box.

To hide all alternate routes
  1. Choose Display Options from the View menu or click the Display Options button on the Main toolbar.

  2. Select the appropriate tab (Tree or Cyberbolic), depending on which view you want this change to affect. (If you want to make this change for both views, you'll simply repeat steps 1 through 4 for the other view.)

  3. Clear Show Alternate Routes.

  4. Click OK.

  5. Repeat this process (steps 1 through 4) for the other view, if you like.

To show all alternate routes
  1. Choose Display Options from the View menu or click the Display Options button on the Main toolbar.

  2. Select the appropriate tab (Tree or Cyberbolic), depending on which view you want this change to affect. (If you want to make this change for both views, you'll simply repeat steps 1 through 4 for the other view.)

  3. Select Show Alternate Routes. In the Tree view, you have several options (these options are not available in the Cyberbolic view):

    If you want to show alternate routes that point to a location within the same page, select the Within Page checkbox. 

    If you want to see alternate routes represented as link icons in the map window, select Link Icons under the Display As area. 

    If you want alternate routes to be expandable, select the Allow Expansion checkbox. 

  4. Click OK.

  5. If you like, repeat steps 1 through 4 for the other view.

For more information about alternate routes and changing routes, see Chapter 5, All About Routes.

Changing the Default Display Options

Content Analyzer comes with default display options (listed in the table at the beginning of this section). By default, all map objects are displayed, including alternate routes. The alternate routes are displayed with link icons but they're not expandable. You can change these settings for the current map and then save your changes as the default for new maps. The Tree and Cyberbolic views can have different defaults. (Existing maps retain the settings you've created for them unless you load the defaults. See "Loading Default Settings into Existing Maps.")

When you change the default settings and save them, you override the default settings that come with Content Analyzer. You cannot recover the original default settings unless you enter them manually and then click Save As Default.

To change the default display options
  1. Choose Display Options from the View menu or click the Display Options button on the Main toolbar.

  2. Select the appropriate tab (Tree or Cyberbolic), depending on which view you want to set defaults for.

  3. Change the display options as you like.

  4. Click Save As Default to save your new defaults for the current map and for all new maps you create.

    If you want to set defaults for both views, simply repeat steps 1 through 4 for the other view; defaults for the Tree and Cyberbolic view are saved separately. 

  5. Click OK.

Loading Default Settings into Existing Maps

You can load the default display options into any existing map by clicking Load Default. (New maps automatically use the default settings.) If you haven't changed the default settings, you can use Load Default to use the settings that came with Content Analyzer.

  1. Choose Display Options from the View menu or click the Display Options button on the Main toolbar.

  2. Select the appropriate tab (Tree or Cyberbolic), depending on which view you want to load defaults for.

  3. Click Load Default.

  4. If you like, repeat steps 2 and 3 for the other view.

Changing How Objects are Labeled

You can change the descriptive labels for any object types in the map (your choices apply to both the Tree and the Cyberbolic views). For example, if you want to see the full URLs of pages instead of their names, you can use the Labels dialog box to change the label. Changing the label for an object type comes in handy for many site management tasks, especially after you search for objects with a particular characteristic.

Any changes you make to labels will appear in the map itself, as well as in the Properties dialog box and results windows.

Depending on which object type you've selected, different labels are available. See "Object Types in the Labels Dialog Box" for details.

In Cyberbolic view, you can also change the pop-up label that appears when the cursor is positioned over a map object. See section 2.3.6, "Changing the Cyberbolic pop-up label," for more information.

Each object in a WebMap is represented by an icon and a label. (In Cyberbolic view, the page icon is not included, but pages are still easy to identify because they are the only labels without any accompanying icons.)

Labels can be different for each object type 

In new maps, pages are labeled with their Name property (by default, the page's HTML TITLE tag), and images are labeled with their ALT string.

Note Changes you make to labels appear in the WebMap itself, as well as in the Links window (View|Object Links) and in the results window (if you perform a search, or use the Compare and Update command).

To change the label for an object type

  1. Choose Label Options from the View menu or click the Labels button on the Main toolbar.

  2. If you want to use the same labels for all object types, select Apply to All Object Types. If you want to use different labels for each object type, select an object type from the list.

  3. Choose a First Choice for the label; this will be the default for the selected object type.

    The label choices available in the drop-down list depend on which object type you've selected; the choices correspond to the properties for that object type (as maintained in the Properties dialog box). For a detailed description of the label choices, see "Object Types in the Labels Dialog Box."

  4. Select an alternate choice for the label (if your first choice for the display label is not available for a particular object). If you don't specify an alternate choice, and the first choice isn't available for an object, the Link URL will be used as the label.

    For example, some images might not have an ALT string (descriptive text for users of nongraphical browsers). So for images, you might want to specify ALT String as the first choice, and Hyperlink Text as the alternate choice.

  5. Click Apply whenever you want to see how your label choices are affecting the map window.

  6. If you want to save your label settings as the default to be used with all new maps, click Save As Default. Anytime you want to use these label settings on an existing map, click Load Default.

  7. When you've finished making changes, click OK.

Note If you'd like to assign custom labels to individual objects, you can do so by changing the object's name in the Properties dialog box.

Object Types in the Labels Dialog Box

The available object types are shown in the following table, along with the default label settings that are included with Content Analyzer. (If you're using a WebMap published by someone else, the defaults may have been changed by the map publisher.) Note that not all MIME types are available as a separate choice, but are part of a larger object-type category.

Object Type

Included MIME Types

Default First Choice Label

Default Alternate Choice Label

Pages

HTML pages, HTML pages with data-entry form

Name (TITLE by default)

Hyperlink Text

Images

GIF, JPEG, and other types of images

ALT String

Link URL

Gateways

CGI script files

Hyperlink Text

Link URL

Internet Services

FTP, Telnet, Mailto, WAIS, NNTP, Gopher, and all other Internet services (except HTTP)

Hyperlink Text

Link URL

Other

All object types and applications that aren't included in other object categories, including WebMaps, audio and video files, text files, PostScript, and other applications, and PDF files

Hyperlink Text

Link URL

Alternate Routes

All secondary occurrences of objects in the WebMap (green labels)

Hyperlink Text

Corresponding Object Label

Labels Available for All Object Types

Depending on which object type you've selected, different labels are available. The following labels are available for all object types (except alternate routes).

Label Type

Description

Author

The name or e-mail address of the person responsible for maintaining the object. This label is derived from the Author field on the Annotations tab of the Properties dialog box.

Date

A date, as specified in the Date field on the Annotations tab of the Properties dialog box. The date format is retrieved from the Windows Regional Settings Properties in the Control Panel.

Full Path

The full URL (Uniform Resource Locator) without the https:// protocol and domain name. For example, if you have a full URL of https://www.Content Analyzer.com/manual/basics.htm, the full path would be everything after the domain name, or /manual/basics.htm.

Hyperlink Text (default)

The text (if any) used in the link reference on the object's main-route parent page.

Link URL

The link URL as specified on the object's main-route parent page. The link URL is most often relative (that is, the pathname is relative to the site's parent page).

Local Path

The complete local directory path and file name, starting with the drive letter; for example, c:\myfiles\project\phase2\base.htm. When you copy a site locally or map from file, the local path is shown on the General tab in the Properties dialog box.

MIME Type

The object's specific MIME type. For example, an image MIME type would be labeled image/gif or image/jpeg.

Name

The object's name, as specified in the Name field on the Annotations tab of the Properties dialog box. By default, the Name field for pages is the TITLE tag. However, you can enter anything you like in this field (in the Properties dialog box).

Size

The size of the HTML page itself, not including any inline resources (such as images).

Source Path
 

The object's source path, as specified in the Source Path field on the Annotations tab of the Properties dialog box.

URL

The full URL (Uniform Resource Locator) or Internet address for an object, including the protocol and domain name. For example, https://www.mysite.com.

Labels Available for Alternate Routes

Alternate routes have three label choices available:

Label Type

Description

Corresponding Object Label

The same label that was chosen for the object type on the main route. For example, if the link points to an image and you've chosen Author as the label for images, any alternate routes to images will also be labeled with the Author.

Hyperlink Text (default)

The text (if any) used in the link reference on the object's alternate-route parent page.

Link URL

The link URL as specified on the object's alternate-route parent page. The link URL is most often relative (the pathname is relative to the site's parent page).

Additional Labels for HTML Pages

In addition to the label choices available for all objects, you can choose one of the following labels for HTML pages:

Label Type

Description

First Heading

The text in the first HTML heading (H1, H2, H3...) found on the page.

Load Size

The full loading size (when loaded by a Web browser) of the page, including inline images, audio and video files, and so on.

Modified Date

The date the page was last modified.

Title

The text string in the first HTML TITLE tag. (By default, the Name label choice is the same as the Title for HTML pages.)

Additional Labels for Images

In addition to the label choices available for all objects, you can choose the following labels for images:

Label Type

Description

ALT String (default)

The optional text string included with the HTML IMG tag (to accommodate people with nongraphical browsers or who have turned off graphics downloading).

Modified Date

The date the image was last modified.

Additional Label for Gateways

In addition to the label choices available for all objects, you can choose one of the following labels for gateways:

Label Type

Description

Method

The method by which data entered in a form is passed to the CGI script for processing.

Modified Date

The date the gateway was last modified.