Microsoft Dynamics

Troubleshooting Microsoft Dynamics CRM

Aaron Elder

 

At a Glance:

  • Illustrating a Solution Architecture
  • Fundamental troubleshooting principles
  • Tools for diagnosing server issues
  • Steps to resolving CRM errors

Contents

The Stack
Ground Rules
Troubleshooting Server Issues
Play at Home Example
Wrapping Up

I’m back for a second article on Microsoft Dynamics CRM (see the first article, "Deploying Microsoft Dynamics CRM 4.0"), with a focus on something I am passionate about—that is, troubleshooting. Troubleshooting Microsoft CRM is not much different than troubleshooting any other n-tier Web application built on the Microsoft stack. As such, this article is not a how-to guide or a collection of "101 Tips & Tricks." Instead, I will discuss the basics of Dynamics CRM components and tools for isolating, understanding, and solving problems. For this article, I will focus only on troubleshooting the server-side aspects of Microsoft Dynamics CRM issues.

The Stack

With any complex system, be it the human body or an n-tiered Web application that leverages many equally complex subsystems and external systems, it is important to understand "the system stack." The stack is basically a blueprint of a system that allows you to understand all the components of that system and how they build and layer on each other. The stack could also be called the Solution Architecture Diagram, as it illustrates the components of the solution—in this case Microsoft Dynamics CRM. Once you understand the solution, you also need to understand how the solution has been deployed. For this, you need a System Architecture Diagram, which illustrates where each component of the solution sits in relation to the others in your deployment. Understanding all of this is critical to being able to isolate the problem.

Figure 1 shows a Solution Architecture Diagram of Microsoft Dynamics CRM 4.0, depicting all of its major components and their interdependencies. Note that each component could in turn have its own diagram of equal or greater complexity. Computer systems are all about abstraction, the process by which a system component makes available a set of features that another, dependent component can rely on, hiding the internal complexity of that component. Abstraction is the reason you need to isolate a problem when troubleshooting.

fig01.gif

Figure 1 UML Package Diagram Depicting a Solution Architecture

To express the solution architecture, I have used a UML Package diagram. Arrows point in the direction of the "dependency." For example, the CRM Email Router "depends on" an SMTP Server and the CRM Platform's Web Services. A full diagram would be very complex, but this provides a basic model.

Now you can think about how Microsoft Dynamics CRM components are deployed within your enterprise. For the purposes of this article, we'll use a typical deployment architecture, as shown in Figure 2. Understanding how the system architecture relates to the solution architecture is vital when it comes to isolating problems. Without knowing where components are running, you could spend hours trying to find and fix a problem that isn't even happening on the machine you are trying to fix!

fig02.gif

Figure 2 A Typical Deployment Architecture

Ground Rules

Before you start troubleshooting a problem with Dynamics CRM, you need to understand a few fundamental troubleshooting principles. First, let's walk through a basic workflow of troubleshooting and some guidelines for how you know when it is safe to proceed to the next step.

1. A problem or error is identified and reproduced.

  • Have you identified the problem and have you read the error message?
  • Is the error message generic? If so, have you taken steps to find the "real error"? Hint: If the error says "Something happened, contact your System Administrator," remember that you are probably that System Administrator and therefore need to do more digging to find the real error. Before you move to Step 3, you must be sure you are dealing with the actual error.

2. The problem needs to be understood.

  • Did you read and comprehend what the error message is saying?
  • Do you have a consistent set of steps to reproduce the issue reliably?

3. The problem needs to be isolated.

  • What systems in your System Architecture can you rule out as a cause of or influence on the problem?
  • What components of the Solution Architecture can you rule out as a cause of or influence on the problem?

4. The fix needs to be identified and understood.

  • Are you able to find support articles, blog posts, or newsgroup postings that suggest fixes that apply to your exact problem?
  • Before applying a fix, do you understand why it will correct the problem?

5. The fix needs to be applied and verified.

  • Does the applied fix resolve the issue? You will need to be able to reproduce the issue (Step 2) in order to be sure. Because you understand the fix, have you re-tested other areas of the system that might be affected?

Troubleshooting Server Issues

With an understanding of the troubleshooting process, we can now move on to the tools needed for diagnosing problems within Microsoft CRM.

DevErrors—When Microsoft Dynamics CRM submits data to the server, information is passed to ASP.NET and processed. Any errors are handled by a global exception handler at the ASP.NET layer. For usability and sometimes security reasons, the real error is hidden from the caller (that's you or your user) and a "pretty error" is displayed instead. Typically this error says something like "You do not have sufficient privileges" or "The requested record was not found." Unfortunately, these pretty errors come from a "white list." Of the hundreds of thousands of errors that could be thrown by CRM or any related component (SQL, SRS, .NET, Windows, and so on), only error codes that the CRM team thought might happen have a pretty string associated with them. The rest get handled by the dreaded catchall "An error has occurred, please contact your System Administrator." This, of course, is not of much use to you, the System Administrator.

Since one of our ground rules is to get the real error, you need to be able to tell when CRM is lying or at least not telling you the whole truth. The truth serum is to enable DevErrors via the web.config. This is done by modifying the [CRMWEB]\web.config file like so:

<add key="DevErrors" value="On"/>

Be sure to keep your System Architecture in mind when doing this. If you have two servers configured in a load-balanced environment, you will want to isolate the server where the error is happening or, alternatively, be sure to enable DevErrors on both servers. Once DevErrors is enabled, you will see errors that look something like the one in Figure 3.

fig03.gif

Figure 3 A Microsoft CRM Error Report

Figure 3 shows several items on the left that provide different sets of information:

The Detailed Error—The first screen (the default) shows you the real error from Microsoft Dynamics CRM's point of view. This includes the Date, Time, and Server name where the error occurred, as well as the Error Description, Call Stack, Error Number (if available), the Source File and Line Number where the error occurred (if available), and the URL that was requested—all useful when trying to figure out what went wrong.

The ASP.NET Error—The next item is the real error from ASP.NET's perspective. This provides much the same information as the CRM error, but adds the options to "Show Detailed Compiler Output" and "Show Complete Complication Source."

Diagnostic Info—The third screen, shown in Figure 4, provides basic information about the server, where the error occurred and details about the client and user that made the request. This information includes server operating system, .NET runtime version, server name, and the path to where CRM is installed. Information on the specific CRM database used and settings in the web.config also is included. For the client, the screen shows the browser version, screen resolution, bit depth, and more. Information on the user making the request (at least from CRM's point of view) includes the user's domain and name, CRM user name, CRM User ID, Business Unit ID, and Organization ID.

fig04.gif

Figure 4 The Diagnostic Info Screen

What the User Would Have Seen—The final item, shown in Figure 5, allows you to see from the perspective of the end user, as if DevErrors were turned off.

fig05.gif

Figure 5 The Error Message the User Would Have Seen

Please note that DevErrors helps only with errors that happen during the processing of a Web application request and only those requests that involve a full page submit to the server. AJAX requests, such as with a publish of customizations, a workflow, or a grid action, do not support DevErrors. For these, you will have to use tracing.

TIP: If the information on the DevErrors page is cut off and you can't resize the window to see more, simply double-click anywhere on the page and a resizable window will open.

Tracing—If a CRM error happens anywhere other than from a direct Web request, the best way to get the real error is to use CRM tracing. Tracing can be enabled and configured by following the steps in the "How to enable tracing in Microsoft Dynamics CRM" article at support.microsoft.com/kb/907490. Or you can use a tool such as CrmDiagTool, available at box.net/shared/6oxfqi2ida.

TIP: In CRM 4.0, the "Trace Directory" is ignored.

Tracing can be scary for the novice, so don't get frustrated. In general, tracing should be used only when troubleshooting an issue. Depending on how you configure tracing, there can be a significant performance impact when it is running, and if you have verbose logging on, a heavily used system can easily create hundreds of megabytes of logs per hour. The article mentioned above provides a detailed explanation of all the ways to configure tracing and how to enable tracing for client and server.

TIP: When emailing out trace logs for help, be sure to compress them first. Log files are just text and usually compress by 90 percent or more.

Log File Structure—When tracing is enabled on the server, logs will be placed in the Trace folder, which is located where you installed CRM. Each service has its own log file and each file will by default grow to 10MB before a new file is started. Since the log files are actively being written to by the various CRM processes, you will not get the absolute latest trace information until the corresponding service (either IIS or Async Service) is stopped. When you open the folder you will see files such as

-CRMDEV-VPC-CrmAsyncService-bin-20090415-1.log

-CRMDEV-VPC-w3wp-CRMWeb-20090415-1.log

The naming convention is [MACHINE NAME] – [CRM PROCESS] – [YEAR MONTH DAY] – [SEQUENCE].LOG

The log file contains loads of information, with items written in chronological order. Note that the trace log writes the latest event at the bottom of the file, while within a call stack items are written in reverse chronological order (newest item first).

TIP: When looking for errors in the log, try searching for ": Error" (a colon followed by a space, then Error.

Event Log—The Windows Event Log is another place to look for errors that occur within Microsoft Dynamics CRM, its dependent components, or other areas of the system. Just like the Trace Log, the Event Log will generally provide more details about errors that occur within the system.

Microsoft CRM does not log all errors to the Event Log. For example, a disabled user trying to log in is logged while an attempt to update a record that no longer exists is not logged. Though this isn't documented, CRM logs errors to the Event Log from the following subsystems:

-MSCRMPerfCounters

-MSCRMPlatform

-MSCRMKeyArchiveManager

-MSCRMKeyGenerator

-MSCRMEmail

-MSCRMDeletionService

-MSCRMReporting

-MSCRMWebService

-MSCRMAsyncService

-ASP.NET 2.0

The ASP.NET 2.0 bucket acts as a "catch most" for Application layer errors. In addition, the Microsoft Dynamics CRM Email Router Service has its own Event Log (MSCRMEmailLog) that can be configured independently to log a wide range of information, warnings, and errors.

Since Event Logging does not need to be turned on, it is a good starting place to look for issues.

Reading a Call Stack—Call stacks come in all shapes and sizes and all too often are overlooked by nondeveloper troubleshooters. It is not uncommon for system engineers to simply "ignore the developer stuff" and research only the error message or code. I recommend that you not do this—even though the call stack looks like code, it is designed to be human readable and to tell what happened right up until the error occurred. Look at the following example:

[ReportServerException: The Report Server Windows service 'ReportServer' is not running. 
   The service must be running to use Report Server. (rsReportServerServiceUnavailable)]
   at Microsoft.Reporting.WebForms.ServerReport.SetDataSourceCredentials(DataSourceCredentials[]credentials)
   at Microsoft.Crm.Web.Reporting.SrsReportViewer.SetExecutionCredentials(ServerReport report)

[CrmReportingException: The Report Server Windows service 'ReportServer' is not running. 
The service must be running to use Report Server. (rsReportServerServiceUnavailable)]
   at Microsoft.Crm.Web.Reporting.SrsReportViewer.SetExecutionCredentials(ServerReport report)
   at Microsoft.Crm.Web.Reporting.SrsReportViewer.ConfigurePage()
   at Microsoft.Crm.Application.Controls.AppUIPage.OnPreRender(EventArgs e)
   at System.Web.UI.Control.PreRenderRecursiveInternal()
   at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)

What you see here is a list of all the things the system did (the calls) listed in reverse chronological order (the stack) up until the last thing it "tried" to do. In this case, the first thing that happened—the call at the bottom of the stack—was a call to System.Web.UI.Page.ProcessRequestMain().

When reading call stacks, it is important to read the name of each call. For each call listed, the words between the last period and the parenthesis are the method name. Words before the method name are the namespace, and the items within the parentheses, the parameters. In this case, the method ProcessRequestMain was called first; this method is in the System.Web.UI.Page namespace; and this method expects two Boolean (True/False) parameters. Right away, by the reading the namespace, we know that this call is not to any Microsoft Dynamics CRM code; this is actually calling code in the base .NET Framework (denoted by the System root namespace) and specifically to ASP.NET (denoted by the Web namespace). As we read "up" the stack, we see that ProcessRequestMain then called PreRenderRecursiveInternal, which then called OnPreRender. The OnPreRender method is the first method that is actually a part of the Microsoft Dynamics CRM code base, as denoted by the Microsoft.Crm namespace. As we continue up the call stack, we see that CRM actually makes a call into the SQL Reporting Services method SetDataSourceCredentials. This method then throws an exception of type ReportServerException with the error that the Report Server is not running. As you can see, by reading the call stack, you can tell that this error is not coming from CRM at all, but is instead coming from SQL Server Reporting Services (SSRS) and is then being "bubbled up" through CRM as a CrmReportingException. Based on your system architecture, you will have to determine where SSRS is running in order to know where to go start the service to resolve the issue.

Reading call stacks in this manner can help you isolate where an error is actually coming from. The final call in the stack might be something like YourCompany.Crm.Extensions.UpdateRecord(); this would tell you that the error appears to be coming from either code written by your developers or perhaps an ISV solution you purchased.

It is not uncommon for CRM errors to actually come from SQL Server in the case of referential integrity (RI) or other SQL level constraints, or from the .NET Framework itself.

Play at Home Example

Now let's give you a chance to play at home. Let's assume you have created a new user in CRM and that user is trying to use the system for the first time and is getting the error as shown in Figure 6.

fig06.gif

Figure 6 A CRM Error Received By a First-Time User

What steps would you take to solve this problem? To resolve this issue, let's follow our basic troubleshooting workflow.

1. A problem or error is identified and reproduced.

You should ask the user what he was trying to do when he received the error, then attempt to reproduce the steps to see if you can recreate the error.

2. The problem needs to be understood.

Let's read the error message from the troubleshooter's point of view: "the logged-on user does not have the appropriate security permissions."

To understand the problem, you have to be able to answer two questions: Who is "the logged-on user"? What "security permission" does he "not have"?

3. The problem needs to be isolated.

In this case, you can answer both questions by using CRM tracing. You know that tracing is needed because this error page is in a dialog and does not provide the information that DevErrors would. CRM does not log privilege errors like this to the Event Log. Let's enable tracing and reproduce the issue using the same user. The Trace Log provides the following detailed error:

MSCRM Error Report:
Error: Exception has been thrown by the target of an invocation.
Error Number: 0x80040220
Error Message: SecLib::CrmCheckPrivilege failed. Returned hr = -2147220960 on UserId: 
  e76c5f50-40b3-dc11-8797-0003ffb8057d and PrivilegeId: 7863e80f-0ab2-4d67-a641-37d9f342c7e3
Error Details: SecLib::CrmCheckPrivilege failed. Returned hr = -2147220960 on UserId: 
  e76c5f50-40b3-dc11-8797-0003ffb8057d and PrivilegeId: 7863e80f-0ab2-4d67-a641-37d9f342c7e3
Source File: Not available
Line Number: Not available
Request URL: https://localhost:5555/AscentiumCrmDev/sfa/accts/edit.aspx?id={906C2F37-8D28-DE11-8D9F-0003FFB23445}
Stack Trace Info: [CrmSecurityException: SecLib::CrmCheckPrivilege failed. Returned hr = -2147220960 on UserId: 
  e76c5f50-40b3-dc11-8797-0003ffb8057d and PrivilegeId: 7863e80f-0ab2-4d67-a641-37d9f342c7e3] at
  Microsoft.Crm.BusinessEntities.SecurityLibrary.CheckPrivilege(Guid user, Guid privilege, ExecutionContext context)
…

From this error and by reading the call stack details, you can see that the problem is caused by a CRM Check Privilege failure. You can see the GUID of the user who made the request as well as the GUID of the privilege he tried to use.

If you perform a Live Search on the Privilege 7863e80f-0ab2-4d67-a641-37d9f342c7e3, the first hit is to the Microsoft CRM SDK.

Following this link, you can see that the privilege the user needs is prvWriteAccount, which is the privilege that grants the user update rights on the Account entity. The same method would work for any of the hundreds of out-of-the-box privileges, as the GUIDs are all known. If you search for the Privilege ID and it was not found, the privilege might be on one of your custom entities, in which case you will need to query your local SQL Server to find out what privilege is being requested. The following script will yield the same information:

SELECT PrivilegeId, Name
FROM PrivilegeBase
WHERE PrivilegeId = '7863e80f-0ab2-4d67-a641-37d9f342c7e3'

Now that you know what privilege is needed, you just need to verify which user is missing the privilege. While you can sometimes assume that the calling end user is the one, you can't always be sure, such as when actions are performed via code, plug-ins, or custom extensions. In such cases, you may want to do a query to find the name of the user CRM thought was trying to use the privilege. The following script will handle this:

SELECT SystemUserId, DomainName
FROM SystemUserBase
WHERE SystemUserId = 'e76c5f50-40b3-dc11-8797-0003ffb8057d'

TIP: If the user GUID is all zeros (00000000-0000-0000-0000-000000000000), the user is probably the SYSTEM account and this means the calling user was probably an account such as Network Service. System accounts do not typically get CRM roles; instead they are granted elevated privileges via the PrivUserGroup in Active Directory.

4. The fix needs to be identified and understood.

You can now go into CRM and check to see what roles the user has, as shown in Figure 7.

fig08.gif

Figure 7 User Roles in CRM

You can then drill down and see that the Salesperson role is indeed missing the Write on Account privilege as shown in Figure 8.

fig09.gif

Figure 8 Core Records of the Salesperson Role shows it’s missing the Write privilege under Account.

5. The fix needs to be applied and verified.

To fix the issue, you simply need to grant the Write privilege to this role and save it. Be careful to make sure you understand which other users this will affect. Once you apply the fix, you can ask the user to try again to verify the problem has been resolved. You should then disable tracing and call the case closed.

Wrapping Up

Troubleshooting Microsoft Dynamics CRM means following basic ground rules and a methodology that includes identifying the real issue, narrowing scope, isolating the issue, and understanding the fix. You'll find the DevErrors, Event Logging, and Tracing tools in Microsoft Dynamics CRM critical in your troubleshooting efforts.

Aaron Elder (Microsoft Dynamics CRM MVP) works for Ascentium, a technology consulting and interactive marketing agency. Visit the Ascentium blog at ascentium.com/blog/crm.