Using Visual Studio Team System 2005 to Measure Code Stability at Microsoft
Using Visual Studio Team System 2005 to Measure Code Stability at Microsoft
Note on IT
Published: November 28, 2007
The Microsoft Information Technology (Microsoft IT) department supports the development
and improvement of approximately 2,500 internal business applications. As part of
its mandate to improve the development of internal business applications, Microsoft
IT developed the Line Of Code Counter tool to measure code stability for various
development groups at Microsoft IT. This tool is a fast and easy-to-use code counting
tool that contains an algorithm to estimate defects. Project managers and developers
at Microsoft IT use this tool to obtain uniform and detailed software development
metrics.
|
Document Definition
|
Intended Audience
|
Products & Technologies
|
|
A Note on IT is a short, technically deep drilldown on a specific topic related
to Microsoft IT and is usually associated with an existing IT Showcase document.
A Note might illustrate how Microsoft IT performs a specific operational task step
by step or configures a hardware device or software application. It might also relate
details of a best practice or contain key information that customers regularly request
about Microsoft IT's operations.
|
Project managers, software developers, and software product testing managers.
|
- Microsoft Visual Studio 2005
- Microsoft Visual Studio Team System
- Microsoft Visual Studio 2005 Team Foundation Server
- Formalized development methodologies such as Agile/Scrum and TSP/PSP
- Source control systems
|
Introduction
Microsoft IT supports approximately 2,500 internal business applications. These
applications are a critical part of managing the day-to-day business operations
at Microsoft. These business operations include the following:
- Sales
- Marketing
- Software licensing and operations
- Services
- Other corporate functions such as human resources (HR), legal, and finance.
Microsoft IT employs approximately 10,000 engineers who develop and improve these
internal business applications. The OEM Information Technology (OEM IT) group, which
contains about 100 people, maintains approximately three million lines of code (LOC).
As part of its mandate to support internal business applications, Microsoft IT implemented
a program that is known as the Program Delivery Engineering Excellence strategic
initiative. The goal of this initiative is to improve code quality, developer productivity,
and the accuracy of the programming schedule and the programming budget for software
development at Microsoft IT.
To make these improvements, Microsoft IT created a series of metrics that it could
use to define guidelines for software development among all the development groups
at Microsoft IT. Additionally, Microsoft IT had to provide these groups with the
tools to gather the appropriate metrics.
This document describes how Microsoft IT used the Microsoft® Visual Studio® 2005
development system to create a tool to measure code stability. All the development
groups in Microsoft IT now use this tool to count lines of code and to measure code
stability. Additionally, this document describes how code changes, also known as
code churn, may affect the code stability of a programming project.
An IT Showcase webcast called How Microsoft IT Uses Visual Studio Team System 2005
to Measure Software Code Stability includes a discussion on this topic.
The webcast is available at
http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032355807&Culture=en-US.
Note: For security reasons, the sample names of forests, domains, internal
resources, organizations, and internally developed applications and files used in
this document do not represent actual names used within Microsoft and are for illustration
purposes only. In addition, the contents of this document describe how Microsoft
IT runs its enterprise data center. The procedures and processes included in this
document are not intended to be prescriptive guidance on how to run a generic data
center and may not be supported by Microsoft Customer Service and Support.
Background
Program managers at Microsoft IT face challenges that are similar to the challenges
that program managers may face at any other large software development organization.
These challenges are to accurately estimate the size of a project together with
determining the overall stability of the code in a programming project.
To overcome these challenges, program development at Microsoft IT has started moving
toward more formalized development methodologies and frameworks. These include the
following development methodologies:
- Waterfall
- Agile/Scrum
- Team Software Process/Personal Software Process (TSP/PSP)
Additionally, program development teams use Microsoft Visual Studio Team System
for operations such as source control, work item tracking, and reporting.
To improve and measure the software development process, Microsoft IT now uses the
following metrics:
- The project size as measured in the total number of LOC
- The project resources that have been assigned to the particular project
- A system of defect tracking to determine the overall success of the project
- The budget requirements for the project
Counting Lines of Code
The number of lines of code in a program is a standard method to describe the size
of a particular program. Although counting the lines of code may not be the most
accurate method to determine the size of a programming project, it has the advantage
of simplicity. Additionally, this method can be used among a broad spectrum of programming
languages.
By counting the lines of code in a project, a project manager can more easily track
code changes. This helps project managers to determine the overall stability of
a project's code.
Note: By tracking a trend of decreasing code changes in a project, a project
manager can determine whether the project is approaching completion.
Although counting the lines of code in a project has become a standard method to
track project size and completion, this method of project measurement has several
disadvantages. These disadvantages include the following:
- No standards exist to specify which lines of code should be counted toward a program's
size. Therefore, comments that appear throughout the code may be counted, code that
the program automatically generates may be counted, and blank lines may be counted.
- Counting lines of code is a tedious process. Additionally, the costs that are associated
with counting lines of code are often assigned to the project overhead budget.
- The physical code count may not indicate the effects of code changes in a project.
For example, code changes may not be counted. Therefore, the physical code count
may not accurately indicate the stability of the code in a project.
Note: A Microsoft Research document, Use of Relative Code Churn to Predict
System Defect Density, discusses the effects of code churn in a large programming
project. The research was performed against programming projects that contain millions
of lines of code. The results of this research indicate that code churn may increase
defect density in a programming project. To view this document, visit the following
Web site:
http://research.microsoft.com/research/pubs/view.aspx?type=Publication&id=1359.
Microsoft Line Of Code Counter Tool
To obtain consistent metrics for all its programming groups, Microsoft IT had to
provide managers and developers with a tool that was easy to use and that provided
a consistent framework in which to count lines of code in different programming
languages. This tool had to provide the following features:
- Separate measured code by the distinct programming languages that are used in the
project.
- Provide a set of common rules to determine which code is included or excluded from
the count.
- Easily connect to the various source control systems (repositories) that are used
at Microsoft IT.
To do this, Microsoft IT used the features and the functionality in Visual Studio
Team System to create the Line Of Code Counter tool. This tool is a flexible 32-bit
program that can be used as a stand-alone client or as a Visual Studio integrated
development environment (IDE) add-in. The tool has the following features:
- It handles many different programming languages.
- It performs many different kinds of code counts.
- It handles comments, system-generated code, blank lines, and code churn.
- It connects to many different repositories.
- It provides an estimated defect density that is based on code churn.
- It is customizable. A user can change the kinds of objects that are counted during
a counting task.
- It generates detailed reports. In addition, a user can export the report information
to a Microsoft Office Excel® worksheet or to a Portable Document Format (PDF)
file.
- It is fast. The tool can parse 10 million lines of code in less than one hour.
Standard for Counting
As the first step toward developing the Line of Code Counter tool, Microsoft IT
specified a standardized definition of the kinds of code that should be considered
as valid countable code for a code counting operation. For this definition, Microsoft
IT specified the following:
- The tool should count only physical lines of code and not logical lines of code.
- The tool should count code according to a set of customizable counting rules.
Table 1 shows the default rules that Microsoft IT has specified as the counting
standard for the tool.
Table 1. Default Counting Rules
|
Object
|
C#
|
ASP
ASPX
cascading style sheets
|
SQL
|
C++
|
Microsoft Visual Basic® .NET
|
XML
XSD
HTML
|
Scripting languages
|
|
User-written code
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
|
Generated code
|
No
|
No
|
No
|
No
|
No
|
No
|
No
|
|
Blank lines
|
No
|
No
|
No
|
No
|
No
|
No
|
No
|
|
Comments
|
No
|
No
|
No
|
No
|
No
|
No
|
No
|
|
Metadata
|
No
|
Not applicable
|
Yes
|
Not applicable
|
Not applicable
|
Yes
|
Not applicable
|
|
Re-used code
|
No
|
No
|
No
|
No
|
No
|
No
|
No
|
|
File types
|
.cs
|
.asp
.aspx
.css
|
.sql
.sp
.tbl
.vew
.fn
|
.cpp
.h
.idl
.def
|
.vb
.vbs
.frm
.cls
|
.xml
.xsd
.htm
.html
|
.js
.vbs
.cmd
|
The following list describes the objects that are defined in this table:
- User-written code. This includes declarations, compiler directives, export
symbols, variable assignments, and all executable source code. Additionally, this
code includes language syntax elements that appear on a line of their own.
- Generated code. This includes the code that is generated by Visual Studio
or by any other code generator.
- Comments. This includes single-line comments or multiple-line comments. If
the comment is part of a valid line of source code, the comment is counted together
with the source line.
- Metadata. This is the XML data that some programs use to pass messages among
subsystems or functions.
- Reused code. This represents any source code library that an IT application
team has not developed. This includes Enterprise Library, the .NET Software Development
Kit (SDK), a product SDK, third-party controls, and so on.
Customizing the Counting Standard
To obtain the greatest flexibility in the Line Of Code Counter tool, Microsoft IT
decided that the counter engine rules that are defined in the tool should not be
a static set of built-in rules. Instead, Microsoft IT created a mechanism to allow
for customization of the counter engine rules. This customization enables a project
manager or a developer to modify the kinds of code objects that are considered as
valid code for a counting task.
Therefore, the counter engine in the Line Of Code Counter tool references an XML
file to determine which code elements to exclude during a counting task. The entries
that are specified in this XML file modify the counting standard. By default, all
code elements that are not specified in the XML file are counted as valid code elements.
By adding code elements to the XML file or by removing code elements from the XML
file, a project manager or a developer can modify the kinds of code that are counted
during a counting task. The following code example shows the organization of the
XML file.
<?xml version="1.0" encoding="utf-8" ?>
<lineCounters version="1.0.0.4 ">
<!--
lineCounter Attributes:
name="[userReadableName]" - the name of the programming language
fileExtension Value:
To recognize the files associated with a programming language(s)
LineCounters can contain any number of fileExtension and/or codeArea elements
codeArea Attributes:
name="[userReadableName]" - to display in the counter result
isCode="[true|false]" - true if code that matches this codeArea is considered
for LOC
multiLine="[true|false]" - true if the code area spans more than one line.
Default is false
caseSensitive="[true|false]" - true if the expressions are case sensitive.
Default is true
description="[user readable description]"
multi line codeAreas must contain a startExpression element and an endExpression
element
single line codeAreas can contain any number of expression elements
by default, all lines are considered Lines of Code unless otherwise matched -->
<lineCounter name="C#">
<fileExtension>cs</fileExtension>
<codeArea name="Autogenerated Windows form code" isCode="false"
multiLine="true">
<startExpression>^\s*\#region Windows Form Designer generated code\s*$</startExpression>
<endExpression>^\s*\#endregion\s*$</endExpression>
</codeArea>
<codeArea name="Autogenerated Web form code" isCode="false"
multiLine="true">
<startExpression>^\s*\#region Web Form Designer generated code\s*$</startExpression>
<endExpression>^\s*\#endregion\s*$</endExpression>
</codeArea>
<codeArea name="Autogenerated Component designer code" isCode="false"
multiLine="true">
<startExpression>^\s*\#region Component Designer generated code\s*$</startExpression>
<endExpression>^\s*\#endregion\s*$</endExpression>
</codeArea>
<codeArea name="Blank lines" isCode="false">
<expression>^\s*$</expression>
</codeArea>
<codeArea name="// comments" isCode="false">
<expression>^\s*//.*$</expression>
</codeArea>
<codeArea name="/* */ comments" multiLine="true">
<startExpression>^\s*/\*.*$</startExpression>
<endExpression>[*]*\*/</endExpression>
</codeArea>
</lineCounter>
By default, when the Line Of Code Counter tool runs, it counts all code elements
as valid code. However, this XML file changes the default behavior of the counter
engine. The counter engine parses the following values in this file to determine
whether to count a code element during a counting task:
- fileExtension. This value is to identify the programming language that the
code element is a part of. The XML file may contain any number of fileExtension
values or code area statements for each programming language that is used.
- isCode. This value specifies whether the particular code element is counted.
If the isCode value is True, the code is counted. If the isCode value
is False, the code is not counted.
If a code element is not specified in this file, or if a code element that is specified
has an isCode value of True, the counter engine considers the element to
be a countable code element during a counting task.
This flexibility enables a project manager or a developer to customize the Line
Of Code Counter tool to measure other languages or to filter other code objects.
Code Retrieval
The Line Of Code Counter tool can connect to and retrieve code from the following
repositories:
- Microsoft Visual Studio 2005 Team Foundation Server
- The Microsoft Visual SourceSafe® version control system
- The file system
To enable the Line Of Code Counter tool to retrieve code from the various repositories,
Microsoft IT used the following Visual Studio 2005 application programming
interfaces (APIs):
- Microsoft.TeamFoundation
- Microsoft.TeamFoundation.Client
- Microsoft.TeamFoundation.Server
- Microsoft.TeamFoundation.Common
- Microsoft.TeamFoundation.VersionControl.Client
- Microsoft.TeamFoundation.VersionControl.Common
- Microsoft.TeamFoundation.WorkItemTracking.Client
By using the functionality that is available in these APIs, the Line Of Code Counter
tool can retrieve code based on the following characteristics:
- Latest version
- Work item number
- Change set number
- Difference between two file versions (Diff)
Additionally, Microsoft IT was able to create a user interface (UI) that abstracts
the connection to the particular repository. Figure 1 shows how this connection
appears in the Line Of Code Counter UI.
.jpg)
Figure 1. Connecting to a repository
Note: The Line Of Code Counter tool uses the credentials of the user who
is currently logged on to connect to the repository.
Count Types
To measure the source code in a repository, the Line Of Code Counter tool can perform
16 distinct count types. A user defines these count types by using the Count Type
area of a counter task. Figure 2 displays the Count Type area of the Line
Of Code Counter user interface.
.jpg)
Figure 2. Count Type selection
Table 2 lists the available count types among the various source control systems.
Table 2. Available Count Types
|
Count type
|
Visual Studio Team Foundation Server
|
Visual SourceSafe
|
File system
|
|
Latest version
|
Yes
|
Yes
|
Yes
|
|
Latest - Changeset
|
Yes
|
No
|
No
|
|
Latest - Work item
|
Yes
|
No
|
No
|
|
Latest - Label
|
Yes
|
Yes
|
No
|
|
Latest - Change ID
|
No
|
No
|
No
|
|
Diff - Date range
|
Yes
|
Yes
|
No
|
|
Diff - Changeset
|
Yes
|
No
|
No
|
|
Diff - Label
|
Yes
|
Yes
|
No
|
|
Diff - Change ID
|
No
|
No
|
No
|
|
Diff with Previous - Changeset
|
Yes
|
No
|
No
|
|
Diff with Previous - Change ID
|
No
|
No
|
No
|
|
Diff with Previous - Label
|
Yes
|
Yes
|
No
|
|
Diff - File system
|
No
|
No
|
Yes
|
To support the Diff count types, the Line Of Code Counter tool uses the powerful
differencing algorithm that is available in the Visual Studio Team Foundation Server
API. This algorithm lets the Line Of Code Counter tool quickly obtain the differences
between any two versions of a file. The following code example shows this functionality.
public static void DiffFiles(VersionControlServer versionControl,
IDiffItem source,
IDiffItem target,
DiffOptions diffOpts, // define a new data structure
String fileNameForHeader, // put in options
bool wait);
Report Generation
A critical feature of the Line Of Code Counter tool is its ability to generate detailed
and flexible reports. Microsoft IT requires flexible reports that provide separate
code counts that can be exported into Office Excel. Figure 3 displays the report
selection area of the Line Of Code Counter user interface.
.jpg)
Figure 3. Report selection user interface
The Line Of Code Counter tool generates the following kinds of reports:
- Standard reports. This option allows for the following report types:
- Task Summary report
- Summarize by Folder report
- Excluded LOC report
- Summarize by Language report
- Summarize by File report
- M Report. These kinds of reports are included in the Task Summary section
of a Standard report. The M Report report type uses information that the Use of Relative
Code Churn to Predict System Defect Density research document describes.
For more information about how to use the Line Of Code Counter tool to help predict
defect density in a software project, see the "Estimating Defect Density"
section of this document.
- PSP metrics. This option allows for the following report types:
Figure 4 displays part of a sample report that the Line Of Code Counter tool generates.
.jpg)
Figure 4. Sample report excerpt
This report excerpt contains code count information that a project manager or a
developer can use to determine the status of a project. Additionally, this report
excerpt contains information that can indicate whether a trend of decreasing code
changes exists. This kind of trend may indicate that a project is approaching code
stability.
Estimating Defect Density
A critical part of project development that Microsoft IT performs is the assignment
of resources for program testing prior to deploying a software product in the production
environment. By estimating the potential for software defects, Microsoft IT can
more accurately gauge the testing resources and the testing budget that must be
allocated for a particular project.
Defect Density Algorithm
To develop an algorithm to estimate defect density, Microsoft Research analyzed
project management data from large programming projects. This algorithm uses the
following version control history for a file:
- The number of times that the selected files has been modified
- The time period in which the modifications have occurred
- The number of files that were actually modified
By using this information together with the code churn measurements (added lines
of code, modified lines of code, deleted lines of code), Microsoft Research developed
relative metrics together with an algorithm that calculates the potential number
of defects per 1,000 lines of code (KLOC). To develop this algorithm, Microsoft
Research applied a multiple regression technique to the relative code churn metrics.
This algorithm has been used to estimate software defect density with an accuracy
of 89 percent.
Defect Density Reports
The report except illustrated in Figure 5 shows defect density metrics.
.jpg)
Figure 5. Estimated defect density information
Note: The estimated defect density information that this figure shows is
generated from the code churn information that appeared earlier in Figure 4.
The information that appears in this area of the report is the M Report information
that is generated in the Task Summary section of a Standard report. This report
contains relative code churn measures that are identified by M1 through M8. Each
of these values is described in the "Relative Code Churn Measures" section
of the Use of Relative Code Churn Measures to Predict System Defect Density
research document. To view this document, visit the following Microsoft Web site:
http://research.microsoft.com/research/pubs/view.aspx?type=Publication&id=1359.
By applying a multiple regression algorithm to the M1 through M8 values, the Line
Of Code Counter tool calculates an estimated defect density per KLOC.
Project managers in Microsoft IT use this information to budget for testing resources
and to better estimate project completion dates. This information makes assessing
the status of a programming project much easier.
Conclusion
Microsoft IT has used Visual Studio Team System to develop an easy-to-use and flexible
tool to improve the software development process for its internal development groups.
This Line Of Code Counter tool has the following primary benefits for the project
teams in Microsoft IT:
- Simplicity and flexibility. The tool is not only for project managers and
developers. The tool is easy to use for any team in Microsoft IT. Additionally,
the tool is customizable for many different kinds of code counting operations.
- Speed. The tool minimizes the time that is required to obtain metrics. Because
the tool performs a counting task so quickly, teams are more likely to use it.
- Standardization. Because all the development groups in Microsoft IT can use
the tool, obtaining consistent and similar code metrics is easier.
- Predictability. The tool improves the implementation of development methodologies,
such as TSP/PSP. Counting added, modified, and deleted lines of code is an important
aspect of the TSP/PSP proxy-based estimation process. The tool helps prevent project
estimation errors. Also, project managers can predict the required testing resources
more accurately. This ability helps to achieve project delivery goals.
The Line Of Code Counter tool is available as a free download from Microsoft. To
obtain the Line Of Code Counter tool, visit the following Microsoft Web site: http://download.microsoft.com/download/D/D/8/DD894C7B-9E15-4981-8936-AF7CBEE66646/LOCCounter.zip.
For More Information
For more information about Microsoft products or services, call the Microsoft Sales
Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information
Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact
your local Microsoft subsidiary. To access information through the World Wide Web,
go to:
http://www.microsoft.com
http://www.microsoft.com/technet/itshowcase