Notification Services Capacity Planning and Performance Tuning
Notification Services Product Team
Microsoft® SQL Server™ Notification Services
Summary: Discover performance data for standard Notification Services components on three common system designs. When you complete this document, you will have a clear idea of the capabilities of Notification Services, common performance bottlenecks, and ways that you can design and test your own systems and Notification Services applications for optimal performance. (30 printed pages)
Notification Services applications manage subscriber and subscription data, collect events, generate notifications, and format and distribute the notifications. Overall application performance is a combination of the performance for each of these application functions, plus the performance of the database system that stores and processes the application data.
Microsoft® SQL Server™ Notification Services is a platform for developing and deploying applications that generate notifications, format them, and send the resulting messages to subscribed users. Notifications are personalized, timely messages that can be sent to a wide variety of devices.
A Notification Services application provides an interface for managing subscriber and subscription data and multiple ways of collecting events from external sources. To generate notifications, Notification Services performs matches between subscriptions and event data. After generating notification data, Notification Services formats the notification and distributes it to a delivery channel, such as an SMTP gateway.
A Notification Services application has several major components: the subscriber and subscription management application, the event collector, the generator that creates notifications, the distributor that formats and sends notifications, and the databases that store application data.
Figure 1. Notification Services architecture
The performance of a Notification Services application is determined by how quickly Notification Services can process subscription, subscriber, and event data, how quickly it can run the match rules that generate notifications, and how quickly it can format and distribute notifications. The following factors affect performance:
- Application settings, as defined in the application definition file (ADF)
- External factors, such as the availability of external delivery systems
- System design: the hardware and software that support the application
This document provides performance data and system design recommendations to help you:
- Choose optimal application settings for the Notification Services components and for the databases.
- Design and configure the system for optimal performance. We provide performance data for all major components of a Notification Services application, using multiple settings and system designs, so you can estimate how a well-configured system should perform. We also provide system design guidelines for optimal performance.
- Measure the performance of your Notification Services application. When you design an application, you should perform a series of tests to determine how well the application is running and to establish baseline measurements for system performance.
- Troubleshoot performance problems. We provide solutions for common performance bottlenecks.
When you complete this document, you will have a clear idea of the capabilities of Notification Services, common performance bottlenecks, and ways that you can design and test your own systems and Notification Services applications for optimal performance.
As you prepare to build a Notification Services application, you want an idea of the performance you can expect from that application. Your application can use standard components provided with Notification Services, or it can use custom event providers, content formatters, and delivery protocols built for the application. Each component choice has performance implications. For example, the standard File System Watcher event provider can process thousands of events per second, depending on system design. However, a custom event provider that uses the Event and EventCollector classes might only be able to submit hundreds of events per second.
In the Performance Data section, we provide performance estimates for each of the major Notification Services components. In each component subsection, we show the results of a series of performance tests based on settings you might choose when building your own application.
Each component subsection provides the following information:
- An introduction to the component test
- A description of the set of tests we ran
- The results of all the performance tests
- Conclusions about component performance
- Recommendations for improving performance
First, we present information about the systems we used to run these tests.
Hardware Specifications for Performance Testing
An instance of Notification Services Standard Edition is deployed on one server. An instance of Notification Services Enterprise Edition can be deployed on one server or can be scaled out using multiple servers. The performance of your Notification Services application will depend on the number of servers as well as the CPU, memory, and disk resources available.
We ran all of our performance tests on the following three systems. We chose these systems to illustrate the performance differences between three common system designs.
- The system named DualProc, is a single server with two 1.7-gigahertz (GHz) Pentium 4 processors and 2 gigabytes (GB) of RAM. The Notification Services instance, the SQL Server instance that hosts the databases, and the subscription management application are all located on the server. All program files, data files, database files, and log files are located on a single SCSI disk.
Figure 2. DualProc system design
- The QuadProc system is a single server with four 700-megahertz (MHz) Pentium III processors and 4 GB of RAM. The Notification Services instance, the SQL Server instance that hosts the databases, and the subscription management application are all located on the server. This configuration uses three SCSI disks on a high-end RAID controller. The program files, data files, and tempdb are located on one disk; the instance and application database files are located on a second disk; and the log files are located a third disk.
Figure 3. QuadProc system design
- The Combo system is a combination of the two systems above. SQL Server is located on the four-processor server; the program files and tempdb are located on one SCSI disk; the instance and application database files are located on a second SCSI disk; and the log files are located on a third SCSI disk. The Notification Services program files and data files, and the subscription management application, are located on the only disk of the two-processor server.
Figure 4. Combo system design
Running the tests on these three systems shows whether the performance is significantly affected by processing and disk resources, and whether separating the instance from the databases significantly affects performance.
Subscriber Management Performance
The Notification Services API contains classes for managing subscriber information (such as subscriber name and the notification delivery address). Any application, such as a Web page, that manages subscriber data uses these classes to add, update, and delete subscriber records in the instance database. The subscription management application can be located on the same server as the databases or on a remote server.
Subscriber Management Testing Methodology
To obtain performance data for subscriber management, we used a C# subscription management application. This application was located on the same server as the Notification Services instance. We ran the following tests using this application:
- Add 100,000 subscribers with randomly selected names to an empty application. This test approximates the task of adding an initial set of subscribers to an instance of Notification Services.
- Add an additional 10,000 subscribers with randomly chosen names to determine how many subscribers can be added per second to an existing instance of Notification Services.
- Delete 10,000 randomly chosen subscribers from a set of 110,000 subscribers to determine how many subscribers can be deleted per second from an existing instance of Notification Services.
- Update 10,000 randomly chosen subscribers from a set of 100,000 subscribers to determine how many records can be updated per second in an instance of Notification Services.
These tests used the three hardware configurations noted earlier in this paper: DualProc, QuadProc, and Combo. On the DualProc and QuadProc systems, the subscription management application was located on the same server as the Notification Services instance and the databases. On the Combo system, the databases were located on the QuadProc server and the subscription management application was located on the DualProc server.
For each test, we measured the elapsed time to complete the operation. We then computed the rate at which subscribers were added, deleted, and updated. All results except for "Combo, Multithreaded" are for a single-threaded client.
Subscriber Management Test Results
The following graph shows how many subscribers were loaded, added, deleted, or updated per second on our three test systems. On the Combo system, we ran the test using a single-threaded and a multithreaded application.
Figure 5. Results for adding, updating, and deleting subscriber data
The test results show that when the database is on the QuadProc server (the QuadProc and Combo systems), performance is best. Also note that while the Combo system initially had mediocre performance, when the subscriber management application is multithreaded, the performance when loading or adding subscribers improves significantly.
In a typical subscription management application, the performance for deleting and updating subscribers is lower than that for loading and adding subscribers. When deleting or updating an existing subscriber record, the application first gets the subscriber object (which involves a round-trip to the database server), and then deletes or updates the subscriber record, which involves another round-trip to the database server.
When looking at the performance counters for this application, it was apparent that the primary performance bottleneck on the DualProc system was the disk subsystem, not the processing resources.
Subscriber Management Test Conclusions
Based on the subscriber management results, we made the following conclusions about subscriber management performance:
- The disk subsystem is the primary factor when determining the rate at which subscribers can be added, deleted, or updated.
- The number of CPUs is not a critical factor when managing subscribers.
- If a single-threaded subscription management application is located on a separate server, there is a decrease in subscriber management performance. The decline in performance is due to the overhead of accessing the databases over the network.
- Deleting and updating subscribers is slower than adding subscribers; this is primarily because when updating and deleting subscribers, the application must first locate the record, and then update or delete it.
Recommendations for Managing Subscribers
Managing subscribers is primarily a database operation. Follow the SQL Server guidelines for improving database performance to improve performance when managing subscriber data. In particular, the following recommendations are key:
- Make sure your database log files are on a high-performance disk.
- Size the instance database appropriately so that the database does not have to autogrow when subscribers are added.
- If the performance of your subscription management application is a concern, use multiple subscription management applications or a multithreaded subscription management application.
Notification Services creates indexes on the subscriber data, so you do not need to create indexes on this data to optimize subscriber management performance.
Subscription Management Performance
Just as with subscriber management, the Notification Services API contains classes for managing subscriptions. Applications that manage subscriptions use these classes to add, update, and delete subscriptions in application databases. The subscription management application can be located on the same server as the databases, or on a separate server.
Subscription data contains the subscriber ID, information about what the subscriber is interested in (such as a stock symbol and trigger price), and possibly some schedule information that indicates when a notification should be delivered.
Subscription Management Testing Methodology
To obtain performance data for subscription management, we used a C# subscription management application. This application was located on the same server as the Notification Services instance. We ran the following tests using this application:
- Load 100,000 unscheduled (event-driven) subscriptions to an empty application, one subscription per subscriber. Vary the size of the subscriptions to determine how subscription size affects performance. This test approximates the task of adding an initial set of subscriptions to an application.
- Load 100,000 subscriptions to an application using a subscription size of 200 characters, but varying the type of subscription (event-driven or scheduled). This test determines whether there is a performance impact for scheduled subscriptions.
This series of tests approximates the number of subscriptions that can be added per second, depending on subscription size and subscription type. The size of the subscription (number of characters) varies with the application. Additionally, a scheduled subscription has some overhead.
In this section, we are not providing performance numbers for adding, updating, or deleting subscriptions. Based on the data we collected, the rate at which you can add additional subscriptions is very close to the rate at which you can load the initial subscriptions. There is some performance degradation when updating or deleting subscriptions, but this is similar to the degradation of updating or deleting subscribers, which is covered in the previous subsection.
These tests used the three hardware configurations DualProc, QuadProc, and Combo. On the DualProc and QuadProc systems, the subscription management application was located on the same server as the Notification Services instance and the databases. On the Combo system, the databases were located on the QuadProc server and the subscription management application was located on the DualProc server.
For each test, we measured the elapsed time to complete the operation. We then computed the rate at which subscribers were added. All results are for a single-threaded client.
Subscription Management Test Results
The following sections show the results of the two subscription management tests.
Test 1: How Subscription Size Affects Performance
Three sets of 100,000 event-driven subscriptions were added to an application. In each set, all subscriptions were the same size: either 20, 200, or 1000 characters per subscription. The following graph shows how many subscriptions could be added per second for each subscription size on each test system.
Figure 6. Results for adding subscriptions of various sizes
The test results show that as the subscription size gets larger, the application is able to add fewer subscriptions per second. The QuadProc is consistently the fastest system, because it has a fast disk subsystem and the databases are local to the subscription management application. The Combo has a fast disk subsystem, but the subscription management application is on a separate server from the databases. The DualProc configuration with the single disk is the slowest performer.
Test 2: How Subscription Type Affects Performance
Two sets of 100,000 subscriptions were added to an application, and each subscription contained 200 characters of data. The first set contained event-driven subscriptions and the second set contained scheduled subscriptions. The following graph shows how many subscriptions could be added per second for each subscription type on each test system.
Figure 7. Results for adding event-driven and scheduled subscriptions
The results show there is a minimal decline in performance when using scheduled subscriptions.
Subscription Management Test Conclusions
Based on the subscription management test results, we made the following conclusions about subscription management performance:
- Performance decreases as the subscription size increases.
- There is little performance difference between adding scheduled versus event-driven subscriptions.
- Database performance is important for subscription management performance. The systems that had the databases on the QuadProc server performed the best.
- Subscription management performance is best when the subscription management application is on the database server. (However, this is often not the best place for a subscription management application because it consumes resources from the database server and because you usually do not want database servers directly accessible on the Internet; as a result, subscription management applications are often located on a separate Web server.)
Recommendations for Managing Subscriptions
Managing subscriptions is primarily a database operation. Follow the SQL Server guidelines for improving database performance to improve performance when managing subscription data. In particular, the following recommendations are key:
- Make sure your database log files are on a high-performance disk.
- Size the application database appropriately so that the database does not need to autogrow when subscriptions are added. (Your application database must also be large enough to store the volume of unvacuumed events, notifications, and other application data that will accumulate.)
- Create the proper indexes on your subscription data.
- If you want the highest performancefrom your subscription management application, use multiple subscription management applications or a multithreaded subscription management application. When we used a multithreaded client we saw a 50% to 100% performance improvement compared to a single-threaded client.
Event Provider Performance
Event providers are components that submit events to Notification Services applications. Event provider performance is important for Notification Services applications that receive very large numbers of events.
Event providers use one of three methods to collect and submit data to the application database:
- Event providers can use an EventLoader class to submit an XML document or a Stream object as a source of events and write the events to the event table.
The standard File System Watcher event provider uses the EventLoader class to submit an XML document from a named folder to an application.
- Event providers can use SQL Server stored procedures to write data from another database table or query to the event table.
The standard SQL Server event provider uses stored procedures to gather events through a user-defined query and submit them to a Notification Services application.
- Event providers can use the Event Class to hand off events to an event collector, and then use the EventCollector class to commit the set of events as a batch.
Each of these methods has its own performance implications. The Event class does not require the events to be available in a document or in a database, but when you use the Event class, events are written one at a time to an event collector, and then committed in a batch. Depending on the application, this can be a good solution, although you might not be able to process as many events per second.
Event Provider Testing Methodology
To obtain performance data for event providers, we ran the following test on the File System Watcher event provider, the SQL Server event provider, and a custom event provider that uses the Event and EventCollector classes:
- Vary the event size while keeping the number of events per batch the same, to determine how event size affects performance. Individual tests use 25, 50, 100, 250, 500, 1000, or 2000 characters per event, and each event batch contains 10,000 events.
- Vary the event batch size while using a constant event size, to determine how event batch size affects performance. Individual tests use 1, 10, 100, or 1000 events per batch, and each event contains 250 characters.
All event provider tests use Notification Services Enterprise Edition; performance is comparable with Notification Services Standard Edition.
Event Provider Test Results
The following sections contain the results of the event provider tests. There are three event providers, and two tests per event provider. The test results are grouped by event provider type.
File System Watcher Event Provider Tests
This section contains the results of running the event provider tests against the File System Watcher event provider. The File System Watcher monitors a folder for events submitted in XML files.
Test 1: How Event Size Affects File System Watcher Event Provider Performance
Files containing XML event data, each containing 10,000 events, were dropped to the event collection folder. In each file, all events were the same size: either 25, 50, 100, 250, 500, 1000, or 2000 characters per event. The following graph shows how many events were collected per second for each event size on each test system.
Figure 8. File System Watcher results for adding events of varying sizes
The results of this test show that performance declines as the event size increases. It also shows that performance is best when the databases are on the four-processor server (in the QuadProc and Combo systems). The performance was lowest on the DualProc system.
Note that the File System Watcher submits all events in one batch. When the event provider is on a remote server (not the database server), networking overhead might be a consideration. However, because all the events are submitted at once, networking overhead has a minimal impact on performance.
Test 2: How Batch Size Affects File System Watcher Event Provider Performance
Files containing XML event data were dropped to the event collection folder. In each file, all events contained 250 characters of data. Individual files contained 1, 10, 100, 1000, or 10,000 events; all events in a file were submitted in one batch. The following graph shows how many events were collected per second for each batch size on each test system.
Figure 9. File System Watcher results for adding events in varying batch sizes
The results of this test show that as the batch size increases, more events can be processed per second. This is true for all systems for up to about 1000 events per batch. On the DualProc system, performance levels off at 1000 events per batch; performance continues to improve on the other systems.
SQL Server Event Provider Tests
This section contains the results of running the event provider tests against the SQL Server event provider. The SQL Server event provider uses a stored procedure to run a Transact-SQL query. The query returns data, which the event provider then submits to the application.
Test 1: How Event Size Affects SQL Server Event Provider Performance
For each iteration of this test, a Transact-SQL query produced 10,000 events of the same size. Individual tests used 25, 50, 100, 250, 500, 1000, or 2000 characters per event. The following graph shows how many events could be collected per second for each event size on each test system.
Figure 10. SQL Server results for adding events of varying sizes
The results of this test show that performance declines as the events get larger. When the databases are located on the DualProc server with the single disk, performance is significantly lower. This is because the events are being gathered from and submitted to database tables, so the performance of the database system greatly affects the performance of the SQL Server event provider.
Test 2: How Batch Size Affects SQL Server Event Provider Performance
For each iteration of this test, a Transact-SQL query produced events containing 250 characters of data. Queries for individual tests produced an event batch size of 1, 10, 100, 1,000, or 10,000 events. The following graph shows how many events were collected per second for each batch size on each test system.
Figure 11. SQL Server results for adding events in varying batch sizes
The results of this test show that the SQL Server event provider is more efficient with large event batches than with small event batches.
Event and EventCollector Class Tests
This section contains the results of running the event provider tests using the Event and EventCollector classes to programmatically submit events to an application.
Test 1: How Event Size Affects Event Class Performance
For this test, we used the Event class to create 10,000 events of the same size and then submit the events as a batch using the EventCollector class. In individual tests, the events contained 25, 50, 100, 250, 500, 1000, or 5000 characters per event. The following graph shows how many events were collected per second for each event size on each test system.
Figure 12. Event class results for adding events of varying sizes
Using the event classes, fewer events are processed per second than with the other event providers tested. However, as with the other event providers, the larger the events, the fewer you can submit per second.
For the event providers that use the event classes, locating the Notification Services instance on a separate server from the databases has a negative impact on performance. This is because the event provider submits individual events, which adds networking overhead for scaled-out systems. The SQL Server event provider runs on the database server, so there is no significant network overhead for this provider. The File System Watcher event provider submits a batch of events in one XML document, so the network overhead for this provider is minimal.
Test 2: How Batch Size Affects Event Class Performance
For this test, we used the Event class to create events containing 250 characters and then submit event batches containing 1, 10, 100, 1000, or 10,000 events using the EventCollector class. The following graph shows how many events were collected per second for each event batch size on each test system.
Figure 13. Event class results for adding events in varying batch sizes
Fewer events are processed per second than with the other event providers tested. However, event providers that use the event classes are relatively efficient at processing small batches. When each batch contains just a few events, this event provider demonstrates performance comparable to the SQL Server event provider, and better than the File System Watcher event provider.
Note that the QuadProc and Combo systems had almost identical performance. Using the multithreaded application on the Combo server improved performance significantly from the previous test.
Event Provider Test Conclusions
The results of the event provider tests lead to the following conclusions about event provider performance:
- Event collection becomes more efficient as the batches contain larger numbers of events.
- Event collection is more efficient when individual events are smaller.
- The File System Watcher and SQL Server event providers provide similar performance. The event classes are significantly slower than the standard event providers unless the batches contain only a few events.
- Hosting the databases on a robust system with ample processing power, and locating the data and log files on separate physical disks, improves event collection performance.
- If an application located on a remote server uses the event classes to submit events, the overhead of writing the individual events reduces performance. However, this can be overcome by using a multithreaded event collection application.
Recommendations for Gathering Events
Based on these tests and general recommendations from Microsoft, do the following to improve event provider performance:
- Ensure that the database system has adequate processing power and RAM. You can use the SQL Server performance counters to ensure that your system is not running out of processing resources or memory.
- Ensure that the database data and log files are located on separate physical disks.
- Use the right event provider for the job:
- If the event data is located in XML files, use the File System Watcher event provider. The File System Watcher is more efficient than the event classes.
- If the event data is located in database tables, use the SQL Server event provider. It is the fastest event provider for this job.
- If you need to submit individual events using an application, use the event classes.
For more information on building a custom event provider, see "Developing a Custom Event Provider" in Notification Services Books Online.
Notification Services generates notifications by running one or more rules to match subscriptions to event data. These rules are Transact-SQL statements, which allows for very flexible rule definition.
Notification Services fires a rule either when a new event batch arrives or according to a schedule defined for the rule. Typically, a rule firing produces one batch of raw notification data.
In Notification Services Enterprise Edition, the application developer can define a maximum notification batch size, which reduces batch sizes and produces more notification batches. Multiple distributors can process these batches in parallel, which can improve formatting and distribution performance.
Some applications use more than one rule to generate notifications. An application developer might define notification generation rules to support various subscription types, or to optimize performance by allowing the same rule to run in parallel using multiple generator threads.
Generator Testing Methodology
The performance tests in this section measure generator performance when varying the notification batch size and when varying the number of generator threads.
To evaluate notification generation performance, we performed the following tests on each of the three systems described earlier, using relatively simple event rules:
- Measure generator performance when generating 10,000 notifications from one batch of events using one generator thread. Vary the number of resulting notification batches from 1 to 32. Use the results to determine whether performance is affected by generating more, but smaller, notification batches.
The <NotificationBatchSize> element limits the number of notifications that can be included in a batch. Increasing the number of notification batches might improve distribution performance by allowing multiple distributors to work on multiple notification batches simultaneously.
- Measure generator scalability by using eight event rules, each generating one batch of 2,500 notifications (for a total of 20,000 notifications). Vary the number of generator threads to see how the generator performs under different settings. (Multiple generator threads enable the application to run rules in parallel.)
Because the second test requires Notification Services Enterprise Edition, we used Enterprise Edition for both tests.
Generator Test Results
The following sections show the results of the generator tests.
Test 1: Measure Generator Performance When Producing Various Numbers of Notification Batches
In this test, we generated 10,000 notifications. The following graph shows the performance of the generator when generating 1, 2, 4, 8, 16, and 32 batches out of the 10,000 notifications.
Figure 14. Generator results with varying batch sizes
On all systems, performance declined as the number of batches increased. However, the decline was relatively small.
The DualProc system could generate approximately 850 notifications per second when producing only one batch of notifications. This number declined to about 650 notifications per second when producing 32 batches of notifications.
The performance of the QuadProc and Combo systems was very similar. Both systems could generate about 1100 notifications per second when producing only one batch of notifications. The number declined to 950 notifications per second when producing 32 batches.
This test shows that the overhead of creating more batches is low, and that spreading the database data and log files over multiple disks, as is done on the QuadProc server, improves performance.
Test 2: Measure Generator Scalability
For this test, we generated eight batches of 2,500 notifications each using eight event rules; we ran the test four times, with each test using 1, 2, or 4 generator threads, or a number of threads determined by Notification Services (the 0 value). The following graph shows the performance of the generator for each generator threadpool setting on each test system.
Figure 15. Generator results with varying threadpool sizes
The best performance occurs when the number of generator threads is set equal to or double the number of processors on the server that hosts the databases. On the DualProc system, the database is running on a server with two processors and only one physical disk; performance is significantly lower on this system.
For more information about how Notification Services prioritizes and fires rules, see "Generator Settings Considerations" in Notification Services Books Online.
Generator Test Conclusions
The results of the generator tests lead to the following conclusions about generator performance:
- Generator performance is dependent on processing power and on the performance of the database system's disk subsystem.
- Setting a maximum notification batch size to create more notification batches has a relatively minor impact on generator performance.
- Increasing the number of generator threads can significantly improve performance.
For more information about optimizing database performance, see "Writing Efficient Notification Generation Queries" in Notification Services Books Online.
Recommendations for Generating Notifications
Based on these tests and general recommendations from Microsoft, do the following to improve generator performance:
- Analyze the performance of your rules by running the NSNotificationBatchDetails reporting stored procedure; use the GenerationTimeInMS value. Rule firing performance has the biggest impact on notification generation.
- Optimize the performance of your rules by using indexes on the event and subscription tables. You can use the SQL Server Index Tuning Wizard to determine which indexes can improve the performance of your rules.
- When determining a limit for notification batch sizes, balance the efficiencies of scale with the ability to share work among multiple distributors. For example, if your system uses two distributors, you might want to aim for four batches of notifications per generator rule firing so that the workload can be balanced across the distributors. However, very small batch sizes are inefficient. Also remember that digest and multicast delivery work within notification batches, so if you produce too many batches, you will not use digest or multicast delivery efficiently.
- When possible, run multiple rules in parallel on a multiprocessor system. Configure the number of threads using the <ThreadPoolSize> value in the ADF. (This is configurable with Notification Services Enterprise Edition). A good rule of thumb is to use between one and two times the number of CPUs on the database server. Set the <ThreadPoolSize> value to 0 (zero) to let Notification Services determine the optimum value.
After Notification Services generates a batch of notifications, the batch is ready for the distributor. The distributor partitions the batch into work items, which are notifications from a notification batch that will be sent through the same delivery channel. The distributor then formats and distributes the notifications. Separating a batch into work items allows the distributor to efficiently process notifications in parallel.
When the distributor formats notifications, it takes the raw notification data and applies formatting based on the destination device and locale. The formatted notifications are distributed through delivery channels and handed off to external delivery services. A delivery channel specifies a delivery protocol, such as SMTP, and the delivery information, such as the address and authentication information.
The performance of a Notification Services application is typically limited by the choice of delivery protocol. Notification Services can usually generate and format notifications much faster than any delivery protocol can deliver them.
Distributor Testing Methodology
To evaluate distributor formatting and distribution performance, we performed the following tests on each of the systems:
- Format 10,000 notifications of various sizes using the XSLT content formatter. We discarded the formatted notifications without delivering them so we could focus solely on formatter performance.
- Distribute 5000 notifications using each of the three standard delivery protocols to determine how many notifications can be distributed per second on the test systems.
- Determine the performance impact of digest delivery when formatting notifications. We formatted 10,000 notifications and varied the number of notifications to be included within a digest message from 1 to 10. Again, we discarded the formatted notifications without delivering them, to focus solely on digest formatter performance.
- Determine the performance impact of multicast delivery by sending 2000 notifications using the SMTP protocol while varying the multicast ratio from 1 to 100 notifications per multicast message.
Distributor Testing Results
The following sections show the results of the four distributor tests.
Test 1: Formatting Notifications of Various Sizes using the XSLT Formatter
For this test, we formatted 10,000 notifications using the XSLT content formatter. In individual tests, all formatted notifications were the same size: either 25, 50, 100, 250, 500, 1000, or 2000 characters per message. The <ThreadPoolSize> value was set to one so that only one distributor thread was running on each of the test systems. The following graph shows how many notifications were formatted per second on each test system.
Figure 16. Formatting results with varying notification sizes
The results of this test show that as the size of the formatted notification increases, the content formatter produces fewer formatted notifications per second. On our test systems, the XSLT content formatter was able to format between 400 and 500 notifications per second when the formatted notification size was 500 bytes or less. If the formatted notification size was 2000 characters, the XSLT content formatter was able to format between 100 and 200 notifications per second.
Test 2: Distributing Notifications Using Three Built-In Delivery Protocols
We distributed 5000 notifications with minimal formatting using a single-threaded distributor. In individual tests, we distributed the notifications using one of the three standard delivery protocols: File, HTTP Extension, and SMTP.
The File protocol simply writes the formatted notifications to a file at a specified location. For this test, the file was located on the local server. The HTTP extension protocol posts a file to a Web server using HTTP. The SMTP protocol routes resulting notifications to an SMTP server.
The following graph shows how many notifications were formatted per second for each delivery protocol on each test system.
Figure 17. Distributor results for standard delivery protocols
This test shows that the File protocol is the most efficient, followed by the HTTP extension, and then SMTP. However, the delivery protocols used by an application are often determined by user requirements, not on protocol efficiency. Knowing how efficient a delivery protocol is at delivering notifications will help you plan the capacity for your system. If an application requires greater capacity, you can scale out distribution. For more information, see "Case Study: Stock Trading Application" later in this paper.
Note that increasing the number of distributor threads can improve performance. In our tests, there was a 25% to 50% improvement for the HTTP extension and SMTP delivery protocols when the distributor was multithreaded.
Test 3: Formatting Notifications Using Digest Delivery
In this test, we turned on digest delivery and formatted 10,000 notifications using the XSLT content formatter and a single-threaded distributor. Each notification was 200 characters when formatted. Individual tests used different notification-to-digest message ratios: either 1, 2, 3, 4, 5, or 10 notifications per digested message. This was done by modifying the size of the notification batch.
The following graph shows how many notifications were sent per second for different notification-to-digest message ratios.
Figure 18. Distributor results for digest delivery
This test shows that using digest delivery reduces formatting performance, especially when there are relatively few notifications per digested message. However, digesting can be advantageous when using an expensive delivery protocol such as SMTP because digesting decreases the number of messages that the delivery protocol must send.
Test 4: Formatting and Distributing Notifications Using Multicast Delivery
In this test, we turned on multicast delivery, and then formatted and distributed 2000 notifications using the SMTP delivery protocol with a single-threaded distributor. Individual tests used various notification-to-multicast message ratios. The first test sent one message to one recipient. The second test sent one message to two recipients: the distributor formatted a notification once and sent it to two subscribers. Subsequent tests sent one message to 4, 10, and 100 recipients.
The following graph shows how many notifications were formatted and distributed per second on each test system.
Figure 19. Distributor results for multicast delivery
The results of this test show the dramatic benefit of using multicast with expensive delivery protocols like SMTP. Note that not all delivery protocols support multicast delivery.
Distributor Testing Conclusions
The results of the distributor tests lead to the following conclusions about distributor performance:
- The size of the formatted notification affects performance. The larger the message, the fewer that can be formatted per second. To improve formatting performance, reduce the amount of work that the formatter must do.
- Of the built-in delivery protocols, the File delivery protocol is the most efficient, followed by the HTTP extension protocol, and then SMTP. Often you must use one particular delivery protocol. In the cases of HTTP and SMTP, be aware of the performance implications, and consider using multiple distributors.
- Digesting notifications has a negative impact on formatter performance when few messages are included in each digested message. When possible, increase the number of notifications included in each digest message by increasing the notification batch size limit. Remember that when using digest delivery effectively, fewer messages need to be distributed, which has a positive performance impact for expensive delivery protocols.
- Multicast delivery has a positive impact on performance, especially when more messages are included in each multicast. This is especially useful for expensive delivery protocols. The same message will be formatted and distributed for multiple subscribers, which can greatly improve distributor performance.
Recommendations for Notification Formatting and Delivery
Be aware that peak delivery periods and latency requirements can have a significant impact on your application design. Consider the following when designing your applications:
- If the rate of notification generation and delivery is relatively consistent throughout the day, your primary concern is the average delivery rate of the delivery protocol.
- If there are peak periods for notification generation and delivery, you must plan your system around these peak loads.
- If your application has strict latency requirements for delivering notifications, you must plan your system around these requirements.
To improve the formatting and distribution performance of your application, follow these recommendations, which are listed in order of importance:
- Pick the right delivery protocol. Delivery protocol performance is usually the critical factor controlling the performance of your system. For example, an SMTP mail server might process 20 messages per second, while using HTTP posting might process over 200 messages per second.
- Increase the number of distributors. Notification Services Enterprise Edition allows you to improve performance by configuring two or more distributors. For example, by increasing the number of distributors from one to three, we increased the number of messages posted using HTTP from 250 per second to 700 per second. When you configure multiple distributors, modify the <NotificationBatchSize> value in the ADF to ensure that the generator produces smaller batches. This allows the distributors to work on different batches and thus share the work more evenly.
- If appropriate, use digest delivery or multicast delivery. While the use of these options is application- and protocol-dependent, digesting messages or using multicast can reduce the formatting and distribution load.
- Increase the number of distributor threads. If the content formatter is expensive compared to the delivery protocol, increasing the number of distributor threads can improve performance. If the distributor is running on a single server, set the distributor <ThreadPoolSize> value to 0 (Enterprise Edition) or between 1 and 3 on Standard Edition. If the distributor is running on multiple servers, use fewer distributor threads so that the distributors will balance their workloads. For example, you might try setting the <ThreadPoolSize> value equal to the number CPUs on the server.
- If content formatting is complex, reduce the complexity or increase the CPU resources. In most applications, the content formatters do relatively simply formatting and consume relatively little CPU time. If your formatting is very complex, you can improve performance by running it on a system with fast processors.
- Improve database server performance. Each time Notification Services attempts to deliver a notification, it updates the status of the notification in the database to reflect the results of the delivery attempt. This may result in significant database update activity. You can improve performance by placing the database log files on a dedicated disk or a RAID disk array.
Each of the previous tests looked at a portion of an application in isolation. To determine the performance of a complete Notification Services application, we developed a sample application and measured the number of notifications it generated, formatted, and then sent per second.
The sample application is a stock trading application in which notifications are sent to subscribers when a stock price crosses a user-defined trigger price. The application was designed as follows:
- We collected events for 1000 stocks. There was a stock quote for each stock each minute, so we collected 1000 events per minute.
- We had 200,000 subscribers, each with five subscriptions, totaling one million subscriptions. Some stock symbols had more subscriptions than others.
- We used an XSLT content formatter to format the notifications.
- We used a custom delivery channel using the HTTP extension delivery protocol to distribute notifications.
- We used four notification generation rules for different types of triggers, such as "stop" and "limit" orders.
- We set a maximum notification batch size of 2500. Each notification generation rule produced four batches of notifications each minute, for a total of 16 notification batches for the application each minute, or an average of 40,000 notifications per minute.
- Each event was stamped with an arrival time, and each notification had a delivery time. (The difference in time was the latency.)
To analyze this application, we determined how many notifications were sent per second. Knowing that delivery protocol performance is often the key to application performance, we ran the test three times using one, two, and three distributor servers. The following graph shows the results of these tests.
Figure 20. Case study results for 1, 2, and 3 distributors
This application generates a large number of notifications, more than most applications. Using one or two distributors, the distributors could not send notifications as fast as the application could produce them. Using three distributor servers, the distributors were able to keep up, sending approximately 700 notifications per second.
In each test, the latency of delivering notifications was less than two minutes.
When you design a system to support your Notification Services applications, you design it around the peak usage periods. For example, if your peak load begins at 8:00 in the morning, with your application sending an average of 300 notifications per second for the next 30 minutes, you must design the application around this peak period. If your system is designed for a maximum of 100 notifications per second, you will have a large backlog of notifications after this peak period, so the latency between receiving an event and sending the notification can be significant.
To design the system, typically you should analyze performance for three application functions: event collection, notification generation, and notification formatting and delivery. Because many applications use expensive delivery protocols, and sometimes use expensive formatting, the performance of the distributor is often the key factor in overall application performance.
You can use the graphs in this paper to estimate the performance of the components for your system. (If you use custom components, consider running similar tests of your own.) Estimate the number of events collected and notifications generated per second, and then look at the graphs in this paper to determine whether a system similar to the ones we used will support the throughput.
In some cases, you may not know the number of notifications an application will generate. You can make an estimate based on the number of subscriptions that you expect will produce a notification each day. For example, if your application has 20 million subscriptions, and you think each subscription will get one notification a day, then you need a system that can handle 20 million notifications a day. If there are peak periods for notification delivery, estimate the percentage of subscriptions that will produce a notification during those periods.
Use the information in the graphs and in the recommendation sections in this paper to improve the performance of your system where necessary. In addition, the following guidelines should help you design a high-performing system:
- Choose the correct edition of Notification Services for your applications. If you need the following features, you must use Notification Services Enterprise Edition:
- Multiple generator threads, which can increase the number of notifications an application can generate.
- Use of more than three distributor threads, which can improve the performance of formatting and distributing individual notification batches.
- Scaling out of formatting and distribution across multiple servers, which is often necessary if your application uses an expensive delivery protocol and sends many notifications per second during peak periods.
- Limitations on notification batch size, which can improve performance when using multiple distributors.
- Multicasting, which allows the application to send one message to multiple subscribers, decreasing the formatting and distribution load when many subscribers want the same information.
- Place your Notification Services databases on your most powerful system, because database performance is key to Notification Services performance. The optimal system has a fast disk subsystem with multiple physical disks, adequate processing power, and abundant memory. Use the systems described in this paper as a guideline, and monitor disk, processor, and memory usage with Windows performance counters.
In addition, use the following guidelines when designing the system:
- Place database log files on their own dedicated physical disk. Also place the tempdb database on it own dedicated physical disk.
- If your application uses two or more notification generation rules that can be fired in parallel, select a multiprocessor server for your database server and configure two or more generator threads. This will permit the generator to fire the rules in parallel.
- If a database runs out of disk space, SQL Server can autogrow the file by an amount determined when you created the database files in the ADF and the configuration file. However, the database autogrow operation is expensive, so allocate ample space for your database files when you initially define them.
- Choose the proper servers for the distributors. When posting messages using HTTP, one distributor can post about 200 notifications per second. Because formatting and distribution is CPU-intensive, consider using dual-processor servers with 1 GB of RAM.
- For most applications, event providers can run on the same server as the generator. This is usually sufficient unless the volume of events is very high.
If you use a custom event provider, consider placing it on a server similar to the distributor server. You can use the same server for the distributor and the event provider if your event and notification load is not too high. Use tests similar to the ones shown in this document to determine the performance of your custom event providers.
- The generator and the SQL Server event provider are primarily database components. Therefore the performance of the database server has the biggest impact on the performance of these components, not the server that hosts them.
To get started with system design, you might find it useful to start with one of our standard configurations:
- Place the databases and Notification Services on one dual-processor server.
- Place the databases and Notification Services on one quad-processor server.
- Place the databases on the quad-processor server and Notification Services on the dual-processor server.
- Place the databases on the quad-processor server and Notification Services on three dual-processor servers; scale the distributor across the three servers.
For more information about system design and instructions for deploying Notification Services applications on various system configurations, see "Hardware Configurations" in Notification Services Books Online.
Measuring Application Performance
When you planned your application and the host system, you first came up with a preliminary system design. For this system, therefore, you know the number and type of servers and how many disk drives the system will have. Using this information, you can complete your application by specifying the location of the databases, event providers, generators, and distributors.
When you deploy your application for testing, run the following tests to check performance:
Note When testing an application, include a simple notification generation rule. This will provide a performance baseline when examining more complex rules.
- Add a large set of subscribers, such as 10,000, and measure the time it takes the application to complete this operation. You can design a small, custom application for testing purposes that simply bulk-loads the subscribers.
- Add one subscription per subscriber, for a total of 10,000 subscriptions, and measure the time it takes the application to complete this operation. You can design a small, custom application for testing purposes that simply bulk-loads a set of subscriptions.
- Add an event batch that is similar to a production event batch, and wait for Notification Services to distribute the notifications.
- Use the built-in reporting stored procedures to get average performance data for event collection, generation, and distribution.
- Use the NSQuantumList stored procedure to locate a quantum of interest. If you submit one batch of events, the results of this stored procedure will show you in which quantum or quanta the processing occurred. The following example shows how to run this stored procedure:
EXEC NSQuantumList '2002-12-21 00:04', '2002-12-21 00:08'
The results contain two columns that show how many notifications were generated during each quantum: EventNotificationsGenerated and ScheduledNotificationsGenerated. Look for non-zero values to locate quanta of interest.
Using a quantum of interest from the previous step, run the NSQuantumDetails stored procedure to find an event batch of interest. For example, if you want information about quantum 2, run this:
EXEC NSQuantumDetails 2
The third result set shows each event batch that was committed during the quantum. Note the StartCollectionTime and StopCollectionTime values for an event batch of interest.
- Run the NSDiagnosticEventClass stored procedure, isolating an event batch of interest. For example, if the StartCollectionTime value is '2002-12-21 00:05:21.380' and the StopCollectionTime value is '2002-12-21 00:05:41.083', run the stored procedure with values similar to this:
EXEC NSDiagnosticEventClass 'MyApp', 'MyEventClass', 1, '2002-12-21 00:04', '2002-12-21 00:06'
The results of this stored procedure contain several values that help you analyze performance:
AvgEventsCollectedPerSecond shows how many events were collected per second.
AvgEventNotificationBatchGenerationTime shows the average latency between collecting the events and creating raw notification data.
AvgEventNotificationBatchWaitTillDistribution shows the average latency between a notification batch being available, and when it was picked up for distribution.
AvgEventNotificationBatchSucceedDeliveryTime shows the average latency between picking up a notification batch and successfully sending the notifications.
- Use the NSQuantumList stored procedure to locate a quantum of interest. If you submit one batch of events, the results of this stored procedure will show you in which quantum or quanta the processing occurred. The following example shows how to run this stored procedure:
Additional stored procedures are available to help you analyze performance, such as NSDiagnosticNotificationClass and NSDiagnosticDeliveryChannel. For more information about using the reporting stored procedures, see "Stored Procedure Reference" and "Using Reports to Analyze Performance" in Notification Services Books Online.
After you have completed these basic tests, you can tune performance by adjusting application settings:
- If using Notification Services Enterprise Edition, adjust the <ThreadPoolSize> value to enable multiple generator threads. Try a value of 0 and let Notifications Services determine the optimal number of threads.
- Adjust the <ThreadPoolSize> value for the distributor, allowing the distributor to process multiple work items.
- Reduce the <NotificationBatchSize> value to break up the notifications into more batches so that multiple distributors can work on notification batches simultaneously.
- If you do not need the various distributor logging options, explicitly turn them off. For more information, see "Defining the <DistributorLogging> Node" in Notification Services Books Online.
Most application problems concern the speed at which events are collected or at which notifications are generated, formatted, or distributed. Use the following recommendations for troubleshooting poor performance.
Notifications are not being distributed fast enough.
Cause: The delivery channel cannot keep up with the notification load.
Solution: If using a custom delivery protocol, test and tune the performance. If the delivery protocol is optimized, try increasing the <ThreadPoolSize> value for the distributor. If this doesn't work, scale out the distributor onto multiple servers.
Cause: The content formatter is too slow.
Solution: Complex formatting decreases performance; try simplifying the formatting. If you are using a custom content formatter, test and tune the performance of the formatter. The next step is to try increasing the <ThreadPoolSize> value for the distributor. If this doesn't work, scale out the distributor onto multiple servers.
Notifications are not being generated fast enough.
Cause: The notification generation rule is complex.
Solution: Complex rules need tuning. Using SQL Server Query Analyzer, run the rule and display the execution plan. You can also use the rule in the Index Tuning Wizard to view suggested indexes for the query.
Cause: The processors on a multiprocessor system are not being used efficiently.
Solution: Increase the <ThreadPoolSize> value for the generator. This value should be between one and two times the number of processors on the database server.
Events are not being picked up fast enough.
Cause: You are using the Event and EventCollector classes in a custom event provider.
Solution: Optimize the disk system on the database server and make sure the database log file is on a separate physical disk. Use a multithreaded event collection application to improve the number of events that can be processed.
Subscribers and subscriptions are not being added fast enough
Cause: The disk subsystem for the database server is not adequate.
Solution: Optimize the disk system on the database server and make sure the database log file is on a separate physical disk.