Middle Tier Application Data Caching with SQL Server 2000
Ian Jose, SQL Server Group
Microsoft® SQL Server™ 2000
Summary: How to use Microsoft SQL Server 2000 to construct and maintain data copies in cache solutions. (17 printed pages)
Middle tier applications often use a single database management system (DBMS) to store data, which can expose scaling limitations as the number of user requests increases. Caching, a technique used to increase application performance by copying data and then using the copied data in place of the original data, can dramatically increase the throughput (the number of application requests serviceable per unit time) and scalability of middle tier applications. It does this by offloading a subset of data requests from a single DBMS to one or more servers that hold closely synchronized copies of the data.
This paper contains a preliminary discussion about caching in general and then caching in the context of middle tier applications. The next sections show how Microsoft® SQL Server™ 2000 can be used to construct and maintain the data copies used in cache solutions. The paper concludes with real-world examples that illustrate how SQL Server 2000 has been used as a caching solution to improve middle tier application performance.
Caching can enhance application performance by reducing latency and increasing throughput. In middle tier application caching, typically the only noticeable benefit is increased application throughput because reductions in latency within the middle tier are usually not noticeable to the user.
In this paper, standard replication terminology is used to describe the servers in a caching topology; the data source is referred to as the Publisher, and the cache is referred to as the Subscriber. For more information about replication, see SQL Server Books Online.
Caching is most appropriate when the cached data is accessed repeatedly from the Subscriber. If cached data is accessed infrequently (Figure 1), caching does not improve throughput.
Note Although the word access can refer to both read and write operations, here it refers only to read operations. The way caches work with data that is written by the end user is described briefly at the end of this section.
Figure 1. Application without caching
If users access the same data repeatedly from the Subscriber, caching data can provide great benefits by improving throughput. It is possible to increase the throughput of an application by offloading processing from a CPU-bound or I/O-bound Publisher to the cache. The Publisher is accessed once to load the data into the cache, but thereafter data can be accessed repeatedly from the cache without impact on the Publisher (Figure 2). Using SQL Server transactional replication, for example, you can forward incremental data changes to Subscribers, and the demand on the Publisher is limited only to forwarding changes from a previous data state already present at the Subscriber.
Figure 2. Single Subscriber cache
A cache typically uses a different technology from its data provider; this is due to the combined requirements of memory speed and capacity in most data retrieval activities. For example, in the case of an in-memory cache for disk-resident data, the combination of different technologies delivers the best of both worlds: low latency from memory and high volume from disk. In some situations, however, it is appropriate to use the same technology for the cache and for the data provider, such as when the data provider has other responsibilities in addition to providing data. In this case, a subset of the data provider's responsibility is offloaded to the cache.
Using the same technology for the cache and the data provider is also appropriate when there are multiple caches. For example, a single SQL Server DBMS might store both catalog data and order processing data for an e-commerce Web site. Caching data in one or more SQL Server Subscribers can increase system throughput. The Publisher is still responsible for operations on order processing data, but read operations against catalog data can be handled by any one of the multiple Subscribers (Figure 3). The Publisher has the new responsibility of keeping the Subscribers up-to-date, but this is substantially less work than processing all application catalog queries.
Figure 3. Multiple Subscriber cache
Although caching has some clear advantages, consider your particular data access requirements carefully before implementing a cache. For example, if you have data for which replication latency must be extremely low (such as data that is updated frequently or data that must reflect changes immediately), caching is less appropriate because more effort is required on the part of the updater (usually the Publisher) to keep all copies of the data current. Even so, you may want to weigh the very real benefits of allowing users to access out-of-date data at the Subscriber (at least for read operations), even when the data has already been updated elsewhere. If you can tolerate a moderate increase in latency in this way, caching becomes a viable option again and you could reap some of the performance benefits of caching. However, if you require serial transactions, you must have the most current data and must forego caching.
Caching read/write data is similar to caching read-only data. However, updates are available immediately only if users are consistently routed to the same cache, which is not usually the case. Changes made at any Subscriber must be propagated first to the Publisher and then to all other Subscribers that cache the same data before all users are guaranteed to see the effects of an update. The increased update activity may outweigh the benefits of offloading read operations. Additionally, to maintain update durability, it is necessary to use the same durability techniques on each cache as those used on the Publisher, which can further offset the benefits of caching.
Caching can increase middle tier application throughput for data that is mostly read data, which is the type of data common in middle tier applications. These applications also have requirements for caching technologies that are often more stringent than requirements for caches in client-side applications.
Middle Tier Application Data
There are a wide variety of middle tier applications, and each application has different classes of data with disparate characteristics. Consider caching data to increase throughput if your data:
- Is mostly read.
- Does not need to be read in its most current form.
A common middle tier application today is the e-commerce application. This type of application allows shoppers to browse and order merchandise through their Internet browsers. Most of the application requests are read operations on catalog data. Each individual customer using the site queries the same catalog data repeatedly. Even if the same data is not returned repeatedly to users, the e-commerce application can access it repeatedly. For example, the application can perform full-text searches.
The catalog data changes infrequently and it is only changed by the e-commerce business, not by the customer. The catalog data usually does not need to be queried in its most current form, and in a well-designed system, update propagation latency should only be a matter of seconds. Users spend most of their time browsing catalog data; they only occasionally perform an insert or update operation to order a product. During order processing, only the most current information can be shown (such as the product price) because the customer and the e-commerce business are entering into a contract. In this example, the catalog data is appropriate for caching; order data is not appropriate for caching.
Table 1. Example of application data comparison
|Web application data||Source||Lifespan||Partitionable by users||Shared between users or across requests||Must be durable||Suitable for middle tier application data caching|
* The entire data set, not the query result, is common across users and user requests for free catalog data.
One other aspect of Web data that should be mentioned is data mining, which often occurs on user profile and order processing data. Both types of data are normally partitionable in an online transaction processing (OLTP) system. However, for data mining, they are typically rolled up in a nonpartitioned cache because data mining queries are designed to answer questions that pertain to the entire user space or enterprise. Caching for data mining is common, and many techniques discussed in this paper are applicable to data mining. For more information about implementing data mining in SQL Server 2000, see the topic "Introduction to Data Mining Models" in SQL Server Books Online.
Middle Tier Application Cache Requirements
Because middle tier applications often have large data sets and a high number of users, middle tier data caches must satisfy a number of requirements:
- Cached data must be relatively up-to-date.
- The cache service must be consistent.
- The entire middle tier application, including the cache, must be scalable and highly available.
- The technology used must be easy to maintain.
- The cache must be able to evolve incrementally over time in response to application and schema changes.
- Application administrators must be able to monitor application and cache operation.
The amount of data stored in a middle tier application cache can range from data sets small enough to be stored completely in memory to data sets so large that they can only be stored across multiple disks. For example, several middle tier applications exist today to sell books over the World Wide Web. Their databases often contain information about millions of books. Information about popular books is accessed repeatedly, while information about others is almost never requested. The total size of the data, including secondary indexes, is much larger than can be conveniently stored in memory.
To handle large data sets, the middle tier application cache should have features similar to database management systems to make good use of hardware resources, such as Most Recently Used (MRU) algorithms. Additionally, when cached data sets are large and expensive to construct, they should be constructed only infrequently, such as when setting up a new cache. In this case, DBMS features such as persistence, logging, and recovery are crucial to ensure that these expensive data states are reliably maintained.
Rich query support
When entire data sets are too large to return for a given user request, you need the flexibility provided by rich query support. This facilitates queries on the data that select only a small portion of the cached data. Again consider a Web-based bookseller application. Each user request is for books that match specific criteria, often based on full-text searches of title and content. The total number of books in the cache is large, but only a few are returned to the user during a shopping experience.
Cache access performance should be consistent and predictable. Regular operations should not disrupt service. For example, propagation of updates from the Publisher should not block service to such an extent that it is noticeable to users.
Fast data propagation
The data cached must be relatively up-to-date. If it lags too far behind the data on the Publisher, it will no longer be an adequate substitute for the original data. For example, a common middle tier application is online publishing of stock information. Most sites have disclaimers indicating that prices shown may be as much as 20 minutes old. However, all these sites strive to keep latency to a minimum.
In addition to being relatively up-to-date, cached data must match a consistent state that existed at some time in the Publisher. When cached data is allowed to be inconsistent, applications must handle anomalies in the data. For example, a middle tier application caches customer order information and the caching technology does not support consistency, anomalies can occur within the cached data. The application must then check for data consistency and take deliberate action when inconsistencies are found in order to present the user with a consistent image of the data. To simplify application development, caching technologies must ensure the consistency of cached data.
The data required in the cache is often a subset of data stored at the Publisher, so the data set must be filtered. Either the entire data set is filtered or only the rows that change are filtered as the changes are propagated to the cache.
Middle tier applications are server applications that must run continuously. Although some downtime is possible, maintenance requirements for middle tier applications must be kept to a minimum. Regular operations should be self-supporting; they should not require regular intervention by application administrators to keep the application running with acceptable performance.
Middle tier applications evolve over time, and, therefore, the nature of data in the cache also changes occasionally. Cached data may gain a new property or lose an existing one, which requires changes to the table schema. Changes in query patterns on the cached data may require index changes.
Caches should accept these changes without requiring entire data sets to be invalidated and reloaded. This is especially important when the amount of cached data is large. For example, a middle tier application that publishes yellow page information on the World Wide Web is modified to begin publishing Web site addresses for their business customers. At the time of the change, only a few businesses have Web sites and the total change to the data within the cache is small. However, the meta data for the cached data will contain a new property. The amount of data in the cache is large, and reloading all caches may require either a prohibitive amount of downtime for the entire site or excessive resources to make the change without affecting users. The cost in resources should be commensurate with the amount of data changes actually occurring in the cache.
Caches should have support for monitoring so that application administrators can watch the operation for anomalies. This includes being able to detect changes in application usage patterns, hardware and software failures, and denial of service attacks.
The features that make SQL Server 2000 a great DBMS for storing middle tier application data also make it a great technology for caching the same data. SQL Server 2000 can store and efficiently access very large amounts of data. Efficient algorithms ensure that most often used data is kept in memory for fast access. After large data sets are imported into the cache, its logging and recovery technology ensures that this data does not become corrupted by events such as application failures or brief power interruptions. Only a media failure might require cached data to be reloaded from either a database backup or from the Publisher.
SQL Server 2000 has rich query capabilities that allow large data sets to be filtered for each user request. This support includes the Transact-SQL query language with precompiled stored procedures that have optimized query plans. Rich indexing is also supported to speed queries on the cached data. To optimize queries, indexes can be added at any time after the cached data is populated, which is particularly useful if cache query loads or query types change over time.
Cache Subscribers are unlike Publishers in that their data can always be regenerated from the Publisher. In order to reduce maintenance requirements, you can elect not to back up cache Subscribers. Instead, you can handle media failures by constructing a new cache at the Subscriber. You can use the simple recovery model with cache Subscriber databases because no backups will exist. Using the simple recovery model also speeds up bulk update operations and minimizes disk usage for log files.
SQL Server 2000 also has rich replication support to keep caches closely synchronized with data Publishers. This replication support includes transactional replication and snapshot replication, the most appropriate technologies for caching. Transactional replication and snapshot replication maintain transactional consistency when replicating data changes. Objects that can be replicated, called articles, include tables, stored procedures, user-defined functions, views, and indexed views.
Transactional replication monitors changes to tables on the publishing server at the transaction level: insert, update, or delete operations. Changes made to the Publisher flow continuously or at scheduled intervals to the subscribing servers, depending on the settings you choose. Changes are propagated in near real time, typically with a latency of seconds. Only committed transactions are sent to the Subscribers, and transactions are guaranteed to be in the order in which they were committed at the Publisher. This guarantees transactional consistency: The cache state is always a state that existed at some time on the Publisher. An example of transactional replication topology is shown in Figure 4.
Transactional replication subscriptions must be synchronized to a recent data state before incremental updates can be applied. The process of creating this initial state in the Subscriber is similar to snapshot replication, which is described in the section Snapshot Replication later in this paper. To replicate incremental changes, transactional replication first propagates changes from the Publisher to the Distributor. The Distributor can be the same server as the Publisher, or it can be a separate server. In either case, it has a special database, called the distribution database, which holds changes harvested from the transaction log of the Publisher. These changes have not yet been distributed to all Subscribers. The process that moves transactions from the Publisher to the distributor is called the Log Reader Agent. The process that distributes transactions from the distribution database to the Subscribers is called the Distribution Agent.
SQL Server 2000 transactional replication automatically replicates changes only to tables, but changes to other types of articles can still be incrementally propagated to subscriptions through the use of sp_addscriptexec. Transactional replication requires published tables to have primary keys, which are used to relate update operations on the Publisher to data already replicated to the Subscribers.
Figure 4. Example of transactional replication server topology
Because SQL Server supports terabytes of data, it is a reliable, high-capacity database system that makes an appropriate platform for a cache. It has a robust backup and recovery system that can be used on both the Publisher and the cache. SQL Server provides high throughput, and transactional replication is a very efficient way to propagate changes.
Rich query support
Unlike many specialized caching products, SQL Server uses a mature dialect of Structured Query Language (SQL): Transact-SQL. SQL Server provides application developers with the same rich query support for cached data that they have for non-cached data. For example, an application could provide users ad hoc query access to cached data with no additional programming.
Transactional replication helps minimize contention and provide regular service from the cache by incrementally replicating updates made at the Publisher. Updates at the Publisher can be thought of as a serial stream of updating transactions with each transaction taking the Publisher from one consistent state to another. In transactional replication, these updates are propagated to each cache as a similar set of serial transactions. However, adjacent transactions at the Publisher are often performed as a single transaction at each subscriber because combining multiple transactions into a single transaction is more efficient. The number of Publisher transactions in a Subscriber transaction is configurable through the –CommitBatchSize parameter of the Distribution Agent, and can be limited to exactly one.
It is important for any cache solution to minimize contention so that operations on the cache complete in a reasonable amount of time. Any updating operation can cause contention because locks are acquired on updated rows and held for the duration of the operation. These locks prevent other update and read operations on the same rows. Contention is reduced when the number of updates in a transaction is small or when the transaction is committed quickly. Transactional replication keeps contention on the cache similar to contention on the Publisher because similar transactions are applied. Snapshot replication has much greater contention because it updates all rows in a cache in a single transaction.
Fast data propagation
SQL Server 2000 transactional replication can propagate changes made to published data quickly. The Log Reader Agent and the Distribution Agents can run continuously. Each agent operates on changes one batch at a time and processes the remaining changes in each batch immediately. (Batch sizes are configurable.) If an agent reaches the end of a batch and there are no remaining changes, the agent becomes inactive for two seconds and then polls for additional changes. This batch-oriented process usually leads to update propagation latencies of only a few seconds.
The Distributor must not be throughput-bound because this has a negative effect on data propagation latency. The Distributor can become throughput-bound if the distribution server has inadequate hardware resources, but this is rarely seen in practice. It is more common for throughput problems in the distribution process to result from a large number of indexes or triggers on published data at the Subscriber. As each update is applied at the Subscriber, all indexes are maintained and all triggers are fired; therefore, operations on Subscribers at update time must be carefully considered with throughput in mind.
Transactional replication replicates changes on transaction boundaries and holds locks when applying updates (as it does for non-cached data), so queries on the cache are always transactionally consistent. The data the user sees may be latent, but it is always consistent because it represents a state that existed at some time on the Publisher.
Transactional replication supports filtering. Publications themselves are a form of filtering because they define an arbitrary set of replicatable objects within a publishing database. Additionally, transactional replication allows filtering within a table, so that only certain columns or rows are published to the cache. Row filtering is based on data in the row or related data in other tables. For filters that are based on data in other tables, the filters are applied against the live data (the complete data set at the Publisher) for all data that is not in the replicated row.
Transactional replication also permits more complex filters when the filter is expressed as an indexed view. An indexed view is a materialized form of a view that is incrementally maintained; transactional replication can replicate this object as though it were a table. For more information about indexed views and what types of views can be indexed, see SQL Server Books Online.
Transactional replication is self-maintaining; under normal conditions it does not require a database administrator to perform any maintenance. The Distribution Cleanup Agent removes transactions from the distribution database after they have been delivered to all Subscribers. Updates are not retained longer than the distribution retention period, which is a Distributor property with a default of 72 hours. This is true even when updates have not been distributed to every Subscriber (for example, when a Subscriber is removed from a topology but its subscription is not dropped). The Distribution Cleanup Agent deactivates such subscriptions and then cleans up the distribution database according to the remaining active subscriptions.
An expired subscription agent checks for subscriptions that have not synchronized within the subscription expiration period (a publication property with a default of 14 days). For example, when you remove a Subscriber without dropping its subscription, the subscription is automatically dropped after the time specified in this property has elapsed. No intervention is necessary in this process, but if the rate of updates propagated is so high that retaining 72 hours of changes causes resource shortages on the Distributor, the database administrator can set the appropriate distribution retention period.
In addition, the History Cleanup Agent deletes history information logged by replication agents. The amount of replication history retained is determined by the history retention period, a Distributor property with a default of 48 hours.
Dynamic schema support
Transactional replication includes support for updating publication schemas after subscriptions have been created and are actively receiving updates from a Publisher. You can update publication schemas after subscriptions have been created by:
- Adding articles to active publications.
- Dropping or adding columns, both to published tables and to their underlying tables.
- Writing scripts once at a Publisher and then running them in transaction order on each Subscriber.
To update publication schemas in these ways, use SQL Server Enterprise Manager, Transact-SQL scripts, or transact-SQL Distributed Management Objects (SQL-DMO). In this paper, only script examples are used to illustrate how schema changes can be made.
Use the stored procedure sp_addarticle to add articles to an active publication. The addition of an article does not affect the distribution of updates to articles that are already published. When a new article is added, the Snapshot Agent creates a snapshot for the new articles only, and the Distribution Agent propagates these articles to subscriptions in transaction order. If the new article is a table, incremental updates to the table are subsequently propagated. At any point in time, all published data at a subscription is consistent.
In addition to adding entire articles to a Publication with active subscriptions, it is also possible to modify published articles. Use the stored procedures sp_repladdcolumn and sp_repldropcolumn to add and drop columns, respectively. The replication process continues uninterrupted when a schema is updated using these stored procedures. However, some contention may occur on the Subscribers because the add column or drop column operations require an exclusive table lock for a brief period of time. It is not necessary to rebuild an entire cached data set from the Publisher when a column is added or dropped, but procedures referencing the table must recompiled for the updated schema.
In addition, arbitrary scripts can be run at each Subscriber using sp_addscriptexec. This stored procedure takes a script file as a parameter and runs this script once at each Subscriber in order with other updates that are occurring on the Publisher. It is important not to use this mechanism to change column definitions in published tables because special stored procedures used in transactional replication must be updated in concert with updates to the column schema of published tables. However, indexes can be added or dropped, and many other changes are also possible.
SQL Server 2000 replication has extensive monitoring support. The Replication Monitor in SQL Server Enterprise Manager graphically displays status for the Snapshot Agent, the Log Reader Agent, and the Distribution Agent. You can also view status for distribution cleanup, agent history cleanup, and expired subscription cleanup agents. It is also possible to monitor a group of distributors in SQL Server 2000 using the Replication Monitor Group feature.
Configure alerts for any agent to notify an administrator in the event of a failure or unexpected result. Configure notifications to use e-mail, a pager, or a net send notification to send a message to a database administrator. In addition, events can be written to the Windows Application Event Log so that all system events can be viewed from one location.
You can also query events directly from the following system tables in the distribution database:
- Snapshot history is stored in the MSsnapshot_history table.
- Log reader history is stored in the MSlogreader_history table.
- Distribution agent history is stored in MSdistribution_history table.
Most developers never access these tables directly because their schema can change from release to release, requiring unexpected changes to custom monitoring applications.
Snapshot replication is similar to transactional replication except that incremental changes to replicated objects are not propagated to Subscribers. Instead, new snapshots of replication objects are periodically produced and applied to Subscribers. Snapshot replication can replicate the same objects that transactional replication can. In terms of caching, snapshot replication features are the same as transactional replication features, except as noted in this section. Primary keys are not required for each replicated table because no incremental updates are applied to Subscribers. However, when articles are updatable through immediate or queued updating, primary keys are required so that updates to subscription article rows can be associated with the same logical row on the Publisher. Updates for all objects, including objects other than tables, are applied in snapshot replication.
In general, snapshot replication is less appropriate for caching than transactional replication, but it can still be employed for caching when:
- Cached data periodically changes all at once.
- Cached data can periodically be offline during updates.
However, snapshot replication can be an even more effective technology for caching than transactional replication when the update rate on the data is very high and when the total volume of data is so low that it can be fully updated in a time similar to an application's cache query.
For example, consider a foreign currency exchange application that serves a kiosk in an airport. At any one location, there are fewer than 20 currencies to convert. Currency conversion rates are changing every second in world currency markets, so cached data is always inaccurate to some degree. Snapshot replication can be used to efficiently update a currency conversion table because applying changes at each kiosk as a net data change, rather than replicating every update that occurred at the Publisher, is a more effective method of moving this data. Additionally, because the data set is so small, it can be fully updated in a time imperceptible to the user; any contention that occurs at the Subscriber is not noticeable to the user.
Snapshot replication, like transactional replication, is self-maintaining. Under normal conditions, no maintenance work is required. Maintenance agents similar to those described for transactional replication clean up out-of-date history and abandoned subscriptions.
Snapshot replication also includes some support for dynamic schema changes. Articles can be added to publications with active subscriptions. In addition, columns can be added or dropped.
Propagating more substantial schema changes is possible if you first drop all subscriptions (which does not remove data from Subscribers) and then define a new snapshot publication. The new publication can be pushed to Subscribers of the previous publication. When objects of the same name are found, they are dropped and re-created using the same process used for a regular snapshot distribution. The only additional step is that all previously published objects on the Subscribers that were not contained in the new publication must be removed. SQL Server 2000 replication also provides ways to automate this activity through pre-snapshot or post-snapshot scripts. (For more information, see the topic "Executing Scripts Before and After the Snapshot is Applied" in SQL Server Books Online.)
Monitoring support for snapshot replication is similar to that of transactional replication, except that snapshot replication does not employ a Log Reader Agent because no propagation of incremental data changes occurs.
The following examples show how SQL Server 2000 with replication is being used today to increase middle tier application throughput with caching. The first example is from MS.com, the application group within Microsoft Corporation responsible for developing and operating the corporate Web site. The second example is from NASDAQ.com, the user information site for the NASDAQ stock exchange.
The MS.com group at Microsoft develops and operates the Web site www.microsoft.com. MS.com uses SQL Server 2000 and its replication support extensively to scale out mostly read content databases (Figure 5). Of the approximately 200 computers running SQL Server that are used in MS.com, about half are Publishers and the other half caches.
Each content server houses one or more databases, with each database ranging in size from 2-10 gigabytes (GB). The content in these databases is all the data behind the Microsoft.com Web site and has a total size of more than 1 terabyte. Some of this data is normalized into rows and columns, and some is stored as XML documents in SQL Server image columns.
The cache servers cache the subset of the content that is read so frequently, by both users and application components, that a single commodity server cannot support the load.
Figure 5. MS.com scalable cache unit
SQL Server replication is used to create and maintain data in caching servers. MS.com uses continuous log reader and distribution agents to minimize latency. In addition, Subscribers are located on the same subnet as their Distributors to further reduce latency. MS.com typically finds a distribution latency of only 2 to 3 seconds. Local Distributors are strictly used because MS.com can partition Publishers by application. This partitioning ensures that no Publisher has so great an update load that it cannot also be a replication Distributor.
Network Load Balancing (NLB), a clustering feature of Microsoft Windows® 2000, is used to abstract sets of homogenous Subscribers into single virtual servers. NLB provides load balancing, availability, and flexibility to the operations staff. For example, operators can remove a server without severely affecting service, provided the server is not the last cluster member. Additional computers can be added to these NLB clusters as the read loads increase. The result of the MS.com architecture is a flexible server topology that can support worldwide loads.
Caches are synchronized with content servers using both snapshot and transactional replication. When snapshot replication is used, NLB moves connections from servers that are about to be resynchronized to the remaining servers in the cluster. After the servers to be resynchronized are emptied of connections, new snapshots are applied; then the servers are re-enabled in the NLB cluster. NLB is then used to strictly move connections from the unsynchronized servers to the synchronized ones. This action occurs quickly enough to prevent users from seeing new data and then subsequently seeing old data.
By repeating this process on all NLB cluster members, the entire cluster can be resynchronized without affecting query service. Scripting automates the entire process. When transactional replication is used, there is no need to enable and disable servers in NLB clusters because transactional replication replays updating transactions as they occurred on the content owner; no substantial locking occurs on caches.
The example of MS.com provides some important lessons about using replication for caching:
- Keep the Distributor and all Subscribers on the same subnet. This minimizes network latency and thereby minimizes replication latency.
- Use continuously running agents to minimize latency. Be aware of the following when using continuously running agents:
- When a continuous running agent experiences an exception condition and stops, it does not automatically restart itself. This agent behavior can be changed in Enterprise Manager. For example, to change the behavior of the Log Reader Agent so that it restarts on failure:
- Expand ServerName, then Replication Monitor, then Agents, and finally Log Reader Agents.
- In the right pane, right-click the Log Reader Agent, and then select Agent Properties.
- Click the Steps tab.
- Select job step 2 (Run Agent), and then click Edit.
- Click the Advanced tab.
- On the On Failure Action menu, select Goto step:  Log Reader Agent Startup Message.
- Click OK to exit this dialog box, and then click OK again to exit the main dialog box.
The alerting feature in SQL Server Agent jobs should be used to inform a site administrator when continuously running agents fail.
- If a continuous agent is changed to a scheduled agent, remove the -Continuous parameter manually from the replication agent command.
- If a continuous agent is running when it is updated to be a scheduled agent, manually stop it and restart it to begin scheduled operation.
- When a continuous running agent experiences an exception condition and stops, it does not automatically restart itself. This agent behavior can be changed in Enterprise Manager. For example, to change the behavior of the Log Reader Agent so that it restarts on failure:
- The snapshot location should be local to the Publisher, so that snapshot creation time is kept to a minimum. After the snapshot has been created, it can be moved to a location close to the Distributor or Subscriber for subsequent use in subscription initialization.
- Use scripting extensively to benefit from guaranteed repeatable results when you set up a new cache. When scripts are created, they run under the security context of their creator. Change this manually through sp_changeobjectowner to be the service account that is used at the new cache.
- SQL Server monitoring can be integrated into existing monitoring components using scheduled jobs to gather general server and replication information. SQL Server replication information is stored in history tables in the distribution database.
- In general, schema changes require subscription re-initialization. Some schema changes, however, do not require re-initialization, including changes to add or drop columns. For these two operations, schema changes should be made with sp_repladdcolumn and sp_repldropcolumn, respectively, whenever possible. It is possible to add articles or columns within a table article to an existing publication by using a partial re-initialization, which is less time-consuming than a full subscription re-initialization.
MS.com plans to continue using SQL Server 2000 and its replication features to develop scalable topologies of commodity servers that can handle the loads required of one of the world's most trafficked sites.
NASDAQ.com is a public information site that contains information about companies traded on the NASDAQ securities exchange. The public information includes continuously updated stock quotes as well as background information on traded companies. The total volume of data is less than 1 GB and can typically be kept in memory after loading from disk. NASDAQ.com employs SQL Server replication for both scalability and availability of data. The topology shown in Figure 6 reflects this, but in this paper we only describe how NASDAQ.com uses replication for scalability.
Figure 6. NASDAQ.com SQL Server/replication topology (click image to see larger picture)
NASDAQ.com uses both snapshot and transactional replication. Snapshot replication is used for data sets that are small in size and that usually change all at once. Transactional replication is used for larger data sets that change incrementally. Continuous push replication is used for transactional subscriptions in order to minimize latency. Using push subscriptions instead of pull subscriptions ensures that replication can be monitored and controlled from the Distributor alone. (For more information see the topics "Push Subscriptions" and "Pull Subscriptions" in SQL Server Book Online.) Although Figure 6 shows three Subscribers, NASDAQ.com actually has four Subscribers in each geographic region at the time of this writing. In addition, the relative workload of the Web servers and database servers means that there are approximately 60 Web servers for the eight Subscriber database servers.
The example of NASDAQ.com reveals some additional lessons about using replication for caching:
- Only replicate the data needed by the queries that run at Subscribers. For example, it is important to ensure that the worktables used in the construction of data at the Publisher are not published.
- Use snapshot replication for data that changes all at once at set intervals.
- Use transactional replication for data that changes incrementally.
- Use replication of stored procedure execution to improve replication performance. This is especially important for stored procedures that perform bulk updates. For more information see the topic "Publishing Stored Procedure Execution" in SQL Server Books Online.
- Use replication to deploy applications as well as data. For example, publish stored procedures and use sp_addscriptexec to update them.
Caching is an effective technique used in middle tier applications to increase throughput. It is most appropriate for data that users read more often than update, and for data that does not need to be read in its most current form. There are many examples of data matching these caching requirements in middle tier applications today.
SQL Server 2000 with replication is an effective basis for a caching solution for middle tier applications that store data in SQL Server. SQL Server in conjunction with transactional and snapshot replication meets the needs for most middle tier application caches. Many SQL Server based applications today are using caching to increase throughput and improve the user experience.