Understanding the Role of Directory Services Versus Relational Databases
This paper examines the role of directory services and relational databases and provides a conceptual overview of how to optimize these resources.
On This Page
Most companies recognize that installing, using, and maintaining distributed systems represents a cost to their bottom line. Most ongoing costs, such as those associated with daily data backups and managing users, are fairly easy to understand and predict. But other costs are less obvious:
Information Proliferation. There has been a proliferation of application-specific directories that contain similar information about users, machines and other network resources. E-mail systems have address books, for example, that contain much of the same information about users as kept by enterprise resource planning (ERP) systems. All directories must be kept up-to-date and synchronized with each other. In addition to problems that occur when information gets out of synchronization, identifying sources of synchronization problems—and restoring consistency to directories—can be very costly.
Client Configuration. Making sure that client machines continue to have the right software installed and are configured properly represents a significant and growing expense. For example, when a person moves from one department to another, an administrator needs to delete some applications and add others—and make sure that each is configured correctly. Improper configurations cause problems ranging from service interruptions to applications that damage corporate data unintentionally by applying out-of-date business rules.
Server Configuration. Many applications require administrators to assign clients to specific server-side resources, such as databases and application components, at the time of client deployment. Because clients are bound to specific machines, this kind of static configuration can hurt service levels. Users must wait for failed machines to be restarted before they can continue working. When users roam, they will still be connected to the same resources even if there are "closer" resources that can provide the same services more efficiently. And when server load increases, users can experience slower response times even if other servers have excess capacity.
Lack of Inter-Application Awareness. Most corporate infrastructures deliver very little synergy between applications, information kept about users, and infrastructure elements such as networks. For example, an employee's applications cannot detect his or her movement between different jobs within the company and reconfigure accordingly. Or participants in an important Internet videoconference may see jerky motions and hear distorted audio because another network user is consuming bandwidth by downloading games.
To address these issues, companies need dynamic applications capable of being more aware of the environment in which they are deployed, able to sense and adapt to changes, as well as share information about themselves with other applications. Early approaches focused on relational database-centric architectures, but found important limitations. Recent enhancements in directory service technologies promise a more complete solution. As usual, though, its not a case of "either/or" and architects need to understand how to balance the roles of databases and directory services in their networks.
The Directory Design Point
To understand the role of directory services versus databases, it is helpful to review the five key characteristics of a modern directory.
Hierarchical data organization
Directory services organize data hierarchically in a tree-like fashion. Within the tree there are two types of entities: Organizational Units (OUs) and objects. OUs are containers that can hold both other OUs and objects. A good analogy is a traditional file system with directories (or folders in the case of the Microsoft Windows® 2000 operating system) and files. Directories can contain other directories (i.e., sub-directories) and ultimately sub-directories contain files.
Such a model is useful for several reasons:
A container-based model makes it easy to group related entities together for simpler management. To understand this benefit, think of how hard it would be to manage very large lists of files if there was no concept of sub-directories.
The directory hierarchy can be used to mirror and model aspects of a network or organizational structure for even simpler management.
Ultimately, the hierarchy can be exploited for the knowledge that its structure represents. For example, if there is an OU in a directory called "All Users" and "Sales" is a child OU of "All Users", applications can infer that members of the "Sales" OU are users (and should therefore inherit some characteristics of other users) but are part of a collection that is somehow different from other users in the subtree (and therefore should be treated differently in other ways). The power of the tree-structured style of organization, and the role of inheritance, is explained further in the sections below on security and policy-based management.
In contrast, databases store information in flat tables where there is no inherent mechanism for reflecting the organizational structure of the data.
Object-style entity modeling
Directory services represent network entities as objects that contain attributes. Within a given directory (or even at the OU level), there can be many different types of objects. Each object type (or more appropriately "class") can contain whatever set of attributes are necessary to accurately model the entity represented by the object. All popular directory services also allow administrators to create new object classes and extend built-in classes (such as those representing users and computers) to contain attributes that are specific to an individual company.
By modeling network elements (such as people and machines) as objects, and supporting many different object classes within a single tree, directory services provide an exceptionally flexible way to store information about the various and diverse entities in a network. In contrast, relational databases typically would require a new table for each type of object, and it would be even more difficult to represent container-style relationships.
Flexible query support
Given the flexibility of directory services to store data and model entities, it is important that they also provide flexible ways to find data within the tree. In particular, directories must make it easy to locate objects and attributes within a "scope" of interest. Scopes should include location within the tree (such as all objects contained within the "Marketing" OU) and classes of objects (such as all objects of class "user") regardless of where they are located within the hierarchy.
In fact, via support for the Light-weight Directory Access Protocol (LDAP) all popular directory services are able to support these types of searches in a way that is both simple and efficient. Simplicity comes from the nature of the LDAP API and efficiency comes from the fact that most directory vendors allow administrators to specify which attributes should also be treated as indices in the underlying data store. With such indexing, searches for "all users who are managers and have spending limits over $5,000.00" can be performed quickly and with no need to search the entire tree. This type of indexing and efficiency is especially important as the industry is deploying directory services containing tens of millions of entries.
Vendors such as Microsoft also offer other forms of data access interfaces such as the Active Directory Service Interfaces (ADSI) for searching based on the Component Object Model (COM) and Advanced Data Objects (ADO) that provides a SQL-style syntax for accessing directory data. And, the industry as a whole seems poised to exploit advances in XML technologies to enable even greater flexibility. So, while it is hard to argue that relational databases do not offer flexible query support, it is safe to say that directories provide interfaces that are at least as flexible and highly adapted to the unique way that directory services store and model information.
Because a key goal is information sharing, directory services generally are deployed in ways that enable access from any user or application that can locate and connect to the directory via TCP/IP port 389. In order to prevent chaos and enforce proper security checks, directories have four important characteristics:
Before users can access data, they must provide credentials (e.g., provide what amounts to a user name and password) that the directory can use to verify the identity of the party requesting access.
Once verified, the directory service uses the credentials to enforce security on all operations (such as reads and updates) at the individual object and individual attribute level.
"Owners" of objects and OUs can "delegate" specific rights over their data to other authenticated users. In this way, an administrator could delegate the right to add users to the "Marketing" OU to a trusted individual in the marketing department.
If the user does not provide credentials, they will be treated as an "anonymous" user and be allowed to access only that data that is effectively labeled as "public" in the directory.
In contrast, databases typically provide security enforcement at just the "column" level and don't provide any built-in means for delegation. So, if a user is allowed to see "Salary" values, they will be allowed to retrieve the salary value for any row in the table. This type of a security model is fine when only "known" applications (i.e., applications that can be audited and controlled centrally) have access to the data and can be "trusted" to not provide improper access to users with insufficient authority. Here, the application will have access to much more information than the user does directly, but contains business rules that effectively implement a more granular access control model than the database itself.
Unfortunately, since directory services are designed for data sharing and usually are deployed in ways that ensure broad data access, there is no way to verify the identity of all the applications that may make a request for information—and what these applications will do with the data once they get it. Therefore, security must be enforced by the directory service itself instead of relying on the applications for any or all of the necessary controls. Further, there may be cases in which the owner of data needs the ability to lock down individual pieces of data (such as the direct-dial telephone number of the CEO in a directory where telephone numbers are generally accessible) in order to achieve the granularity of control required to make a class of data available for sharing. In the example of the CEO's telephone number, if attribute-level security were not provided, the only alternative might be to deny access to all telephone numbers to all applications.
It is important to note that there is no requirement that administrators assign a specific access control right to each attribute on each object. Instead, they can assign an access right on an OU higher up in the tree and let the right inherit downwards onto objects contained within the OU and OUs that are children (and grandchildren, etc.) of the OU where the right was applied. This enables administrators to achieve an ideal balance of high-level management of rights and the ability to apply specific rights when needed.
Perhaps the most defining characteristic of directory services is the way they replicate data to other directory servers in the network. In particular, most modern directories (the biggest exception being the iPlanet Directory) perform what is called "multi-master" replication. This term refers to the fact that there can be multiple replicas of a particular directory in a network, and all directory operations (such as reads and updates) can be performed against any replica even if that replica is out of network contact with the other replicas. This means that all replicas must store a full copy of the data (including access control information) and record a log of all updates so that replicas that were not reachable at the time of the update can be contacted (and brought up to date) at some time in the future.
While most relational databases support some form of multi-master update, they tend to be deployed in a single-master configuration to enable synchronous transactional updates. Transactions are not implemented—by design—in multi-master directory architectures because all replicas may not be reachable at once and the majority of transactions would abort due to failure to obtain locks against all of the replicas. This means that directory services trade transactional capabilities for higher availability access to data and the ability to continue localized operation (e.g., in a branch office) despite losing connectivity to some or all other servers in the network.
Of course, giving up transactional protections comes at a price. In the case of directory services, the tradeoffs include:
There is no way to lock an object or its attributes for update across all replicas. Transactions would normally create such locks automatically. Consequently, there is no way to update a collection of objects where atomicity and isolation semantics are enforced.
The same data may be updated on two replicas by two different applications within the same replication interval, resulting in a "collision". Directories resolve collisions automatically at the attribute level by picking a "winning update" (in what should be thought of as an arbitrary way) and propagating that update to all of the replicas that contain the "losing" update.
In both cases, all replicas eventually will converge on the same values for all objects and attributes. However, there is no way to ensure that other replicas will see the same set of updates in the same order as on the originating replica or that all replicas even "see" a given update (e.g., the replica that initiated the "winning" update in a collision situation never sees the update that "lost"). This means that attribute values will be the same across replicas but may be inconsistent from referential integrity or application correctness perspective. For example, one application could set "Candidate = George Bush" and "Party = Republican" and another application could set "Candidate = Al Gore" and "Party = Democrat" and, due to collisions, "Candidate = George Bush" and "Party = Democrat" could be the result that is agreed upon between the replicas.
Similar, but transient, conditions can also result from the fact that updates affecting multiple objects are not transmitted between servers using transactions. This means that applications can read a group of objects that are in the process of being updated via replication from another server and receive what amounts to an intermediate (and inconsistent) state.
Lest the reader conclude that directory services have a considerable design flaw, it is helpful to note:
Such behavior is absolutely necessary to support operation when replicas are disconnected from the rest of network. Disconnected operation is critically important to operations such as adding new users and resetting passwords—and being disconnected should not force a branch office system, for example, to suspend operation.
There are techniques for detecting and dealing with inconsistencies due to replication latency and collision. These include marking groups of updated objects with a unique identifier (such as a GUID) so that applications that read the data can verify that all components of the update have been received.
Not all applications are going to be good candidates for directory integration.
Based on the five characteristics identified earlier, a number of "best practices" emerge for what types of data are appropriate to store in a directory service. In particular, application designers should look to directory services when data:
Is "interesting" throughout the network. Simply put, replication is "expensive" and networks with many replicas can consume a considerable percentage of available bandwidth by simply propagating updates. Therefore, before data is stored in a directory (and "pushed" via replication to the "far corners" of the network) it is important to make sure that some application in those corners will actually use the data.
Changes "slowly". Even if information is interesting throughout the network, if updates become too frequent, replication traffic can overwhelm even fast networks. Two "rules of thumb" for identifying data that is changing "too frequently" are:
The ratio of updates to reads is greater than 1:100
Updates occur more frequently than two times the maximum latency of the replication topology. Most directory services perform replication periodically (versus continuously) and topologies can be configured so that an update on one replica may need to take more than one "hop" to reach all other replicas. If the maximum latency in the system is computed (e.g., in two hours an update on any replica will reach all other replicas) then two times that value (e.g., four hours) will ensure that each update will be seen by each replica, and remain unchanged on the replica, for a minimum of one replication cycle. This heuristic has been shown to simplify a number of application programming issues.
Benefits from hierarchical storage and security models. Directory services offer the greatest value when the data they store benefits from "container-style" storage and inheritance-based security. For example, data related to managing network operating systems and enterprise white pages usually maps well to a hierarchical model. In other cases, such as with the data corresponding to large Business to Consumer (B2C) e-commerce sites—where users are stored as huge, flat lists—the fit may not be as strong.
Doesn't require transactional update. As mentioned earlier, directory services do not support transactions so any application that must observe the ACID (Atomicity, Consistency, Isolation & Durability) properties (such as an inventory database) is not a good fit for storing its data in a directory.
Doesn't require change history information. For many reasons beyond the scope of this article, it is not practical to create a centralized log that reflects all of the updates that have been applied on all replicas throughout the network. Therefore, applications that require a log of changes that have been made to data (e.g., for auditing reasons) are not a good fit for a directory service.
Is tolerant of replication latency and inconsistency. Due to the store and forward style of replication used by directory services, and the fact that replicas frequently are connected by unreliable network connections, applications must not assume that updates will propagate between replicas on any particular schedule or will arrive in any particular order.
It is also important to note that a directory service can still play a role even if data doesn't meet all of the requirements outlined above. For example, an "inventory" database can be "published" in a directory so that applications can discover its location dynamically (see "Service Publication") even though the transactional nature of the data itself is a poor fit for storing directly in a directory service.
Based on the guidelines above, examples of good uses for a directory include:
"White pages" information. Contact and location information about people is of interest throughout the network and generally changes very slowly.
User credential & security group information. Security-related information may be needed at various locations in the network to ensure that access controls are being applied consistently regardless of where a user logs on in the network. And, this information doesn't tend to change quickly.
Network configuration and service policies. With more "edge devices" being used to support virtual private networking (VPN) to users and business partners, it is important that configuration information (such as which users have access and the level of link security they require) is made available to all possible points of access.
Application deployment policies. By integrating application deployment policies with a hierarchical directory, administrators can simplify management by assigning applications to OUs and groups instead of having to manage users one-by-one. For example, human resources application can be assigned via a policy to all members of the "Personnel" OU.
"Not So Good" Examples
Examples of applications that are not as appropriate for directory services include:
Accounting. Accounting applications require transactional update protections and updates typically occur quickly.
Centralized data collection. It is tempting to use the replication features of a directory to propagate data (such as event log information) from remote servers to a centralized location. However, this would create traffic to all replicas, not just the ones located centrally. Combined with the frequency of updates, available network bandwidth could be consumed quickly.
Hardware inventory. Most data about the hardware configuration of client and server machines is not interesting to applications throughout the network. Here it is probably better to store a reference to an inventory information provider on each of the machine objects in the directory. Then when an application wants configuration data for a particular machine, it can look up the machine in the directory, find the interface attribute, and bind directly to the information provider on that machine.
Process control. Any application that is dependant on the timing of data propagation between replicas is a bad fit for a directory service.
Directory Integration Models
Based on the preceding information about the nature of directory services, it is possible to identify a series of architectural models where directories offer clear value when integrated with applications.
Directory Object Extension
Most applications use some form of repository to store information about users and service configurations. For example, e-mail systems have address books that contain lists of users along with their e-mail addresses and other information such as telephone numbers. In addition to basic tasks such as authentication and authorization, such information also enables valuable functionality within applications, such as the ability to search for people or shared resources. Within most companies, however, information about people and resources now exists (and must be maintained) within many different application-specific directories. This proliferation of data is largely the result of developers being unable to assume that a suitable repository will exist in the networks for which they were designing their application.
Directory services are well suited to address the proliferation problem. First, with the popularity of directory-enabled network operating systems (NOS) and enterprise directory products, it is now more likely than ever that networks will contain a standards-based directory service. Second, the granular and centralized security models of directory services are ideal for environments that include applications from multiple departments or organizations that do not want to trust other applications to implement security protections correctly. Finally, directories are designed to be extensible to accommodate change without significant disruption to other applications.
Such characteristics make it possible to architect applications to extend existing directory objects (such as those used to represent users) instead of implementing an entirely new repository—at least for the kind of data that is likely to be redundant between applications. Having information about users, machines, and other objects consolidated in a directory service allows administrators to view, change, and manage information in one place. Applications also can take advantage of information stored in the directory by other applications. For example, a human resources application could store the name of each employee's manager as an extra attribute on each employee's user object in a directory. The accounts payable system could store approval limits for expense reports on each manager's user object. A new online expense-report approval system could then be developed that would automatically send an employee's expense report to the person in the management chain with sufficient authority to approve a given report.
In a network-computing environment there will be many different application components running on many different machines. User desktops will run client-side applications ranging from payroll and accounting systems to automated backup systems that run by themselves after business hours. Other machines will contain server-side elements including databases, shared application components, and network services such as file and print servers. Historically, client-side applications have used static configuration files to hold location information about the services they access. Because of the increasingly dynamic nature of most network environments, however, maintaining associations between client and services is becoming more difficult and adds significantly to ongoing administrative costs. Moving a resource from one server machine to another, for example, can be a time-consuming and costly process because each client machine may require an update to its configuration data.
Directory services address this issue by enabling the server side of applications to "publish" information about the services they provide. Then, when a client application needs access to a particular application or server, it:
Looks up the resource by name in the directory using a programming interface such as LDAP.
Retrieves the "binding" information associated with the resource (for instance, database name, connection point or TCP/IP port address).
Connects to the resource dynamically and begins using it.
To facilitate higher availability and performance, applications can also be programmed to support multiple providers of the same service within the same network. Each provider then registers itself with the directory service using the same name. When all providers are running and reachable, users see better performance because the load is shared across more than one machine. When a machine or network connection fails, users see higher availability because their client applications can locate alternative machines that provide the same service. Service publication also makes it easier for administrators to move services between machines (for instance, to take advantage of available CPU resources)—even on a daily basis—because the need to reconfigure clients is removed.
The process is analogous to making a telephone call to a business such as a hardware store to see if a product is in stock. As long as the consumer knows the store's name, he or she can look up the telephone number (in the phone book) of the store closest to home. If the hardware store is currently sold out of the product or the telephone line is busy, the person can look up the telephone numbers for other locations of stores with the same name—which may have the product in stock.
When most people think of "roaming" the first thing that probably comes to mind is working from a different company office when traveling or perhaps working from home over dial-up lines. Roaming in this sense means accessing services (such as e-mail) over a different communication path and using different devices (such as printers) than usual. Another example is working in an environment where computers are shared between multiple users. Here, roaming has more to do with the particular machine on which the person is working than the path they take over the network to get services. In both cases, the more seamless the user's experience (compared to where they usually work from, or the machine they last used) the happier and more productive that user is going to be.
One simple way to ensure a seamless experience is to store user configuration information in a directory service and read this information when a user logs on (or starts an application) to establish the services and configuration to that which they last used. Additionally, assuming that information about shared resources is stored in the directory, settings such as the "default printer" can automatically be set to sensible values with no intervention required from the user. Directory service features such as multi-master replication ensure that configuration and service data will be available regardless of where the user roams within the network—even if LAN/WAN links are unavailable to the replica that they usually access.
It is extremely common for companies to be organized in a hierarchical structure of some sort. For example, a company might have product development, marketing, sales, human resources, and accounting departments that all report to the chief executive officer. Within the sales group, there might be a headquarters staff and four regional managers who report to the vice president of sales. And so on.
Most companies also have the concept of groups that span across the different divisions and departments of the organization. For example, the Human Resources department may keep track of managers even though managers as a grouping concept spans across the organizational hierarchy. In other cases, people from different departments and different roles (for instance, managers and non-managers) will have to work together on a project. They also can be thought of as a group.
It is very common for companies to want to allocate and control resources, such as systems management functions, applications, file access and storage limits, based on where users reside in the organization and the groups of which they are members. For example, companies may decide to allow only managers to run certain human resources applications, or to configure backup applications to do full (as opposed to incremental) backups of machines used by employees in the accounting department.
Despite how intuitive this seems most companies today must work very hard to implement these types of policies. For example, administrators typically have to know which applications to install on an employee's machine when the employee joins a department or is added to a group. If an individual moves to a different department or changes groups (for instance, because of a promotion) someone has to ensure that they have the set of applications and privileges appropriate for the new role. It is also common for administrators to have to configure applications on a person-by-person basis to implement policies such as backup intervals and storage limits. And even the best configuration management procedures can be thwarted when users decide to customize or change their application settings in ways that make standardized support more difficult.
To address these issues, some directory services (such as the Active Directory™ service) provide features that allow administrators to implement a policy-based installation, configuration, and management environment based on the organizational hierarchy and group definition information stored in the directory. Policy management features enable administrators to associate policy attributes, such as the names of applications that should be installed or settings that should be applied based on membership within organizational units (OUs) defined within the directory tree. In addition, administrators can further refine policies to include or exclude the members of security groups. For example, administrators could deny access to a timecard reporting application by default and then define a tree-wide policy—that grants access to the timecard application—which only gets applied if the user is a member of the "manager" group.
It is also possible to architect applications to inspect hierarchy and group information and behave according to the location of the user in the tree (i.e., what OU they are stored in) and their group memberships. For example, the name of a database server could be stored in a policy and the policy applied to the accounting department organizational unit. When users join (or are moved into) the accounting department OU, their applications can be reconfigured automatically to access the correct database.
Another example of policy-based application integration is the Directory-Enabled Networking (DEN) initiative, formed originally by Microsoft and Cisco Systems, to develop standards in the Distributed Management Task Force (DMTF) for improving the manageability of networks by using a directory service. Here, the applications are network management systems that use policies to define and enforce Quality of Service (QoS) and other behaviors (such as VPN security levels) within a network. For example, by placing a QoS policy on an OU, members of that OU can be granted more bandwidth than members of other OUs. Assuming that a hospital has an OU that represents "emergency room doctors" who need high bandwidth to quickly receive and read CAT scans and other lab test results, applying this type of policy makes a lot of sense. And, being able to implement the behavior by simply associating a policy with an OU that will naturally and accurately represent who the "emergency room doctors" are at a given point in time eases management complexity.
Many companies have begun to realize the value of directory services because of the ways that directories help simplify user and machine management in NOS and Enterprise directory roles. For example, storing information about users and machines hierarchically in a NOS directory allows administrators to delegate management responsibilities to appropriate individuals within departments and groups. This frees administrators to focus on other tasks and gives more autonomy and control to users.
Extending the role of directory services to include application integration enables companies to extend these benefits: simplified management, enhanced network services, greater application functionality, lower TCO, and synergy between all directory-enabled components of the network-computing environment. This "raises the bar" for what a directory service needs to be—directories that offer just simplified administration or support a single application deny their users many important benefits.
Clearly, directory services are not a replacement for relational databases—and vice versa. But, when used together, directories and databases can significantly lower the cost and complexity of managing today's complex distributed systems.
Microsoft TechNet – Windows 2000 Server Technology Center (http://www.microsoft.com/technet/prodtechnol/windows2000serv/default.mspx )
Microsoft Windows 2000 Server Directory Services (http://www.microsoft.com/windows2000/technologies/directory/default.asp )