This documentation is archived and is not being maintained.
Hey, Scripting Guy! Can This Relationship Be Saved?
The Microsoft Scripting Guys work for—well, are employed by—Microsoft. When not playing/coaching/watching baseball (and various other activities) they run the TechNet Script Center. Check it out at www.scriptingguys.com.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.
The Microsoft Scripting Guys
Download the code for this article: HeyScriptingGuy2007_09.exe (150KB)
Relationships do matter. But how would the Scripting Guys, of all people, know that relationships matter? Well, for one thing, we watch a lot of TV, and on TV they're always telling us that relationships matter.
And that's not just on soap operas or TV movies of the week. For example, there's a payday loan company commercial that encourages people to patronize their business. Is that because they're willing to loan money to people who probably shouldn't be getting a loan in the first place? No. Is that because they'll charge you as much as 900 percent annual interest on that loan? Of course not. It's because this payday loan company forms a personal relationship with you. As their commercial states, "Relationships matter."
Note: sure, we're crying—we're not ashamed of that. After all, you'd have to be pretty cold and callous not to tear up a little just thinking about the deep, enduring ties that bind a person and their payday loan company.
Of course, we know what you're thinking: "Gosh, if relationships matter to payday loan companies, then relationships must really matter to people who write scripts that interact with databases." And the fact of that matter is that relationships should matter to people who write scripts that interact with databases. Unfortunately, though, these script writers don't always realize that relationships matter. But that's OK; in this month's column we'll set everyone straight.
We know that a lot of system administrators use databases, often as a way to track their hardware inventory. The problem is that many of these databases are set up in somewhat less-than-optimal fashion. For example, suppose you need a database that keeps track of the disk drives attached to all your computers. If you don't have a background in database design, there's a good chance you'll create a one-table database like the one in Figure 1.
Figure 1 A one-table database
As you can see, that's a very simple design: there's a field for the computer name, and a couple of Yes/No fields used to indicate whether a computer has a C: drive and a D: drive. And that's about it. Crisp, clean, and to-the-point.
So what's wrong with this database? Well, to tell you the truth, pretty much everything. If you can be sure none of your computers have more than two disk drives, this design will work (sort of). However, what happens if you have a computer that has three disk drives? In that case, you're going to have to add another field, one that keeps track of drive E. OK, but what happens if you have a computer that has eleven disk drives? And what if you want to keep track of the properties for each of those drives? For example, suppose you'd like to keep track of the total size of each drive. In that case, you're going to need a Drive_C_Size field. And a Drive_D_Size field. And a Drive_E_Size field, and—well, you get the idea. (And heaven forbid you'd also like to keep track of available disk space, disk connector type, whether disk quotas have been enabled on the drive, and so forth.)
Here's a rule of thumb to keep in mind: any time an entity can have more than one of something (for example, a computer can have more than one disk drive), this "flat-file" (one-table) design isn't very good. True, in the case of disk drives you might think you can get away with it; after all, you're only going to have a finite number of drives connected to a computer. But suppose you want to query your database and get back a list of all the disk drives that are 100 gigabytes or larger. How would you go about doing something like that? Well, you'd have to first look for 100-gigabyte drives in the Drive_C_Size field, then look for the same thing in the Drive_D_Size field, then the Drive_E_Size field, then.... All in all, not a very efficient or effective approach. But what's the alternative? If you can't use a flat-file database, what can you use?
That's easy: a relational database. Relational databases are designed, among other things, to handle one-to-many relationships (for example, one computer that can have many disk drives). And before you ask, no, you don't have to go out and buy anything new in order to use a relational database; if you have a database of any kind (Microsoft® Access™, SQL Server™, Oracle, whatever), the odds are that you already have a relational database. You don't need to buy anything; you just need to know two things: 1) how to set up a relationship between two tables in that database; and 2) how to write a script that can take advantage of that relationship.
Note: according to TV you should never try to take advantage of a relationship. But this is different.
Because this column isn't called Hey, Database Design Guy!, we won't spend a lot of time discussing database design. Instead, we'll just show you a very simple relational database design that will enable us to talk about writing relational queries. In this database we have two tables. The first is named Computers and consists of a pair of fields: ComputerName and SerialNumber. The ComputerName field is where we'll store the computer name, and the SerialNumber field is where we'll store—oh, you guessed it. That's right, the serial number.
Then where in the world do we put our disk drive information? That goes in our second table, which is named DiskDrives. This table contains three self-explanatory fields: SerialNumber, DriveLetter, and DriveSize.
The key field here is SerialNumber. If that field name sounds familiar, well, it should: we have the exact same field in the Computers table. And that's not an accident or a coincidence; we did it on purpose. In order to establish a relationship between these two tables we need to have one field that appears in each table; that's how we can tell which disk drives belong to which computers.
Why did we link the two tables using serial number rather than, say, computer name? There's actually a good reason for that: computer names can (and often do) change. Serial numbers typically don't.
But enough talk. Let's take a look at Figure 2, a script that can search these two tables and retrieve a list of the disk drives that belong to each computer. Figure 3 shows the kind of data this script returns.
Const adOpenStatic = 3 Const adLockOptimistic = 3 Set objConnection = CreateObject("ADODB.Connection") Set objRecordSet = CreateObject("ADODB.Recordset") objConnection.Open _ "Provider = Microsoft.Jet.OLEDB.4.0; " & _ "Data Source = c:\scripts\test.mdb" objRecordSet.Open "SELECT Computers.*, DiskDrives.* FROM Computers " & _ "INNER JOIN DiskDrives ON Computers.SerialNumber = DiskDrives.SerialNumber " & _ "ORDER BY Computers.ComputerName, DiskDrives.Drive", objConnection, adOpenStatic, _ adLockOptimistic objRecordSet.MoveFirst Do Until objRecordset.EOF Wscript.Echo objRecordset.Fields.Item("ComputerName") Wscript.Echo objRecordset.Fields.Item("Drive") Wscript.Echo objRecordset.MoveNext Loop objRecordSet.Close objConnection.Close
We're not going to discuss the fundamentals of connecting to and working with databases in this month's column; if you're new to database scripting you might want to take a look at the Scripting Guys webcast "Database Scripting for System Administrators" (go.microsoft.com/fwlink/?LinkId=22089). All we will say is that we are connecting to an Access database (C:\Scripts\Test.mdb) and working with relational tables named Computers and DiskDrives. That much should be pretty clear from the script.
OK, one more thing we will say is that you need to make one small change to get this to work with Access 2007: it's important that the Provider needs to be changed from Microsoft.Jet.OLEDB.4.0 to Microsoft.ACE.OLEDB.12.0, like this:
objConnection.Open _ "Provider = Microsoft.ACE.OLEDB.12.0; " & _ "Data Source = c:\scripts\test.accdb"
That's it. What we're going to do, instead, is focus on the SQL query that retrieves data from our two tables:
objRecordSet.Open "SELECT Computers.*, " & _ "DiskDrives.* FROM Computers " & _ "INNER JOIN DiskDrives ON " & _ "Computers.SerialNumber = " & _ "DiskDrives.SerialNumber " & _ "ORDER BY ComputerName, Drive", _ objConnection, adOpenStatic, adLockOptimistic
Complicated? Maybe a little. So let's see if we can break this down a bit and make it easier to digest.
The first part of our query is actually pretty easy. We want to select all the fields in both tables: Computers and DiskDrives. Hence this bit of SQL:
That's not too bad; the asterisk, needless to say, is simply a wildcard character that means "everything."
If you've written SQL queries before (or if you've written Windows® Management Instrumentation, or WMI, queries, which use a subset of the SQL query language), you know the routine: after specifying the items you want to select, you next specify where you want to select these items from. Thus the next piece of our query puzzle:
Don't let this throw you. True, it's more complicated than a typical FROM statement, but for good reason. After all, in a typical SQL query we only select data from one table; this time we're selecting data from two tables at once.
Let's take a closer look at the syntax and see how it works. As you can see, we're asking the script to select data from the Computers table and from the DiskDrives table. Notice, however, that we don't use the word "and"; instead, we use the term INNER JOIN. This term defines the type of relationship and, in turn, describes how we're going to join information from two separate tables into a single recordset. (And yes, there are other types of joins; we'll discuss those in a few minutes.)
As we've implied, it's up to us to specify the type of relationship that exists between our tables; it's also up to us to specify exactly how those tables are related. That's what this bit of code does for us:
What we're doing here is defining how we're going to join our tables. Records will be grouped together any time the SerialNumber field in the Computers table matches the SerialNumber field in the DiskDrives table. What if we had used a different field (say, ComputerName) as our joining field? Well, then our code would look like this:
In case you're wondering, no, the two fields do not have to have the same name; they just need to contain the same data. Using the same name makes it easier to identify the relational field. There's just one minor catch to keep in mind: because we have two fields with the same name, we need to always use the syntax Table.Name.Field.Name when referring to one of these fields. That means using code like this: Wscript.Echo obj- Recordset.Fields.Item("Computers.SerialNumber").
That's really all we need; the rest of the query merely sorts the data, first by computer name and then by drive:
See, that wasn't so bad, was it? We need to pause for a few seconds now and talk about joins. Why did we use an inner join in our query? That's easy. An inner join returns only records that have matching values in each table. In other words, suppose we have serial number 121989. If that serial number appears in both the Computers and DiskDrives tables, the corresponding records will be returned. (Of course, the serial number can't just show up anywhere; it must be in the SerialNumber field.) Does this make sense? Good.
Now, suppose we have a computer that has the serial number 121989; however, there are no disk drives with that serial number. In that case, no data for the computer with serial number 121989 will be returned. That's because an inner join only returns information if a record has matching values in each of the joined tables.
In other words, an inner join query returns all the computers that have hard disks; what it won't return are any computers that don't have hard disks, or any hard disks that aren't installed in a computer. That's good; that's often what we want. But what if you do want a list of computers that don't have hard disks or hard disks that don't have computers? What then?
That's what the outer join is for. (Oh, so that's what the outer join is for!) For the time being, we're going to say that there are just two different types of outer joins: Left and Right. In our database we have two tables: Computers (which is considered the "left" table because it's the master table) and DiskDrives (which is considered the "right" table because it's, well, not the master table). Suppose we want our returned recordset to include all computers, even those that don't have any disk drives installed. In that case, we use the syntax LEFT OUTER JOIN, and a query that looks like this:
objRecordSet.Open "SELECT Computers.*, " & _ "DiskDrives.* FROM Computers " & _ "LEFT OUTER JOIN DiskDrives ON " & _ "Computers.SerialNumber = " & _ "DiskDrives.SerialNumber " & _ "ORDER BY ComputerName, Drive", _ objConnection, adOpenStatic, adLockOptimistic
As you probably figured out, a left outer join returns all records in the left table, even if there are no corresponding records in the right table. So, we get back all computers, but only records from the right table (disk drives) that are associated with a computer.
Alternatively, maybe we want a list of all the disk drives, including those that aren't installed in a computer. Because the DiskDrives table is the right table in the relationship we use the syntax RIGHT OUTER JOIN, like so:
objRecordSet.Open "SELECT Computers.*, " & _ "DiskDrives.* FROM Computers " & _ "RIGHT OUTER JOIN DiskDrives ON " & _ "Computers.SerialNumber = " & _ "DiskDrives.SerialNumber " & _ "ORDER BY ComputerName, Drive", _ objConnection, adOpenStatic, adLockOptimistic
Now that we think about it, if DiskDrives is the right table in the relationship, then by all rights Computers should be known as the husband table in the relationship instead of the left table. As some of the Scripting Guys know through painful experience, the husband is never the right entity in a relationship.
With the right outer join we'll get back all the records in the right table (all the disk drives), but only computers (records in the left table) that are associated with a disk drive.
This might not make a lot of sense at the moment, but if you play around with it a bit you'll see how it works. If you need a database to play around with, we've posted a sample one at microsoft.com/technet/scriptcenter/resources/tnmag/archive.mspx.
It is important to mention that any time you use an outer join you should also, at the very least, make On Error Resume Next the first line in your script. Why? Well, suppose we do a left outer join; in that case it's possible we'll get back some computers that don't have any hard disks installed. That's fine (after all, that's what we want to get back), at least until we encounter this line of code, which echoes back the drive letter:
Because no Drive field exists for this particular computer, the script will come to a crashing halt:
If we implement On Error Resume Next, however, the script can ignore the fact that a computer has no disk drives and it can continue on its merry way. Alternatively, you could use code such as this to check the value of the Drive field and then take the appropriate action:
If IsNull(objRecordset.Fields.Item("Drive")) _Then Wscript.Echo "No disk drives installed." Else Wscript.Echo _ objRecordset.Fields.Item("Drive") End If
With this code we check to see if we get back a Null value as opposed to an actual drive letter. If so, then we echo the message "No disk drives installed." If we don't get back a Null value, then we simply echo back the value of the Drive field. The net result is output, as shown in Figure 4.
Without going into much explanation, let's look at a couple more queries. For example, here's an inner join query that returns a list of computers and their installed disk drives, provided those disk drives are larger than 50 gigabytes (we're assuming that drive sizes are stored in gigabytes):
objRecordSet.Open "SELECT Computers.*, " & _ "DiskDrives.* FROM Computers " & _ "INNER JOIN DiskDrives ON " & _ "Computers.SerialNumber = " & _ "DiskDrives.SerialNumber " & _ "WHERE DriveSize > 50 " & _ "ORDER BY Computers.ComputerName, " & _ "DiskDrives.Drive", objConnection, _ adOpenStatic, adLockOptimistic
As you can clearly see, all we did was add a standard WHERE clause to the original query:
What if we only wanted information about Drive E on computers? No problem—just add the appropriate Where clause:
And here's a slightly more complicated query, one that returns a collection of computers that have no disk drives installed:
objRecordSet.Open _ "SELECT Computers.ComputerName, " & _ "DiskDrives.Drive " & _ "FROM Computers LEFT JOIN DiskDrives " & _ "ON Computers.SerialNumber = " & _ "DiskDrives.SerialNumber " & _ "GROUP BY Computers.ComputerName, " & _ "DiskDrives.Drive " & _ "HAVING (DiskDrives.Drive) Is Null " & _ "ORDER BY Computers.ComputerName", _ objConnection, adOpenStatic, adLockOptimistic
Like we said, this one's a little more complicated, and since we're running out of space we won't explain how it works. But it does work, which is the most important thing.
Well, maybe the second most important; as we've pointed out several times, it's relationships that truly matter. Which doesn't mean that relationships can't go awry. As Woody Allen said at the end of "Annie Hall," "A relationship...is like a shark; it has to constantly move forward or it dies. And I think what we got on our hands is a dead shark." Oh, if only Woody had known about inner joins and outer joins. With these query techniques, you'll always have successful relationships!