Hey, Scripting Guy! Famous Last Words

The Microsoft Scripting Guys

The famous Greek philosopher Socrates (who died in the year 399 BC, shortly before the Scripting Editor was born) is perhaps best known for saying, "An unexamined life is not worth living." Most people are familiar with this saying. Remarkably enough, however, the Scripting Guys recently discovered that Socrates never said that "an unexamined life is not worth living." As it turns out, that was a mistranslation made by a medieval scribe hundreds of years ago. Instead, what Socrates actually said was this: "An unexamined XML file is not worth having."

Now there's a philosophical saying that actually makes sense! XML is becoming more and more popular these days; a quick search of one of the Scripting Guys' test computers uncovered more than 500 XML files used by various systems or applications.

Needless to say, a lot of very important data is now stored in XML format. And that's fine. Unless, of course, that data goes unexamined. And, sadly, that's often the case, if for no other reason than this: many people don't have any idea how to even go about examining an XML file. In particular, they have no idea how to query and search that XML file.

Note If you're not up on your Greek philosophers, Socrates was a big proponent of going around talking about things but not all that big on going around doing things; in fact, his wife, Xanthippe, referred to him as a "worthless idler." You can see why the Scripting Guys have a soft spot in their hearts for Socrates.

Of course, some of you who are reading this might be thinking, "Wait a second: didn't the Scripting Guys already discuss searching an XML file in a previous column ("Chasing Cars … and XML" at technet.microsoft.com/magazine/cc162506)? In that case, why are they revisiting that topic now; are the Scripting Guys so lazy that they're willing to write the same column over and over again?"

Believe it or not, we're even lazier than that. But that's not the primary reason we're revisiting this topic. In our previous column, we talked about working with an XML file that was structured like the code in Figure 1, with each and every property value representing a single node in the file.

Figure 1 XML file with nodes only

<?xml version='1.0'?> 
  <INVENTORY> 
    <COMPUTER>
      <os>Windows XP</os>
      <department>Human Resources
      </department>
      <name>atl-ws-001</name>
    </COMPUTER> 
    <COMPUTER>
      <os>Windows XP</os>
      <department>Finance</department>
      <name>atl-ws-002</name>
    </COMPUTER> 
    <COMPUTER>
      <os>Windows Vista</os>
      <department>Finance</department>
      <name>atl-ws-003</name>
    </COMPUTER> 
  </INVENTORY>

That's nice, except—as people were quick to write in and point out—that's not the only way that XML files can be structured. In the preceding example, each computer in the file has its own node; in turn, each of those nodes has a number of child nodes (os, department, and name). However, it's also possible to construct an XML file in which the individual nodes don't have any child nodes; instead, additional property values are configured as attributes. You know, like the file shown in Figure 2.

Figure 2 XML file with attributes

<?xml version='1.0'?> 
  <HARDWARE> 
      <COMPUTER os="Windows XP" department="Human Resources">atl-ws-001</COMPUTER> 
      <COMPUTER os="Windows XP" department="Finance">atl-ws-002</COMPUTER> 
      <COMPUTER os="Windows Server 2003" department="IT">atl-fs-003</COMPUTER> 
      <COMPUTER os="Windows Vista" department="IT">atl-ws-004</COMPUTER> 
      <COMPUTER os="Windows Vista" department="Human Resources">atl-ws-005</COMPUTER> 
      <COMPUTER os="Windows Vista" department="Finance">atl-ws-006</COMPUTER> 
      <COMPUTER os="Windows XP" department="Sales">atl-ws-007</COMPUTER> 
      <COMPUTER os="Windows Server 2008" department="IT">atl-fs-008</COMPUTER> 
      <COMPUTER os="Windows XP" department="Human Resources">atl-ws-009</COMPUTER> 
      <COMPUTER os="Windows Vista" department="Sales">atl-ws-010</COMPUTER> 
  </HARDWARE>

Is that a problem? As a matter of fact, it is. The script we showed you a long time ago, in an issue far, far away, won't work with an XML file structured like the one shown in Figure 2. It's just not going to happen. And that is definitely a problem because many of you need to be able to read this type of XML file.

All you had to do was ask. Well, that and also wait a year and a half before we finally got around to addressing this issue.

Before we go much further, let's take a closer look at our XML file. In this case, we have a file with a primary node named HARDWARE; each individual computer exists as a child node of HARDWARE. In addition to this, each of these nodes has a pair of attributes: os (used to store the name of the operating system installed on the computer) and department (used to store the name of the department that owns the computer).

Suppose, as an example, that we would like to get a list of all the computers that are running Windows XP. Can we do that using a script? Of course we can:

Set xmlDoc = _
  CreateObject("Microsoft.XMLDOM")

xmlDoc.Async = "False"
xmlDoc.Load("HARDWARE.xml")

Set colNodes=xmlDoc.selectNodes _
  ("//HARDWARE/COMPUTER[@os='Windows XP']")

For Each objNode in colNodes
  Wscript.Echo objNode.Text 
Next

Let's see what we have here. To begin with, we create an instance of the Microsoft.XMLDOM object; as the name implies, this is the object that enables us to work with XML files. Once we've created the object, we next set the Async property to False; this allows the script know that we want to load the document synchronously rather than asynchronously. Why does the script care about that? To be honest, it probably doesn't; after all, scripts (like Scripting Guys) are inanimate objects.

However, you should care. If we loaded the document asynchronously, the script would continue to run even if the document had not been fully loaded. That's not good: imagine the problems that might occur should you try to perform a task on a document that doesn't exist yet. Loading an XML file synchronously guarantees that the file will be fully loaded before our script proceeds.

By amazing coincidence, we're now at the point where we load our XML file. That's something we do by calling the Load method and opening the file C:\Scripts\HARDWARE.xml, which brings us to this line of code:

Set colNodes=xmlDoc.selectNodes _
  ("//HARDWARE/COMPUTER[@os='Windows XP']")

What we're doing here is we're using the SelectNodes method to query the XML file and to tell the script which XML nodes we want to retrieve. Notice the syntax, which, admittedly, is a little unusual.

To begin with, we need to specify the path within the XML file. Our XML file has a top-level node named HARDWARE, followed by a series of second-level nodes named COMPUTER. Each of these second-level nodes represents a single record in our XML data file. That's why our selectNodes query starts out like this:

//HARDWARE/COMPUTER

At that point we encounter this construction:

[@os='Windows XP']

This portion of the query is similar to a WHERE clause in a standard SQL query. With SQL we might use a database query similar to this to retrieve a list of all the computers running the Windows XP operating system:

SELECT Name FROM Hardware 
    WHERE OS = 'Windows XP'

With the SelectNodes method we're doing something similar: we're asking to get back a list of all the computers where the os attribute (@os) is equal to Windows XP. And to indicate that this is a WHERE clause, we enclose the entire clause in square brackets.

Note This type of query is known as an XPath query. To find out more about XPath, look at msdn.microsoft.com/library/ms256115.aspx.

Like we said, it's a little weird, but it works. What if we wanted to get a list of all the computers that belong to the Finance department? That's no big deal; our call to selectNodes would look like this:

Set colNodes=xmlDoc.selectNodes _
  ("//HARDWARE/COMPUTER" & _
   "[@department='Finance']")

Again, our WHERE clause is enclosed in square brackets, we preface the attribute name (department) with the at sign (@), and then we indicate the value we're interested in.

Simple, huh?

After we have called the selectNodes method, we can then echo back the computer names simply by looping through the collection of returned values and echoing back the Text property, as shown in the following:

For Each objNode in colNodes
  Wscript.Echo objNode.Text 
Next

And what sort of information can we expect to get back? To tell you the truth, we fully expect to get back information like this:

atl-ws-001
atl-ws-002
atl-ws-007
atl-ws-009

In other words, it returns the names of all the computers that are still running Windows XP.

Incidentally, you're not limited to reporting back just the computer name; because the computer name is the "default" value for each node, that's simply what we get back if we reference the Text property.

Alternatively, we could specify exactly which attribute values we want returned. For example, take a look at this modified For Each loop:

For Each objNode in colNodes
    Wscript.Echo objNode.Text 
    Wscript.Echo objNode.Attributes. _
      getNamedItem("department").Text
    Wscript.Echo
Next

As you can see, in this loop we're still echoing back the value of the Text property; however, we've also tacked on this line of code:

Wscript.Echo objNode.Attributes. _
  getNamedItem("department").Text

In this case, we're using the get­NamedItem method to retrieve the value of the department attribute; we then echo back the Text property for that attribute. That's how we can specify which attribute values we do (and do not) echo back to the screen. (Note too that we added on a Wscript.Echo command to put a blank line between records.) When we run this script, we should get back the following:

atl-ws-001
Human Resources

atl-ws-002
Finance

atl-ws-007
Sales

atl-ws-009
Human Resources

And yes, you can write more complex queries if you choose. For example, suppose you'd like a list of all the computers running either Windows XP or Windows Vista. In SQL, you'd do that by writing an OR query. Guess what? You do the exact same thing when querying an XML file:

Set colNodes=xmlDoc.selectNodes _
    ("//HARDWARE/COMPUTER" & _
     "[@os='Windows XP' or " & _
     "@os='Windows Vista']")

See what we did here? Inside a single set of square brackets we provided two criteria: @os='Windows XP' or @os='Windows Vista'. That's all you have to do, and it's going to give us back a report that looks like Figure 3.

Figure 3 Result of OR query

atl-ws-001
Human Resources

atl-ws-002
Finance

atl-ws-004
IT

atl-ws-005
Human Resources

atl-ws-006
Finance

atl-ws-007
Sales

atl-ws-009
Human Resources

atl-ws-010
Sales

If you double-check the XML file, you'll see that the computers running Windows XP or Windows Vista are included in the output; computers running Windows Server 2003 or Windows Server 2008 are not included in the output. Nor should they be.

You have a question? You'd like to write a more restricted query, one that returns only those computers that are running Windows XP and belong to the Finance department? As you might have guessed, that's easy; all you have to do is write an AND query like this:

Set colNodes=xmlDoc.selectNodes _
    ("//HARDWARE/COMPUTER" & _
     "[@os='Windows XP' and " & _
     "@department='Finance']")

Doing that will then return the following dataset:

atl-ws-002
Finance

Why just one item in our report? You got it: because the Finance department has only a single computer that's running Windows XP.

We hope we've helped all of you to deal with this particular type of XML file. If it turns out that there's yet another type of XML file lurking out there, well, good luck with that.

Because we started this month's column off with a famous saying, we thought it would be fitting to end the column on a similar note. With that, we'd like to close with the words of Kaiser Wilhelm II, words we Scripting Guys hold dear: "I herewith renounce for all time claims to the throne of Prussia and to the German Imperial throne connected therewith."

Dr. Scripto's Scripting Perplexer

The monthly challenge that tests not only your puzzle-solving ability but also your scripting skills.

October 2008: VBScript Fill ‘Er Up

To solve this puzzle, simply fill in each row with the name of a VBScript function. When you're done, unscramble the letters in the blue squares to find one more function name.

fig10.gif

ANSWER:

Dr. Scripto's Scripting Perplexer

Answer: Octobe 2008: VBScript Fill ‘Er Up

puzzle_answer.gif

The Scripting Guys—Greg Stemp and Jean Ross—work for Microsoft.