How to: Manipulate Office Open XML Formats Documents

Summary: Office Open XML Formats files replace legacy binary Office system files. Learn about the components that are included in a formatted file and about several scenarios that show the versatility of these files. (25 printed pages)

Frank Rice, Microsoft Corporation

December 2006

Applies to: Microsoft Office Excel 2007, Microsoft Office PowerPoint 2007, Microsoft Office Word 2007

Download 2007 Office System Sample: Manipulating Office Open XML Formats Files.

Contents

  • Overview

  • Creating an Office Open XML Formats File

  • Exploring the Office Open XML Formats File

  • Manually Editing Documents Created by Using the Office Open XML Formats

  • Manipulating Office Open XML Formats Documents Programmatically

  • Conclusion

  • Additional Resources

Overview

In previous versions of Microsoft Office, files created in Microsoft Office Excel, Microsoft Office PowerPoint, and Microsoft Office Word were saved in a proprietary, single file format; they were known as binary files. The 2007 release of the Microsoft Office system introduces new file formats for Microsoft Office Excel 2007, Microsoft Office PowerPoint 2007, and Microsoft Office Word 2007, named the Office Open XML Formats.

The Office Open XML Formats are based on XML and ZIP archive technologies. Just as in previous versions of Microsoft Office, documents in the 2007 release are saved in a single file or container so that the process of managing documents stays simple. However, unlike legacy files, Office Open XML Formats files can be opened to reveal component parts that give you access to the structures that compose the file.

In this article, you examine Office Open XML Formats files by manually opening files and exploring each of the parts that make up the document. You also work with the documents programmatically. The files used in this article are available in the download that accompanies this article, 2007 Office System Sample: Manipulating Office Open XML Formats Files. If you do not have access to the download, you can substitute your own program files and image files in the examples.

Creating an Office Open XML Formats File

In this section, you examine the XML file format of a sample Word 2007 document that contains text, an image, and properties.

To create an Office Open XML Formats document in Word

  1. Start Word 2007.

  2. In the new document, paste the following text:

    Soaring with the American Bald Eagle

    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nulla rutrum. Phasellus feugiat bibendum urna. Aliquam lacinia diam ac felis. In vulputate semper orci. Quisque blandit. Mauris et nibh. Aenean nulla. Mauris placerat tempor libero.

    Pellentesque bibendum. In consequat, sem molestie iaculis venenatis, orci nunc imperdiet justo, id ultricies ligula elit sit amet ante. Sed quis sem. Ut accumsan nulla vel nisi. Ut nulla enim, ullamcorper vel, semper vitae, vulputate vel, mi. Duis id magna a magna commodo interdum.

  3. Highlight the line Soaring with the American Bald Eagle and then, on the Home tab, in the Styles group, set the style to Title.

  4. Next, insert an image into the document:

    1. Place the cursor at the end of the first paragraph and press Enter to insert a new line.

    2. Then, on the Insert tab, click Picture, navigate to an image file (such as the Eagle1.gif file included in the download), and then click Insert.

  5. Now add some document properties:

    1. Click the Microsoft Office Button, point to Prepare, and then click Properties.

    2. In the Document Properties pane, add an author name, title, subject, and comments, as follows:

      Table 1. Document Property Settings

      Property

      Text

      Author

      Nancy Davolio

      Title

      Soaring with the American Bald Eagle

      Subject

      Bald Eagles

      Comments

      A study of the bald eagle

  6. Next, add a comment into the document body:

    1. On the Review tab, click New Comment.

    2. In the comment balloon, type This is my comment.

      Your document should look similar to Figure 1.

      Figure 1. Sample Office Word 2007 document

      Sample Office Word 2007 document

  7. Next, save the document:

    1. Click the Microsoft Office Button, and then click Save As.

    2. In the Save as type list, select Word Document (*.docx) and then, in the File name box, type SampleWordDocument.docx.

    3. Click Save.

  8. Close Word.

Exploring the Office Open XML Formats File

In this section, you explore the sample document you just created.

To explore the Office Open XML Formats document

  1. Open Microsoft Windows Explorer.

  2. Navigate to the SampleWordDocument.docx file, right-click it, and then select Rename.

    Important noteImportant

    Use the following three steps to extract the Office Open XML Formats files that you will use for the remainder of this article. Step 5 is different, depending on whether you are using Windows XP or Windows Vista.

  3. Add a .zip extension at the end of the file name, so that the file name is now SampleWordDocument.docx.zip.

  4. When prompted by the Rename Warning message, click Yes.

  5. Extract the container files:

    1. (If you are using Windows XP:) Right-click SampleWordDocument.docx.zip, point to Open With, and then click Compressed (zipped) Folders. The folders and parts that make up the document are displayed in the Explorer window.

    2. (If you are using Windows Vista:) Right-click SampleWordDocument.docx.zip and then click Extract All. In the Extract Compressed (Zipped) Folders dialog box, accept the default or choose a new location, and then click Extract. The folders and parts that make up the document are displayed in the Explorer window.

    In the next steps, you examine the key parts that are contained in the document.

  6. Examine the [Content_Types].xml part:

    1. Using Windows Explorer, find the file named [Content_Types].xml.

    2. Right-click the file, point to Open With, and then click Internet Explorer.

      In the root of every Office Open XML Formats document is a [Content_Types].xml part. The purpose of the [Content_Types].xml part is to identify every unique type of part found within the document. Each part needs to have its type listed in this part. Parts need to have identifiable types so that applications know how to use the part when rendering the document. The types also enable you to understand the part's purpose and how to use it.

    3. Close the file.

  7. Examine the .rels folder:

    Relationships represent a connection between two parts. Relationships themselves are parts that are stored within a subfolder named _rels. Any part that has related parts also contains a sibling _rels folder, which contains a .rels part defining its relationships. The subfolder is created in the same folder as the part for which the relationship is being created. The name of a relationship is derived by appending the .rels extension to the file name of the original part (the relationship part for the document file is an exception; it is just named ".rels").

    1. In Windows Explorer, double-click the _rels folder, and then right-click the .rels file.

    2. Point to Open With, click Choose Program, click Internet Explorer, and then click Open.

    3. After you finish examining the part, close it.

  8. Examine the docProps folder:

    Document properties in the 2007 release are structured consistently across the three Microsoft Office system programs. Separated into three logical XML parts, they are stored in the docProps subfolder. This makes them easy for you to access because they are always in the same location and not mixed in with other document content.

    • In Windows Explorer, double-click the docProps folder, right-click the core.xml file, point to Open With, and then click Internet Explorer.
  9. Examine the core.xml part:

    • Open the core.xml part and observe that the properties you typed previously are displayed here.

      The core.xml part holds the typical document properties that users fill out to identify documents, such as Title, Subject, and Author.

  10. Examine the custom.xml part:

    • From Windows Explorer, open custom.xml in Internet Explorer.

      The custom.xml part contains any custom document properties added to the document by a user, by a developer, or through custom logic.

  11. Examine the app.xml part:

    • From Windows Explorer, open app.xml in Internet Explorer.

      The app.xml part consists of the unique properties specific to the document at the application level, such as the number of pages, the number of lines of text, and the version of the application.

  12. Examine the word folder:

    The majority of the content-specific parts reside inside the word subfolder. Again, there is a relationship subfolder named _rels.

    • Double-click the _rels folder.

      Inside the _rels subfolder, the relationships used to link all document parts are found in the part named document.xml.rels.

  13. Examine the document.xml.rels part:

    • Open document.xml.rels in Internet Explorer.

      Relationships use IDs and Uniform Resource Identifiers (URIs) to locate parts. This allows all non-relationship parts to be void without any hard-coded references. This is discussed in more detail later in this article.

      In the word folder, also notice a styles.xml part.

  14. Examine the styles.xml part:

    • Open styles.xml in Internet Explorer.

      This file contains a list of available accents and shadings that you can use in the document.

    Required and Optional Parts

    The use of parts in Office Open XML Formats files enable documents to be stored in a highly modular manner. Some parts are required for a document to be valid, such as the document.xml part and the fontTable.xml part.

  15. Examine the document.xml part:

    1. Open document.xml in Internet Explorer.

      The document.xml part contains the text for the main body of the document.

    2. After you examine the file, close Internet Explorer.

  16. Examine the fontTable.xml part:

    1. Open the fontTable.xml part in Internet Explorer.

      The fontTable.xml part contains the font settings for the document.

    2. After you examine the file, close Internet Explorer.

    Other parts are not required if the functionality they represent is not found within a document. Examples include comments, header parts, and footer parts, all of which are optional for Word documents. This makes it easy for you to navigate through the document structure without having to traverse through content that is not used.

    XML is designed for structured content and does not natively support binary content such as images or OLE objects. Binary data can be encoded into characters and stored in XML, but that requires an encoding/decoding process, making it inefficient for both applications and developers. With the 2007 release, there is no need to encode binary objects because you can store them in their native format as binary parts. This makes accessing binary objects in Office documents very simple.

    Media files are stored in the media folder.

  17. Examine the word\media and word\embeddings folders:

    • In Windows Explorer, double-click the media subfolder.

      Notice the .gif media file representing the image you inserted previously.

      NoteNote

      You might notice that the file name of the image file has been changed from Eagle1.gif to image1.gif. This is done to address privacy concerns, in that a malicious person could derive a competitive advantage from the name of parts in a document, such as an image file. For example, an author might choose to protect the contents of a document by encrypting the textual part of the document file. However, if two images are inserted named old_widget.gif and new_reenforced_widget.gif, even though the text is protected, a malicious person could learn the fact that the widget is being upgraded. Using generic image file names such as image1 and image2 adds another layer of protection to Office Open XML Formats files.

  18. Close SampleWordDocument.docx.zip without saving.

Manually Editing Documents Created by Using the Office Open XML Formats

The Office Open XML Formats have numerous advantages. One advantage that is particularly useful is the ability to work with documents that were created by using the 2007 release of the Office system without needing to have the Office programs yourself. This makes it possible for you to create server-based solutions that assemble, access, or edit documents in a scalable manner.

In the following steps, you manually edit a Word 2007 document. Note that these scenarios are only a small sample of what is possible with this new file format. Also note that, in most instances, users would never be expected to manually edit documents in this way. However, for developers, being able to explore a document created by using the 2007 release of the Office system without writing code is a great benefit when designing solutions or prototyping applications. As you saw previously, after you have access to the container file for a document, you can navigate the individual parts fairly easily. This also means you can edit, replace, or even add parts. In the following steps, you edit a Word 2007 document by modifying the XML in one of its parts. Specifically, you revise a comment in the document and update the document properties.

To modify the Office Open XML Formats document with XML

  1. In Word 2007, open the SampleWordDocument.docx document.

    NoteNote

    You might need to remove the .zip extension from the file name if you have not already done so.

  2. Click the Microsoft Office Button, point to Prepare, and then click Properties. Note the Author, Title, Subject, and Comments entries, and then close the document.

  3. Open Windows Explorer and navigate to the SampleWordDocument.docx document.

  4. Extract the document files by using the Exploring the Office Open XML Formats File.

    Reviewer comments in a Word 2007 document are stored in their own part named comments.xml. This separation from the body of the document enables you to easily locate and modify the part.

  5. Drag the comments.xml part out of the archive and onto the Windows desktop.

  6. Right-click the comments.xml part, point to Open With, and then click any text or XML editor, such as Notepad.

  7. Locate the following text:

    <w:t>A study of the bald eagle</w:t>
    
  8. Replace or edit the text of the <w:t> element. For example, change it to:

    <w:t>A detailed study of the bald eagle</w:t>
    
  9. Save and close the file.

  10. From the Windows desktop, drag the comments.xml part into the word folder in the archive.

  11. When prompted by the Confirm File Replace warning message, click Yes.

    Next, you change a document property and then confirm the results of these changes. Document properties are stored in a subfolder of the archive root folder, making them very straightforward to access and edit.

  12. Double-click the docProps folder.

  13. Drag the core.xml part from the archive to the Windows desktop.

  14. Open core.xml in a text editor.

  15. Locate the following text:

    <dc:creator>Nancy Davolio</dc:creator>
    
  16. Replace or edit the text in the <dc:Creator> element. For example, replace the text with your own name.

  17. Save and close the file and then drag the part back to the docProps folder in the archive.

  18. When prompted by the Confirm File Replace warning message, click Yes.

  19. Navigate to the document container by clicking the back arrow or the Up icon on the toolbar until you locate the .zip file.

  20. Remove the .zip extension from the file name and open the file in Word 2007.

  21. Click the Microsoft Office Button, point to Prepare, and then click Properties.

    Notice that the comment text has changed. Also notice that the Author property has changed.

In the previous steps, you modified the document by editing existing XML parts found within the document. With the new file format, you can also replace entire document parts to change the content, format, or properties of a document. This enables many scenarios where existing document parts can be used to update individual documents or to update an entire library of documents.

One example of using existing parts to modify a document is changing the styles that the document uses. This can be advantageous if you need to manage multiple looks for documents but want to only maintain one physical version of the document. Changing all of the styles that a document uses is as easy as replacing the styles.xml part.

This scenario enables you to compile a collection of style parts for all of your documents and then create a program that enables users to choose different styles automatically. Behind the scenes, your program would just replace one prefabricated part for another. In the following steps, you do this manually.

To modify an Office Open XML Formats document by replacing existing parts

  1. Make a copy of SampleWordDocument.docx and name it AnotherSampleWordDocument.docx.

  2. In Word 2007, open AnotherSampleWordDocument.docx.

  3. On the Home tab, click Change Styles, point to Style Set, and then click Fancy.

    The document should look similar to Figure 2.

    Figure 2. Document formatted in the Fancy style

    Document formatted in the Fancy style

  4. Save and close the document.

  5. Extract the AnotherSampleWordDocument.docx document files by using the Exploring the Office Open XML Formats File.

  6. Double-click the word folder and then drag the styles.xml part onto the Windows desktop. This part will be used to update the first document you created.

  7. Navigate to the document container by clicking the back arrow or the Up icon on the toolbar until you locate the .zip file.

  8. Now, open the SampleWordDocument.docx document in Word 2007 and notice the style of the body of the document.

  9. Close the document.

  10. Extract the SampleWordDocument.docx document files by using the Exploring the Office Open XML Formats File.

  11. Double-click the word folder to open it and then drag the styles.xml part from the Windows desktop into the word folder, replacing the original file.

  12. When prompted by the Confirm File Replace warning message, click Yes.

  13. Navigate to the document container of the SampleWordDocument.docx file by clicking the back arrow or the Up icon on the toolbar until you locate the .zip file.

  14. Remove the .zip extension from the file name and then open the file in Word 2007.

    Notice that the style of the document has now changed to that seen in AnotherSampleWordDocument.docx.

    If you have not already done so, remove the .zip extension from the AnotherSampleWordDocument.docx.zip file name.

Another example of this type of scenario is using the header and footer document parts from one Word 2007 document to quickly duplicate the same settings for one or more other Word 2007 documents. Header and footer parts can be manually changed, as you will see later in this article. Of course, the process can also be automated by using code. This is helpful for organizations that want to use standardized document headers and footers without the effort of managing those details on a per document basis. Additionally, replacing the headers and footers is also straightforward if your header and footer format changes.

In the following steps, you add a simple header to the SampleWordDocument.docx. You then update this header with a different header from the AnotherSampleWordDocument.docx document.

To update the header of the Office Open XML Formats document

  1. In Word 2007, open SampleWordDocument.docx.

  2. On the Insert tab, click the drop-down arrow of the Header button, and then select the Alphabet built-in header. A header is added with the title of the document.

    The document should look similar to Figure 3.

    Figure 3. Sample document with Alphabet header

    Sample document with Alphabet header

  3. Save and close the file.

  4. In Word 2007, open AnotherSampleWordDocument.docx.

  5. On the Insert tab, click the drop-down arrow of the Header button, and then select the Annual built-in header. A header is added with the title of the document and with a place for the year.

    The document should look similar to Figure 4.

    Figure 4. Sample document with Annual header

    Sample document with Annual header

  6. Save and close the file.

  7. Extract the document files for AnotherSampleWordDocument.docx by using the Exploring the Office Open XML Formats File.

  8. Double-click the word folder to open it, and then drag the header1.xml file onto the Windows desktop.

  9. Navigate to the document container by clicking the back arrow or the Up icon on the toolbar until you locate the .zip file.

  10. Remove the .zip file name extension from the AnotherSampleWordDocument.docx.zip file name.

  11. Extract the document files for SampleWordDocument.docx by using the Exploring the Office Open XML Formats File.

  12. Double-click the word folder to open it, and then drag the header1.xml file from the Windows desktop into the word folder.

  13. When prompted by the Confirm File Replace warning message, click Yes..

  14. Navigate to the document container by clicking the back arrow or the Up icon on the toolbar until you locate the .zip file.

  15. Remove the .zip extension from the SampleWordDocument.docx.zip file name, and then open the file in Word 2007.

    Notice the new header.

Documents can also contain binary parts—such as image files or Microsoft Visual Basic for Applications (VBA) projects—that are just as easy to access as XML parts. Swapping binary parts creates many interesting possibilities. For example, you can replace an entire OLE object, such as a Microsoft Office Visio diagram, by exchanging its binary part. Doing this manually has little value, but think of a scenario where the image needs to be automatically updated from a server. Writing a tool to do this would be a relatively simple task. In the following steps, you swap the image binary file in the AnotherSampleWordDocument.docx document with another image.

To swap binary parts in the Office Open XML Formats document

  1. Extract the document files for SampleWordDocument.docx by using the Exploring the Office Open XML Formats File.

  2. Locate the Eagle1.gif image file by double-clicking the word folder and then double-clicking the media folder.

  3. Right-click Eagle1.gif and then click Preview. This is the image that currently appears in the document.

  4. Locate the Eagle2.gif image in the download files and copy it to the Windows desktop (or you can substitute your own image file).

  5. On the Windows desktop, right-click the Eagle2.gif image, and then click Preview. This is the image that will replace the current image.

  6. Close the Preview window, right-click Eagle2.gif, click Rename, and then change the file name to Eagle1.gif.

  7. Drag the now-renamed Eagle1.gif from the Windows desktop to the media folder.

  8. When prompted by the Confirm File Replace warning message, click Yes.

  9. Navigate to the document container by clicking the back arrow or the Up button on the toolbar until you locate the .zip file.

  10. Remove the .zip extension from the file name and then open the file in Word 2007. Notice that the image is updated.

Some parts are required in a Microsoft Office system document and must be present for the file to function properly, such as the document.xml part in a Word 2007 document. However, many parts are optional and exist only if the functionality they represent is present in the document. This means if you no longer require that functionality, you can simply remove the part and its associated relationship. 2007 Office documents that include project code are known as "macro-enabled" documents (and have the .docm extension for Word 2007, the .xlsm extension for Excel 2007, and the .pptm extension for PowerPoint 2007). Unlike "macro-free" documents, macro-enabled documents store code in a part. The type of part depends on the type of code found in the document. For example, a macro-enabled document that contains VBA code stores its data in the vbaProject.bin binary part.

Other project types include Excel 2007 workbooks that use Excel 4.0–style macros (XLM macros) or PowerPoint 2007 presentations that contain command buttons. These features are located in their own isolated parts so they can be easily identified and removed.

In the following steps, you create a simple macro-enabled document and demonstrate its functionality. You then remove the vbaProject.bin part from the document and its associated relationship and see the effects. Note that you can also remove the project part of a macro-enabled file by clicking Save As on the File menu and saving the file as a macro-free file (.docx, .pptx, or .xlsx). However, this requires that you open the document in the 2007 Office system application. Using the steps in the following procedure enables you to remove the project without using the Office application.

To remove a VBA project from an Office Open XML Formats document

  1. In Word 2007, open SampleWordDocument.docx.

  2. On the Developer tab, in the Code group, click Visual Basic to open the Visual Basic Editor. You can also open the Visual Basic Editor by pressing ALT+F11.

    NoteNote

    If you do not see the Developer tab, you can quickly add it. Click the Microsoft Office Button, click Word Options, and then, on the Popular tab, select the Show Developer tab in the Ribbon option.

  3. In the Visual Basic Editor code window, type or paste the following statements:

    Sub SampleCode()
       Msgbox("Hello World")
    End Sub
    
    NoteNote

    If you do not see the code window, you can display it by clicking the View menu and then clicking Code.

  4. On the Run menu, click Run Sub/UserForm to run the procedure. You could also press F5.

  5. Click OK to close the message box and then close the Visual Basic Editor.

    Next, save the document as a macro-enabled document:

  6. Click the Microsoft Office Button, point to Save As, and then click Word Document.

  7. In the Save As type drop-down box, select Word Macro-Enabled Document (.docm), and then click Save. Close the document.

  8. Extract the document files for SampleWordDocument.docm by using the Exploring the Office Open XML Formats File.

  9. Double-click the word folder and then double-click the _rels folder.

  10. Right-click document.xml.rels, click Open With, and then select a text editor such as Notepad.

    NoteNote

    The file might be set as Read-only. If it is, close the file, right-click its file name, click Properties, and then clear the Read-only attribute check box. Then, reopen the file.

  11. Locate and delete the following XML tag:

    <Relationship Id="rId1" Type="http://schemas.microsoft.com/office/2006/relationships/vbaProject" Target="vbaProject.bin"/>
    
  12. Save and close the file.

  13. Right-click document.xml.rels and then click Copy.

  14. Navigate back to the SampleWordDocument.docm.zip file and open it.

  15. Double-click the word folder.

  16. Right-click the vbaProject.bin file and then click Delete. When prompted, click Yes to confirm the deletion.

  17. Double-click the _rels folder.

  18. Right-click and then click Paste.

  19. When prompted by the Confirm File Replace warning message (or by the Copy and Replace warning message in Windows Vista), click Yes.

  20. Navigate to the document container by clicking the back arrow or the Up button on the toolbar until you locate the .zip file.

  21. Right-click the SampleWordDocument.docm folder, and then click Delete to delete it. When prompted, click Yes in the confirmation dialog box.

  22. Remove the .zip extension from the SampleWordDocument.docm.zip file name. When prompted, click Yes in the Rename warning dialog box, and then open the file in Word 2007.

  23. Press ALT+F11 to view the VBA project. Notice that the subroutine is missing.

    NoteNote

    You can achieve similar results by changing the file name extension from .docm to docx. Files with a macro-free extension never execute code.

Manipulating Office Open XML Formats Documents Programmatically

One of the important benefits of the Office Open XML Formats is the unlimited potential for custom solutions. You can use tools on almost any platform capable of working with XML and ZIP files to access and alter document contents. For example, you can write a server-side application in Microsoft Visual Studio with managed code (Microsoft Visual Basic .NET or C#) to programmatically generate 2007 Office system documents. You can use the powerful XML class library of the Microsoft .NET Framework to work with any of the XML document parts found in Office Open XML Formats files.

One powerful way to manipulate document parts and relationships is by using the System.IO.Packaging namespace that is included with the Microsoft Windows Software Development Kit (SDK). The System.IO.Packaging namespace is discussed in more detail later in this article.

In the following steps you develop a console application that programmatically changes the formatting styles in a document without using Word 2007. The project does this by exchanging the document's styles part with a styles part extracted from another Word 2007 document. The result is that the target document adopts the "look and feel" of the styles borrowed from the other document.

NoteNote

As an aside, there are several code examples, available as a downloadable file, to work with Office Open XML Formats files. After you download the prefabricated code examples, you can use the Code Snippet Manager feature in Microsoft Visual Studio 2005 (available on the Tools menu) to insert them into your projects. You can download 2007 Office System Sample: Open XML File Format Code Snippets for Visual Studio 2005.

Perform the following steps to change the formatting of a document.

To change the formatting styles in an Office Open XML Formats document

  1. First, create a folder and a subfolder to hold the document that you will update and to hold the styles.xml part. For this exercise, name the folder WordOpenXMLFormatSample. In the folder, add a subfolder named NewStylePart.

  2. Copy SampleWordDocument.docx into the WordOpenXMLFormatSample folder.

  3. Navigate to the AnotherSampleWordDocument.docx file, add the .zip file name extension to the end of the file name, and then open the file.

  4. Navigate to the styles.xml part in the word folder, right-click it, and then click Copy.

  5. Navigate to the WordOpenXMLFormatSample folder, and then to the NewStylePart subfolder. Right-click the NewStylePart subfolder, and then click Paste. The folder now contains the Word 2007 document with the default styles part, and the subfolder contains the styles.xml part from the document with the style called "Fancy."

  6. Start Visual Studio 2005.

  7. On the File menu, click New Project.

  8. In the New Project dialog box, from the Project Types tree view on the left side, click Other Languages, select Visual C#, and then select Console Application from the Templates list view on the right side.

  9. In the Name box, name this project StyleSwapper, and then click OK. Visual Studio creates all of the files in the project and stores them in a directory named after the project, such as drive\Visual Studio projects\StyleSwapper.

    To see exactly where the project will be saved to, or to change the location, do the following:

    1. On the Tools menu, click Options.

    2. In the Options dialog box, click the Project and Solutions tree view, and then click the General tab.

      In this screen, you can specify where your projects and templates are stored, and set other options.

    3. Note the location of your projects, and then click OK to close the window.

    When you create the project, Visual Studio automatically creates the new application with three containers: Properties, References, and Program.cs. You can see these containers in Solution Explorer. Visual Studio also creates an empty class where you will add most of your code.

    NoteNote

    The Microsoft .NET Framework 3.0 (formerly WinFX) includes the System.IO.Packaging namespace, which simplifies working with 2007 Microsoft Office system documents programmatically from Visual Studio 2005. With the System.IO.Packaging namespace, you can add document parts, retrieve and update contents, or create new relationships, enabling you to build new documents or to alter existing documents. Some of the important members of the namespace are the Package object, the PackagePart object, and the PackageRelationship object. For more information about the System.IO.Packaging namespace, see System.IO.Packaging Namespace.

    Next, you add a reference from your project to the .NET Framework 3.0:

  10. On the Project menu, click Add Reference.

  11. On the Browse tab, in the Look In box, search for WindowsBase.dll. As of the date of publication of this article, the file is located at drive\Program Files\Reference Assemblies\Microsoft\WinFx\v3.0.

  12. Select the WindowsBase.dll file, and then click OK. Verify that the reference was created by clicking References in Solution Explorer.

  13. In Solution Explorer, right-click Program.cs, and then select View Code.

  14. Type the following code in the code window above the namespace statement:

    using System.IO;
    using System.IO.Packaging;
    

    To work with the contents of a 2007 Office system document, you need to open it. The System.IO.Packaging namespace has a top-level member called a Package, which is synonymous with a document. After you open a Package object, you can inspect its structure and manipulate its parts. Packages can be opened as read-only, write-only, or read/write.

  15. Add the following lines of code after the opening brace ({) of the Class Program statement. These lines set variables containing the location of the Word document and the style.xml part that you will insert into the document. These statements assume that the WordOpenXMLFormatSample folder was created on drive C.

    private static String stylePath = @"C:\WordOpenXMLFormatSample\NewStylePart\styles.xml";
    private static String packagePath = @"C:\WordOpenXMLFormatSample\SampleWordDocument.docx.zip";
    
    NoteNote

    Strings that start with the @ symbol are known as verbatim string literals. This tells the program (the String constructor) to use the string exactly as it appears when retrieving the file.

  16. The Main procedure is automatically executed when you run the project. Between the opening brace ({) and the closing brace (}) in Main, type the following line:

    SwapStylePart(packagePath, stylePath);
    

    This line calls the SwapStylePart procedure, which you will add next. It passes the paths to the Word 2007 document and to the styles.xml part.

  17. In the line just after the closing brace (}) of Main, add the following procedure:

    static void SwapStylePart(String packagePath, String stylePath)
    {
    
    }
    

    In the next steps, you add code to the SwapStylePart routine that opens the Word document as a Package object with Read/Write access. Notice the use of the using statement. Its purpose is to automatically dispose of the Package object and free up the memory it uses after the statement has completed executing.

  18. Type the following code into the SwapStylePart routine:

    using (Package package = Package.Open(packagePath, FileMode.Open,
      FileAccess.ReadWrite))
    {
    
    }
    

    To work with any part inside the 2007 Office system document package, you first need to locate it. You can reference a specific document part by using its URI, which is unique for each part.

  19. Type the following statements between the braces of the using statement:

    // Set the URI for the styles document part (/word/styles.xml).
    Uri uriPartTarget = new Uri("/word/styles.xml", UriKind.Relative);
    

    At the time this article was written, the System.IO.Packaging namespace does not allow you to copy over or replace an existing part. To swap the part, you must first remove the existing part, and then create a new part with the same URI. Note that deleting the part does not affect any of the part's relationships. Any relationships remain intact and still apply to a new part.

  20. Type the following code just after the line you added in the previous step:

    // Delete the existing document part (/word/styles.xml).
    package.DeletePart(uriPartTarget);
    

    Adding a new document part to a package also requires the use of a URI. In this instance, you simply reuse the same URI to recreate the styles document part. One additional parameter is required when creating a package: the content type of the part. Current content types already in use in a document can be found in the [Content_Types].xml part, located in the document container.

  21. Type the following code after the line you added in the previous step:

    // Recreate a new document part for styles(/word/styles.xml).
    PackagePart packagePartReplacement = package.CreatePart(uriPartTarget, "application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml");
    

    With the new styles part created, the final step is to copy the XML from the original styles.xml part to the newly created part. The System.IO.Packaging namespace does not natively handle XML in an XML document, so the way you copy content in the .NET Framework is through the use of Streams.

    In the following step, you add code that opens the external styles part as a stream and writes it to the new styles document part. To copy the stream, you call the CopyStream routine and pass in the source and destination streams.

  22. Type the following code after the lines you added in the previous step:

    using (FileStream fileStream = new FileStream(stylePath, FileMode.Open,
        FileAccess.Read))
    {
       // Load the new styles.xml using a stream.
       CopyStream(fileStream, packagePartReplacement.GetStream());
    }
    
  23. Next, add the CopyStream routine after the closing brace of the SwapStylePart routine:

    private static void CopyStream(Stream source, Stream target)
    {
       const int bufSize = 0x1000;
       // const int bufSize = 1024;
       byte[] buf = new byte[bufSize];
       int bytesRead = 0;
       while ((bytesRead = source.Read(buf, 0, bufSize)) > 0)
       {
          target.Write(buf, 0, (int)bytesRead);
       }
       source.Close();
       target.Close();
    }
    

    This procedure sets the size (in bytes) of a buffer in memory where the data stream from the original styles.xml part is stored as it is read in. And, while there are bytes to be read, they are written to the new styles.xml part.

    To see the application in action, you need to build the project:

  24. On the Build menu, click Build StyleSwapper.

    NoteNote

    If there are errors during the build, you see a dialog box that asks if you want to run the last build. Click No and you will see any errors described in the Error List panel. If you do not see the Error List, on the View menu, click Error List.

    Assuming there are no errors, you are now ready to execute the code. But first, you might want to view the current document:

  25. In Word 2007, open SampleWordDocument.docx.

  26. Close Word 2007 and then add the .zip extension to the end of the SampleWordDocument.docx file name.

  27. Press F5. You see the Windows console appear and then, just as quickly, you see it disappear. Because this is a console application, there is no user interface other than the Windows console, which appeared briefly.

  28. Remove the .zip extension from the file name and then reopen the file in Word 2007. Notice that the style of the document has changed to the "Fancy" style, all without using Word 2007.

The next exercise illustrates how you can use a custom application to perform bulk operations on Office Open XML Formats files without Word 2007. In this exercise, you create a managed application that searches folders and (optionally) subfolders for files that match a certain criteria. The application then examines the document part in these files for the occurrence of a term and lists those files that contain the term. You can imagine using this application to identify documents that contain a particular client's name or that contain a specific product name. You could also add additional logic to the application to replace the term with another term, essentially replicating the search-and-replace feature of Word 2007 without using the application.

To search a group of Office Open XML Formats Files for a keyword

  1. Start Visual Studio 2005.

  2. On the File menu, click New Project.

  3. In the New Project dialog box, from the Project Types tree view on the left side, select Visual C#. Then, select Windows Application, change the name of the project to KeywordSearch, and click OK. Visual Studio creates all of the files in the project.

  4. In Solution Explorer, right-click Form1.cs, and then click View Designer.

  5. On the Form1.cs [Design] tab, add the following controls to the form, and then set their properties so that the form looks similar to the form shown in Figure 5.

    Figure 5. KeywordSearch form

    KeywordSearch form

    Table 2. List of controls for the Office Open XML Formats File Keyword Search form

    Type

    Properties

    Label

    Text: Search Directory

    TextBox

    Name: txtPath

    Text: C:\WordDocuments

    Note that this is the default directory where the search will begin.

    Label

    Text: Search Pattern

    ComboBox

    Name: cboMask

    Items (Collection): *.docx

    *.docm

    Text: *.docx

    CheckBox

    Name: ckbSubfolders

    Text: Include Subfolders

    Label

    Text: Search Term

    TextBox

    Name: txtTerm

    Button

    Name: btnSearch

    Text: Search

    Button

    Name: btnClose

    Text: Close

    Label

    Text: Results

    ListBox

    Name: lbxResults

  6. In Solution Explorer, right-click Form1.cs, and then click View Code.

  7. In the code window, add the following statements below the existing using statements:

    using System.Xml;
    using System.IO;
    using System.IO.Packaging;
    

    To use the System.IO.Packaging namespace, you need to add a reference to the WindowsBase.dll library that is available in the .NET Framework 3.0 SDK (formerly WinFX):

  8. On the Project menu, click Add Reference.

  9. On the Browse tab, in the Look In box, search for the WindowsBase.dll file name. As of the publish date of this article, the file is located at drive\Program Files\Reference Assemblies\Microsoft\WinFx\v3.0.

  10. Click OK.

  11. Just after the opening brace of public partial class Form1 : Form, add the following class variables (also called fields):

    FileInfo[] tempfiles;
    List<FileInfo> files = new List<FileInfo>();
    

    Next, add code to the Search button:

  12. On the Form1.cs [Design] tab, double-click the Search button to add the Click event for this button.

  13. Between the braces in the btnSearch_Click procedure, add the following code:

    Boolean match = false;
    
    // Ensure that the user added a search term.
    if (txtTerm.Text == "")
    {
       MessageBox.Show("Don't forget the search term.");
    }
    
    List<FileInfo> returnedFiles;
    // Get the starting directory.
    DirectoryInfo dir = new DirectoryInfo(txtPath.Text);
    
    // Get the list of files.
    returnedFiles = GetDirFiles(dir);
    
    // Loop through the file list.
    foreach (FileInfo file in returnedFiles)
    {
       match = GetDocPart(file);
    }
    if (!match)
    {
       // No matching files were found.
       lbxResults.Items.Add("No matches.");
    }
    

    In this code, a Boolean variable is declared to indicate whether the search finds any matches. Next, the code prompts the user for a search term if the text box is blank. The other text boxes on the form all have default values. Next, a List class is defined to contain the files returned from the search. The List class provides an array whose size can be increased dynamically. The variable dirs points to the directory where the search will start. Then a call is made to the GetDirsFiles procedure that returns the list of files in the directory. The code then loops through the set of returned files, calling the GetDocPart procedure to examine each file for the search term. If there are no matches, an appropriate message is added to the list box.

  14. Add the GetDirFiles procedure:

    public List<FileInfo> GetDirFiles(DirectoryInfo dir)
    {
       // Get all files for the current directory.
       Object selectedItem = cboMask.SelectedItem;
    
       tempfiles = dir.GetFiles(selectedItem.ToString());
    
       // Add these files to the returned file list.
       foreach (FileInfo file1 in tempfiles)
       {
          files.Add(file1);
       }
    
       // Search subfolders if requested.
       if (ckbSubfolders.Checked)
       {
          // Get subfolders for the current directory.
          DirectoryInfo[] dirs = dir.GetDirectories("*.*");
    
          foreach (DirectoryInfo directory in dirs)
          {
             GetDirFiles(directory);
          }
        }
        return files;
    }
    
    

    This procedure sets the selectedItem variable equal to the search pattern that is displayed in the combo box. Next, the GetFiles method is called with the current directory to return all of the files that match the search pattern. The returned files are added to the list of files. If the user selects the Include Subfolders check box, each of the current directory's subfolders are searched by recursively calling the GetDirFiles procedure with the name of each subfolder. Finally, the list of files is returned to the calling (btnSearch_Click) procedure.

  15. Add the following code after the GetDirFiles procedure:

    private Boolean GetDocPart(FileInfo file)
    {
       // Retrieve the start part for the input file.
    
       Boolean fileFound = false;
       const String documentRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
       const String dcPropertiesSchema = "http://purl.org/dc/elements/1.1/";
    
       // Open the package with read access.
       using (Package myPackage = Package.Open(file.DirectoryName + "\\" + file.Name, FileMode.Open, FileAccess.Read))
       {
          // Get the main document part (document.xml).
          foreach (PackageRelationship relationship in myPackage.GetRelationshipsByType(documentRelationshipType))
          {
             //  There should be only one document part in the package. 
             Uri documentUri = PackUriHelper.ResolvePartUri(new Uri("/", UriKind.Relative), relationship.TargetUri);
             PackagePart documentPart = myPackage.GetPart(documentUri);
    
             NameTable nt = new NameTable();
             XmlNamespaceManager nsmgr = new XmlNamespaceManager(nt);
             nsmgr.AddNamespace("w", dcPropertiesSchema);
    
             XmlDocument doc = new XmlDocument(nt);
             doc.Load(documentPart.GetStream());
    
             if (doc.OuterXml.IndexOf(txtTerm.Text) >= 1)
             {
                lbxResults.Items.Add(file.DirectoryName + "\\" + file.Name);
                fileFound = true;
             }
    
    
             // There is only one document part, so exit the loop.
             break;
           }
         }
         if (lbxResults.Items.Count >0)
         {
            return true;
         }
         else
         {
            return false;
         }
    }
    

    The procedure initially sets variables equal to the namespaces for the document relationship types and to the document properties schema, respectively. Next, the document is opened as a Package object. Through the System.IO.Packaging namespace, you gain access to the various parts of the package through their relationships with other parts (as defined in the relationship parts) and through the URI that contains the hierarchical path to the parts. An example of a URI for a graphic part is \word\media\picture.jpg. If you know the URI to a specific document part in a 2007 Microsoft Office system document, you can directly access, create, or delete the part. In the GetDocPart procedure, the document.xml part is returned by calling the GetPart method of the Package object and passing in the URI of the part.

    Next, the XmlDocument object points to an XML file that contains one or more references to different namespaces. As with all XML files, XML parsers access the various elements and attributes in the document through names prefixed with a namespace qualifier.

    NoteNote

    Element names with no qualifier are said to be part of a default namespace.

    These namespace qualifiers must be resolved to their namespace references at run time. To make this task easier and consistent, the .NET Framework includes the XmlNamespaceManager class, which provides various namespace management tools. One of these is the NameTable class. The NameTable class internally stores attribute and element names. When an element or attribute name occurs multiple times in an XML document, it is stored only one time in the NameTable. When a namespace qualifier is encountered, it can then be resolved with the string in the name table.

    In the next statements, an XmlDocument object is created and populated with the contents of the document part. Next, the content of the object is scanned for the search term. Note that the C# IndexOf method is the companion to the Visual Basic InStr method. If the search term is found, the directory and file name are added to the form's list box and the procedure returns True. Otherwise, no match was found and the procedure returns False.

  16. Finally, in the Form1 Designer, double-click the Close button and add the following statement to the procedure:

    Close();
    

    To test the program:

  17. On drive C or at a location of your choice, create a folder named WordDocuments.

  18. Paste copies of the SampleWordDocument.docx document and the AnotherSampleWordDocument.docx document into the WordDocuments folder.

  19. Start Word 2007 and create a new document.

  20. Add some text (but do not include the term Eagle) and save the document as SearchSampleDocument.docx in the WordDocuments folder.

  21. Press F5 to run the project.

  22. In the form, in the Search Directory box, type the location of the .docx documents.

  23. In the Search Term box, type the term Eagle, and then click Search.

    The two documents containing the term "Eagle" are displayed in the form's list box, as shown in Figure 6. The SearchSampleDocument.docx document does not appear.

    Figure 6. The results of running the keyword search

    The results of running the keyword search

  24. Close the form.

And that is all the code you need to search documents, without needing to have Word 2007. Some refinements you could add include: counting the number of occurrences per document of the search term, adding replace functionality, or searching for other parts.

Conclusion

In this article, you became familiar with the Office Open XML Formats file structure. You explored the file formats and demonstrated how easy it is to access and edit 2007 Microsoft Office system documents by using the standards-based XML and ZIP technologies. You also learned how to manually and programmatically manipulate Office Open XML Formats files. With this knowledge, you have the foundation for creating custom solutions for your own organization.

Additional Resources

For more information, see the following resources: