Readme_Remove Duplicates Component Sample

This sample works only with SQL Server 2005 and SQL Server 2008. It will not work with any version of SQL Server earlier than SQL Server 2005.

The Remove Duplicates sample demonstrates the implementation of a data flow transformation component with asynchronous outputs. Components with asynchronous outputs receive an input and output PipelineBuffer corresponding to the input and output of the object. The input buffers contain rows provided by upstream components. The output buffer is empty and is filled by the component, typically using the rows from the input buffer, during a call to the ProcessInput method. After all the rows have been received, they are sorted, and then the distinct rows are sent to one output and the duplicate rows to the other.

For more information about components with asynchronous outputs, see, "Creating a Transformation Component with Asynchronous Outputs", in SQL Server Books Online.

This sample is not supported on Itanium-based operating systems.

The Integration Services Data Flow Programming code samples are intended to demonstrate the core functionality that you need to implement to create a custom data flow component. The samples do not include full support for customization in the Advanced Editor. For example, you cannot use the Advanced Editor to add or remove inputs and outputs or to configure columns.

Samples are provided for educational purposes only. They are not intended to be used in a production environment and have not been tested in a production environment. Microsoft does not provide technical support for these samples.

If you already know how to locate, build, and install code samples, you can go directly to the section, Testing the Sample, and read about how to configure and run the code sample.

This sample requires that the following components be installed.

  • Microsoft Visual Studio
  • Microsoft SQL Server Integration Services

If the code samples were installed to the default installation location, the C# version of the code sample is located in the following folder:

C:\Program Files\Microsoft SQL Server\100\Samples\Integration Services\Programming Samples\Data Flow\RemoveDuplicates Component Sample

The C# solution for the code sample is located in the CS directory, and the Visual Basic solution is located in the VB directory.

For information about the two-step process required to install the samples, see Considerations for Installing SQL Server Samples and Sample Databases.

If you have not already generated a strong name key file in the Samples folder, use the following procedure to generate this key file. The sample projects are configured to sign assemblies at build time with this key file. You can view the signing properties on the Signing tab of the Project Properties dialog box.

  1. To open a Microsoft Visual Studio command prompt, click Start, point to All Programs, point to Microsoft Visual Studio 2008, point to Visual Studio Tools, and then click Visual Studio 2008 Command Prompt.

    - or -

    To open a Microsoft .NET Framework command prompt, click Start, point to All Programs, point to Microsoft .NET Framework SDK 2.0, and then click SDK Command Prompt.

  2. At the command prompt, use the change directory (CD) command to change the current folder of the command prompt window to the Samples folder. The key file that you create in this folder will be used by all SQL Server code samples.

    To determine the folder where samples are located, click Start, point to All Programs, point to Microsoft SQL Server 2008, point to Documentation and Tutorials, and then click Samples Directory. If the default installation location was used, the samples are located in <drive>:\Program Files\Microsoft SQL Server\100\Samples.

  3. At the command prompt, run the following command to generate the key file:

    sn -k SampleKey.snk
    For more information about the strong-name key pair, see "Security Briefs: Strong Names and Security in the .NET Framework " in the .NET Development Center on MSDN.

  1. From the File | Open menu, click Project and open the RemoveDuplicates.sln in your preferred programming language.

  2. From the Build menu, click Build RemoveDuplicates to build the project.

This sample is provided in both Visual Basic and C#. To distinguish the assemblies for each version of the sample, the name of the output assembly has CS or VB appended. After successfully building the component, follow these steps in order to add it to a Data Flow task in Business Intelligence Development Studio.

  1. Open Windows Explorer or your preferred application for working in the file system.

  2. Copy the assembly (RemoveDuplicatesCS.dll or RemoveDuplicatesVB.dll) to the PipelineComponents folder located at %system%\Program Files\Microsoft SQL Server\100\DTS.

  1. Open Windows Explorer or your preferred application for working in the file system.

  2. Drag assembly from the PipelineComponents folder to the folder where the global assembly cache (GAC) is located, at %system%\assembly.

  1. Open a Command Prompt window.

  2. Type the following command to run gacutil.exe and install the C# version of the component into the GAC:

    gacutil.exe -iF "c:\Program Files\Microsoft Sql Server\100\DTS\PipelineComponents\RemoveDuplicatesCS.dll "

    - or -

    Type the following command to run gacutil.exe and install the Visual Basic version of the component into the GAC:

    gacutil.exe -iF "c:\Program Files\Microsoft Sql Server\100\DTS\PipelineComponents\RemoveDuplicatesVB.dll "

  1. Open Business Intelligence Development Studio.

  2. Right-click the toolbox and then click Choose Items.

  3. In the Choose Toolbox Items dialog box, click the SSIS Data Flow Items tab.

  4. Click the check box next to your component, and then click OK.

    If the component is not displayed in the list, you can click Browse to locate the component yourself. However in this case it may not be installed correctly.

After you finish these steps, the component is visible in the Data Flow Items tab of the Toolbox, and can be added to the Data Flow task in SSIS Designer.

After the component is added to a Data Flow task in a package and connected to a component that will provide rows to it, you can configure it as follows in SSIS Designer.

  • Select the columns to be used by the component on the Input Columns tab of the Advanced Editor. Only the selected columns are passed to the next component in the data flow. The contents of each column are compared to determine whether a row matches other rows.