PowerPivot Data Refresh Overview

A PowerPivot data refresh operation is a scheduled process that runs on an Analysis Services server instance in a SharePoint server farm. You can schedule data refresh as a server-side operation to re-import data from external data sources at regular intervals, replacing older data in PowerPivot workbooks that you open from SharePoint sites.

This topic explains how data refresh works. Administrators or workbook authors can use this information to gain an understanding of the overall process before troubleshooting any data refresh problems that arise.

Other resources for learning about PowerPivot data refresh include the following whitepaper, Everything you always wanted to know about PowerPivot data refresh but were afraid to ask.

This topic contains the following sections:

Data Refresh Architecture

Accounts used in data refresh

Configuring Data Refresh

Setting up a Data Refresh Schedule

Triggering Data Refresh

Processing Data Refresh

Canceling Data Refresh

Deleting Data Refresh Schedules

Note

PowerPivot data refresh is unrelated to the Excel data refresh feature and you cannot refresh PowerPivot data by using the standard data refresh capability in Excel. On a SharePoint server that has an installation PowerPivot for SharePoint, PowerPivot data refresh is configured through the Manage PowerPivot Data Refresh application page on your SharePoint site and executed only for PowerPivot data that is embedded in a published workbook.

Data Refresh Architecture

Refreshing PowerPivot data in an Excel workbook is a process that is triggered, performed, and managed by Analysis Services and PowerPivot server components in a SharePoint farm. A PowerPivot service application starts data refresh by running a data refresh job, and finishes data refresh by checking in the updated data file after the data is refreshed. Data retrieval is performed by the Analysis Services instance. It opens a connection to external data sources, submits the queries, and overwrites existing data with a newer result set. Once you set up a schedule, data refresh will run on its own, but you can view data refresh history to verify the status of actual processing outcomes.

In contrast with on demand data processing that is initiated by the user, data refresh processing is based on predefined schedules that determine when processing occurs. The schedule also defines which data sources participate in the refresh operation, and the credentials used to access the workbook and external data sources. The PowerPivot service uses this information to initiate, run, and save refreshed data to a workbook as an automated process.

In order for data refresh to succeed, the services must have access to stored credentials that are used to both run the data refresh job and connect to data sources. The SharePoint feature used for storing credentials is Secure Store Service. Secure Store Service is a SharePoint application that is part of the SharePoint product. You can enable it to support data refresh scenarios. SQL Server Setup can also enable and provision it for you, if you used the New Server installation option to set up your server.

Components that support data refresh include the following:

  • PowerPivot data refresh timer job

  • PowerPivot service application

  • Analysis Services service 

A PowerPivot Data Refresh timer job runs every minute on each PowerPivot server in the farm. The timer job calls the PowerPivot service to read predefined schedules in the PowerPivot application database. The timer job is run by the SharePoint 2010 Timer service that runs on every computer that is a member of the SharePoint farm. When a schedule is found, the PowerPivot Data Refresh timer job hands the request to the PowerPivot service application that is associated with the web application that provides the workbook. If there are multiple Analysis Services service instance instances in the farm, a load balancing scheme will be used to determine which server receives the request.

The server that gets the request adds it to the processing queue of the local Analysis Services service instance. The PowerPivot service application reads the schedule definition to get the workbook location, the list of data sources to refresh, and the credentials of the user under which the process will occur. It passes this information to the Analysis Services service instance.

The Analysis Services service instance checks for sufficient system resources, and then loads the PowerPivot data source. It then opens connections to external data sources using the credentials that are specified in either the schedule definition or within the PowerPivot data source. Current data from the external data source replaces existing rows in the PowerPivot data source. New columns or tables are ignored; data refresh cannot be used to add new tables or columns to the data source.

After all of the data is retrieved from external sources, the PowerPivot service application checks the content library to verify that the workbook can be saved. If the workbook is not checked out or open for editing, the service saves the updated workbook in the library and logs an event to the usage data collection system in the farm. If the file is checked out or open, the refreshed version of the workbook will be discarded.

Data refresh is turned off by default. For more information about how to enable and configure data refresh, see Enable and Configure PowerPivot Data Refresh.

Accounts used in data refresh

PowerPivot workbooks that are published to SharePoint can be automatically refreshed to use newer data from the external sources that provided the original data. At the scheduled time, the server will connect to external sources and re-import data using the queries and connection information within the workbook.

To establish a connection to external data sources and complete the data refresh operation, the server must have the following:

  • A user identity under which the data refresh operation is performed.

  • A user identity for connecting to external data sources.

Although the PowerPivot service application handles a data refresh request, it does not run the job as itself. Instead, it impersonates another user (either the unattended data refresh account or the Windows credentials of a SharePoint user) to run the process and ensure that data refresh history is collected for that user. Scheduled data refresh can also use stored credentials for connecting to external data sources. All credentials that are used in PowerPivot data refresh are stored in SharePoint, requiring that you can enable and configure Secure Store Service as a prerequisite step. You specify which accounts to use when you set up the schedule.

The following table describes the different options for specifying identities used in data refresh:

Purpose

Approach

Description

Requirements

Run the job

Use the PowerPivot unattended data refresh account.

When you choose this option for a workbook, all data refresh history and usage data that is collected is based on the unattended account. If you require a more granular approach, use one of the other approaches listed below.

A target application in Secure Store Service must be created in advance to store the unattended data refresh account.

SQL Server Setup creates the unattended data refresh account automatically if you use the New Server installation option. The account is based on Farm account used to provision all services for this installation type.

Otherwise, for all other installation scenarios, you must create this account manually.

For instructions, see Configure and Use the PowerPivot Unattended Data Refresh Account.

Use any Windows domain user account.

If you want data refresh history or usage data to be based on a specific user identity, you can configure target applications for the accounts you want to use.

You must create a target application in Secure Store Service to store the credentials of the account you want to use.

For instructions, see Configure any Windows account for data refresh.

Use the personal Windows account of the person scheduling the job.

The PowerPivot service will create target applications on the fly to store the Windows user credentials of the person who is configuring the data refresh schedule, and then delete the applications when the schedule no longer exists.

None. These credentials are stored automatically and then removed automatically when the schedule no longer exists.

Connect to external data sources

Use the connection string and credentials that are embedded within the workbook.

Connection information is stored inside the workbook when the data is first imported. The connection string, credentials, and queries are a permanent part of the PowerPivot workbook. You can override the credentials using either of the following options in this table.

None. The connection information is already available.

Use Windows user credentials

Use this option to override a Windows domain user name and password that are stored with the connection string.

You must create a target application in Secure Store Service to store the credentials of the account you want to use.

For instructions, see Configure any Windows account for data refresh.

Use database or third-party credentials

Use this option to override non-Windows credentials on the connection string.

You must create a target application in Secure Store Service to store the credentials of the account you want to use.

For instructions, see Configure a predefined account for accessing external or third-party data sources.

Configuring Data Refresh

Before users can set up a data refresh schedule, a farm or service application administrator must enable data refresh. Data refresh is not available by default. An administrator enables data refresh through the following actions:

  • Enables and configures Secure Store Service. This step enables the minimum level of data refresh functionality. Namely, a workbook owner or author can define a data refresh schedule to run under his or her credentials.

  • Installs any data providers used to import data into a PowerPivot workbook that are not already installed on the server.

  • Optionally, configures the PowerPivot Unattended Data Refresh account. This step enables the Use the data refresh account configured by the administrator option in a PowerPivot data refresh schedule page. A user might want to use the unattended account rather than his or her credentials to run a data refresh job. If the connection properties on the workbook are configured to use no authentication, the PowerPivot unattended data refresh account must be configured in order for data refresh to work.

  • Optionally, configures an arbitrary Windows user account that can be used to run data refresh jobs. As an administrator, you might want to make multiple accounts available for running data refresh. You can use Secure Store Service to create target applications for these accounts, and then make the target application IDs available to users so that they can specify them in the data refresh schedule.

A farm or server administrator can also configure the data refresh processing mode for the PowerPivot server. Data refresh processing mode is enabled by default, but as an administrator, you should know that this processing mode can be disabled for the server. If data refresh does not appear to be working, checking this server option should be your first step in troubleshooting the problem. For more information, see Configure Dedicated Data Refresh or Query-Only Processing.

Setting up a Data Refresh Schedule

A workbook author or owner defines the data refresh schedule for the workbook that he or she saved to SharePoint. The person who schedules data refresh must have Contribute permission to the workbook on the SharePoint site.

A workbook owner sets up data refresh by specifying the following:

  • A schedule that determines how often data is refreshed.

  • An account for running the data refresh job (either the unattended data refresh account or the Windows credentials of a SharePoint user).

  • Selected data sources to include in the data refresh operation. Data sources must be accessible over a network connection. If you import data from a local file on your workstation, that data source will most likely not be available to the server when it attempts to refresh data from that source. Permissions to the data files are also required. Not all data sources will meet the requirements for data refresh. You can exclude the data sources that would otherwise fail during a refresh operation.

  • Optionally, credentials used to access external data sources that provide updated data. By default, credentials that were used to get the data originally are reused for data refresh, but you can override this by entering different credentials in the schedule.

There is one PowerPivot data refresh schedule for each workbook. Any user who has Contribute permissions on the workbook can create or modify the data refresh schedule.

Data refresh is turned off by default, but a service application administrator can enable it for a PowerPivot service application. For more information about how to set up data refresh, see Schedule PowerPivot Data Refresh.

Triggering Data Refresh

Data refresh schedules are stored in a PowerPivot service application database and triggered by the PowerPivot Data Refresh timer job that runs on an application server. The timer job calls the PowerPivot service to read schedule information from its database. By default, the timer job scans at one minute intervals, which is the smallest interval allowed for timer jobs in a SharePoint farm.

The timer job runs on every physical server that has PowerPivot for SharePoint instance in the farm. The timer job itself is defined as farm-level job. You can view and configure timer job properties in the Monitoring page in Central Administration. The settings you define will apply to all PowerPivot data refresh timer jobs in the farm.

When the timer job detects a user-defined data refresh schedule, it hands a data processing request to a PowerPivot service to put into the processing queue of the local Analysis Services server instance. If there is no local instance of Analysis Services, the job adds the request to another instance that is associated with the Web application. Once a request is in the queue, it remains there until it is processed or cancelled.

You cannot prioritize or reorder the processing queue, but you can use a Run Now action to force an immediate data refresh. For more information, see Schedule PowerPivot Data Refresh.

Processing Data Refresh

Data refresh succeeds if the Analysis Services server instance is able to retrieve updated data from all of the data sources specified in the schedule. If retrieval fails for one data source, none of the refreshed data will be saved. 

The following illustration shows the steps in a data refresh operation.

Sequence diagram for data refresh

The following table describes server actions when a data refresh operation begins:

1

An authorization check verifies that the user has sufficient permissions to request updated data for the PowerPivot data source. The data refresh job impersonates the identity of the unattended data refresh account or the Windows user specified in the schedule definition, and under that identity opens the workbook file from the content database.

2

The server reads the list of data sources that are scheduled for the current data refresh operation. The list will consist of all data sources that currently contribute data to the PowerPivot data source, or individual data sources that were explicitly selected for data refresh operations. A user can choose a subset of the data sources for the schedule.

The service opens a connection to each data source using the connection string that is stored inside the PowerPivot data source. Depending on how the data refresh schedule is defined, the credentials for the connection will either be read from the connection string or from Secure Store if the person scheduling data refresh entered supplied credentials for connecting to a particular data source.

Each data source is opened and queried in parallel. If a data source is unavailable or if an error occurs, data refresh will fail with an error that indicates the cause of the problem. The data refresh job will stop processing and an error will be written to data refresh history.

3

If data refresh succeeds for all data sources, the data and workbook are saved to the content database. The operation is logged to data refresh history. The identity of the Windows user or the unattended data refresh account is included in the log entry.

The workbook cannot be checked out when the server attempts to save the refreshed data. If the workbook is checked out, the server will discard the refreshed data if it cannot save the updated workbook. If the workbook is not locked for editing, the updated workbook will be saved using the Windows credentials specified in the schedule or the unattended data refresh account.

Measuring System Resources for Data Refresh

Data refresh processing will only occur if the server has sufficient memory to initiate the process, where sufficient memory is measured as one and a half times the file size. Requests for data refresh are allocated using the same load balancing scheme used for on demand queries. If the PowerPivot service application uses round robin, data refresh requests are allocated to the next server. Health-based methodology allocates the request to the server that has the most available resources. Each server is queried for its system information. If all servers are under memory pressure, one or more data sources will be unloaded to make room for the data refresh job. Data sources that have been inactive the longest are removed first. If servers are not under memory pressure and there are multiple candidates, the allocation methodology uses other criteria, such as the availability of a cached PowerPivot data source that can be loaded immediately rather than retrieved from a content database.

By default, PowerPivot data refresh jobs and on demand queries share the same physical server resources on an application server. You can modify this behavior by configuring a specific Analysis Services service instance to support either on demand queries or data refresh only. You can also change the number of data refresh jobs that can run concurrently. For more information, see Configure Dedicated Data Refresh or Query-Only Processing.

Canceling Data Refresh

If the server is under pressure and cannot process a data refresh at the scheduled time, the request will remain in the queue and the server will process it when system resources become available.

If for some reason system resources never become available, the server will cancel the request, remove it from the queue, record the event in data refresh history, and optionally notify the document owner via email that the process was cancelled. The cancellation occurs the next time the process is scheduled to run. For example, if a data refresh scheduled for 9:00 a.m. on Wednesday never occurred, a cancellation would occur on the following Wednesday morning at 09:00 when the server detects an unprocessed request in the queue.

You cannot manually cancel a data refresh operation once it starts, but you can disable the schedule in advance to prevent it from running. For more information about how to disable data refresh for a specific workbook, see Schedule PowerPivot Data Refresh. For more information about how to turn off data refresh on the server, see Configure Dedicated Data Refresh or Query-Only Processing.

Deleting Data Refresh Schedules

Data refresh schedules are part of the metadata that is stored and maintained for a PowerPivot workbook. Because a schedule is not a separate file, you cannot delete it independently of the workbook that uses it. If you no longer want to refresh the data in a workbook, you can clear the Enable checkbox on the schedule. The schedule will no longer run, and after a period of inactivity the schedule information that was previously specified will be removed.

Deleting a workbook deletes any data refresh schedule that is associated with the workbook. If you subsequently recycle the workbook, the schedule information is also recycled and will resume on the same schedule.