Case Study: King Abdullah University of Science and Technology and host-named site collections (SharePoint Server 2010)
Published: October 2, 2012
While none of the Microsoft SharePoint Server 2010 design samples currently incorporate host-named site collections, King Abdullah University of Science and Technology (KAUST) provides a compelling business case for their use.
KAUST IT Services offers a large multilingual, English and Arabic, collaboration environment for the university. KAUST decided to migrate to SharePoint Server 2010 to take advantage of managed metadata and social tagging, improved search capabilities, improved business integration services (PerformancePoint Services, Reporting Services, Excel Services in SharePoint, and Access Services), improved publishing features, and a more flexible services architecture. SharePoint consultant, Timo Heidschuster, was hired to help KAUST migrate their SharePoint Server 2007 deployment to SharePoint Server 2010.
A big part of the consulting job was to fix a design issue that affected performance and scalability. One of the requirements driving website design for KAUST, vanity URLs, resulted in a large number of Web applications and application pools. Using the traditional path-based site collection model, each vanity URL was placed in a dedicated Web application, as well as a dedicated application pool. Yasser M Al-Harazi, IT engineer for KAUST, explained, “In the previous year we increased from three to up to 80 application pools and 80 URLs.” The large number of Web applications and application pools slowed performance and limited future growth. Al-Harazi noted that the environment will likely double or triple, depending on business needs.
SharePoint 2010 solution design
The solution was to migrate the KAUST deployment to SharePoint 2010 using host-named site collections. The primary difference between path-based and host-named site collections is that all path-based site collections in a Web application share the same host name (DNS name). In contrast, each host-named site collection in a Web application is assigned a unique DNS name. Host-named site collections provide a scalable solution in which each site collection has its own vanity host-name URL.
Given the various types of sites, Microsoft and the KAUST team decided on seven application pools for hosting the university’s sites. The Microsoft-recommended limit is ten application pools. Three of the seven KAUST application pools are used for host-name site collections, one for published intranet content (such as http://hr.kaust.edu.sa) and the other two for collaboration sites for the many academic teams and departments. This design brings KAUST well-within the recommended limits for application pools.
Each host-named site collection is configured to use a dedicated database to make sure that content databases do not grow too large. The size target for KAUST is 100-200 GB. Microsoft recommends limiting the size of content databases to 200 GB to help ensure system performance. The recommended limit for the number of databases per Web application is currently 300. This allows room for the KAUST environment to grow.
One of the application pools is dedicated to hosting sites for partner collaboration. These sites will be migrated to an externally-facing farm in the future. This frees up another application pool on this farm.
The following diagram shows the logical architecture of the KAUST environment.
In the diagram, the white boxes with dashed borders represent individual Web applications.
Managing host-named site collections
Microsoft also provided the KAUST team with a plan for managing the large number of host-named site collections. Host-named site collections can only be created by using Windows PowerShell. Two lines of code are used for each site collection to create the database and then assign the database to a site. Once created, all host-named site collections are available in Central Administration and can be managed through this user interface. SharePoint 2010 products provide the ability to use managed paths with host-named site collections and managed paths are also created by using Windows PowerShell. However, these sites are not visible in Central Administration.
Microsoft provided the KAUST team with PowerShell coaching and samples for their environment. Given the scale, the KAUST team is likely to use Windows PowerShell more often than Central Administration. Microsoft recommends using Windows PowerShell to manage a web application when a large number of content databases are present.
Development tradeoffs with host-named site collections
While host-named site collections address specific architecture goals for KAUST, Mohammed Shokry, Web Analyst, notes that putting multiple sites in a single Web application can limit the development opportunities for sites. Changes made to the Web.config file for application-level configuration apply to all sites in a Web application. Also, in SharePoint 2010 Products host-named site collections cannot be used with alternate access mapping.
KAUST achieves high availability in a single farm with two servers dedicated to each role, including a SQL Server database cluster. The two Web servers are virtualized using Microsoft Hyper-V Server 2008 to make these server roles easier to manage and move, if needed. These virtualized Web servers are put on two separate servers to achieve high availability. The following diagram shows the physical architecture for the SharePoint 2010 farm.
A second datacenter in a different city hosts a failover farm that takes advantage of Hyper-V Server to consolidate servers. The failover farm is designed as a warm standby farm and will allow limited access to SharePoint. The failover environment is not pictured in this case study.
Database recommendations, RAID settings, and SAN disk allocation
Microsoft recommended and implemented the following best practices for KAUST:
For high availability split databases on different discs and logical unit numbers (LUNs).
Place databases and logs for the Web Analytics service application and the Usage database (for the Usage and Health Data Collection service application) on a different disk because of the increased write activity.
To improve SQL Server performance, change the NTFS allocation size of the disk from the standard 4K to 64K. The average on performance benefit is around +30% compared to the standard formatting.
The following RAID settings are used for KAUST:
Raid 10 for logs
Raid 5 for data