Category: Enterprise Content Management

Revisiting RBS: SharePoint and Service Pack 1

Since the release of SharePoint 2010 there has been a lot of debate over the value and application of Remote BLOB Storage (RBS) with SharePoint 2010. That debate has been reinvigorated with the updated Plan for Software Boundaries guidance that was published with Service Pack 1 for SharePoint 2010. Frankly I am disturbed by the volume of blog posts that include inaccurate information about RBS and the updated Software Boundaries and Limits. It seems that some “experts” influence far outweighs their competence on the subject of RBS. I have personally seen the confusion created in the ecosystem manifest itself in conversations I have with customers on a daily and weekly basis. For this reason I think it is pertinent to revisit the value of BLOB externalization, correct common misconceptions, and discuss how the updated guidance from Microsoft may impact your consideration to implement SharePoint with a remote blob storage solution.

A Brief History of BLOB Externalization

First a brief history lesson before we tackle RBS (a history lesson that many of you already know). BLOB externalization is not a new concept. In fact most legacy ECM system store unstructured data (files) separate from metadata stored within a relational database. Microsoft originally developed SharePoint this way using the same Web Storage System that Exchange Server uses. With the release of WSS 2.0/SharePoint 2003, Microsoft moved to storing all data (structured and unstructured) within SQL Server databases. Many vendors attempted to address database growth through the use of archive tools that will pull BLOBs from the database in a post processing batch job. While this worked to solve database bloat problems it created compatibility issues with out-of-the-box and third party SharePoint solutions which had to be “aware” of the archive product in use and understand how to interpret the stub left behind. Definitely not the most elegant solution and it certainly didn’t address the core issue. It wasn’t until the release of Service Pack 1 with SharePoint 2007/WSS 3.0 that Microsoft introduced support for BLOB externalization via the EBS interface. Subsequently Microsoft introduced support for Remote BLOB Storage (RBS) along with continued support for EBS* with SharePoint 2010.

BLOB externalization isn’t about being able to leverage commodity disk but rather being able to leverage the “optimal” disk based on the content being managed/stored. The goal is to make sure that patient records, invoices, purchase orders, lunch menus, and vacation pictures land on the most optimal storage device. For obvious reasons not all content is created equal nor should it be treated as such. Subsequently there are scenarios that SharePoint simply cannot support out of the box or with the RBS FileStream provider. For example, take SEC17A-4 (Electronic Storage of Broker-Dealer Records) requirements for client/customer records. Once being declared a record any client related document (an IRA account opening document for example) as specific requirements for storage (they must be immutable and unalterable). Third party RBS products like Metalogix StoragePoint facilitate this scenario through support of WORM (Write Once, Read Many) and CAS (Content Addressable Storage) devices.

In the process of optimizing the storage environment for SharePoint, BLOB externalization accomplishes some critical goals. It is no secret that relational databases (including SQL Server) are not the ideal place to store large pieces of unstructured data. No, this isn’t a dig at SQL Server but rather stating the obvious fact that a system optimized for the storage of highly relational, fine-grained transactional data is not an ideal place to store large pieces of unstructured data. The problem with SQL Server is that the performance cost related to storing BLOBs in the database is expensive. If you consider the RPC call between the web server and the database that contains a large payload (metadata plus the BLOB) and the IO, processor, and memory requirements for storing the blob, you have a very expensive process. Yes it is true that Microsoft has optimized BLOB performance in the subsequent releases of SQL Server but it is still more optimal to store blobs outside of the database when you consider a typical SharePoint farm under load or the process for executing a bulk operation such as a full crawl. The updated guidance from Microsoft would certainly support this assertion. Microsoft itself has document this fact in many of its own publications and even alluded to the initial value of BLOB externalization as being a way to improve the performance of your SharePoint environment. Additionally SQL Server is very rigid in terms of the type of storage it can leverage and methods in which you back up the environment. This brings me to my next point. What was the intent of providing BLOB externalization interfaces within the SharePoint product in the first place?

The original sizing guidelines/limitation for content databases with WSS 3.0/SharePoint 2007 was 100GB (collaboration sites). With SharePoint 2010 Microsoft increased the size limit to 200GB and changed the limit yet again with Service Pack 1 for SharePoint 2010 (more on this later). These limits proved to be problematic for many looking to implement SharePoint pervasively throughout their organization. Not only is database growth a problem, there are challenges with segmentation of content to work around database size restrictions, along with SQL Server being a less than optimal place to storage BLOBs. Additionally backup/restore would become a challenge as SharePoint environments continued to grow both in size and criticality.

Microsoft originally positioned BLOB Externalization as a way to reduce the size of your SharePoint content database. While this is some debate on this topic, it is generally agreed upon in the SharePoint community that the content database size limitations did NOT include externalized BLOBs (this changes with Service Pack 1 for SharePoint 2010). When the StoragePoint team released StoragePoint 2.0 we spent quite a bit of time creating and shaping the messaging for the product which included the following benefits which holds true as the basis for BLOB externalization:

  • Reduce the size of your SharePoint content databases by externalizing BLOBs. Roughly 90-95% of your content database consists of BLOBs (this varies with auditing enabled). By externalizing the BLOBs we can reduce the size and number of databases required to support your environment.
  • Optimize performance by freeing SQL Server from the burden of managing unstructured pieces of data
  • Support the use a variety of storage platforms based on business requirements (storage costs, compliance, performance, etc) including SAN, NAS, and Cloud storage.
  • Create new opportunities to replicate and backup SharePoint content in a more efficient manor

If you consider that roughly 90-95% of a content database is comprised of BLOBs then you stand to have significant reduction in the database size and an increase in the size of the content you can manage per content database. One of the metrics that we often referred to with StoragePoint was the management of 2TB of content. If you reduce a 2TB content database by 95% you end up with a 102.4GB content database and 1945.6GB (1.9TB) of externalized BLOBs. This would be well within the database size limits for SharePoint 2010 and at the high end of the limit for SharePoint 2007. This sounds familiar doesn’t it? I think I have seen something like this in the SP1 limits for SharePoint 2010 … let’s take a look.

Service Pack 1 Consideration

Prior to the release of Service Pack 1 for SharePoint 2010 the content database did not include externalized BLOBs (yes, yes this is debatable but I can tell you that this is generally accepted based on the lack of clarity in the original Plan for Software Boundaries documentation). Microsoft revised this guidance along with the database size limits. For SharePoint 2010 the 200GB size limit is still in effect with a new option to expand a “collaboration” site to 4TB. Now for the fun part … in order to expand a content database beyond the 200GB limit you need an optimized SQL Server disk subsystem. Specifically Microsoft recommends that you have 2 IOPS (inputs/outputs per second) per GB of storage. Note that I am generalizing a bit on the SP1 guidelines and limits so you can read them for yourself here.

While the database “size” limitation includes the externalized BLOBs in the calculations, externalized BLOBs are not included in the IOPS requirement. In order to manage a 4TB database you must have a disk sub system that supports 2 IOPS per GB. If you are not familiar with this concept I can tell you that this is an expensive disk configuration (more on this below). With StoragePoint in place you can have a 4TB “content database” that consists of approximately 200GB SQL Database and 3.8TB of externalized BLOBs. All without an expensive disk requirement. Sound familiar? This is the same messaging that the StoragePoint advertised with the original release of StoragePoint 2.0 with SharePoint 2007/WSS 3.0 SP1 If you do believe that the IOPS requirement includes the externalized BLOBs then you have to discount Microsoft’s support for NAS storage (via iSCSI) with the RBS FileStream provider. Most NAS devices were not intended to support such a high level of IOPS. The new guidance is simply reaffirming what the StoragePoint team has asserted all along. Using a 95% reduction in the database (a typical database is comprised of 95% blobs) you would end up with a 200GB content database (within Microsoft’s original guidelines). If you decide to keep the BLOBs in the database then you need to have lots of expensive disk to maintain performance of your environment.

Let’s take a practical example following Microsoft’s database size limits and guidelines for disk performance and determine what a reasonable disk subsystem might look like. Remember that Microsoft requires .25 IOPS per GB for content databases over 200GB (2 IOPS per GB is highly recommended for optimal performance. Note that in order to keep things brief and to the point I am using some rough estimates to calculate IOPS. Disk performance is impacted by hard disk specs, RAID level, and controllers.

IOPS = 1/(Average Latency in ms + average seek time in ms)

The following tables illustrates the number of disks required to achieve both .25 IOPS per GB (minimum requirement) and 2 IOPS per GB (recommended). Note that for this example we will assume that the IOPS requirement is for data beyond 200GB leaving us with 3.8TB of data that requires optimal disk configuration (minimum IOPS = 972; recommended IOPS = 7792). Note the following assumptions used when calculating IOPS in the tables below.

  1. For each disk type IOPS estimates were used. IOPS will vary based on disk type and manufacturer
  2. RAID 5 and RAID 10 disk configurations were used as these tend to be the most common configurations for database servers (RAID 10 being the preferred configuration).
  3. The IOPS calculations make the assumption that .25 IOPS/GB and 2 IOPS/GB is required for databases above 200GB. The initial 200GB of data is not included in the minimum and recommended IOPS calculations. Additional disks would be require as including the 200GB in the calculations would require an additional 50 and 400 IOPS respectively.
  4. There is an IOPS penalty that varies based on the RAID configuration. For RAID 10 the IOPS penalty is calculated at .8 and for RAID 5 the IOPS penalty is calculated at .57.

Disk Configuration Sample for Minimum IOPS

Drive Type IOPS per Disk RAID Level Disk Capacity (GB) # Disks Usable Capacity (GB) Max IOPS
7200 RPM SATA 90 RAID 10 1024 14 7168 1008
10000 RPM SATA 130 RAID 10 1024 10 5120 1040
10000 RPM SAS 140 RAID 10 1024 10 5120 1120
15000 RPM SAS 180 RAID 10 1024 8 4096 1152
7200 RPM SATA 90 RAID 5 512 20 9216 1026
10000 RPM SATA 130 RAID 5 512 14 6144 1037.4
10000 RPM SAS 140 RAID 5 512 14 6144 1117.2
15000 RPM SAS 180 RAID 5 512 10 4096 1026

Disk Configuration Sample for Recommended IOPS

Drive Type IOPS per Disk RAID Config Disk Capacity (GB) # Disks Usable Capacity (GB) Max IOPS
7200 RPM SATA 90 RAID 10 1024 110 56320 7920
10000 RPM SATA 130 RAID 10 1024 76 38912 7904
10000 RPM SAS 140 RAID 10 1024 70 35840 7840
15000 RPM SAS 180 RAID 10 1024 56 28672 8064
7200 RPM SATA 90 RAID 5 512 152 38912 7797.6
10000 RPM SATA 130 RAID 5 512 106 27136 7854.6
10000 RPM SAS 140 RAID 5 512 98 25088 7820.4
15000 RPM SAS 180 RAID 5 512 76 19456 7797.6

As you start calculating the IOPS requirements (both minimum and recommended) it quickly become apparent that achieving an “optimized” disk subsystem for your large database is going to be quite expensive and will most likely result in overprovisioned disks. When you being to consider replication of environments for disaster recovery and nonproduction scenarios (i.e. moving production data into a nonproduction environment for testing) organizations will experience a 2-5X multiplier on the disk subsystem required to support SQL Server. Obviously this is not the ideal scenario for most organizations deploying SharePoint on any reasonable scale. RBS and products like Metalogix StoragePoint allow organizations to store content on the appropriate storage without the need to meet an expensive IOPS requirement.

Why Not Just Use the RBS FileStream Provider?

Somehow the RBS FileStream provider has evolved into a solution that some would actually consider for a medium or large scale SharePoint environment. I think folks forget why this provider was created in the first place. WSS 3.0 with the WIDE (Windows Integrated Database) option does not have a database size limit. In theory, and in practice, organizations can and have stuffed large volumes of content into this “at no additional charge” product. With the release of SharePoint Foundation 2010 and SQL Server 2008 Express edition, Microsoft introduces database instance limits. SQL Server 2008 R2 Express Edition has a 10GB instance limitation (SQL Server 2008 Express Edition has a 4GB instance limitation) … wait for it … now you see the problem. How can a customer upgrade without buying SQL Server licenses? Enter the RBS FileStream provider.

The problem with the RBS FileStream Provider is that it lacks basic features required to call it an enterprise solution. There are obvious issues such as lack of user interface, lack of support for “remote” storage, and lack of a multithreaded garbage collection process (this issue plagues many StoragePoint competitors as they opt to use the OOB garbage collector with RBS). But more importantly it fails to address a very important challenge. RBS FileStream does not bypass SQL Server for the processing of BLOBs. RBS FileStream pulls the BLOB out of the initial RPC call and then redirects it right back to SQL Server using the FileStream column type. Again, for obvious reasons this is not an efficient process. I am not saying that the RBS FileStream provider is not a viable solution but organizations considering this option should proceed with caution. Backing out of the RBS FileStream provider once you have amassed large volumes of content can prove cumbersome and time consuming.

Backup and Restore Considerations

Backup/restore and disaster recovery can be a complex topic and for this reason I am going to explore this in great detail in this post. Any RBS solution for SharePoint, including StoragePoint, will change the process for backing up and restoring SharePoint environments. What’s lost on most people is that this is not necessarily a negative aspect of RBS. Often the change is very positive and provides new ways for backing up SharePoint environments that weren’t previously possible.

Before we explore backup/restore processes it is important to first understand the anatomy of a BLOB when it is stored outside of SharePoint content databases. Externalized BLOBs are immutable which means they will never change once they are a written out to external storage. There is a one to one relationship between a BLOB and a given version of a file/document in SharePoint. This means that SharePoint will only create and delete BLOBs (StoragePoint actually deletes them as part of a garbage collection process). It may not be immediately apparent but this is actually a good thing. Traditionally you would backup SharePoint content databases using a simple or full recovery model. This means that you are taking full backups on a regular basis that contain objects that will never, ever change. This is less than efficient. By separating BLOBs from the database you can now backup (or replicate) a BLOB one time rather than capturing it in multiple backups. This approach reduces backup storage costs and provides an opportunity for DR scenarios (warm/hot failover) possible.

In general the backup process involves backing up the content database followed by the external BLOB store(s). A farm level restore would involve restoring your BLOB store followed by your content database(s). In many cases it isn’t necessary to backup the external BLOB store as there are ways to replicate it to multiple locations. Item level restores tend to be to area of biggest concern when using an RBS solution like StoragePoint. Fortunately StoragePoint has some built in features to make item level restore feasible. StoragePoint includes a feature called “Orphaned BLOB Retention Policies” that allows for the retention of BLOBs for which the corresponding database item has been deleted. These retention policies are used in conjunction with item level restore tools to guarantee that item level restore is available for a definable period of time.


RBS is clearly a viable option for organizations considering leveraging SharePoint where growth of the environment will be consistent or exponential over a period of time. Microsoft’s updated guidelines and database size limits are a confirmation of sorts for the opportunity that RBS presents for SharePoint deployments. If you are deploying SharePoint in any capacity you should consider RBS as an option for optimizing the storage, for both active and archive content for your SharePoint environment.

StoragePoint 3.0 Webinar Series Announced

Yesterday we announced a new webinar series where we will review and demonstrate the features of StoragePoint 3.0.  StoragePoint 3.0 is a highly anticipated release that provides significant advancement over our  already robuts BLOB remoting capabilities for SharePoint 2007 and SharePoint 2010.  New features like Intelligent Archiving, Multiple Storage End Points, and advanced reporting capabilities significantly reduces the total cost of ownership of SharePoint deployments while providing improved performance, reduced backup/restore time frames, and flexible storage options .  StoragePoint 3.0 is a must have for organizations deploying any level of document management or enterprise content management solutions on SharePoint 20007 or SharePoint 2010.  To register for an upcoming webinar please visit our Event Brite page at  For more information on StoragePoint visit,, or

When Not To Crawl Content

It is generally accepted that searching for content in MOSS or WSS 3.0 requires the content to first be crawled by the SharePoint Search Service.  However in traditional Enterprise Content Management (ECM) scenarios this typically doesn’t make a lot of sense.  If evaluate how most organizations manage content for the purposes of managing that content, you will quickly see why crawling content doesn’t make a whole lot of sense.  A typical ECM related business process involves the capture (data stream or scanning of content), categorization, processing, and archival of content.  In many cases significant time, money, and effort is expended in these business processes.  So if you spent significant resources to capture and categorize content then why would you rely on a search technology that is better suited for unstructured, full text queries to retrieve your content?  In most (and I say “most” because there are exceptions to this rule) ECM scenarios users are not conducting broadly scoped searches for content.  User’s search critieria is very targeted and specific.  For example an accounting user might want to search for an invoice for a specific vendor based on vendor id and/or invoice number.  A slightly broader search might be executed where the same accounting user is looking for all invoices from a specific vendor for the 2008 calendar year.  In either case the search is targeting.  Attenmpting to crawl this content doesn’t result in a favorable outcomes.  For starters crawling content in SharePoint doesn’t occur immediately after content is added and incremental crawls can take long periods of time to execute depending on how much content was added since the last incremental crawl was executed.  In many EMC scenarios users are required to immediately validate the content once it’s archived to SharePoint but requiring the content to first be crawled doesn’t support this process due to the latency by which items are made available for searching. 

 The performance challenges with crawling large volumes of content in SharePoint are well documented.  If you are not familiar with SharePoint limitations I would recommend reviewing Microsoft’s TechNet article title Plan for Software Boundaries (Office SharePoint Server) located here  If you have ECM scenarios where users are conducting targeted searches in SharePoint, I would suggest evaluating existing search utilities that leverage CAML (Collaborative Application Markup Language) or developing your own.  In large volumen scenarios it makes sense to exclude the content from the SharePoint crawl all together.  I have personally experienced extremely unfavorable crawl performance as a result of larger content volumes in SharePoint even when the underlying SharePoint server infrastructure was optimal.

Bookmark and Share