Enterprise vs. Open Source vs. SaaS BI: Some Thoughts

Updated: September 16, 2010

Let's consider the architecture, TCO, and best usage of each solution type in turn.

Enterprise BI

An enterprise BI solution such as IBM Cognos sets up data feeders from operational databases such as order entry into a common central data store, known as a data warehouse (or multiple data marts). Data comes into the data store in hourly, daily, or weekly bursts, and the periods of time not spent on "mass loads" is spent on running queries against the data store using "BI software". Today, such a store may be terabytes or more, and it is typically composed of numeric or text data records with relatively small sizes.

BI solutions cost a lot, but a large part of the cost goes to database administration. The reason is that the data warehouse has to maximize query performance, day after day, year after year, as the data-store size increases by 50% a year. Only a fine-tuned, powerful database can handle the job, and every customer believes that his or her fine-tuning is not "one size fits all" - it's very hard to outsource the tuning that the administrator performs on the database.

However, the same is not true for SMBs. Up to a certain point (in my opinion, somewhere around 500-1000 employees in the company) these companies need raw power rather than customization. Moreover, it's possible to find a cheap enterprise database that will handle all the load that an SMB can throw at it, and deliver "near-lights-out administration" as well. The result is that, according to my studies, an SMB can save more than 50% in 3-year TCO by using one of these instead of Oracle.

Open Source BI

An open source BI solution such as Jaspersoft or Pentaho replaces the license cost of a full BI solution with an open-source "free" distribution of software, plus either a fee for services or an "enterprise edition" at moderate cost. The architecture of the open-source solution is pretty much the same as that of an enterprise BI solution, although the prevalence of open source communities on the Web has led to a significant presence of open-source BI software in public clouds.

The main attraction of open-source BI is the reduction in license costs. Note, however, that the open-source BI solution either uses an enterprise database, in which case overall costs are not reduced by much, or its own open-source database (typically MySQL), in which case the open-source solution won't scale as well and may be more appropriate for an SMB. The main possible problem with open-source BI is not the possible security vulnerability of company data (since users can always take advantage of sophisticated Web security schemes and keep the physical architecture in the company itself), but rather the relative inexperience of today's open-source community with scaling databases. It is only very recently that open-source databases like mySQL have implemented some of the basic mechanisms of enterprise databases to ensure data integrity and consistency, and Java programmers frequently betray a poor understanding of database schemas. Finally, for some SMBs, databases that offer "administration for dummies" are vital, because good database-administration personnel are just not out there to be hired, even in today's economy. All in all, open-source BI right now occupies a "middle tier" in the BI market - good for medium-to-large-scale implementations where Web knowledge is plentiful.

SaaS BI

Birst is a good example of the new breed of SaaS BI provider. The architecture is hosted and multi-tenant (multiple users can share one BI "veneer" and physical data store). Instead of flowing operational data to an in-house data store, Birst redirects the data to a Birst data center "in the cloud." To implement Birst, one simply inserts new generic ETL software that feeds the hosted hardware, and the Birst solution auto-discovers the structure of the existing data. Thus, deployment is quick, and administration is cross-customer, cutting the costs (included in the price) of database administration. Moreover, the solution itself is necessarily quite agile, being able to adapt more readily to, or be customized more quickly for, new data types and new kinds of transaction streams (with cost-saving load balancing).

However, many large-enterprise implementations do not just store new data in the data warehouse; they also store historical data. Moving massive amounts of new data to a geographically farflung SaaS data center is much slower than moving multiple smaller streams of that data to a local data center or one with dedicated communications. Things are even worse when historical data is involved, because it can increase the amount being loaded by one or two orders of magnitude. The proof of this is in the new cloud concept of "data locality": although theory says that applications can be moved quickly between geographies in a public cloud, in fact implementers keep the data where it is, and "pretend" that the data has been moved along with the code -- because moving large amounts of data dynamically croaks performance.

The result is that SaaS BI is especially good for one of two situations: handling a new SMB's BI, or serving as a complement to a larger organization's BI to do quick ad-hoc deeper data mining for particular, smaller data marts or tables.