Relational Technologies Under Siege:
Will Handsome Newcomers Displace the Stalwart Incumbents?
Published: October 16, 2014
Analyst: Neil Raden
After three decades of prominence, Relational Database Management Systems RDBMS) are being challenged by a raft of new technologies. While enjoying a position of incumbency, newer data management approaches are benefitting from a vibrancy powered by the effects of Moore’s Law and Big Data. Hadoop and NoSQL offerings were designed for the cloud, but are finding a place in enterprise architecture. In fact, Hadoop has already made a dent in the burgeoning field of analytics, previously the realm of data warehouses and analytical (relational) platforms.
• RDBMS are overwhelmed by new forms of data (so-called “big data”), including text, documents, machine-generated streams, graphs and other, but are counter-attacking with new development and features as well as acquisitions and partnerships
• Non-relational platform vendors assert that the relational model itself is too rigid and expensive for the explosion of information
• A fundamental drawback in RDBMS technology is the tight coupling of the storage, metadata and parser/optimizer layers that cannot take advantage of the separate storage and compute capabilities of Hadoop
• Advances in technology are not the key differentiators between RDMBS tools and Hadoop/Big Data NoSQL offerings. Requirements are. The continuing enterprise need for or quality, integrated information and a “single version of the truth” argues for existing and enhanced relational data warehouses versus the “good enough” mentality of cloud-based and Hadoop efforts that were developed for large internet companies are key identifying differences between analytical approaches
• The “new-new” is pretty exciting, but there is a rush to provide true SQL access to many of these platforms, an admission that the relational calculus will endure
• Desirable features of RDBMS will migrate to the distributed processing of Hadoop, but only once Hadoop solves its shortcomings in security, workload management and operability. Born-in-the ¬cloud SaaS applications built on NoSQL databases (even some to emerge) will operate seamlessly on this platform, but not for 3-5 years
• Surveys of “revenue intention” for new technology spending are misleading; only 15% of companies surveyed are using Hadoop, and many are experiments.
• Recognize that RDBMS, Hadoop and NoSQL databases have vastly different purposes, capabilities, features and maturity
• When contemplating a move from a Enterprise Data Warehouse and/or on-premise ETL, take the long view of the effort, cost and disruption
• Determine exactly what your RDBMS vendor is planning for supporting “hybrid” environments because, for the time being, it will have the effect on the downstream activities of analytics
• There are many use cases for NoSQL/Big Data that are compelling and you should carefully consider them. In general, they go beyond your existing Data Warehouse/BI but are not necessarily a suitable replacement. IN two years this will likely change.
• Go slow and do not throw away the baby with the bath water. The best approach is to experiment with a “skunk works” project or two to get a feel if the approach is right for your organization. Beyond that, design a careful Proof of Concept (PoC) that can actually “prove” your “concept.” Vendors tend to insert requirements and features that favor their product, which can derail the validity of the PoC.
Relational database technology was adopted by the enterprise for its ability to host transactional/operational applications. By the late 80’s vendors posted benchmarks of transactions/second that exceeded those of the purely proprietary databases with the added benefit of an abstracted language, SQL, that allowed for different flavors of databases to be designed, queried and maintain without the effort of learning a new proprietary language for each one.
Later, as the need grew for more careful data management for reporting and analytics, RDBMS were pressed into service as data warehouses, a role for which they were not well-suited in terms of scale and especially speed of complex queries and large table joins. This need was met in a number of ways, to some degree, but it took time.
This is precisely where we see Hadoop today, a tool that was built to support search and indexing of unruly data in the Internet, primarily. However, its advantages in term of cost and scale are so compelling that it is quickly being pressed into service as an enterprise analytics platform, but it is sorely lacking in some features that data warehouses and analytical platforms (like Vertica, Netezza, Teradata etc.) already possess.
The trend for distributors of Hadoop is to claim that relational data warehouses are obsolete, or at best artifacts that have some enduring value. Curiously, with all of the attendant deficiencies of RDBMS in their view, they are mostly mute about RDBMS for transactional purposes, but that is likely to change.
Relational vendors are at work to put in place reference architectures (and products to support them) that are hybrid in nature. A term emerging is “polyglot persistence,” the ability of the first mover in an analytical query to parse and distribute pieces of the query to the logical location of the data and, preferably, the compute engine for that data without having to bulk-load data and persist it to answer a question. The concept is similar to federating queries, but much more powerful as a federation scheme usually involves design of a reference schema and assembling and transforming the data into a single place to satisfy the query. In a hybrid architecture, there are actually multiple storage locations (even in-memory) and compute resources working in a cooperative fashion. This arrangement preserves the RDBMS as the origin of analytical queries and provider of the answer set and simplifies the maintenance and orchestration of downstream processes, especially analytical, visualization and data discovery.
RDBMS were mostly row-oriented, given their OLTP orientation, but some adopted a column-orientation, the most visible being SybaselQ. In the past few years, it became obvious that analytical applications would be better served by a columnar orientation and products like Vertica emerged combined with a highly scalable MPP architecture. But today, there is an explosion of new databases of many types such as (a sampling, not comprehensive):
• Column: Accumulo, Cassandra, HBase
• Wide Table: MapR-DB, Google BigTable
• Document: MongoDB, Apache CouchDB, Couchbase
• Key Value: Dynamo, FoundationDB, MapR-DB
• Graph: Neo4J, InfiniteGraph and Virtuoso
Keep in mind that none of these database system are “general purpose,” most require programming interfaces and lack the kind of management and administrative features that IT departments demand.
The explosion in database technology was inevitable as the effects of Moore’s Law caused a discontinuous jump in the flow and processing of information. Technology, however, is always a step ahead of business. The implementation of enterprise applications, information management and processing platforms is a carefully woven fabric that does not bear rapid disruption (unless, of course, that is the enterprise’s strategy). “Big data” can provide enormous benefits to organizations, but not all of them. Many will find it preferable to rely on third parties to prepare and even interpret big data for them. For those that see a clear requirement, it is wise to consider the whole playing field and how the insights gained will find purchase and value. As Peter Drucker said, “Information is data that has meaning and purpose.”