Commenting on Software Technology Trends & Market Dynamics

Jnan Dash

Subscribe to Jnan Dash: eMailAlertsEmail Alerts
Get Jnan Dash: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Datacenter Automation, SOA & WOA Magazine, Java Developer Magazine, Big Data on Ulitzer

BigData: Blog Feed Post

Data Management, Circa 2011

These are exciting times for Data Management research and development

The world of Data Management has never been this vibrant as now. Only five years back, if you were to start a new database product company, the VC’s would have thought you to be real crazy. Why start something in an established market with 3 leaders – Oracle, IBM (DB2), and Microsoft (SQL Server)? Then we started to notice “specialized” appliance products such as Netezza (now IBM) and Greenplum (now EMC) crop up,  to focus on large scale data analytics. This trend was soon followed by Oracle (Exadata) and now HP (Vertica).

But what I am talking about is a list of new companies backed by well-known VCs addressing the Data Management problems of the Internet era. We can roughly divide the data world into two – operational data management and analytic data management.

Within the operational data camp, there are three categories:

  1. Traditional RDBMS (read Oracle, DB2, SQL Server, Sybase, Ingres, MySQL,etc.) and NewSQL products addressing mostly MySQL scalability and performance issues (e.g. Clustrix, Drizzle, VoltDB, NimbleDB, MySQL Cluster,..). I advise two companies in this category, ScaleDB and ScalArc.
  2. Traditional non-relational DBMS (Objectivity, Progress, Versant, etc.) and NoSQL which has seen a lot of new activities. The NoSQL data management products deal with key-value store, or the big table, or a document data, or a graph data. Examples of products include CouchBase, MongoDB, Riak, VoldeMart, BerkeleyDB, Hypertable, HBase, Cassandra, GraphDB, etc. They address very large number of simple structures and use parallel computing for performance. Google invented Map-Reduce algorithm that has become the Hadoop open source with HDFS as its file base.
  3. Distributed Data Grid and Cache technologies. Here Memcached came as an open source caching framework for MySQL and PHP applications. Other solutions include Terracotta, GigaSpaces, Oracle Coherence, etc.  SAP is also trying in-memory solution called Hana.

The Analytic Data Management space has two categories:

  1. Non-relational (like Hadoop, Mapr, Piccolo,Dryad, ..)
  2. Relational products like Infobright, Netezza, ParAccel, SAP Sybase IQ, Teradata, EMC Greenplum, HP Vertica, IBM Infosphere, etc. The phrase Big Data is applied here, typically exceeding a petabyte. Social networking sites like Facebook and Tweeter are dealing with this.

I have seen the acronym SPRAIN (Scalability, Performance, Relaxed Consistency, Agility, Intricacy, and Necessity) to explain why the incumbents are inadequate to address the new challenges of unstructured data as well as Big Data.

These are exciting times for Data Management research and development.

Read the original blog entry...

More Stories By Jnan Dash

Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at