Commenting on Software Technology Trends & Market Dynamics

Jnan Dash

Subscribe to Jnan Dash: eMailAlertsEmail Alerts
Get Jnan Dash: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Apache Web Server Journal, IBM Journal

Blog Feed Post

IBM’s Big Commitment to Apache Spark | @CloudExpo #DevOps #Microservices

It will offer Apache Spark as a service on Bluemix

Last June IBM made a serious commitment to the future of Apache Spark with a series of initiatives:

  • It will offer Apache Spark as a service on Bluemix (Bluemix is an implementation of IBM's Open Cloud Architecture based on Cloud Foundry, an open source Platform as a Service (PaaS). Bluemix delivers enterprise-level services that can easily integrate with your cloud applications without you needing to know how to install or configure them.
  • It committed to include 3500 researchers to work on Spark-related projects.
  • It will donate IBM SystemML (its machine learning language and libraries) to Apache Spark open source

The question is why this move by IBM?

First let us look at what is Apache Spark? Developed at UC Berkeley's AMPLab, Spark gives us a comprehensive, unified framework to manage big data processing requirements with a variety of data sets that are diverse in nature (text data, graph data etc) as well as the source of data (batch v. real-time streaming data). Spark enables applications in Hadoop clusters to run up to 100 times faster in memory and 10 times faster even when running on disk. In addition to Map and Reduce operations, it supports SQL queries, streaming data, machine learning and graph data processing. Developers can use these capabilities stand-alone or combine them to run in a single data pipeline use case. In other words, Spark is the next-generation of Hadoop (came with its batch pedigree and high latency).

With other solutions for real-time analytics via in-memory processing such as RethinkDB, an ambitious Redis project or commercial in-memory SAP Hana, IBM needed a competitive offering. Other vendors betting on Spark range from Amazon to Zoomdata. IBM will run its own analytics software on top of Spark, including SystemML for machine learning, SPSS, and IBM Streams.

At this week's Strata conference, several companies like Uber described how they have deployed Spark all the way for speedy real-time analytics. 

More Stories By Jnan Dash

Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.