Data Science in the Enterprise by Amr Awadallah, CTO Cloudera

May 11th, 2017 by Travis Miller

Hadoop emerged in the early 2000’s as an open source software designed to help companies capitalize on ever-increasing streams of big data.  It wasn’t long before several leaders in the tech industry moved in: forming companies that focused on creating proprietary programs that ran on top of Hadoop.  One of those companies – founded in 2008 – was Palo Alto-based Cloudera Inc.

Nine years after its founding, Cloudera launched its initial public offering on April 27, 2017.  Shortly thereafter, on May 11, the Hive Think Tank hosted Amr Awadallah, Cloudera Co-Founder and CTO, to discuss his perspective on Data Science in the Enterprise.

During his presentation, which can be viewed in full here, with corresponding presentation slides here, Awadallah gave larger context to the need for continued advancements in data science.  He discussed what he termed “the six waves of automation” that have occurred throughout human history – from pre-historic times to present day – and noted the positive outcomes of each of these waves.  Awadallah claimed that the “automation of decision-making” wave we’re witnessing today is no different: it too is having incredible positive impact in numerous contexts (healthcare, insurance, law, and others).  He thus proposed that continued growth of this wave should be eagerly welcomed.

With such an optimistic vantage point, Awadallah contended that there is a need for marked improvement in data science environments.  Awadallah argued that data scientists often stand against several limitations that cripple their ability to explore, discover, and quantify new opportunities: limitations such as a lack of access to secured clusters, an inability to scale due to limited data storage, and an inflexible, unintuitive user experience.

As a proposed answer to these issues, Awadallah introduced Cloudera’s latest product, the Cloudera Data Science Workbench. Awadallah noted the multiple assets of the Workbench: the ease and immediacy of collaboration between multiple users… support for Python, R, and Scala alike… visualized real-time results… consumer ownership of IP… and more.  Demonstrating the Workbench’s breath of functionality and intuitive user experience, Awadallah devoted the latter half of his presentation to a live demo of the tool, including an interactive question and answer session.

Cloudera’s initial public offering was an encouraging and exciting moment for leaders and innovators in the data and analytics space: it’s an indication of just how far big data companies and solutions can go.  All the more, as tools such as the Data Science Workbench continue to emerge for data science and data engineering, these forward strides should encourage and excite business people and consumers the world over.

 

Recent Posts

Leave a Comment