Every decade has a key technology and a
wave of new companies breed up around this technology. For the 90’s it was
write once run anywhere Java, for the 2000’s the whole world was caught up in
the web. For the 2010 decade, the software section is buzzing with the term Big
Data along with a significant increase in consumer electronics such as tablets
and phones. The key software framework which is most sought out for understanding
Big Data is Hadoop. There are three main next gen companies which want
customers to ask big questions and brave into new frontiers. In this blog I want
to write about Hadoop and its allies.
Hadoop was created by Doug Cutting
(used to work for Yahoo, currently Chief Architect at Cloudera & Director
at Apache Software Foundation) and Mike Cafarella (currently a professor of
computer science at University of Michigan) in 2005
Hadoop was quickly recognized as the go
to solution for distributed computing using commodity hardware. This framework
promised painless deployment, easy maintenance and was at the right time for
handling the big data explosion (being a bit dramatic). Ever since its
release under the Apache federation foundation this project amassed lot of
committers; soon an ecosystem was built around Hadoop. Variety of companies
started providing Hadoop consulting services, amongst them the most popular are
Cloudera, Hortonworks & MapR
Cloudera
Three top engineers from Google, Yahoo
and Facebook (Christophe Bisciglia, Amr Awadallah and Jeff Hammerbacher)
foresaw a need for analyzing, managing and processing big data, so they teamed
up with an Oracle employee and formed Cloudera in the early 2008.
Open Source
Contribution
The employees of this company are
active participants to many open source projects, some popular projects are:
Apache Avro, Apache Bigtop, Apache
Crunch, Apache Flume, Apache Hadoop, Apache Hive, Apache Lucene, Apache MRUnit,
Apache Oozie, Apache Sqoop, Apache Whirr, Apache ZooKeeper, Cloudera
Development Kit, Crepo, Hue, Impala, Kitten, ML, Seism, CDH - 100%
Open Source
Hadoop Distribution
Other services provided by this company
include a Cloudera Manager & Navigator- Centralized System Management which
provide a rich set of features which include
Manager
Deployment & Configuration, Service
Management, Service & Host Monitoring, Diagnostics API, Rolling
Updates/Restarts, SNMP Support, LDAP Integration, Configuration History &
Rollbacks, Operational Reports, Automated Disaster Recovery, BDR Add-on
Navigator
Data Audit – HDFS, HBase & Hive,
Navigator Add-on, Access Management, Technical Support and Indemnity, Core
Projects, Apache HBase, RTD Add-on, Cloudera Impala, RTQ Add-on, Cloudera
Manager, Cloudera Navigator, Navigator Add-on
Partners &
valuation
Over 160 active partners
Current net valuation of $700 million
at least that the word on the street, along with an additional $65 million
through Series E round of venture capital funding by Accel patners .
Developers Community
Cloudera offers training for only 5
days at a price of around $2500. This training includes Hadoop development,
administration and data science. Also they provide certification as well.
Hortonworks
This company was incubated by Yahoo.
The employees of this company are regular committers to various Apache Hadoop
eco systems projects. Primarily this company provides consulting services and
offers an end to end solution with the Hadoop framework. They are emerging out
with various data products with the newer version of Hadoop.
Open Source
Contribution
Yarn, Tez, Stinger, Ambari, HDP
platform
Partners &
Valuation
Microsoft, Teradata, Talend, Rackaspace
are official partners. This company is currently valued at $200 million.
Developers Community
Hortonworks offers a similar training
like Cloudera for only 5 days at a price of around $2500. This training
includes Hadoop development, administration and data science. Also they provide
certification as well. The advantage with the Hortonworks is they are committed
with Microsoft. They provide training for Hadoop on Windows platform.
MapR
My personal opinion their business
model is not viable, although backed up with EMC and Cisco, their online presence
is dominated by Cloudera and Hortonworks. This company provides a commercial Hadoop
solution by offering three products M3, M5 and M7. The employees of this firm
actively participate with HBase, Pig, Apache Hive, ZooKeeper.
Partners & valuation
Google Compute Engine, EMC Cisco, valued at $30 million
What’s in it for us?
Well Hadoop is free, anyone can
download it and use it, and the trick is to know what to use it for and how to manage
it. There is a need for distributed computing platform and Hadoop by itself is in
its nascent stages. It’s definitely not a threat to existing database platforms
however investing time and money with Hadoop would be a safe bet for future.