Last week, data upstart Hortonworks (HDP) went public and the stock rose 40% on its first day. Investors are in love with Hortonworks because it's the first public company in the Hadoop space to commercialize open source since Red Hat (RHT).
Hortonworks is based on a unique version of the Apache Hadoop. Horton has customized the open source version of Hadoop and sells add-on software and consulting services around Hadoop.
Many people think of Hadoop as a database but it really isn't. Hadoop is more of an ecosystem that allows massively parallel computing and enables a new type of database that is used to store "unstructured" data. These types of databases are often called "NoSQL" databases. Traditional "SQL" databases store structured data nice and neatly in rows and columns, like a spreadsheet. To retrieve the data, the programmer simply queries the right row and column. SQL databases work great on "structured" data. SQL databases have been around since the 1960s. Structured data, such as names, addresses and zip codes, can be easily inserted into rows and columns.
Several years ago Yahoo! (YHOO) engineers needed a way to store and analyze "unstructured" data. Unstructured data are messy. These data don't necessarily fit into rows and columns. The Yahoo! engineers designed a whole ecosystem that could handle this messy data. They engineered a solution and made it all freely available on the Internet.
Think about trying to store tweets. Tweets can contain pictures, comments, links, re-tweets and replies. It could take a lot of work (and programming) to get that kind of data into a traditional SQL database. You could scrape millions tweets off the Internet and dump them into a NoSQL database running in a Hadoop environment and let the database figure out the best way to store the data.
Hapdoop's Map Reduce function stores a summary of the data from each parallel server in a central location. Because of the Map Reduce feature, Hadoop is lightening fast. It allows users to perform real-time analysis on massive data sets. If you wanted to do real time sentiment analysis on millions of tweets during the Super Bowl, you would use Hadoop. Besides being freely available, Hadoop can run on inexpensive generic servers and can scale to truly gigantic proportions. Both Yahoo! and Google (GOOG) have customized versions of Hadoop running on hundreds of thousands of servers.
For a true orgy of data, enterprises are running Tableau Software (DATA) on top of their Hadoop environment and allowing their users to discover hidden trends with just a few clicks of the mouse.
Being a leading Hadoop provider, Hortonworks has grown explosively. The company has grown from just 54 customers on Sept. 30, 2013 to 233 a year later.
For the nine months ending September 2014, Hortonworks had $33.3 million in revenue, up from just $1.6 million in 2012. Although the company has lost $86 million in its short life (founded in 2011), over the next three years, I think it will continue to see explosive growth as corporate America wakes up to the possibilities of the Hadoop environment.
Obviously, this is a small company with a limited operating history and that makes the stock very speculative. After the IPO, Yahoo! will own about 17% of the company. Tech investors who like to take speculative flyers on big data names will find that Hortonworks fits the bill.