From the day Mark Zuckerberg started building Facebook in his Harvard dorm room in 2004, the site has been built on common open source software such as Linux, Apache, memcached, MySQL, and PHP. In that time, we've open sourced more than 20 different technologies, and scaled Facebook to reach over 350 million people around the world. Today we are pleased to announce that we are becoming a Gold sponsor of the Apache Software Foundation (ASF), which has been instrumental in fostering open source adoption and providing structure to build successful open source communities.
The ASF has over 100 different projects which all help the Web grow as it continues to evolve. As Jim Jagielski said, "sponsoring the ASF helps us grow existing projects, incubate new initiatives, promote community development, host user events, expand our outreach, and provide the infrastructure that keeps the Foundation running on a day-to-day basis." Beyond funding the ASF to help the organization grow, we really want to continue focusing on building, releasing, and fostering great open source software which tackle hard scaling problems.
If you read our engineering blog, you'll know that it's not possible to scale a site like Facebook simply by sharding your databases, but rather takes a combination of specialized technologies. Open source allows us not just to make technologies like memcached scale beyond its original intent, but to release technologies like Thrift for others to build upon as well.
Over the past two years we've contributed the following open source projects to the Apache Software Foundation. While there's still work to do, our goal is to build robust communities of both developers and users around each.
- Thrift, which is a framework for scalable cross-language services development (it lets our PHP website code talk to our backend services in C++, Erlang, and Java), was released in 2007, and then brought into the Apache Incubator in 2008. Today there are about a half-dozen developers maintaining the project and there is language support for services written in C++, C#, Cocoa, Erlang, Haskell, Java, OCaml, Perl, PHP, Python, Ruby, and Smalltalk. Thrift is used by over a dozen different projects and companies including both Cassandra and Hive. The C#, Cocoa, Java, and Ruby mappings are also now being maintained by non-Facebook developers. About a month ago the Thrift team made their first release, Thrift 0.2.0.
- Hive is a data warehouse infrastructure built on top of Apache Hadoop. It provides tools to easily query and analyze large data sets stored within Hadoop. Hive defines a simple SQL-like query language that enables people familiar with SQL to get started quickly. At the same time, this language allows programmers who are familiar with the MapReduce framework to perform more sophisticated analyses that may not be supported by the built-in capabilities of the language. Most of the data analysis in Facebook - both ad hoc as well as periodic jobs - are done using Hive. We store about eight petabytes of data within Hive, which is used by more than 200 people within the company every month, regularly running over 8,000 jobs per day. Beyond Facebook, Hive is used by the likes of CBS, Digg, hi5, and last.fm.
- Cassandra is a scalable, distributed, and structured key-value store which we released in late 2008 and was brought fairly quickly into the Apache Incubator. Today we use it for Inbox search, but the majority of development is now being led by Digg, Rackspace, and Twitter. In addition to using Cassandra for Inbox search, we have also open sourced all of the the same code that we use to run it in production. While we haven't been developing it actively, we're rather excited to see the community take on a life of its own and continue improving Cassandra and increasing its adoption.
David, senior open programs manager, is looking for people who like solving big problems and love working on open source. We're always looking for amazing engineers!