Ultra-large-scale Sites

This became a research area almost by accident. I had to write an article on distributed systems in the context of computer science and media which - due to work on much research and work on secure systems - turned out to be too late for publishing. During a sabbatical term I started to extend the article. At that time mayspace, facebook, flickr and others were growing like crazy and it was clear that distributed systems played a central role in this: the content side of media production turned out to become distributed as well through Web2.0. And the single user in front of a PC became a distributed node within her groups, linked through many differnt relations and communication media.

I developed a big interest in researching and describing the interconnections between socially and technically distributed systems, between groups and their social sites which developed their own interconnnections.

The presidential election in the states showed at the same time the growing importance of social sites and their infrastructure which had to scale immensely during thw whole phase. And it was then that I discovered Todd Hoff's excellent portal on scalability issues: HighScalability.com. His continuously growing collection of articles on the technical infrastructure of large-scale sites like facebook or flickr provided me with the necessary input for my work. And I noticed that sometimes it was rather hard to understand what the chief architects of famous sites really meant when talking about problems and solutions.

And that was the birth of my current research: I am trying to find the core architectures, methods, components and algorithms that make systems scalable. Research questions are e.g. whether one can build extensible and scalable kernel and grow them incrementally into large scale sites. Or whether "revolutionary" phases are needed which will require complete re-writes.

But there is an educational and very practical goal besides my research interests: To enable beginners and professionel developers to build ever larger but still maintainable systems, using the experience from ultra-large-scale sites. In my course on ULS Systems at digicom I wrote that even if you do not need to build a system on such an extreme scale, you will benefit greatly from knowing those architectures. Only they make scalability problems really visible.

Currently me and my students of the system engineering and middleware courses are experimenting with large-scale infrastructures, measuring and optimizing etc. And at the same time we continue to look at core algorithms and problems like time in virtual systems, failure detection and basic architectural components. In other words the second half of the book is currently developing and sections on MMOGs, Clouds etc. are being added

The resulting book is under Creative Commons license and will be Open Accessed. After taking so much from Todd Hoff's site and the input and papers from my students I considered it only decent to give the work back to the public.

You will find here a growing set of materials, research papers etc. on the topic of ultra-large-scale sites and their use in social media.

Papers

Caching: A very good overview on different aspects of caching and caching tools can be found in the Caching Paper by Stadtrecher/Holder.Watch out for "dogpile" effects and thundering herds which recently brought down facebook.
Andreas Stiegler, Shattering and Feature Management in MMOGs extra slides and his presentation at the 6th gamesday: An excellent explanation of partitioning (logical and physical) in MMOGs as a necessary means to ensure performance. Feature Management along several dimension shown as being critical for the survival of game sites. Content mapping to server infrastructures is shown as a general principle and not - as in many business sites - an accident.
Isolde Scheurer, Eve-Online - A Shardless MMOG: After putting so much emphasis on partitioning it is almost a surprise to see a large MMOG without shards. Eve is not as large as WoW but has a growing user base. The paper explains the technical infrastructure and algorithms needed to allow a shardless, continuous world. The author gives various interesting economic and social reasons for shardless or sharded design. The paper is also a living proof the we discovered some core algorithms and techniques for scalability correctly in our course: asynchronous, stackless Python allowed a huge reduction in proxy servers for the Eve-online infrastructure.
Marc Seeger, Key-Value Stores - a practical overview: The rationale for regular RDBMS is under scrutiny in certain circles: Do we really need such a huge database engine for regular web sites and their data models? What can we save by using alternatives like key-value stores or column stores? And how will the APIs look to access those stores e.g. from Ruby? The author gives a very good comparison of various products and technologies (CouchDB, Tokyo Cabinet, Cassandra etc. in the NOSQL area.
Christian Schiepe, Monitoring and Tracking in ultra-large-scale sites: An overview of various tools and their technology and why this is a core topic for large scale sites. Ganglia, Munin etc. are explained and extension mechanisms shown. A good start for those interested in monitoring.
Stefan Zülch, Cloud-Computing Offerings (in German): On overview of various offerings in the cloud computing area. (currently reading it)
Clemens Kern, Quake-Live: New ideas in this free-to-play ego shooter and what makes it different. Interesting features like "skill-matching" are explained.
Patrick Bader On Language Support for Application Scalability: Large sites use many different languages and it almost looks like language does not matter at the first glance. This is wrong of course and a closer look at the use of languages in large sites reveals a multi-paradigm approach: take the best language for a specific job. The paper by Patrick Bader explains exactly what kinds of language paradigms are important, e.g. for multi-core architectures. Parallel programming support, fault-tolerance and hot update options are discussed as well as the strategy to deal with errors using Erlang as an example (supervision trees..).

Practical Scalability Engineering

The goal here is to set-up an environment (e.g. a mediawiki installation) and then grow it in a controlled way by doing load and performance tests, changing and extending architecture, components and algorithms, add monitoring and alarming from the beginning and os on. Use e.g. map/reduce to "outsource" critical processing, re-design data model and data base for performance and scalability, add caches, change to asynchronous processing where possible etc..

Planned Sessions:

Example of multi-tier architecture (wikimedia), scalability and component model, profiling options

Load Test Design, performance measurements and interception points, profiling technology

Database design and scalable data model

Monitoring approach and tools

Alarming and Failure Detection

Capacity Planning

Caching Architecture and measurements

Reporting technology

Optimizations for scalability after tests

Special component: map/reduce with Hadoop

Cloud computing and scalability (Visiting RUS?)

Special component: Scalaris as Wikimedia simulation

Special component: Erlang light http server comparison to apache

Middleware and Algorithms

Here we want to get some theoretical background on system engineering and management tasks.

Simulation with GridSim and Palladio

Pprinciples of failure detectors and partial synchronous systems

Failure models: what can go wrong and what are the effects on our architecture?

Chubby/Paxos Example to discuss the way from algorithm to implementation

Time in virtual machines: measuring cloud performance

A popular site goes social: how to scale a social network site, social graphs etc.

Group communication and group membership with virtual synchrony

Reasons for blocking in 2pc/3pc and how to make progress

Deanonymization in social networks

Scalability algorithms in MMOGs

Eventual consistence and its semantics: dynamo vs. Yahoo's new replicated store

Cloud API and restrictions

Resources

Most resources will simply show up in the draft to avoid duplicated efforts. Most important for participants in my course is probably our internal redmine for systems engineering and middleware

Facebook Scalability as a function of memory access latency, more: A few comments on Facebook scalability benchmarking.
Design beyond human abilities, more..: An inspiring talk by Richard Gabriel on ultra-large scale, self-sustaining systems and a few thoughts on computing beyond human and turing machines (aka: Digital Evolution and Hypercomputation). Crazy but interesting stuff.