Notes from the Linked-In: Lessons learned and growth and scalability session at QCon with Jean-Luc Vaillant.
Their architecture includes:
- Java (trying out some Ruby, adding some C++, as little as possible)
- Oracle 10g and MySQL
- ActiveMQ (tried OracleMQ, doesn't recommend it)
- Tomcat & Jetty
That raises the connection of how to keep the RAM database in sync at all times. One option is to update the database and inform other engines of changes through direct RPC, reliable multicast, JMS. This has the typical problems of two-phase commit.
An alternate approach that LinkedIn has used is to log changes in a transaction log which can be pulled from each graph engine into RAM as necessary. The approach is currently Oracle-specific, but it is applicable to just about any database.
Once that's in place, the in-memory techniques for traversing the graph are far less painful. Breadth-first traversal to get connections of various degrees. Using symmetry to find connections from both sides.
Having run into issues with Read-Write Lock, he prefers Copy On Write.