Simplified Amazon Dynamo Style
Key-Value Storage

Introduction

This is a Dynamo style key-value storage which provides both availability and linearizability at the same time viz., the system should always perform read and write operations successfully even under failures. Further, the read opearation should always return the most recent value. To accomplish this, we need to implement partitioning, replication and failure handling. In order to implement dynamo style replicated key-value storage we have used 5 AVD instances (nodes) which will acts as both server and client and their Content Provider is used to implement all the storage functionalities. Quorum approach is used to implement linearizability. The key-values are stored as files with key as file names and value as the contents of the file. For this implementation the system assumes that there can be atmost 1 node failure at any given time and the system is requred to support insert, query, delete, query all keys in the dynamo ring, and query keys stored in a particular AVD. The failure is emulated by force closing the instance an app instance rather than by killing an entire emulator instance.

Design of Content Provider

  1. Membership: Every node can know every other node. This means that each node knows all the other nodes in the system and also which partition in the dynamo ring belongs to which node. Any node can forward a request to the correct node directly.
  2. Request Routing: Each Dynamo node knows all other nodes in the system and also knows exactly which partition belongs to which node. Therefore, under no failure, a request for a key is directly forwarded to the coordinator (successor of the key) and the coordinator should be incharge of serving read/write operations.
  3. Quorum Replication: It is used to provide linearizability. The replication degree N is taken to be 3 which means that given a key, the key's coordinator as well as the 2 successor nodes in the Dynamo ring should store the key. The reader quorum size R and the writer quorum size W is taken as 2 to establish N > R + W. When the coordinator receives a get/put request it should always contact other two nodes (successor) and get an acknowledgement for write and a value for read. For write operations, all the objects are versioned in order to distinguish stale copies from the most recent copy. For the read operation, if reader quorum has different versions of the same object, the coordinator should return the most recent version.
  4. Failure Handling: Socket Timeout is used to detect a node failure. When a coordinator for a request fails its successor can be contacted next for the request. When a failed node recovers it should copy all the writes missed during the failure. For this purpose it should query its two predecessor and two successor for their keys as it is required to store the keys of its two predecessor and its own keys which were stored in its two successor when it was down.

Test Phases

  1. Phase 1 - Testing basic operations: This phase tests all the basic operations, i.e., insert, query, delete, @ (query all keys stored in a particular AVD) and * (query all the keys stored in the system). There is no concurrency in operations and there is no failure either. ... passed
  2. Phase 2 - Testing concurrent operations with diffrent keys: This phase tests if system can handle concurrent operations without any failure. The tester uses independent (key,values) pairs inserted/queried concurrently on all the nodes.... passed
  3. Phase 3 - Testing concurrent operations with same keys: This phase tests if system can handle concurrent operations with same keys without any failure. The tester uses same set of (key,values) pairs inserted/queried concurrently on all the nodes.... passed
  4. Phase 4 - Testing one failure: This phase tests one failure with every operation. One node will crash before operation start. After all the operations are done, the node iwll recover. This is repeated for each and every operation.... passed
  5. Phase 5 - Testing concurrent operations with one failure: This phase executes operations concurrently and crash one node in the middle of the execution. After some time, the failed node will also recover in the middle of the execution.... passed
  6. Phase 6 - Testing concurrent operations with one consistent failure: This phase crashes one node at a time consistently, i.e., one node will crash then recover, and another node will crash and recover, etc. There will be a brief period of time in between the crash-recover sequence.... passed

Assumptions

  1. The system does not implement virtual nodes i.e., the partions in the dynamo ring are static and fixed.
  2. The system does not implement hinted handoff which it is OK to replicate on only two nodes under failure.