PennCloud
Course project for CIS 505 Software Systems @Penn
This is a class project for CIS 505 Software Systems (Fall 2020) at University of Pennsylvania.
PennCloud is a distributed cloud platform, with webmail service similar to Gmail, and a storage service similar to Google Drive. This project is built in C++, with Protobuf used for data serialization, gRPC used for communication between components and React for frontend applications. See each component and its features below:
Frontend
- Clients are directed to one of multiple frontend servers through a load balancer. Each frontend server periodically send a ping message to the frontend load balancer which keeps track of all frontend servers and their status.
- The frontend servers provide interfaces for webmail service and storage service for each user account.
- A special admin console page shows status of all frontend, backend servers and a way to view all key-value pairs stored in the BigTable.
Backend
Each frontend server also serves as a backend client with an interface to access storage operations in the backend servers. To make our application more robust, we built support for data replication, load balancing, checkpointing and recovery.
- Health check: A single backend master node keeps track of all backend server status (using a pinging service similar to frontend load balancer keeping track of frontend servers).
- Load Balancing: Each new backend client (frontend server) is assigned a backend server with the least load (fewest data entries). The backend client caches the address of the backend server for further requests.
- Quorum Replication: We implement quorum replication for stronger fault tolerance. To minimizing losing data in case a server dies, we write data to all healthy servers; and for faster read, we contact half of the healthy servers to get the data.
- Local Checkpointing & Recovery: The backend server node has its own BigTable storage, all changes are in memory and stored in a log file for each operation. It periodically dumps key-value pairs in memory to the local storage file (checkpointing). When a node fails, the backend master sends the failed node’s address to a healthy node who sends its recovery file.