Cloud --- Scalability Availability ---
Agenda Scalability Availability CAP Theorem Scalability Availability Consistency BASE Transaction
Scale-out Scale-out Availability
Scalabilty Availability Scalability Availability
Availability Replica Replica Eventually Consistency TransactionACID BASE
Scale-up Scale-out
$10,0 00 machi ne $10,0 00 machi ne $1000 machi ne $1000 machi ne Scale-up Scale-out $500 machi ne $500 machi ne # Machines Scale Up $500 machi ne $500 machi ne $500 machi ne $500 machi ne $500 machi ne $500 machi ne $500 machi ne $500 machi ne Scale Out
Scale-up
Scale-out
Google PC? 33PC
Scalability Scale-out Scalability
Application GFS Client Google File System (file name,chunk index) (chunk handle, chunk location) GFS Master File NameSpace /foo/bar Chunk 2ef0 Instruction to chunkserver (chunk handle,byte range) chunk data Chunkserver state GFS chunkserver GFS chunkserver Linux File System Linux File System Scalable
Master Google MapReduce Worker Partition0 Partition1 Split 0 Split 1 Split 2 Split 3 Scalable Worker Partition0 Partition1 Scalable Worker Worker Output file0 Output file1 Split 4 Worker Partition0 Partition1 Input File and its Split Map Phase Intermediate File on Local Disk Reduce Phase Final Output
Bigtable Client Google BigTable Bigtable Client Library Open Bigtable Bigtable Tablet Server Bigtable Master Scalable Bigtable Tablet Server Bigtable Tablet Server Cluster Sheduling System Google File System Chubby Lock Service
Amazon Dynamo Consistent Hashing Scalable B,C,D K A,B]
Microsoft Azure SDS f(g) = 4601 G Service Instance 1 Service Instance 6435 53 905 Service Instance Service Instance 5501 Service Instance 5000 G Service Instance Partition Data Overlay Service Request Logical Partitions 0 2 1 A, B C, D E, F G, H I, J K, L 28 Service Instances Scalable
Microsoft Azure SDS Front-end Node REST/SOAP ACE Logic Data Access Library Front-end Node REST/SOAP ACE Logic Data Access Library Front-end Node REST/SOAP ACE Logic Data Access Library SDS frontend Master Cluster Provisionin g Service Management Deployme nt Health Monitori ng Master Node Partition Manager Data Node Components Master Node Partition Manager Data Node Components Scalable Data Cluster Data Node Data Node Data Node Data Node Data Node SQL Server Fabric Mgmt. Servic es SQL Server Fabric Mgmt. Servic es SQL Server Fabric Mgmt. Servic es SQL Server Fabric Mgmt. Servic es SQL Server Fabric Mgmt. Servic es Fabri c Replicatio n Fetch Partition Map SQL Client
Scalability
Scalable Scalable On Premise
Scalability Scalability Virtualization Scalability Scalability Virtualization
Scalability Scalability Scalablity Scalability Scalable Scalability
Availability
Scale-out 3-year MTBF, 1000 Google 2000
Google? Ben Jai, Google Platforms Architect Google
Cloud Cloud = Scalability + Availability
Availability Fail-Over
Google File System Availability Google File System Availability Chunk Server
4 Client 3 1 2 Master 7 Secondary Replica A 3 6 5 Primary Replica 3 5 Secondary Replica B 6
Windows Azure Availability Windows Azure File System Data Storage Replica Fail-over
MS Azure Primary Secondary (quorum) Ack S Ack Ack Read Value Write P Write Write Ack S Ack S Write Write S
MS Azure Primary Secondary replica Secondary P B S SP S! S
Glassfish Fail-Over Consistency
Cluster Cache 4cluster
Cache Cachereplica
Cache
Cache
Cache
Cluster
Cluster
Cluster
CAP Theorem
CAP C,A,P Consistency Partition Availability
CAP C A Consistency + Availability P
CAP C A Consistency + Partition database / P
CAP C A Availability + Partition / DNS P
Cloud Cloud C P A C P A? C C P P A A C P A?
Availability Consistency e-
e- Availability
Availability Consistency
Read-OnlyPrimary-Key
e- AvailabilityConsistency
Scalability Availability Consistency CAPScalability Availability Consistency Cloud
CAP Cloud C A P C P A C P A C P A
Consistency Scalable Available Consistency Availability Consistent
Azure Ack Value Read Write Ack Ack P Ack Ack S Write Write S S Write Write S
Azure Consistency
Azure Consistent? Primary
Eventually Consistency
Eventually Consistency Consistent Consistent Eventually Consistency DNS Eventually Consistent
Consistency Consistency Eventually ConsistencyCAP Scalable Available Eventually Consistent
BASE Consistency ACID Atomic,Consistent,Isolation, Durable
OS OS OS OS? OS
BASE Basically Available Soft-State Eventually Consistent ACID BASE BASEACID
Soft-State Partial Failure
ACID Transaction ID Begin Transaction insert (10001 100 update set =+100 where update set +100 where = End Transaction
ID
ID
ACID ACID ID Transaction Manager
ACID Consistent ACID ID
Soft-State Eventually Consistent ACID ACID Soft-State Eventually Consistent
Soft-State Eventually Consistent ACID Transaction Manager ID ACID
Soft-State Eventually Consistent ACID ACID Transaction Manager BASE Transaction ID
Soft- State Eventually Consistent Consistency Soft-State Eventually Consistency Eventually Consistency Consistency Consistency
Basically Availability Cloud Basically Availability Optimistic Concurrent Controll
Thread A Thread B Share dcoun t Count++ Count++ GetCount() 10 11 Count++ GetCount() 12 Count++ GetCount() 11 12 GetCount() 12 10 11 12 13? 13?
Thread A Thread B Share dcoun t [Begin Tx] GetCount() 12 12 GetCount() Count++ 13 13 Count++ [Commit Tx] 14 Availability
Basically Available Thread A Thread B Share dcoun t Worker Queue GetCount() 12 GetCount() 12 Q.PutMsg( add ) Q.PutMsg( add ) 12 Q.GetMsg() 2 13 14 GetCount() 12 13 GetCount() 13 14 Count++ Q.GetMsg() Count++
Internet Windows Azure m Web Role n Worker Role Queue Windows Azure Datacenter L B Web Site Web Web (ASPX, Site Site ASMX, (ASPX, ASMX, (ASPX, WCF) ASMX, Web Role WCF) WCF) Queue Worker Service Role Tables Storage Blobs
Queue Queue Producers Consumers P 2 4 3 2 1 C 1 1. Dequeue(Q, 30 sec) msg1 Queue msg1 P 1 C 2 2. Dequeue(Q, 30 sec) msg2 Queue
Producers Consumers P 2 P 1 4 3 1 2 C 1 C 2 2. Dequeue(Q, 30 sec) msg2 3. C2 msg2 4. Delete(Q, msg2) 7. Dequeue(Q, 30 sec) msg1 1. Dequeue(Q, 30 sec) msg 1 5. C 1! 2 1 6. msg1 Deueue 30Queue
Basically Availability Optimistic Lock
Entity Client A Version Rating Client B 5 : Ch9, Jan-1, 3 1 : Ch9, Jan-2, 2 9 : Ch9, Jan-3, 6 version Etag
Entity Client Client A B 1 : Ch9, Jan-2, Version Rating 1 : Ch9, Jan-2, 2: 2 Ch9, Jan-2, 5 2 2: Ch9, Jan-2, 4 5 : Ch9, Jan-1, 3 1 : Ch9, Jan-2, 2 9 : Ch9, Jan-3, 6
Version Client A 1: Ch9, Jan-2, 5 Version Rating Client B 1: Ch9, Jan-2, 4 If-Match: 1 Ch9, Jan-2, 5 5 : Ch9, Jan-1, 3 1 : Ch9, Jan-2, 2 9 : Ch9, Jan-3, 6
Version Client A 1: 2: Ch9, Jan-2, 5 Version Rating Client B 1: Ch9, Jan-2, 4 If-Match: 1 Ch9, Jan-2, 5 5 : Ch9, Jan-1, 3 1 2 : Ch9, Jan-2, 2 5 9 : Ch9, Jan-3, 6 Version Client-A
Version Client A Client B Error: Version Rating 1: 412 2: Ch9, Jan-2, 5 1: Ch9, Jan-2, 4 If-Match: 1 1: Ch9, Jan-2, 4 5 : Ch9, Jan-1, 3 1 2 : Ch9, Jan-2, 2 5 9 : Ch9, Jan-3, 6 Precondition failed (412)
Persistency Scalability Availability Consistency Scale-out
1,000 PC Scale-out 2G 500G PC PC 1000 2TByte 0.5Peta Byte Scale-out Scalability
365 Availability volatile persistent Coherence
Index Indexing Key/Value Hash Indexing
P2P/DHT Scalability Availability P2P/DHT
P2P/DHT(1) P2P Overlay Scale-out Availability
P2P/DHT(2) P2P P2P DHT Distributed Hash Table DHT Hash Table
Scale-out Scalability Cloud Scale-out Scalability Cloud Availability Cloud Scalability Availability Consistency
Eventually Consistent Soft State Basically Available ACID Transaction Eventually Consistent Soft State
Basically Availability Optimistic Concurrent Controll P2P/DHT Scalable Available Cloud Cloud
A Note on Distributed Computing Jim Waldo et al. http://www.sunlabs.com/techrep/1994/sml i_tr-94-29.pdf
(partial failure)
Formulated in 10 Years Ago Network is Homegenous added by Gosling
--- Complexity Quanta and Platform Definition Summary of Jim Waldo s Keynote at the 10th Jini Community Meeting http://www.jini.org/files/meetings/tenth/vid eo/complexity_quanta_and_platform_defin ition.mov http://www.jini.org/files/meetings/tenth/pr esentations/waldo_keynote.pdf
(SEQ) (MT) MT (MP) MT (MM) (MMU) Web
Lamport: http://research. microsoft.com/users/lamport/pubs/pubs.html OS
Seq to MT : MT to MP : ( ) MP to MM : ( ) MM to MMU : (web ).