webdfs: August 2009

Hello Again,

Download WEBDFS here: http://code.google.com/p/webdfs/downloads/list

This blog post is about a real world test of WEBDFS tested on 20 Amazon ec2 instances.

I setup 20 m1.large amazon instances and uploaded ~250GB of data and downloaded ~1.5 terabytes worth of data from the files that were uploaded. Total transfer was ~1.8 TB

I have included all of the pertinent information at the end of this blog post. Here are some highlights:

500 threads (5 java clients, 100 threads each)
15 servers
~250 GB uploaded (PUT requests) (individual files between 50k and 10mb)
~1.5 Tb downloaded (GET requests)
~1.8 TB transfer total
~47mb / sec upload rate
~201 requests / sec overall
The data was evenly distributed across all nodes
40 replicas were lost and totally unrecoverable amounting to .030% data loss

The final results tell me that WEBDFS performed quite well. However the data loss is undesirable and needs to be addressed. How do we prevent that? One strategy is to be atomic and durable, making absolutely sure that we hold on to the client until we know undoubtedly that the first replica has been successfully written to disk. This can be achieved with some sort of a checksum mechanism to be sure that the file upload is not corrupted. We could also add an fsync() call after each file write. (but sometimes that is not always reliable, I actually did this and tested it and will report the results in a future post). The point is we do not want to tell the client that an upload was OK when it was not and we do not want to replicate a corrupted file. This is exactly what happened which led to 40 of the replicas becoming totally unrecoverable meaning that we did not have a good copy from which we could make other replicas.

Another thing that might help would be to increase the replication factor. However, there is something to consider with WEBDFS when increasing the replication factor (at least in its current state). Increasing the replication factor increases the write load on your over all system. In the case of this test, increasing the replication factor from 2 to 3 will have an impact of increasing the write load by 50% which could be quite significant. We can see that as things scale the impact of increasing the replication factor decreases, so this is a diminishing problem as we scale but still something to consider.

So how did the servers perform? I have included a couple load avg graphs below to show how a typical box performed and the graph from the one server that got really hot.

The graphs show the machines stayed mostly cool, load averages were in the 3-4 range most of the time on most machines and the CPUs were rarely maxed out. There was one exception, one machine went to 100 load avg and stayed there for most of the test until I rebooted it after which point the machine went to and stayed at a level similar to the rest of the boxes in the cluster. I am not sure why the one server got really hot. After analyzing the server logs, I do not see any particular request pattern that would make the one server go really bad. However, it is obvious that this server was the reason for the data loss and problems experienced during the test.

Load avg graph of typical server during the test:

Below is the load avg graph of the server that became very hot during the test. You can see that after about 1000 secs the box suddenly became very hot. None of the other boxes exhibited this behavior. Also, you can see the obvious point where I rebooted the box and how the server never went back to to a high load average. further analysis is needed to determine why this might have happened.

Below are the specifics of the test and the results:

Hardware and Software:

	Description
Hardware	Amazon ec2 instance m1.large
CPU	4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
Operating System	Fedora Core 9 64 bit
Memory	7.5 GB
File System	Ext3
I/O	"High" - exact SAN specs not available from Amazon (AFAICT)
Storage	850 GB instance storage (2×420 GB plus 10 GB root partition)

Test Parameters:

	Total
total servers	15
sub clusters	5
servers per sub cluster	3
total clients	5
total threads per client	100
total threads	500
read:write ratio	8:2
replication degree	2
Total to be uploaded	250 Gb
Total to be written to disk	500 Gb
Total uploaded by each client	50 Gb
Thread Task Description	Each thread uploaded and downloaded files between 50k and 10mb in size. An attempt was made to simulate real world usage by having a 20% write load. So for every 8 GETs there would be on average 2 PUTs. Each thread slept for a random number of milliseconds between 0 and 1000 (0 and 1 second) between each request.
The files sizes uploaded were:	50k 100k 500k 1mb 3mb 5mb 10mb

Test Results

	Total
Total Requests	1067159
Total Run time	~5300 seconds
Total Uploaded	250000850000 bytes
Total Upload Rate	~47169971.7 bytes / second
Avg Download per replica	1933823.62 bytes
Total Downloaded	808603
Requests / Second	~201 req./second
Total Data Transferred	1813697450000 bytes
Overall Transfer Rate	342207066 / second
Total Original Replicas	129278
Total Replicas written to disk	258556
Total Written to disk	499200710256 bytes
Avg written to disk per replica	1930725.69 bytes / replica
Total Lost Replicas	0
Total corrupted but recoverable replicas	2071
Total Non-Recoverable replicas due to corruption(both copies were corrupted)	40

Data Distribution:

The table below describes how many replicas were saved on each server. As we can see the data was well distributed amongst all nodes.

Server	No of Replicas	Server	No of Replicas
1	17406	9	17623
2	17914	10	16679
3	17153	11	17606
4	17329	12	17256
5	17426	13	16703
6	16955	14	17190
7	17082	15	17145
8	17089

webdfs

Friday, August 28, 2009

WEBDFS on 20 Amazon ec2 instances

Followers

Blog Archive

About Me