Hello there, continuing with our series, lets look into AWS S3 Performance optimization w.r.t uploads
Single PUT upload
This is the S3 Default upload method
The data is transferred in a single stream to S3
A file becomes an object and is uploaded using the PUT object API and happens in a single stream.
The problem: If the stream fails, then the entire upload fails and the upload operation has to restart from beginning
resulting in wastage of internet bandwidth and time.
Whenever anything is downloaded, it is done on multiple streams
Single stream of data is not reliable when data is transferred over long distances
Speed and reliability are the limitations of a single stream of data.
When the data is transferred over two points the lowest of the speeds are selected.
Data transfer protocols like bit-torrent have been developed for speedy, distributed transfer of data.
If a single PUT upload is used, only 5GB data could be transferred.
The solution – multipart upload – improves speed and reliability, by data into individual parts
Multipart Upload
Minimum size for multi-part upload is 100MB
A multipart upload can be split into a max of 10000 parts and each part can be of size between 5MB to 5GB
The last part is leftover and can be < 5MB
Multipart upload is so effective because each part is treated as an individual upload.
If the part fails then only the failed part needs to be restarted. The risk significantly reduces.
The transfer rate of the entire upload is the sum of all individual parts.
S3 Transfer Acceleration
Distributed teams around the world can make use of the public internet to upload data to a bucket in any AWS region and we have no control over the path taken by the data as it can take an indirect path.
Transfer Acceleration uses network of AWS Edge locations
S3 bucket needs to be enabled for transfer acceleration.
By default, its switched off. There are some restrictions for enabling it.
Bucket name cannot contain periods and names should be DNS compatible
So the data is transferred to the nearest AWS edge location and from there the data is transferred over the AWS Global network, which tend to be direct links
The internet is a multipurpose public network built for flexibility and resilience not for speed.The AWS network is built to connect from region to region – much faster and lower latency (delay)