Category Archives: AWS

Programmatically get the ec2 spot price for an instance size

This python script will output the spot price for an instance size passed in via a command line arg. This just looks at the east region but can easily be modified for other aws regions.

import sys, urllib, json
try:
    # Get the jsonp spot price data from amazon
    response = urllib.urlopen("http://spot-price.s3.amazonaws.com/spot.js");
    jsonpstr = response.read();
    data = json.loads(jsonpstr[ jsonpstr.index("(") + 1 : jsonpstr.rindex(")") ])
    # Parse it and retrieve the USD price for the instance size passed in arg[1] i.e. m3.xlarge
    reg=filter(lambda x: x["region"]=="us-east", data["config"]["regions"])
    instance = [ y for x in reg[0]["instanceTypes"] for y in x["sizes"] if y["size"]==sys.argv[1] ]
    print filter(lambda x: x["name"]=="linux", instance[0]["valueColumns"])[0]["prices"]["USD"]
except:
    print 0

My python knowledge is very minimal so any suggested improvements would be appreciated.

Moving / Copying lots of s3 files quickly using gnu parallel

3 Replies

I recently had to copy a large volume of files from one s3 folder to another using s3cmd but found that the process was very slow as it is single threaded. I looked into how to do this in a more parallel manner and discovered GNU Parallel which was the answer to my dreams.

To use it I simply create a list of files that need to be moved and pass them to parallel with the correct copy command :

The -j20 switch tells it to use 20 parallel threads
–halt 1 is so that if one task fails then the remainder are finished and the command fails
{/} is a GNU Parallel specific token which denotes the basename of the line from the file passed to parallel i.e. given s3://bucket/filename.txt {/} returns filename.txt. Note that this requires a recent version of GNU parallel so install the latest stable version from source if necessary.

# filelist.txt just contains a list of s3 files i.e. s3://bucket/filename.ext
# Parallel creates a new thread to handle each line in the file (up to the limit -j)
# Note also that we escape the $ within the command passed to parallel. If we did not 
# escape it then the variable would be treated as a variable in the scope of the 
# calling script rather than within the parallel call.
cat filelist.txt |parallel -j20 --halt 1 "filename={/};s3cmd cp s3://bucket/folder1/$filename s3://bucket/folder2/$filename;"

If you want to perform a more complex task in parallel it can be a bit cumbersome and unreadable to put all the command line. To get around this we can just use a bash function.

#!/bin/bash
function dosomethingabitmorecomplex {
    echo "Doing something with arg $1"
    sleep 10
    echo "finished doing something with arg $1"
}

# Since parallel creates subshells for each of its threads we need to 
# export the function to ensure it can be accessed by the subshell
export -f dosomethingabitmorecomplex 
# testfile.txt just contains lines of text
# the {} token represents the line passed to parallel from the text file
parallel -j20 "dosomethingabitmorecomplex {}" < testfile.txt

A simple script to list s3 bucket sizes

Andrew Clarke's Blog!

Random thoughts

Category Archives: AWS

Programmatically get the ec2 spot price for an instance size

Moving / Copying lots of s3 files quickly using gnu parallel

A simple script to list s3 bucket sizes