Getting started with Amazon Web Services (AWS)
Because many of the datasets that we would like to analyze are already in Amazon's S3, we will use AWS for much of the computations we are doing.
Interacting with AWS
There are several ways to interact with AWS.
The web console
It's a website. You log in and click on buttons.
The command-line interface (CLI)
To install the CLI use
pip install awscli or
conda install -c conda-forge awscli, or include
awscli in your environment configuration
(see also documentation about Python package management)
The first time that you use the CLI, you should run:
Provide your credentials (ask Ariel for credentials if you don't have these
yet). When asked about your preferred region, enter
us-east-1 and preferred
file format is
This will create two text files in
can look at these files and even edit them, if you feel the need, but the CLI
should provide a safe way to do this.
You will need to add credentials if you want to use multiple profiles in parallel. For example, if you need to access data that is credentialed through the HCP
To add credentials for the HCP project (for example) use:
aws configure --profile hcp
This will add a section to your
~/.aws/credentials file with the access key id and
secret access key that you got from the HCP for access to their data. Different
credentials give you different priviliges. For example, the HCP credentials
will only give you read access to the HCP OpenAccess Amazon S3 Bucket.
This is a Python library that allows you to interact with AWS. It is highly useful, but the API can be a bit inconsistent and confusing.
This library extends Boto3 to provide a much nicer and more efficient interface to data stored in AWS S3.
This library packages Python code for execution on AWS Batch.
We don't have one yet, but we will.