KalyanV

Deploying an application to AWS using AWS CLI, Part 3b - Application Repository and Deployment to Instance

2021-02-06T00:00:00+00:00

Application Repository

As I mentioned in Part 1 - Introduction post, I decided to use the application developed as part of Miguel Grinberg’s Flask Mega-Tutorial.

Since the focus of this blog series is on deploying to AWS. I decided to keep the application simple. Therefore, I cloned the application as it stands at Chapter 9: Pagination of Grinberg’s series. In addition, I apply the minor modifications needed to use the python-dotenv package as described in Chapter 15.

To summarize:

Start with application with modifitions upto and including Chapter 9 of Grinberg’s tutorial.
Apply changes related to python-dotenv as described in Chapter 15.

I made these modifications and pushed the changes to microblog_cli repository.

Preparing an EC2 Instance to Host the Application

For the Level-1 architecture, we install all tiers on the one EC2 instance. We install:

Nginx to as our web server
Flask as application server / business logic framework
PostgreSQL as database

This preparation of EC2 instance also closely follows the instructions from Grinberg’s tutorial as described in Chapter 17: Deployment to Linux. The deployment instructions described in the rest of this post are a little different from the instructions described in Grinberg tutorial’s Chapter 17. The summary of the differences is:

We use Amazon Linux 2 for our EC2 instance instead of Ubuntu
We use PosgreSQL database instead of MySQL
We skip the Password-less Logins section
We skip the Secure Your Server section, since we will use AWS’s Security Groups to secure our server

Executing Scripts on an EC2 Instance

The script shown below does the following things:

check for options
Accept keys for the new host to avoid interactive question
Install following software:
- python3 and related libraries
- git
- nginx
- postgreSQL (repo and libraries)

A note about how script is executed on the remote EC2 instance:

The script creates a sub-script that would be executed on the remote EC2 instance (let’s call it remote_script).
The remote_script is created using cat command along with heredoc.
The remote_script then passed in as stdin to the ssh command. ssh executes the remote_script on the remote EC2 instance.

Install Python, Git, Nginx and PosgreSQL Packages

#!/bin/bash
set -euo pipefail

ip_address=""
ssh_key_file=""
while getopts "i:k:" opt; do
    case "$opt" in
    i)
        ip_address=$OPTARG
        ;;
    k)
        ssh_key_file=$OPTARG
        ;;
    esac
done

if [[ $ip_address == "" || $ssh_key_file == "" ]]; then
    echo "$(basename $0): Required options are missing."
    echo "Usage: $(basename $0) -i instance-ip -k ssh-key-file"
    exit 1
fi

# Accept keys for the new host to avoid the interactive
# question when connecting using ssh for the first time.

ssh-keyscan $ip_address >> ~/.ssh/known_hosts

PG_ADMIN_PWD=`cat pg_admin_pwd.txt`

cat <<-ENDCMDS > /tmp/remote_script.sh
#!/bin/bash
set -euo pipefail

sudo yum -y update
sudo yum -y install python3 python3-venv python3-devel
sudo yum -y install git

# Install nginx
sudo amazon-linux-extras install -y nginx1

#
# Install postgresql 12
#

# Add repo
sudo tee /etc/yum.repos.d/pgdg.repo <<-PGREPO
[pgdg12]
name=PostgreSQL 12 for RHEL/CentOS 7
baseurl=https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-7-x86_64
enabled=1
gpgcheck=0
PGREPO

# Generate metadata cache and install postgresql 12
sudo yum makecache
sudo yum -y install postgresql12 postgresql12-libs postgresql12-server

Initialize PostgreSQL Database

Initialize and setup postgreSQL database - initialize database - start and enable database service - set admin user’s password

sudo /usr/pgsql-12/bin/postgresql-12-setup initdb

# Start and enable database service
sudo systemctl start postgresql-12
sudo systemctl enable postgresql-12

# Set postgresql admin user's password
sudo -i -u postgres -- bash -c "psql -c \"alter user postgres with password '$PG_ADMIN_PWD'\""

ENDCMDS

ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/remote_script.sh

Download Application, Create & Configure Virtual Environment

MICROBLOG_PG_USER_PWD=`cat microblog_pg_user_pwd.txt`

cat <<-ENDCMDS > /tmp/app_install.sh
#!/bin/bash
set -euo pipefail

# Download the application source code
git clone https://github.com/vedala/microblog_cli microblog

# Create python virtual environment and install dependencies
cd microblog
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Install gunicorn
pip install gunicorn

echo "PATH=\\\$PATH:/usr/pgsql-12/bin" >> ~/.bash_profile

# Create .env file for environment variables
echo -n "SECRET_KEY=" > .env
python -c 'import uuid; print(uuid.uuid4().hex)' >> .env
echo "DATABASE_URL=postgres://microblog:$MICROBLOG_PG_USER_PWD@localhost:5432/microblog" >> .env

ENDCMDS

ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/app_install.sh

The echo command that modified the PATH variable on the remote machine uses two levels of escaping. First, we do not want to expand $PATH on the local machine, so we add a “\” before the $ sign. The second escaping is needed, when echo command runs on the remote instance. We do not want to expand $PATH even then, since we want to add a line that looks like “PATH=$PATH:…”. For the second escaping we add two additional back slashes before the back slash we added above.

Install psycopg2

Our python application needs a driver to access posgreSQL database. The library we will use is psycopg2. Here is how we install it:

cat <<-ENDCMDS > /tmp/install_psycopg2.sh
#!/bin/bash
set -euo pipefail

cd microblog
source venv/bin/activate

# Install C compiler, needed for "pip install" of psycopg2 postgresql driver
sudo yum -y install gcc

# Install libpq library
sudo yum -y install libpq5 libpq5-devel

# Install wheel package
pip install wheel

# Install postgresql driver
pip install psycopg2

# Use password authentication instead of the default ident authentication.
# Restart postgres service.
sudo -u postgres sed 's/ident/md5/' /var/lib/pgsql/12/data/pg_hba.conf | sudo -u postgres tee /tmp/new_pg_hba.conf
sudo -u postgres mv /tmp/new_pg_hba.conf /var/lib/pgsql/12/data/pg_hba.conf
sudo systemctl reload postgresql-12

ENDCMDS

ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/install_psycopg2.sh

Create Database Schema and Run Migrations

MICROBLOG_PG_USER_PWD=`cat microblog_pg_user_pwd.txt`

cat <<-ENDCMDS > /tmp/db_and_migrations.sh
#!/bin/bash
set -euo pipefail

sudo -i -u postgres -- bash -c "psql -c \"create database microblog;\""
sudo -i -u postgres -- bash -c "psql -c \"create user microblog with encrypted password '$MICROBLOG_PG_USER_PWD';\""
sudo -i -u postgres -- bash -c "psql -c \"grant all privileges on database microblog to microblog;\""

cd microblog
source venv/bin/activate
flask db upgrade

ENDCMDS

ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/db_and_migrations.sh

Supervisor Setup

cat <<-ENDCMDS > /tmp/install_supervisor.sh
#!/bin/bash
set -euo pipefail

sudo easy_install supervisor
sudo mkdir /etc/supervisor
sudo echo_supervisord_conf | sudo tee /etc/supervisor/supervisord.conf
sudo mkdir /etc/supervisor/conf.d
sudo mkdir /var/log/supervisor

# Modify socket file location under sections [unix_http_server] and [supervisorctl]
sudo cp /etc/supervisor/supervisord.conf /tmp
sudo sed -i 's#tmp/supervisor.sock#var/run/supervisor.sock#' /tmp/supervisord.conf

# Modify items under [supervisord] section
sudo sed -i 's#^logfile=/tmp/supervisord.log#logfile=/var/log/supervisord.log#' /tmp/supervisord.conf
sudo sed -i 's#^pidfile=/tmp/supervisord.pid#logfile=/var/run/supervisord.pid#' /tmp/supervisord.conf
sudo sed -i 's#^;childlogdir=/tmp#childlogdir=/var/log/supervisor#' /tmp/supervisord.conf

# Uncomment [include] section and modify files configuration
sudo sed -i 's#^\;\[include\]#[include]#' /tmp/supervisord.conf
sudo sed -i 's/^\;files.*/files=\/etc\/supervisor\/conf.d\/*.conf/' /tmp/supervisord.conf

sudo mv /tmp/supervisord.conf /etc/supervisor

# Create script for systemctl service
sudo tee /lib/systemd/system/supervisord.service <<-END_SERVICE_SCRIPT
[Unit]
Description=Supervisor process control system for UNIX
Documentation=http://supervisord.org
After=network.target

[Service]
ExecStart=/usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf
ExecStop=/usr/bin/supervisorctl \\\$OPTIONS shutdown
ExecReload=/usr/bin/supervisorctl -c /etc/supervisor/supervisord.conf \\\$OPTIONS reload
KillMode=process
Restart=on-failure
RestartSec=50s

[Install]
WantedBy=multi-user.target
END_SERVICE_SCRIPT

# Start and enable supevisord
sudo systemctl start supervisord
sudo systemctl enable supervisord

# Add supervisor configuration to monitor gunicorn
sudo tee /etc/supervisor/conf.d/microblog.conf <<-END_GUNI_MONITOR
[program:microblog]
command=/home/ec2-user/microblog/venv/bin/gunicorn -b localhost:8000 -w 4 microblog:app
directory=/home/ec2-user/microblog
user=ec2-user
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
END_GUNI_MONITOR

# Reload supervisor service
sudo supervisorctl reload

ENDCMDS

ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/install_supervisor.sh

Nginx Setup

cat <<-ENDCMDS > /tmp/config_nginx.sh
#!/bin/bash
set -euo pipefail

cd microblog
mkdir certs
openssl req -new -newkey rsa:4096 -days 365 -nodes -x509 -keyout certs/key.pem -out certs/cert.pem -subj "/C=US/O=Microblog, AWS CLI/CN=mbawscli"

sudo tee /etc/nginx/conf.d/microblog.conf <<-END_NGINX_CFG
server {
    listen 80;
    server_name _;
    location / {
        return 301 https://\\\$host\\\$request_uri;
    }
}

server {
    listen 443 ssl;
    server_name _;

    ssl_certificate /home/ec2-user/microblog/certs/cert.pem;
    ssl_certificate_key /home/ec2-user/microblog/certs/key.pem;

    access_log /var/log/microblog_access.log;
    error_log /var/log/microblog_error.log;

    location / {
        proxy_pass http://localhost:8000;
        proxy_redirect off;
        proxy_set_header Host \\\$host;
        proxy_set_header X-Real-IP \\\$remote_addr;
        proxy_set_header X-Forwarded-For \\\$proxy_add_x_forwarded_for;
    }

    location /static {
        alias /home/ubuntu/microblog/app/static;
        expires 30d;
    }
}
END_NGINX_CFG

sudo systemctl start nginx
sudo systemctl enable nginx

ENDCMDS

ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/config_nginx.sh

Deploying an application to AWS using AWS CLI, Part 2 - AWS Account, IAM User & AWS CLI Installation

2021-02-05T00:00:00+00:00

AWS Root Account and IAM User

Create a new AWS root account
Configure the root account
- Setup MFA
- Setup billing alert

IAM User

Create IAM user
- Create an IAM user with “AdministratorAccess” policy
- Give it some name, e.g. Developers
- Allow both console and programmatic access for the user
- Save credentials CSV file to local machine

Install AWS CLI

Installation basics
- Installing on macOS for a single user
- Installing version 2
- Used this guide as reference AWS CLI Users Guide
Installation Steps
- Following the instructions under Installing the AWS CLI –> AWS CLI version 2 –> macOS
- Copy and save the provided XML template to a file. This XML ile is used to specify the location where we want aws-cli to be installed. I wanted to install the AWS CLI executables in bin folder under my home directory. The XML after modifications looks as follows (I only modified the location of the directory):
```
  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
  <plist version="1.0">
  <array>
      <dict>
      <key>choiceAttribute</key>
      <string>customLocation</string>
      <key>attributeSetting</key>
      <string>/Users/your_home_directory/bin</string>
      <key>choiceIdentifier</key>
      <string>default</string>
      </dict>
  </array>
  </plist>
```
- Download install package
```
  curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
```
- Run installer command, specify the XML file that you created in the previous steps:
```
  installer -pkg AWSCLIV2.pkg \
      -target CurrentUserHomeDirectory \
      -applyChoiceChangesXML choices.xml
```
- Create symlinks within a folder that may contain all your executables or symlinks to executables. I created a folder to hold all my executable links. Since I created a new folder to hold my executable links, I adding this folder to PATH variable in my .bash_profile:
```
  mkdir ~/executable_links
```
  Add to ~/.bash_profile:
```
  PATH=$HOME/executable_links:$PATH
```
  Create symlinks:
```
  cd ~/executable_links
  ln -s $HOME/bin/aws-cli/aws .
  ln -s $HOME/bin/aws-cli/aws-completer .
```
- Configure AWS CLI to use the IAM user credentials that we downloaded in an earlier step:
  - Run command aws configure:
    - Enter Access Key ID
    - Enter Secret Access Key
    - Enter region

Install jq

If not already present, install jq utility on your local machine from the jq website.

Deploying an application to AWS using AWS CLI, Part 3 - Level-1 Architecture

2021-02-05T00:00:00+00:00

Level-1 Architecture

As I mentioned in Part 1 - Introduction, this blog series develops a series of deployment architectures to mirror the recommendations made in the Scaling Up to Your First 10 million Users

The original video just describes a series a recommendations; I demarcated the recommendations into a few “levels”. “Level” is the word that I am using for the purpose of this blog series.

Create Security Group and Allow Incoming Traffic

Create security group:

aws ec2 create-security-group --group-name "my-sg-level1" \
    --description "Security group, level1"

Allow incoming traffic on ports 22, 80 and 443 for ssh, http and https.

aws ec2 authorize-security-group-ingress --group-name my-sg-step1 \
    --protocol tcp --port 22 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress --group-name my-sg-step1 \
    --protocol tcp --port 80 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress --group-name my-sg-step1 \
    --protocol tcp --port 443 --cidr 0.0.0.0/0

Create Key Pair

Create key pair and save the certificate file:

aws ec2 create-key-pair --key-name UsEast1KP --query 'KeyMaterial' \
    --output text > acct93user1.pem

Obtain Image ID

Obtain image id for the latest Amazon Linux 2 image:

aws ec2 describe-images --filters "Name=name,Values=amzn2-ami-hvm-2.0*-x86_64-gp2" | jq -r '.Images[].Name' | sort | tail -1 > image_name.txt

IMAGE_NAME=`cat image_name.txt`
aws ec2 describe-images --filters "Name=name,Values=$IMAGE_NAME" | jq -r '.Images[].ImageId' > image_id.txt

Launch an Instance, obtain public IP address & instance ID

Launch an EC2 instance:

IMAGE_ID=`cat image_id.txt`
SG_ID=`cat sg_id.txt`
aws ec2 run-instances --image-id $IMAGE_ID --instance-type t2.micro --key-name UsEast1KP --security-group-ids $SG_ID

Install Required Software and Application to EC2 Instance

Follow the instructions in Part 3b to install the required software and application to the EC2 instance. After completing the steps in Part 3b, come back here and continue with allocating Elatic IP address.

Allocate and Associate Elastic IP Address

aws ec2 allocate-address

aws ec2 describe-addresses --query "Addresses[0].PublicIp" | sed 's/"//g' > elastic_ip_addr.txt

ELASTIC_IP_ADDR=`cat elastic_ip_addr.txt`
aws ec2 associate-address --instance-id `cat instance_id.txt` --public-ip $ELASTIC_IP_ADDR

Register Domain

Register a domain with a registrar of your choice. I registered my domain with GoDaddy. You can also register a domain using Route53.

The instructions outlined in the following sections are for the scenario when you register your domain using an outside registrar. Some of the steps are different in the case where the domain is registered using Route53.

Save the domain name in a text file domain_name.txt.

Create Hosted Zone

CALLER_REF=$(date +%Y-%m-%d-%H:%M:%S)
DOMAIN_NAME=`cat domain_name.txt`
RETURN_JSON=$(aws route53 create-hosted-zone --name $DOMAIN_NAME --caller-reference $CALLER_REF)
echo $RETURN_JSON | jq '.HostedZone.Id' | sed 's/"//g' | sed 's#/hostedzone/##' > hosted_zone_id.txt
echo $RETURN_JSON | jq '.ChangeInfo.Id' | sed 's/"//g' | sed 's#/change/##' > create_hz_change_id.txt

We create a hosted zone using the domain as the hosted zone’s name. Hosted zone creation also needs a caller reference parameter which is required to be unique. We use a timestamp as caller reference.

We also need to save two pieces of information from the json returned by the create hosted zone request: Hosted Zone Id and Change Id.

Hosted zone creation is treated as a request by Route53. For this reason, we need to save the Change ID information returned by the create-hosted-zone request. We can then use the change id to query the status:

CHANGE_ID=`cat create_hz_change_id.txt`
aws route53 get-change --id $CHANGE_ID --query "ChangeInfo.Status"

Requests to Route53 are assigned an initial status of PENDING. Upon completion of the request, the status changes to INSYNC. Before moving on to the record set creation step, we have to make sure the hosted zone creation request has the INSYNC status.

Create Record Sets

DOMAIN_NAME=`cat domain_name.txt`
aws route53 change-resource-record-sets --hosted-zone-id `cat hosted_zone_id.txt` \
    --change-batch '{
        "Changes": [
            {
                "Action": "CREATE",
                "ResourceRecordSet": {
                    "Name": "'$DOMAIN_NAME'",
                    "Type": "A",
                    "TTL": 60,
                    "ResourceRecords": [ { "Value": "'$(cat elastic_ip_addr.txt)'"} ]
                    }
            },
            {
                "Action": "CREATE",
                "ResourceRecordSet": {
                    "Name": "www.'$DOMAIN_NAME'",
                    "Type": "A",
                    "AliasTarget": {
                        "HostedZoneId": "'$(cat hosted_zone_id.txt)'",
                        "DNSName": "'$DOMAIN_NAME'",
                        "EvaluateTargetHealth": false
                    }
                }
            },
            {
                "Action": "CREATE",
                "ResourceRecordSet": {
                    "Name": "*.'$DOMAIN_NAME'",
                    "Type": "A",
                    "AliasTarget": {
                        "HostedZoneId": "'$(cat hosted_zone_id.txt)'",
                        "DNSName": "'$DOMAIN_NAME'", "EvaluateTargetHealth": false
                    }
                }
            }
        ]
    }' > /tmp/record_sets_create_info.txt

cat /tmp/record_sets_create_info.txt | jq '.ChangeInfo.Id' | sed 's/"//g' | sed 's#/change/##' > create_rs_change_id.txt

Verify that record set creation request has its status set to INSYNC:

CHANGE_ID=`cat create_rs_change_id.txt`
aws route53 get-change --id $CHANGE_ID --query "ChangeInfo.Status"

Get Delegation Set and Update Nameserver Records with Domain Registrar

HOSTED_ZONE_ID=`cat hosted_zone_id.txt`
aws route53 get-hosted-zone --id $HOSTED_ZONE_ID --query "DelegationSet" > delegation_set.txt

The delegation set is a list of name servers that looks similar to the list shown below:

{
    "NameServers": [
        "ns-xxx.awsdns-xx.net",
        "ns-xxx.awsdns-xx.com",
        "ns-xxxx.awsdns-xx.co.uk",
        "ns-xxxx.awsdns-xx.org"
    ]
}

We need to update our domain’s name servers on the domain registrar’s site.

Deploying an application to AWS using AWS CLI, Part 1 - Introduction

2021-02-04T00:00:00+00:00

Introduction to Deploying an Application to AWS using AWS CLI Blog Series

This blog series aims to replicate the architecture recommendations as described in the Scaling Up to Your First 10 million Users video presented at Re:invent 2015.

Example Application

I want this blog series to focus on developing infrastucture deployment scripts. Therefore, I decided to use an already available application. I picked application developed in Miguel Grinberg’s Flask Mega-Tutorial. I list the reasons for picking this application below.

Why Flask Mega-Tutorial

A few reasons why I picked this application:

Uses python-based web framework - Flask. Flask is a micro framework which can be learned very quickly.
In my opinion, Grinberg’s tutorial is by far the best tutorial to learn Flask application development. In fact, the tutorial is best I have seen for learning web application development - not just python web development, but web development in general.

Application Technology Stack

Application uses:

Python-based Flask application framework
nginx as web server
PostgreSQL database

The application repository setup is described in more detail in Part 3b.

Tools Used for Deployment

AWS CLI
bash
jq

Basic Regular Expression Use In Python

2019-10-30T00:00:00+00:00

Every once in a while, I find the need to use regular expressions in Python programs. Most of the time, my needs are simple, such as: check if a string contains a word, where the word may have first letter capitalized.

Looking for a pattern at beginning of a string

In Python, regular expression functionality is provided by re module. And the most basic function to perform regular expression matching is the match() function.

match() accepts two arguments (and an optional third argument which we will not discuss in this post). The first argument is the pattern we are looking for and the second argument is the string we want to search in.

match() looks for the pattern at the beginning of the string.

Using in a conditional

import re

if re.match("[lL]orem", "Lorem ipsum dolor sit amet."):
    print("matched")
else:
    print("not matched")

The above conditional works because:

match() returns None if there is no match
returns a match object if there is a match
presense of an object (the match object) makes the return value have truthy value of True
since None is equivalent to boolean value False, we can use re.match() directly in a conditional as shown above.

Using Search

While match() may satisfy many search requirements, it has one obvious limitation - match() looks for pattern only at the beginning of the string. The re module provides the function search() which overcomes this limitation. Function search() looks for the pattern anywhere in the string.

As we did with match() above, we can use search() in a simple conditional, as follows:

if re.search("[dD]olor", "Lorem ipsum dolor sit amet."):
    print("match found")
else:
    print("match not found")

Here, we are looking for the pattern that occurs anywhere in the string, not just at the beginning. If the pattern is present, the if condition will evaluate to True. If there is no match then search() returns None which evaluates to truthy value of False.

Basic string match without using regular expressions

While this post is about using regular expressions, simple searches can be done using string operations:

in operator

if "dolor" in "Lorem ipsum dolor sit amet.":
    print("match found")

string.lower()

if "dolor" in "Lorem ipsum DOLOR sit amet.".lower():
    print("match found")

comparison operators

”==” and “!=” operators can be used to compare equivalence of two strings.

Conclusion

In this post, I wanted to write about basic usage of regular expressions within python programs. Python’s re module provides a lot more functionality than described here.

Using PostgreSQL With Python on AWS Lambda

2019-06-10T00:00:00+00:00

While working on a personal project for setting up a basic data pipeline, described here, I ran into an issue where psycopg2 library was not available on AWS Lambda. My lambda function uses this library to access data stored in an PostgreSQL RDS instance. It is understandable that AMI image does not include libraries such as psycopg2, it is the lambda function developer’s job to include any dependency libraries that the lambda function needs. AWS provides documentation here on deploying lambda functions with dependency libraries that are not available in the AMI image.

In this blog post, I start with the method outlined in the AWS documentation on Lambda deployment package, describe issues encountered and the steps I took to resolve the issues.

1. Create deployment package as described in AWS documentation

In this section, we follow the instructions as outlined in the AWS documentation mentioned above. We use the virtual environment method.

Setup python virtual environment on development machine

On your development machine (Mac in our case), create a python virtual environment (we are using python 3.7.3, the latest version available at the time of writing). In this post, we are assuming you will create the virtual environment directory under your home directory.

$ python3.7 -m venv my_venv

Activate the virtual environment

$ source my_venv/bin/activate

Install psycopg2 library in the virtual environment. Although there are many libraries available for accessing postgreSQL from python, psycopg2 is the most widely used.

$ pip install psycopg2

Create python lambda function script

Create a directory that will be used to hold the lambda script and dependency library:

$ mkdir pypg_lambda

In the directory, create a file to hold your lambda script:

$ cd pypg_lambda
$ touch my_lambda.py

Add following as contents of the file my_lambda.py:

my_lambda.py

import sys
import logging
import psycopg2
import json
import os

# rds settings
rds_host  = os.environ.get('RDS_HOST')
rds_username = os.environ.get('RDS_USERNAME')
rds_user_pwd = os.environ.get('RDS_USER_PWD')
rds_db_name = os.environ.get('RDS_DB_NAME')

logger = logging.getLogger()
logger.setLevel(logging.INFO)

try:
    conn_string = "host=%s user=%s password=%s dbname=%s" % \
                    (rds_host, rds_username, rds_user_pwd, rds_db_name)
    conn = psycopg2.connect(conn_string)
except:
    logger.error("ERROR: Could not connect to Postgres instance.")
    sys.exit()

logger.info("SUCCESS: Connection to RDS Postgres instance succeeded")

def handler(event, context):

    query = """select id, name, job_title
            from employee
            order by 1"""

    with conn.cursor() as cur:
        rows = []
        cur.execute(query)
        for row in cur:
            rows.append(row)

    return { 'statusCode': 200, 'body': rows }

The above file is an example of a very simple lambda function that fetches rows from a table and returns them when the lambda function is invoked. This program has been adapted from code sample in this tutorial in AWS Lambda documentation.

You need to create a AWS RDS PostgreSQL instance with a database mydatabase. In this database, a table employee needs to be created.

-- Employee table

CREATE TABLE employee (
  id        INTEGER     NOT NULL,
  name      VARCHAR(40) NOT NULL,
  job_title VARCHAR(40) NOT NULL,
  PRIMARY KEY (id)
);

Insert a few rows into the employee table:

INSERT INTO employee(id, name, job_title) VALUES
    (1, 'Jack', 'Software Engineer'),
    (2, 'Jill', 'Senior Software Engineer'),
    (3, 'Joe', 'Engineering Manager');

Create deployment package

Enter pypg_lambda directory (if not already there):

$ cd pypg_lambda

Copy the psycopg2 package installed within the virtual environment to pypg_lambda directory:

$ cp -r ~/my_venv/lib/python3.7/site-packages/psycopg2 .

As mentioned previously, we created the virtual environment in the home directory of our development machine. Modify the cp command to suit your directory’s location.

Create the deployment package zip archive:

$ zip -r ../my_lambda.zip .

Create lambda function using the deployment package

Set environment variables related to RDS database instance

$ export RDS_HOST=<database host url>
$ export RDS_USERNAME=<username>
$ export RDS_USER_PWD=<password>
$ export RDS_DB_NAME=mydatabase

Set environment variables related to VPC for use with aws cli command. We can avoid this step by directly typing the details into the aws command. But setting these details as environment variables makes entering the command less tedious.

$ export role_arn=<AWS role arn>
$ export subnet_ids="subnet-xxxxxx,subnet-xxxxxx,..."  # comma separated list
$ export sec_group_id=<security group id>

Create the lambda function:

$ aws lambda create-function --region "us-east-1" \
    --function-name "mylambda"       \
    --zip-file fileb://mylambda.zip  \
    --handler "my_lambda.handler"     \
    --role "${role_arn}"             \
    --runtime "python3.7"            \
    --timeout 60                     \
    --vpc-config SubnetIds="${subnet_ids}",SecurityGroupIds="${sec_group_id}" \
    --environment Variables="{RDS_HOST=${RDS_HOST},           \
                              RDS_USERNAME=${RDS_USERNAME},   \
                              RDS_USER_PWD=${RDS_USER_PWD},   \
                              RDS_DB_NAME=${RDS_DB_NAME}}"

Invoke the lambda function:

$ aws lambda invoke --function-name mylambda  ~/lambda_output.txt

Following error is encountered on invocation of the lambda function:

Unable to import module 'mylambda': No module named 'psycopg2._psycopg'

The psycopg2 folder under the deployment package folder on our machine contains the following library:

_psycopg.cpython-37m-darwin.so

To explore the possibility that lambda function is looking for _psycopg.so file, we rename the file:

mv _psycopg.cpython-37m-darwin.so _psycopg.so

And redeploy the lambda function:

Copy the psycopg2 directory from the virtual environment to pypg_lambda directory
Create a new zip archive from the deployment package folder pypg_lambda
Delete lambda function using AWS interface
Use aws lambda create-function to deploy using the updated deployment package

Invoked lambda function again, this time the following error is encountered:

Runtime.ImportModuleError: Unable to import module 'mylambda': /var/task/psycopg2/_psycopg.so: invalid ELF header

We describe how we resolved this error in the next section.

2. Resolving “invalid ELF header” error

Background

As suggested here and here, the “invalid ELF header” error happens due to a mismatch between the machine where the lambda function deployment package is created and the machine where the lambda function is executed. We built the deployment package on a Mac, whereas the execution environment is AWS Lambda’s environment, which is the Amazon Linux AMI.

To remove the mismatch, we need to create the deployment package in the same envionment as the AWS Lambda function runs in. The simplest approach is to spin up an EC2 instance, install psycopg2 library in a virtual environment there. Described below are steps we followed to do this:

Create an EC2 instance and connect to it

Launch an EC2 instance on AWS and connect to the instance (replace with the ip address of your instance):

$ ssh -i <aws-key-file> ec2-user@192.0.2.0

Setting up virtual environment on an EC2 instance

Python3 is not available on Amazon Linux, so we need to install it. The following commands will install python3 and other dependencies needed for creating a virtual environment and installing pyscopg2 within the virtual environment. We are also installing the C compiler here, which we need in a later step:

$ sudo yum install python3
$ sudo yum install gcc python-setuptools python-devel python3-devel
$ sudo yum install postgresql-devel

The above installs python 3.7.3, which is the latest version available at the time of writing.

As described in the previous section, create a virtual environment, activate it and install psycopg2 library:

$ python3 -m venv my_venv
$ source my_venv/bin/activate
$ pip install psycopg2

We now have the psycopg2 package file we need in the virtual environment. You need to copy the package from the EC2 instance to your development machine.

Clean up you deployment package working directory pypg_lambda:

$ cd pypg_lambda
$ rm -r psycopg2

Run the following command on your development machine to copy the package directory to the local machine:

$ scp -r -i <aws-key-file> \
    ec2-user@192.0.2.0:~/my_venv/lib/python3.7/site-packages/psycopg2 .

Create zip archive:

$ zip -r ../my_lambda.zip .

Create the lambda function and invoke it

Create the lambda function using the aws lambda create-function command as shown previously and invoke it.

A different error encountered

Running the lambda function generates the following error:

Runtime.ImportModuleError: Unable to import module 'mylambda': libpq.so.5: cannot open shared object file: No such file or directory

While we are still encountering an error, we are no longer running into the “invalid ELF header”. So we can consider the “invalid ELF header” error to be resolved and let’s work on resolving the new error.

3. Resolving “libpq.so.x cannot open shared object file” error

Background

Searching for solutions to the “cannot open shared object file” error lead us to this post on AWS forums. This forum post also provides a link to this Github project (we will refer to the Github project by its owner’s name, Jeff Kehler, in rest of this post).

The solution requires us to link the libpq.so library statically, which in turn requires us to build postgreSQL and psycopg2 from source code.

We pick the following versions of postgreSQL and psycopg2 to build from source code:

picked 10.0.0 version of postgreSQL, since this is the version used by Amazon RDS instance
picked 2.8.3 version of psycopg2, since this is the latest version available at the time of writing. I like to start with the latest version and see if it works. Then work backwards to go to older version if more recent versions don’t work.

We download source code for postgreSQL and psycopg2 from the following locations:

postgreSQL source downloads
psycopg2 download page, click on source package link to download source code for the latest version

Upload source packages to the EC2 instance:

$ scp -i <aws-key-file> postgresql-10.0.tar ec2-user@192.0.2.0:~
$ scp -i <aws-key-file> psycopg2-2.8.3.tar ec2-user@192.0.2.0:~

Once again, we will be working in the home directory on the EC2 instance. The above commands copied the source code tar archives to EC2 instance’s home directory.

SSH into your EC2 instance and follow the steps below (as outlined in the Jeff Kehler project).

Compiling postgresql from source code

Extract the files from postgreSQL tar package:

$ tar -xf postgresql-10.0.tar

Enter the extracted postgresql source directory:

$ cd postgresql-10.0

Run the following three commands:

$ ./configure --prefix `pwd` --without-readline --without-zlib

In the above command, the argument provided to the prefix option is the absolute path of the postgreSQL source directory. You can type the path (/home/ec2-user/postgresql-10.0) or simply use `pwd` since we are already located in that directory.

$ make

$ make install

Next, build psycopg2 from source code. Once again, the instructions are as outlined in the Jeff Kehler project.

Compiling psycopg2 from source code and statically linking

Extract the files from psycopg2 tar package:

$ tar -xf psycopg2-2.8.3

Enter the extracted psycopg2 source directory:

$ cd psycopg2-2.8.3

Edit setup.cfg file and make following changes:

set pg_config to pg_config file under postgresql source directory that was created there when postgresql was built from source code
set static_libpq to 1

On our EC2 instance, the modified lines of setup.cfg look like:

...
pg_config = /home/ec2-user/postgresql-10.0/bin/pg_config
...
static_libpq = 1

Build the library:

$ python3 setup.py build

After completion, a build directory will be created under the psycopg-2.8.3 directory. Under the build folder there will be folder with name similar to lib.linux-x86_64-3.7. Under this folder there will be a folder psycopg2, which is the package we need.

Go back to your development machine and clean up the previous psycopg2 directory:

$ cd pypg_lambda
$ rm -r psycopg2

Copy the psycopg2 directory from the EC2 instance to your development machine. Enter the following command on your development machine:

$ scp -r -i <aws-key-file> \
    ec2-user@192.0.2.0:psycopg2-2.8.3/build/lib.linux-x86_64-3.7/psycopg2 .

Note: the Jeff Kehler project contains ready-to-use psycopg2 library build for AMI image. Since the Github repository is about 2 years old, the package is built to work with python 3.6. If you are using python 3.6 for the lambda function, you can download the psycopg2 directory from the project without having to build postgresql and pyscopg2 from source code. Since we decided to use the latest python version (3.7 as of this writing), we had to build the library from source code ourselves. (Update, February 2021: Jeff Kehler project now contains pre-built psycopg2 libraries for python 3.7 and 3.8 now.)

Create the lambda function and invoke it

As described in the sections above, create the deployment package zip archive, create the lambda function using the deployment package and invoke the lambda function.

Success!

We taste success on our third attempt. The lamdba function invocation runs successfully and returns with the expected results:

{
    "statusCode": 200,
    "body": [
                [1, "Jack", "Software Engineer"],
                [2, "Jill", "Senior Software Engineer"],
                [3, "Joe", "Engineering Manager"]
            ]
}

References

AWS document on how to create deployment package in Python
- AWS Lambda Deployment Package in Python
Resolving “invalid ELF header” error
- TG4 Solutions blog post - How to resolve an invalid ELF header error quickly
- Stackoverflow answer
- Amazon Compute Blog post - this post is mainly about node.js, but it talks about building libraries using an EC2 instance.
AWS Lamdba supports python 3.7
- AWS Compute Blog post
Accessing database from a Lambda function
- AWS Lambda tutorial
Resolving “libpq.so: cannot open shared object file” error
- AWS forum post - this post contains discussion about this issue and a solution suggested by a forum participant.
- Github project with steps on building psycopg2 library - this Github project is created by the forum participant mentioned in the previous reference. This project provides detailed steps to build postgresql and psycopg2 from source code. If you are using python 3.6, this project contains ready-to-use psycopg2 library built for AWS Lambda.
Links to postgresql and psycopg2 source code downloads
- postgreSQL source downloads
- psycopg2 download page, click on source package link to download source code for the latest version

Tutorial - Setting Up A Basic Data Pipeline

2019-05-23T00:00:00+00:00

A few months ago, I decided to develop a personal project to help me learn data engineering skills. I wrote this tutorial as documentation of my learning experience. I hope the tutorial will be useful to others who might be looking to learn basic data engineering skills.

The approach I took for the project was to implement a basic data pipeline involving the usual steps of a data engineering / data warehousing project. These steps are:

identify data source and acquire data
clean and prepare data
load data into data warehouse

Data Source

To find datasets that I could use for my demo data engineering project, I started with a simple internet search and found this post among several useful hits.

Of the datasets described in the blog post, I picked Walmart Recruiting Store Sales data. Some of the reasons for picking this dataset are:

retail data, easy to understand domain
this data is hosted on Kaggle, very good description of data is provided by Kaggle

Source data located here.

The data provided is historical sales data for 45 Walmart stores for years 2010 thru 2012.

The primary data is weekly department-wise sales amount for each store. Another piece of information included is whether a given week is a holiday week or a regular week. For the purpose of this data, only the following four holidays are considered: Super Bowl, Labor Day, Thanksgiving and Christmas.

In addition, “features” data is provides information such as temperature in the region, fuel price and markdowns etc.

Of the files available from the data source, I used the following files for this data engineering project:

train.csv
features.csv

Schema Design

Applying dimensional design process on the data yields one dimension and two fact tables. The tables and their fields are listed below:

date dimension
- Id (PK)
- Date
- Is Holiday
- Holiday Name
sales fact
- Store (PK)
- Dept (PK)
- Date Key (PK)
- Weekly Sales
features fact
- Store (PK)
- Date Key (PK)
- Temperature
- Fuel Price
- Markdown 1
- Markdown 2
- Markdown 3
- Markdown 4
- Markdown 5
- CPI
- Unemployment

Data Cleaning and Preparation

I picked Stitch as the ETL tool for this project (I describe the process of selecting the ETL tool in the next section).

In this section, I describe the cleaning and prepration I performed while creating each of the dimension and fact CSV files. Because the ETL tool that I picked is a sync-only tool, I had to perform cleaning of certain data items that would otherwise be performed by an ETL tool (I found Stitch to be great even with this limitation).

Date dimension CSV file generation
- extract only Date and isHoliday columns from the source sales data
- sort on Date field and output only unique rows
- data for few of the weeks in not available for the years 2010 and 2012. Add rows for 2010 and 2012 dates that are missing from data
- insert holiday_name field into data for appropriate rows
- add sequence data as first column, this will serve as primary key of the dimension table created from the CSV file
Sales fact CSV file generation
- delete IsHoliday field
- replace “Date” field with “Date Key” foreign key, lookup date key from date dimension created above
Features fact CSV file generation
- features data contains “NA” for missing data values, since these items are numeric, I convert them to 0.0
- delete IsHoliday field
- similar to what was done for sales fact generation, replace “Date” field with “Date Key” foreign key

Implementation

I implemented data cleaning and prepartion using AWS Lambda functions. I upload the source data files to AWS S3 and the Lambda functions download the source data files, clean and prepare CSV files containing dimension & fact data and store the generated files back to S3.

Links to source code of the Lambda functions are:

Data Load

ETL Tool

Stitch is used as ETL tool (link).

Stitch is a sync-only provider. Stitch tool does not provide any transform ability.

Why Stitch

offers a free plan. The only cloud-based ETL tool I found that offers a free plan. (Update: as of December 2020, Stitch Data no longer offers a free plan).
once I started using Stitch, I found the service to be excellent. The free account allows only one destination to be added. This was adequate for my needs.

Setting up data source in Stitch

as mentioned above, after the cleaning and preparation step, the cleaned data files are uploaded to S3.
Stitch provides the ability to use many different types of sources, CSV files stored on AWS S3 is one of the supported sources.
on starting a new integration, I first pick an integration name.
this is used as the schema name in the postgres database.
next, I select AWS S3 CSV integration from the list of integrations presented. Next, I type in my S3 bucket name and file name.
grant access to S3 bucket. Directs me to create an IAM role, provides details such as AWS account id, role name, role policy to use for creating the IAM role.
setup CSV files to table name mapping.
setup integration frequency. Since this project needs just one-time load of data, I pick the default (30 minute) interval. Stitch starts the first load within minutes of setting up the integration. After data load is complete, I turn off the integration.

Setting up destination in Stitch

as mentioned above, Stitch free plan allows only one destination to be setup.
on the user interface for setting up the destination, I pick PostgreSQL as the destination type.
on picking PostgreSQL, I enter details such as RDS host endpoint, port, username, password and database name.
the interface provides a list of IP address and directs me to whitelist these IP addresses on my RDS instance.
after entering all details, Stitch checks if it can connect to the database and if successful, creates the destination.

Data Warehouse

Data loaded into an AWS RDS PostgreSQL instance. The data is organized as a star schema. The Stitch tool creates the dimension and fact tables in the PostgreSQL instance.

Analysis

The following analyses are a sample of possible analyses that can be performed on the data.

Overview

Analysis buttons kick off ajax calls to AWS Lambda functions.
The lambda functions run analysis SQL queries on the postgres database and return the result to the web application.
Chartjs is used for rendering charts.

Implementation

Analysis 1 - Data Availability, Number of Weeks per Year

An extremely simple analysis. Counts the number of weeks for which data is available for each year.

Analysis 2 - Week-of-Holiday Sales Compared to Annual Weekly Average

Compare annual weekly average for the entire year to the weekly sales for the weeks that includes a holiday.

Deployment

The project is deployed at the following location:

Data Pipeline Tutorial - Demo

Jekyll Minima Theme - A Few Settings

2018-11-05T00:00:00+00:00

This post is a follow-up of my earlier post about building a jekyll site.

In this post, I will describe a few things that I modified on my site. For example, the footer shows site title twice - I add a _config.yml setting to show blog author name instead of one of the site titles.

Add site description

The blog site that you deployed earlier (as described in the post referenced above) should look pretty good, but we can make a few quick improvements to make it look even better.

Add a variable description to your _config.yml and set its value to a short description of your site.

_config.yml

description: This blog includes posts related to topic A, topic B and topic C.

Try in browser. The description now displays in footer.

Modify title above list of posts

On the Home/Index page, the list of posts are preceded by a title “Posts”. You can change it to a different title. Suppose, you want the title to be “Latest Posts”. This is easily achieved by adding a variable to index.html’s front matter:

---
layout: home
list_title: "Latest Posts"
---

Try in browser. The Home page displays “Latest Posts” as the title above list of posts.

Github, Twitter and RSS links can be added to footer by simply setting variables in _config.yml.

The link to RSS feed is already displayed at the end of list of posts on the Home page. But adding a variable in _config.yml and setting it to just rss also adds a link in the footer.

_config.yml

github_username: your_github_username
twitter_username: your_twitter_username
rss: rss

Try in browser. Github, Twitter and RSS links are displayed nicely stacked in the footer.

Now the setting that I mentioned at the beginning of this post - displaying blog author name in the footer, instead of displaying site title twice.

The Minima theme displays the site title a second time, because it looks for site author variable and if the author variable is not present it uses title variable as default value.

So, to make this changes, you can simply add author variable to _config.yml.

author: "Blog Author"

Try in browser. Footer should now display blog author name instead of site title be displayed a second time.

Google Analytics

You can add Google analytics to your site by adding the following setting to _config.yml:

google_analytics: UA-XXXXXXXX-X

Customizing A Jekyll Theme Layout

2018-10-25T00:00:00+00:00

After I started a blog, as described in this post, I felt it would be cool to convert the author name shown on top of each post to a link to the about page. This was a minor change, but it required me to customize minima theme’s post layout.

I describe the steps that I used to customize minima’s post.html layout in this post.

Customizing theme files

You can modify a jekyll theme’s functionality by copying a specific file from theme gem and then modifying it. Jekyll uses local files of the same name to override the theme behavior. In addition, the local folder name has to be identical to the folder name in gem where you copied the file from.

Create a folder in your site root directory

Since you are modifying the post layout, you need create a copy of the file in your local site. You need the following steps:

In the root directory of the site, create a _layouts directory:

$ mkdir _layouts

Locate minima gem’s post.html on your computer

You need to determine where Ruby gems are stored on your computer.

You can figure out Ruby gem folder location by running the command gem environment and looking for the value of INSTALLATION DIRECTORY field. The command output should look something like:

$ gem environment

RubyGems Environment:
  ...

  - INSTALLATION DIRECTORY: /path/to/your/ruby/installation/lib/ruby/gems/2.3.0

  ...

The minima gem files are located within this folder:

$ ls -l /path/to/your/ruby/installation/lib/ruby/gems/2.3.0/gems/minima-2.5.0

The file you want to copy and modify is located within the _layouts folder:

$ ls -l /path/to/your/ruby/installation/lib/ruby/gems/2.3.0/gems/minima-2.5.0/_layouts
default.html
home.html
page.html
post.html

An alternative way to figure out the location of gem files is by using the command bundle show minima.

Copy the gem post.html to your site

Copy _layouts/post.html from minima ruby gem folder into the local _layouts directory just created. Go to your site’s root directory and then run the following commands:

$ cd _layouts
$ cp /path/to/your/ruby/installation/lib/ruby/gems/2.3.0/gems/minima-2.5.0/_layouts/post.html .

Make author name a link

Open post.html in an editor and locate the line that you want to modify. The html that you want to modify is shown below:

<span itemprop="author" itemscope itemtype="http://schema.org/Person"><span class="p-author h-card" itemprop="name">Kalyan Vedala</span></span>

Add an anchor element around {{ page.author }} as follows:

<span itemprop="author" itemscope itemtype="http://schema.org/Person"><span class="p-author h-card" itemprop="name"><a href="/about.html">Kalyan Vedala</a></span></span>

Try in browser. In the post, the author name should now be a link. Clicking on the author name should take you to the about page.

Build A Blog Using Jekyll And Deploy To Github Pages And Set Custom Domain

2018-09-12T00:00:00+00:00

I recently decided to start a blog. I had used Wordpress in the past, so I knew I could get my blog up and running quickly using Wordpress. I was also slightly familiar with Jekyll. Doing a google search and reading a few blog posts educated me on benefits of Jekyll and static sites in general. I explored Jekyll a little more and loved it immediately.

The first thing that appealed to me about Jekyll was how programmer-friendly it was. Creating a site using Jekyll felt very similar to a developer’s day-to-day tasks. Another thing that appealed to me was Jekyll’s integration with GitHub Pages. Finally, free hosting provided by GitHub Pages (along with the ability to set custom domains) tipped my decision towards using Jekyll.

I have written this post to serve as a stand-alone tutorial, while also trying to keep it short. I briefly describe new terms and concepts as I introduce them, but do not go into much detail. Jekyll’s documentation is excellent and working through Quickstart and Step-by-Step Tutorial should provide you good background on Jekyll.

Let’s get started.

Install Ruby Development Environment

You need Ruby development environment setup on your computer. Jekyll documentation provides the requirements list here. In addition, you also need bundler. You can install bundler by using the command gem install bundler.

Install Jekyll

Jekyll is a ruby gem. Install it by running the following command in a terminal:

$ gem install jekyll

Create a new directory for your site

On your computer, create a directory to hold your site:

$ mkdir my-site

Create index.html in the new directory

$ cd my-site

Create index.html with some content, such as:

index.html

<h1>Welcome to my blog.</h1>"

Serve the jekyll blog

In a terminal, run the following command

$ jekyll serve

This command generates the site files and runs a local web server at http://localhost:4000.

Install theme gem

You can use a theme to improve your site’s presentation. There is a wide selection of themes to choose from. You can get started with minima theme which is provided by Jekyll. You can install minima gem using the following command:

$ gem install minima

Create Gemfile

Create a file Gemfile in the root directory. Gemfile is used to specify which gems your Jekyll site uses.

Gemfile

source 'https://rubygems.org'
gem 'minima'

Create Jekyll config file and add theme

You also need to set the theme in Jekyll’s configuration file. Jekyll reads configuration from a file named _config.yml in your site’s root directory. Create _config.yml with the following contents:

_config.yml

theme: minima

After making any changes to _config.yml, you need to restart jekyll serve for Jekyll to pickup configuration changes. Even after restarting jekyll serve, you will notice no difference in your site’s rendering. You will fix this next.

Update index.html to use a layout

In the previous section, you saw that the text of your index page rendered without any styling from the theme. This is happening because Jekyll is treating your index.html as a regular html file. You can tell Jekyll to use the theme’s home layout by adding the following to the top of your index.html:

---
layout: home
---

This is called the front matter. Jekyll does additional processing on any file containing a front matter.

Minima theme provides a home layout which is most suitable for a site’s index page. Among other things home layout adds a list of recent posts to the home page.

Try in browser. The index page of the site now renders in theme.

Add site title to config

With the site rendered in theme, it looks really good. Like most sites, you want your site to have a title too. The minima theme uses title variable’s value (if available) as title of your site. You can set title variable by adding the line title: MyAwesomeBlog to _config.yml. Your _config.yml should look like this now:

_config.yml

theme: minima
title: MyAwesomeBlog

Restart jekyll serve and refresh browser. You will notice that the value for title that you provided in _config.yml now becomes the title of your site.

Create about page

Create an About page by creating a about.md file in the site’s root directory.

about.md

---
layout: page
title: About
---
# About me
This page will contain information about me.

This file uses the page layout provided by the theme.

Try in browser. About link shows up in top bar. Jekyll automatically adds any html or markdown files that are in your root directory to navigation bar, using value of the variable title from the page’s front matter as link text.

Create projects page

Add another page to your site. Create projects.md in the site’s root directory.

projects.md

---
layout: page
title: Projects
---
# Projects
Projects will be listed here.

Try in browser. The navigation bar now shows Projects link too. Clicking on projects takes you to projects page.

With about and project pages added, the site is in good shape now. Suppose you want modify the order of items in the navigation bar with About appearing to the right of Projects.

All top-level pages are added to navigation bar in alphabetical order. Reordering navigation items is easily done by using header_pages configuration setting. Add header_pages to configuration and set its value to a list of pages in the order you wish them to appear.

_config.yml

theme: minima
title: MyAwesomeBlog
header_pages:
  - projects.md
  - about.md

Try in browser. The About and Project items now appear in your preferred order.

Add a blog post

Creating a blog post is as simple as creating a directory and a file within that directory.

Create a folder called _posts in the root directory of your site. Create a markdown (or html) file with year, month and day prefixed to the filename.

$ mkdir _posts
$ cd _posts

Create a markdown file with year, month and day prefixed to the filename:

2018-09-12-my-first-post.md

---
layout: post
---
This is the contents of this blog post.

Notice that this file contains Jekyll front matter and sets the layout to post, which is another layout provided by minima theme.

Try in browser. The site lists the post you just added. Clicking on the post title takes you to the post. Notice that the hyphen separated text portion of the file name becomes the title of the post. Also notice that a link to RSS feed is added.

Adding author name to post

In the blog post, you will notice that there is no author name being displayed. Jekyll minima theme supports author name setting. It just needs author variable to have a value. You can set author variable in the front matter of your post. Since it is likely that all posts on your site are written just by you, it is simpler to set author once in _config.yml.

Add the following to _config.yml:

defaults:
  - scope:
      path: ""
    values:
      author: "Blog Author"

defaults is special Jekyll setting that allows you to set front matter defaults. path under scope specifies which files this rule applies to. A blank path means the rule applies to all files in the site.

Restart jekyll serve and refresh browser page. The post now displays author name.

Deploying to GitHub Pages

This section describes how to host your site on GitHub Pages. GitHub allows you to host one user-level site on github pages. The github pages site for your github account should be created in a repository with the name username.github.io, where username is your GitHub username.

create a repository on GitHub with the name username.github.io.
add the github-pages gem to Gemfile. This is a gem provided by GitHub to manage Jekyll and its dependencies. Read this for more details.
Gemfile
```
source 'https://rubygems.org'
gem 'minima'
gem "github-pages", group: :jekyll_plugins
```
commit and push to your repository.
after a couple of minutes you can point your browser to http://username.github.io and your should see your site.

Using a custom domain

You can set a custom domain for your site you just deployed as follows:

purchase a domain name using service of your preference.
in the root directory of your blog site, create a file CNAME.
add the domain name as file’s contents.
CNAME
```
myawesomedomain.com
```
commit and push the changes to your gitub repository.
to connect the domain name to your site, you need to update ALIAS, A or ANAME records with your domain registrar.
for example, GoDaddy uses A records. If you registered your domain using GoDaddy, you can use IP addresses listed in this article to set A records.
set the www subdomain to redirect to myawesomedomain.com by adding a CNAME record with your domain registrar. This is not to be confused with the CNAME file that you created earlier.
if GoDaddy is your domain registrar, no action needs to be taken. GoDaddy automatically sets the CNAME record.

Add Disqus commenting

Comments are essential component of any site. minima supports Disqus commenting system. Comments can be enabled for your posts by setting a configuration parameter. There are steps to add comments to your site:

sign-up for Disqus Basic account.
on Disqus, add your site as organization (you will use myawesomedomain.com).
in your site’s _config.yml, enable Disqus commenting by adding the following:
_config.yml
```
disqus:
  shortname: <site-shortname-from-disqus>
```
commit and push changes to your github repository.

After github pages regenerates the site in a few minutes, navigate to myawesomedomain.com in your browser. You should see Disqus comments displayed at bottom on your post page. Note that Disqus comments are not displayed when running the site locally using jekyll serve.

Conclusion

This post described how you can deploy your personal blog to GitHub Pages hosted site. We used Jekyll site generator since that is the technology that GitHub Pages uses internally. We saw how easy and quick it was to get a basic site up and running. Creating a post was equally straight-forward. Finally, we applied a custom domain to our site.

KalyanV

Deploying an application to AWS using AWS CLI, Part 3b - Application Repository and Deployment to Instance

Application Repository

Preparing an EC2 Instance to Host the Application

Executing Scripts on an EC2 Instance

Install Python, Git, Nginx and PosgreSQL Packages

Initialize PostgreSQL Database

Download Application, Create & Configure Virtual Environment

Install psycopg2

Create Database Schema and Run Migrations

Supervisor Setup

Nginx Setup

Deploying an application to AWS using AWS CLI, Part 2 - AWS Account, IAM User & AWS CLI Installation

AWS Root Account and IAM User

IAM User

Install AWS CLI

Install jq

Deploying an application to AWS using AWS CLI, Part 3 - Level-1 Architecture

Level-1 Architecture

Create Security Group and Allow Incoming Traffic

Create Key Pair

Obtain Image ID

Launch an Instance, obtain public IP address & instance ID

Install Required Software and Application to EC2 Instance

Allocate and Associate Elastic IP Address

Register Domain

Create Hosted Zone

Create Record Sets

Get Delegation Set and Update Nameserver Records with Domain Registrar

Deploying an application to AWS using AWS CLI, Part 1 - Introduction

Introduction to Deploying an Application to AWS using AWS CLI Blog Series

Example Application

Why Flask Mega-Tutorial

Application Technology Stack

Tools Used for Deployment

Basic Regular Expression Use In Python

Looking for a pattern at beginning of a string

Using in a conditional

Using Search

Basic string match without using regular expressions

Conclusion

Using PostgreSQL With Python on AWS Lambda

1. Create deployment package as described in AWS documentation

Setup python virtual environment on development machine

Create python lambda function script

Create deployment package

Create lambda function using the deployment package

2. Resolving “invalid ELF header” error

Background

Create an EC2 instance and connect to it

Setting up virtual environment on an EC2 instance

Create the lambda function and invoke it

A different error encountered

3. Resolving “libpq.so.x cannot open shared object file” error

Background

Compiling postgresql from source code

Compiling psycopg2 from source code and statically linking

Create the lambda function and invoke it

Success!

References

Tutorial - Setting Up A Basic Data Pipeline

Data Source

Schema Design

Data Cleaning and Preparation

Data Load

Analysis

Jekyll Minima Theme - A Few Settings

Add site description

Modify title above list of posts

Add github, twitter and rss links to footer

Modify footer to display blog author name

Google Analytics

Customizing A Jekyll Theme Layout

Customizing theme files

Create a folder in your site root directory

Locate minima gem’s post.html on your computer

Copy the gem post.html to your site

Make author name a link

Build A Blog Using Jekyll And Deploy To Github Pages And Set Custom Domain

Install Ruby Development Environment