Jekyll2023-12-28T17:32:21+00:00https://kalyanv.com/feed.xmlKalyanVI blog here about software development topics related to projects and tools I happen to be currently working on.Kalyan VedalaDeploying an application to AWS using AWS CLI, Part 3b - Application Repository and Deployment to Instance2021-02-06T00:00:00+00:002021-02-06T00:00:00+00:00https://kalyanv.com/2021/02/06/deploy-application-to-aws-using-aws-cli-part-3b-application<h2 id="application-repository">Application Repository</h2>
<p>As I mentioned in <a href="/2021/02/04/deploy-applications-to-aws-using-aws-cli-part-1-introduction.html">Part 1 - Introduction</a> post, I decided to use the application developed as part of Miguel Grinberg’s <a href="https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world">Flask Mega-Tutorial</a>.</p>
<p>Since the focus of this blog series is on deploying to AWS. I decided to keep the application simple. Therefore, I cloned the application as it stands at <a href="https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-ix-pagination">Chapter 9: Pagination</a> of Grinberg’s series. In addition, I apply the minor modifications needed to use the <code class="language-plaintext highlighter-rouge">python-dotenv</code> package as described in <a href="https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-xv-a-better-application-structure">Chapter 15</a>.</p>
<p>To summarize:</p>
<ul>
<li>Start with application with modifitions upto and including Chapter 9 of Grinberg’s tutorial.</li>
<li>Apply changes related to <code class="language-plaintext highlighter-rouge">python-dotenv</code> as described in Chapter 15.</li>
</ul>
<p>I made these modifications and pushed the changes to <a href="https://github.com/vedala/microblog_cli">microblog_cli</a> repository.</p>
<h2 id="preparing-an-ec2-instance-to-host-the-application">Preparing an EC2 Instance to Host the Application</h2>
<p>For the Level-1 architecture, we install all tiers on the one EC2 instance. We install:</p>
<ul>
<li>Nginx to as our web server</li>
<li>Flask as application server / business logic framework</li>
<li>PostgreSQL as database</li>
</ul>
<p>This preparation of EC2 instance also closely follows the instructions from Grinberg’s tutorial as described in <a href="https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-xvii-deployment-on-linux">Chapter 17: Deployment to Linux</a>. The deployment instructions described in the rest of this post are a little different from the instructions described in Grinberg tutorial’s Chapter 17. The summary of the differences is:</p>
<ul>
<li>We use Amazon Linux 2 for our EC2 instance instead of Ubuntu</li>
<li>We use PosgreSQL database instead of MySQL</li>
<li>We skip the <em>Password-less Logins</em> section</li>
<li>We skip the <em>Secure Your Server</em> section, since we will use AWS’s Security Groups to secure our server</li>
</ul>
<h2 id="executing-scripts-on-an-ec2-instance">Executing Scripts on an EC2 Instance</h2>
<p>The script shown below does the following things:</p>
<ul>
<li>check for options</li>
<li>Accept keys for the new host to avoid interactive question</li>
<li>Install following software:
<ul>
<li>python3 and related libraries</li>
<li>git</li>
<li>nginx</li>
<li>postgreSQL (repo and libraries)</li>
</ul>
</li>
</ul>
<p>A note about how script is executed on the remote EC2 instance:</p>
<ul>
<li>The script creates a sub-script that would be executed on the remote EC2 instance (let’s call it remote_script).</li>
<li>The remote_script is created using <code class="language-plaintext highlighter-rouge">cat</code> command along with heredoc.</li>
<li>The remote_script then passed in as stdin to the ssh command. ssh executes the remote_script on the remote EC2 instance.</li>
</ul>
<h2 id="install-python-git-nginx-and-posgresql-packages">Install Python, Git, Nginx and PosgreSQL Packages</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/bin/bash
set -euo pipefail
ip_address=""
ssh_key_file=""
while getopts "i:k:" opt; do
case "$opt" in
i)
ip_address=$OPTARG
;;
k)
ssh_key_file=$OPTARG
;;
esac
done
if [[ $ip_address == "" || $ssh_key_file == "" ]]; then
echo "$(basename $0): Required options are missing."
echo "Usage: $(basename $0) -i instance-ip -k ssh-key-file"
exit 1
fi
# Accept keys for the new host to avoid the interactive
# question when connecting using ssh for the first time.
ssh-keyscan $ip_address >> ~/.ssh/known_hosts
PG_ADMIN_PWD=`cat pg_admin_pwd.txt`
cat <<-ENDCMDS > /tmp/remote_script.sh
#!/bin/bash
set -euo pipefail
sudo yum -y update
sudo yum -y install python3 python3-venv python3-devel
sudo yum -y install git
# Install nginx
sudo amazon-linux-extras install -y nginx1
#
# Install postgresql 12
#
# Add repo
sudo tee /etc/yum.repos.d/pgdg.repo <<-PGREPO
[pgdg12]
name=PostgreSQL 12 for RHEL/CentOS 7
baseurl=https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-7-x86_64
enabled=1
gpgcheck=0
PGREPO
# Generate metadata cache and install postgresql 12
sudo yum makecache
sudo yum -y install postgresql12 postgresql12-libs postgresql12-server
</code></pre></div></div>
<h2 id="initialize-postgresql-database">Initialize PostgreSQL Database</h2>
<p>Initialize and setup postgreSQL database
- initialize database
- start and enable database service
- set admin user’s password</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo /usr/pgsql-12/bin/postgresql-12-setup initdb
# Start and enable database service
sudo systemctl start postgresql-12
sudo systemctl enable postgresql-12
# Set postgresql admin user's password
sudo -i -u postgres -- bash -c "psql -c \"alter user postgres with password '$PG_ADMIN_PWD'\""
ENDCMDS
ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/remote_script.sh
</code></pre></div></div>
<h2 id="download-application-create--configure-virtual-environment">Download Application, Create & Configure Virtual Environment</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MICROBLOG_PG_USER_PWD=`cat microblog_pg_user_pwd.txt`
cat <<-ENDCMDS > /tmp/app_install.sh
#!/bin/bash
set -euo pipefail
# Download the application source code
git clone https://github.com/vedala/microblog_cli microblog
# Create python virtual environment and install dependencies
cd microblog
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Install gunicorn
pip install gunicorn
echo "PATH=\\\$PATH:/usr/pgsql-12/bin" >> ~/.bash_profile
# Create .env file for environment variables
echo -n "SECRET_KEY=" > .env
python -c 'import uuid; print(uuid.uuid4().hex)' >> .env
echo "DATABASE_URL=postgres://microblog:$MICROBLOG_PG_USER_PWD@localhost:5432/microblog" >> .env
ENDCMDS
ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/app_install.sh
</code></pre></div></div>
<p>The echo command that modified the PATH variable on the remote machine uses
two levels of escaping. First, we do not want to expand $PATH on the local
machine, so we add a “\” before the $ sign. The second escaping is needed,
when echo command runs on the remote instance. We do not want to expand $PATH
even then, since we want to add a line that looks like “PATH=$PATH:…”.
For the second escaping we add two additional back slashes before the back
slash we added above.</p>
<h2 id="install-psycopg2">Install psycopg2</h2>
<p>Our python application needs a driver to access posgreSQL database. The library
we will use is psycopg2. Here is how we install it:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cat <<-ENDCMDS > /tmp/install_psycopg2.sh
#!/bin/bash
set -euo pipefail
cd microblog
source venv/bin/activate
# Install C compiler, needed for "pip install" of psycopg2 postgresql driver
sudo yum -y install gcc
# Install libpq library
sudo yum -y install libpq5 libpq5-devel
# Install wheel package
pip install wheel
# Install postgresql driver
pip install psycopg2
# Use password authentication instead of the default ident authentication.
# Restart postgres service.
sudo -u postgres sed 's/ident/md5/' /var/lib/pgsql/12/data/pg_hba.conf | sudo -u postgres tee /tmp/new_pg_hba.conf
sudo -u postgres mv /tmp/new_pg_hba.conf /var/lib/pgsql/12/data/pg_hba.conf
sudo systemctl reload postgresql-12
ENDCMDS
ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/install_psycopg2.sh
</code></pre></div></div>
<h2 id="create-database-schema-and-run-migrations">Create Database Schema and Run Migrations</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MICROBLOG_PG_USER_PWD=`cat microblog_pg_user_pwd.txt`
cat <<-ENDCMDS > /tmp/db_and_migrations.sh
#!/bin/bash
set -euo pipefail
sudo -i -u postgres -- bash -c "psql -c \"create database microblog;\""
sudo -i -u postgres -- bash -c "psql -c \"create user microblog with encrypted password '$MICROBLOG_PG_USER_PWD';\""
sudo -i -u postgres -- bash -c "psql -c \"grant all privileges on database microblog to microblog;\""
cd microblog
source venv/bin/activate
flask db upgrade
ENDCMDS
ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/db_and_migrations.sh
</code></pre></div></div>
<h2 id="supervisor-setup">Supervisor Setup</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cat <<-ENDCMDS > /tmp/install_supervisor.sh
#!/bin/bash
set -euo pipefail
sudo easy_install supervisor
sudo mkdir /etc/supervisor
sudo echo_supervisord_conf | sudo tee /etc/supervisor/supervisord.conf
sudo mkdir /etc/supervisor/conf.d
sudo mkdir /var/log/supervisor
# Modify socket file location under sections [unix_http_server] and [supervisorctl]
sudo cp /etc/supervisor/supervisord.conf /tmp
sudo sed -i 's#tmp/supervisor.sock#var/run/supervisor.sock#' /tmp/supervisord.conf
# Modify items under [supervisord] section
sudo sed -i 's#^logfile=/tmp/supervisord.log#logfile=/var/log/supervisord.log#' /tmp/supervisord.conf
sudo sed -i 's#^pidfile=/tmp/supervisord.pid#logfile=/var/run/supervisord.pid#' /tmp/supervisord.conf
sudo sed -i 's#^;childlogdir=/tmp#childlogdir=/var/log/supervisor#' /tmp/supervisord.conf
# Uncomment [include] section and modify files configuration
sudo sed -i 's#^\;\[include\]#[include]#' /tmp/supervisord.conf
sudo sed -i 's/^\;files.*/files=\/etc\/supervisor\/conf.d\/*.conf/' /tmp/supervisord.conf
sudo mv /tmp/supervisord.conf /etc/supervisor
# Create script for systemctl service
sudo tee /lib/systemd/system/supervisord.service <<-END_SERVICE_SCRIPT
[Unit]
Description=Supervisor process control system for UNIX
Documentation=http://supervisord.org
After=network.target
[Service]
ExecStart=/usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf
ExecStop=/usr/bin/supervisorctl \\\$OPTIONS shutdown
ExecReload=/usr/bin/supervisorctl -c /etc/supervisor/supervisord.conf \\\$OPTIONS reload
KillMode=process
Restart=on-failure
RestartSec=50s
[Install]
WantedBy=multi-user.target
END_SERVICE_SCRIPT
# Start and enable supevisord
sudo systemctl start supervisord
sudo systemctl enable supervisord
# Add supervisor configuration to monitor gunicorn
sudo tee /etc/supervisor/conf.d/microblog.conf <<-END_GUNI_MONITOR
[program:microblog]
command=/home/ec2-user/microblog/venv/bin/gunicorn -b localhost:8000 -w 4 microblog:app
directory=/home/ec2-user/microblog
user=ec2-user
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
END_GUNI_MONITOR
# Reload supervisor service
sudo supervisorctl reload
ENDCMDS
ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/install_supervisor.sh
</code></pre></div></div>
<h2 id="nginx-setup">Nginx Setup</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cat <<-ENDCMDS > /tmp/config_nginx.sh
#!/bin/bash
set -euo pipefail
cd microblog
mkdir certs
openssl req -new -newkey rsa:4096 -days 365 -nodes -x509 -keyout certs/key.pem -out certs/cert.pem -subj "/C=US/O=Microblog, AWS CLI/CN=mbawscli"
sudo tee /etc/nginx/conf.d/microblog.conf <<-END_NGINX_CFG
server {
listen 80;
server_name _;
location / {
return 301 https://\\\$host\\\$request_uri;
}
}
server {
listen 443 ssl;
server_name _;
ssl_certificate /home/ec2-user/microblog/certs/cert.pem;
ssl_certificate_key /home/ec2-user/microblog/certs/key.pem;
access_log /var/log/microblog_access.log;
error_log /var/log/microblog_error.log;
location / {
proxy_pass http://localhost:8000;
proxy_redirect off;
proxy_set_header Host \\\$host;
proxy_set_header X-Real-IP \\\$remote_addr;
proxy_set_header X-Forwarded-For \\\$proxy_add_x_forwarded_for;
}
location /static {
alias /home/ubuntu/microblog/app/static;
expires 30d;
}
}
END_NGINX_CFG
sudo systemctl start nginx
sudo systemctl enable nginx
ENDCMDS
ssh -i $ssh_key_file ec2-user@$ip_address < /tmp/config_nginx.sh
</code></pre></div></div>Kalyan VedalaApplication RepositoryDeploying an application to AWS using AWS CLI, Part 2 - AWS Account, IAM User & AWS CLI Installation2021-02-05T00:00:00+00:002021-02-05T00:00:00+00:00https://kalyanv.com/2021/02/05/deploy-application-to-aws-using-aws-cli-part-2-initial-setup<h2 id="aws-root-account-and-iam-user">AWS Root Account and IAM User</h2>
<ul>
<li>Create a new AWS root account</li>
<li>Configure the root account
<ul>
<li>Setup MFA</li>
<li>Setup billing alert</li>
</ul>
</li>
</ul>
<h2 id="iam-user">IAM User</h2>
<ul>
<li>Create IAM user
<ul>
<li>Create an IAM user with “AdministratorAccess” policy</li>
<li>Give it some name, e.g. Developers</li>
<li>Allow both console and programmatic access for the user</li>
<li>Save credentials CSV file to local machine</li>
</ul>
</li>
</ul>
<h2 id="install-aws-cli">Install AWS CLI</h2>
<ul>
<li>Installation basics
<ul>
<li>Installing on macOS for a single user</li>
<li>Installing version 2</li>
<li>Used this guide as reference <a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html">AWS CLI Users Guide</a></li>
</ul>
</li>
<li>Installation Steps
<ul>
<li>
<p>Following the instructions under <em>Installing the AWS CLI</em> –> <em>AWS CLI version 2</em> –> <em>macOS</em></p>
</li>
<li>
<p>Copy and save the provided XML template to a file. This XML ile is used to specify the location where we want aws-cli to be installed. I wanted to install the AWS CLI executables in <code class="language-plaintext highlighter-rouge">bin</code> folder under my home directory. The XML after modifications looks as follows (I only modified the location of the directory):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<array>
<dict>
<key>choiceAttribute</key>
<string>customLocation</string>
<key>attributeSetting</key>
<string>/Users/your_home_directory/bin</string>
<key>choiceIdentifier</key>
<string>default</string>
</dict>
</array>
</plist>
</code></pre></div> </div>
</li>
<li>
<p>Download install package</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
</code></pre></div> </div>
</li>
<li>
<p>Run installer command, specify the XML file that you created in the previous steps:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> installer -pkg AWSCLIV2.pkg \
-target CurrentUserHomeDirectory \
-applyChoiceChangesXML choices.xml
</code></pre></div> </div>
</li>
<li>
<p>Create symlinks within a folder that may contain all your executables or symlinks to executables. I created a folder to hold all my executable links. Since I created a new folder to hold my executable links, I adding this folder to PATH variable in my .bash_profile:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> mkdir ~/executable_links
</code></pre></div> </div>
<p><strong>Add to ~/.bash_profile</strong>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> PATH=$HOME/executable_links:$PATH
</code></pre></div> </div>
<p>Create symlinks:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> cd ~/executable_links
ln -s $HOME/bin/aws-cli/aws .
ln -s $HOME/bin/aws-cli/aws-completer .
</code></pre></div> </div>
</li>
<li>
<p>Configure AWS CLI to use the IAM user credentials that we downloaded in an earlier step:</p>
<ul>
<li>Run command <code class="language-plaintext highlighter-rouge">aws configure</code>:
<ul>
<li>Enter Access Key ID</li>
<li>Enter Secret Access Key</li>
<li>Enter region</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<h2 id="install-jq">Install jq</h2>
<p>If not already present, install <code class="language-plaintext highlighter-rouge">jq</code> utility on your local machine from the <a href="https://stedolan.github.io/jq/download/">jq website</a>.</p>Kalyan VedalaAWS Root Account and IAM User Create a new AWS root account Configure the root account Setup MFA Setup billing alertDeploying an application to AWS using AWS CLI, Part 3 - Level-1 Architecture2021-02-05T00:00:00+00:002021-02-05T00:00:00+00:00https://kalyanv.com/2021/02/05/deploy-application-to-aws-using-aws-cli-part-3-level-1<h2 id="level-1-architecture">Level-1 Architecture</h2>
<p>As I mentioned in <a href="/2021/02/04/deploy-applications-to-aws-using-aws-cli-part-1-introduction.html">Part 1 - Introduction</a>, this blog series develops a series of deployment architectures to mirror the recommendations made in the <a href="https://www.youtube.com/watch?v=vg5onp8TU6Q">Scaling Up to Your First 10 million Users</a></p>
<p>The original video just describes a series a recommendations; I demarcated the recommendations into a few “levels”. “Level” is the word that I am using for the purpose of this blog series.</p>
<h2 id="create-security-group-and-allow-incoming-traffic">Create Security Group and Allow Incoming Traffic</h2>
<p>Create security group:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws ec2 create-security-group --group-name "my-sg-level1" \
--description "Security group, level1"
</code></pre></div></div>
<p>Allow incoming traffic on ports 22, 80 and 443 for ssh, http and https.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws ec2 authorize-security-group-ingress --group-name my-sg-step1 \
--protocol tcp --port 22 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress --group-name my-sg-step1 \
--protocol tcp --port 80 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress --group-name my-sg-step1 \
--protocol tcp --port 443 --cidr 0.0.0.0/0
</code></pre></div></div>
<h2 id="create-key-pair">Create Key Pair</h2>
<p>Create key pair and save the certificate file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws ec2 create-key-pair --key-name UsEast1KP --query 'KeyMaterial' \
--output text > acct93user1.pem
</code></pre></div></div>
<h2 id="obtain-image-id">Obtain Image ID</h2>
<p>Obtain image id for the latest Amazon Linux 2 image:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws ec2 describe-images --filters "Name=name,Values=amzn2-ami-hvm-2.0*-x86_64-gp2" | jq -r '.Images[].Name' | sort | tail -1 > image_name.txt
IMAGE_NAME=`cat image_name.txt`
aws ec2 describe-images --filters "Name=name,Values=$IMAGE_NAME" | jq -r '.Images[].ImageId' > image_id.txt
</code></pre></div></div>
<h2 id="launch-an-instance-obtain-public-ip-address--instance-id">Launch an Instance, obtain public IP address & instance ID</h2>
<p>Launch an EC2 instance:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>IMAGE_ID=`cat image_id.txt`
SG_ID=`cat sg_id.txt`
aws ec2 run-instances --image-id $IMAGE_ID --instance-type t2.micro --key-name UsEast1KP --security-group-ids $SG_ID
</code></pre></div></div>
<h2 id="install-required-software-and-application-to-ec2-instance">Install Required Software and Application to EC2 Instance</h2>
<p>Follow the instructions in <a href="/2021/02/06/deploy-application-to-aws-using-aws-cli-part-3b-application.html">Part 3b</a> to install the required software and application to the EC2 instance. After completing the steps in Part 3b, come back here and continue with allocating Elatic IP address.</p>
<h2 id="allocate-and-associate-elastic-ip-address">Allocate and Associate Elastic IP Address</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws ec2 allocate-address
aws ec2 describe-addresses --query "Addresses[0].PublicIp" | sed 's/"//g' > elastic_ip_addr.txt
ELASTIC_IP_ADDR=`cat elastic_ip_addr.txt`
aws ec2 associate-address --instance-id `cat instance_id.txt` --public-ip $ELASTIC_IP_ADDR
</code></pre></div></div>
<h2 id="register-domain">Register Domain</h2>
<p>Register a domain with a registrar of your choice. I registered my domain with GoDaddy.
You can also register a domain using Route53.</p>
<p>The instructions outlined in the following sections are for the scenario when you register
your domain using an outside registrar. Some of the steps are different in the case where
the domain is registered using Route53.</p>
<p>Save the domain name in a text file <code class="language-plaintext highlighter-rouge">domain_name.txt</code>.</p>
<h2 id="create-hosted-zone">Create Hosted Zone</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CALLER_REF=$(date +%Y-%m-%d-%H:%M:%S)
DOMAIN_NAME=`cat domain_name.txt`
RETURN_JSON=$(aws route53 create-hosted-zone --name $DOMAIN_NAME --caller-reference $CALLER_REF)
echo $RETURN_JSON | jq '.HostedZone.Id' | sed 's/"//g' | sed 's#/hostedzone/##' > hosted_zone_id.txt
echo $RETURN_JSON | jq '.ChangeInfo.Id' | sed 's/"//g' | sed 's#/change/##' > create_hz_change_id.txt
</code></pre></div></div>
<p>We create a hosted zone using the domain as the hosted zone’s name. Hosted zone creation
also needs a caller reference parameter which is required to be unique. We use a timestamp
as caller reference.</p>
<p>We also need to save two pieces of information from the json returned by the create
hosted zone request: Hosted Zone Id and Change Id.</p>
<p>Hosted zone creation is treated as a request by Route53. For this reason, we need to save
the Change ID information returned by the <code class="language-plaintext highlighter-rouge">create-hosted-zone</code> request. We can then use the
change id to query the status:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CHANGE_ID=`cat create_hz_change_id.txt`
aws route53 get-change --id $CHANGE_ID --query "ChangeInfo.Status"
</code></pre></div></div>
<p>Requests to Route53 are assigned an initial status of <code class="language-plaintext highlighter-rouge">PENDING</code>. Upon completion of the
request, the status changes to <code class="language-plaintext highlighter-rouge">INSYNC</code>. Before moving on to the record set creation step,
we have to make sure the hosted zone creation request has the <code class="language-plaintext highlighter-rouge">INSYNC</code> status.</p>
<h2 id="create-record-sets">Create Record Sets</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DOMAIN_NAME=`cat domain_name.txt`
aws route53 change-resource-record-sets --hosted-zone-id `cat hosted_zone_id.txt` \
--change-batch '{
"Changes": [
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "'$DOMAIN_NAME'",
"Type": "A",
"TTL": 60,
"ResourceRecords": [ { "Value": "'$(cat elastic_ip_addr.txt)'"} ]
}
},
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "www.'$DOMAIN_NAME'",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "'$(cat hosted_zone_id.txt)'",
"DNSName": "'$DOMAIN_NAME'",
"EvaluateTargetHealth": false
}
}
},
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "*.'$DOMAIN_NAME'",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "'$(cat hosted_zone_id.txt)'",
"DNSName": "'$DOMAIN_NAME'", "EvaluateTargetHealth": false
}
}
}
]
}' > /tmp/record_sets_create_info.txt
cat /tmp/record_sets_create_info.txt | jq '.ChangeInfo.Id' | sed 's/"//g' | sed 's#/change/##' > create_rs_change_id.txt
</code></pre></div></div>
<p>Verify that record set creation request has its status set to <code class="language-plaintext highlighter-rouge">INSYNC</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CHANGE_ID=`cat create_rs_change_id.txt`
aws route53 get-change --id $CHANGE_ID --query "ChangeInfo.Status"
</code></pre></div></div>
<h2 id="get-delegation-set-and-update-nameserver-records-with-domain-registrar">Get Delegation Set and Update Nameserver Records with Domain Registrar</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>HOSTED_ZONE_ID=`cat hosted_zone_id.txt`
aws route53 get-hosted-zone --id $HOSTED_ZONE_ID --query "DelegationSet" > delegation_set.txt
</code></pre></div></div>
<p>The delegation set is a list of name servers that looks similar to the list shown below:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
"NameServers": [
"ns-xxx.awsdns-xx.net",
"ns-xxx.awsdns-xx.com",
"ns-xxxx.awsdns-xx.co.uk",
"ns-xxxx.awsdns-xx.org"
]
}
</code></pre></div></div>
<p>We need to update our domain’s name servers on the domain registrar’s site.</p>Kalyan VedalaLevel-1 ArchitectureDeploying an application to AWS using AWS CLI, Part 1 - Introduction2021-02-04T00:00:00+00:002021-02-04T00:00:00+00:00https://kalyanv.com/2021/02/04/deploy-applications-to-aws-using-aws-cli-part-1-introduction<h2 id="introduction-to-deploying-an-application-to-aws-using-aws-cli-blog-series">Introduction to Deploying an Application to AWS using AWS CLI Blog Series</h2>
<p>This blog series aims to replicate the architecture recommendations as described in the <a href="https://www.youtube.com/watch?v=vg5onp8TU6Q">Scaling Up to Your First 10 million Users</a> video presented at Re:invent 2015.</p>
<h2 id="example-application">Example Application</h2>
<p>I want this blog series to focus on developing infrastucture deployment scripts. Therefore, I decided to use an already available application. I picked application developed in Miguel Grinberg’s Flask Mega-Tutorial. I list the reasons for picking this application below.</p>
<h2 id="why-flask-mega-tutorial">Why Flask Mega-Tutorial</h2>
<p>A few reasons why I picked this application:</p>
<ul>
<li>Uses python-based web framework - Flask. Flask is a micro framework which can be learned very quickly.</li>
<li>In my opinion, Grinberg’s tutorial is by far the best tutorial to learn Flask application development. In fact, the tutorial is best I have seen for learning web application development - not just python web development, but web development in general.</li>
</ul>
<h2 id="application-technology-stack">Application Technology Stack</h2>
<p>Application uses:</p>
<ul>
<li>Python-based Flask application framework</li>
<li>nginx as web server</li>
<li>PostgreSQL database</li>
</ul>
<p>The application repository setup is described in more detail in <a href="/2021/02/06/deploy-application-to-aws-using-aws-cli-part-3b-application.html">Part 3b</a>.</p>
<h2 id="tools-used-for-deployment">Tools Used for Deployment</h2>
<ul>
<li>AWS CLI</li>
<li>bash</li>
<li>jq</li>
</ul>Kalyan VedalaIntroduction to Deploying an Application to AWS using AWS CLI Blog SeriesBasic Regular Expression Use In Python2019-10-30T00:00:00+00:002019-10-30T00:00:00+00:00https://kalyanv.com/2019/10/30/basic-regular-expression-use-in-python<p>Every once in a while, I find the need to use regular expressions in Python
programs. Most of the time, my needs are simple, such as: check if a string
contains a word, where the word may have first letter capitalized.</p>
<h2 id="looking-for-a-pattern-at-beginning-of-a-string">Looking for a pattern at beginning of a string</h2>
<p>In Python, regular expression functionality is provided by <code class="language-plaintext highlighter-rouge">re</code> module. And the most
basic function to perform regular expression matching is the <code class="language-plaintext highlighter-rouge">match()</code> function.</p>
<p><code class="language-plaintext highlighter-rouge">match()</code> accepts two arguments (and an optional third argument which we will not
discuss in this post). The first argument is the pattern we are looking for and
the second argument is the string we want to search in.</p>
<p><code class="language-plaintext highlighter-rouge">match()</code> looks for the pattern at the beginning of the string.</p>
<h2 id="using-in-a-conditional">Using in a conditional</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import re
if re.match("[lL]orem", "Lorem ipsum dolor sit amet."):
print("matched")
else:
print("not matched")
</code></pre></div></div>
<p>The above conditional works because:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">match()</code> returns <code class="language-plaintext highlighter-rouge">None</code> if there is no match</li>
<li>returns a <code class="language-plaintext highlighter-rouge">match</code> object if there is a match</li>
<li>presense of an object (the <code class="language-plaintext highlighter-rouge">match</code> object) makes the return value have truthy value of <code class="language-plaintext highlighter-rouge">True</code></li>
<li>since <code class="language-plaintext highlighter-rouge">None</code> is equivalent to boolean value <code class="language-plaintext highlighter-rouge">False</code>, we can use <code class="language-plaintext highlighter-rouge">re.match()</code> directly in a conditional as shown above.</li>
</ul>
<h2 id="using-search">Using Search</h2>
<p>While <code class="language-plaintext highlighter-rouge">match()</code> may satisfy many search requirements, it has one obvious
limitation - <code class="language-plaintext highlighter-rouge">match()</code> looks for pattern only at the beginning of the string.
The <code class="language-plaintext highlighter-rouge">re</code> module provides the function <code class="language-plaintext highlighter-rouge">search()</code> which overcomes this limitation.
Function <code class="language-plaintext highlighter-rouge">search()</code> looks for the pattern anywhere in the string.</p>
<p>As we did with match() above, we can use <code class="language-plaintext highlighter-rouge">search()</code> in a simple conditional, as follows:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if re.search("[dD]olor", "Lorem ipsum dolor sit amet."):
print("match found")
else:
print("match not found")
</code></pre></div></div>
<p>Here, we are looking for the pattern that occurs anywhere in the string, not just at
the beginning. If the pattern is present, the <code class="language-plaintext highlighter-rouge">if</code> condition will evaluate to <code class="language-plaintext highlighter-rouge">True</code>.
If there is no match then <code class="language-plaintext highlighter-rouge">search()</code> returns <code class="language-plaintext highlighter-rouge">None</code> which evaluates to truthy
value of <code class="language-plaintext highlighter-rouge">False</code>.</p>
<h2 id="basic-string-match-without-using-regular-expressions">Basic string match without using regular expressions</h2>
<p>While this post is about using regular expressions, simple searches can be done
using string operations:</p>
<p><ins>in operator</ins></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if "dolor" in "Lorem ipsum dolor sit amet.":
print("match found")
</code></pre></div></div>
<p><ins>string.lower()</ins></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if "dolor" in "Lorem ipsum DOLOR sit amet.".lower():
print("match found")
</code></pre></div></div>
<p><ins>comparison operators</ins></p>
<ul>
<li>”==” and “!=” operators can be used to compare equivalence of two strings.</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>In this post, I wanted to write about basic usage of regular expressions within
python programs. Python’s <code class="language-plaintext highlighter-rouge">re</code> module provides a lot more functionality than
described here.</p>Kalyan VedalaEvery once in a while, I find the need to use regular expressions in Python programs. Most of the time, my needs are simple, such as: check if a string contains a word, where the word may have first letter capitalized.Using PostgreSQL With Python on AWS Lambda2019-06-10T00:00:00+00:002019-06-10T00:00:00+00:00https://kalyanv.com/2019/06/10/using-postgresql-with-python-on-aws-lambda<p>While working on a personal project for setting up a basic data pipeline, described
<a href="/2019/05/23/tutorial-a-basic-data-pipeline.html">here</a>,
I ran into an issue where psycopg2 library was not available
on AWS Lambda. My lambda function uses this library to access data stored in
an PostgreSQL RDS instance. It is understandable that AMI image does not include
libraries such as psycopg2, it is the lambda function developer’s job to include
any dependency libraries that the lambda function needs. AWS provides documentation
<a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html">here</a>
on deploying lambda functions with dependency libraries that are not
available in the AMI image.</p>
<p>In this blog post, I start with the method outlined
in the AWS documentation on Lambda deployment package, describe issues
encountered and the steps I took to resolve the issues.</p>
<h2 id="1-create-deployment-package-as-described-in-aws-documentation">1. Create deployment package as described in AWS documentation</h2>
<p>In this section, we follow the instructions as outlined in the AWS documentation
mentioned above. We use the virtual environment method.</p>
<h3 id="setup-python-virtual-environment-on-development-machine">Setup python virtual environment on development machine</h3>
<p>On your development machine (Mac in our case), create a python virtual environment
(we are using python 3.7.3, the latest version available at the time of writing).
In this post, we are assuming you will create the virtual environment directory
under your home directory.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python3.7 -m venv my_venv
</code></pre></div></div>
<p>Activate the virtual environment</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ source my_venv/bin/activate
</code></pre></div></div>
<p>Install psycopg2 library in the virtual environment. Although there are many libraries
available for accessing postgreSQL from python, psycopg2 is the most widely used.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ pip install psycopg2
</code></pre></div></div>
<h3 id="create-python-lambda-function-script">Create python lambda function script</h3>
<p>Create a directory that will be used to hold the lambda script and dependency library:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir pypg_lambda
</code></pre></div></div>
<p>In the directory, create a file to hold your lambda script:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd pypg_lambda
$ touch my_lambda.py
</code></pre></div></div>
<p>Add following as contents of the file <code class="language-plaintext highlighter-rouge">my_lambda.py</code>:</p>
<p><strong>my_lambda.py</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import sys
import logging
import psycopg2
import json
import os
# rds settings
rds_host = os.environ.get('RDS_HOST')
rds_username = os.environ.get('RDS_USERNAME')
rds_user_pwd = os.environ.get('RDS_USER_PWD')
rds_db_name = os.environ.get('RDS_DB_NAME')
logger = logging.getLogger()
logger.setLevel(logging.INFO)
try:
conn_string = "host=%s user=%s password=%s dbname=%s" % \
(rds_host, rds_username, rds_user_pwd, rds_db_name)
conn = psycopg2.connect(conn_string)
except:
logger.error("ERROR: Could not connect to Postgres instance.")
sys.exit()
logger.info("SUCCESS: Connection to RDS Postgres instance succeeded")
def handler(event, context):
query = """select id, name, job_title
from employee
order by 1"""
with conn.cursor() as cur:
rows = []
cur.execute(query)
for row in cur:
rows.append(row)
return { 'statusCode': 200, 'body': rows }
</code></pre></div></div>
<p>The above file is an example of a very simple lambda function that fetches
rows from a table and returns them when the lambda function is invoked. This
program has been adapted from code sample in
<a href="https://docs.aws.amazon.com/lambda/latest/dg/vpc-rds.html">this</a>
tutorial in AWS Lambda documentation.</p>
<p>You need to create a AWS RDS PostgreSQL instance with a database <code class="language-plaintext highlighter-rouge">mydatabase</code>.
In this database, a table <code class="language-plaintext highlighter-rouge">employee</code> needs to be created.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-- Employee table
CREATE TABLE employee (
id INTEGER NOT NULL,
name VARCHAR(40) NOT NULL,
job_title VARCHAR(40) NOT NULL,
PRIMARY KEY (id)
);
</code></pre></div></div>
<p>Insert a few rows into the <code class="language-plaintext highlighter-rouge">employee</code> table:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>INSERT INTO employee(id, name, job_title) VALUES
(1, 'Jack', 'Software Engineer'),
(2, 'Jill', 'Senior Software Engineer'),
(3, 'Joe', 'Engineering Manager');
</code></pre></div></div>
<h3 id="create-deployment-package">Create deployment package</h3>
<p>Enter <code class="language-plaintext highlighter-rouge">pypg_lambda</code> directory (if not already there):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd pypg_lambda
</code></pre></div></div>
<p>Copy the psycopg2 package installed within the virtual environment to
<code class="language-plaintext highlighter-rouge">pypg_lambda</code> directory:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cp -r ~/my_venv/lib/python3.7/site-packages/psycopg2 .
</code></pre></div></div>
<p>As mentioned previously, we created the virtual environment in the home directory
of our development machine. Modify the <code class="language-plaintext highlighter-rouge">cp</code> command to suit your directory’s
location.</p>
<p>Create the deployment package zip archive:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ zip -r ../my_lambda.zip .
</code></pre></div></div>
<h3 id="create-lambda-function-using-the-deployment-package">Create lambda function using the deployment package</h3>
<p>Set environment variables related to RDS database instance</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ export RDS_HOST=<database host url>
$ export RDS_USERNAME=<username>
$ export RDS_USER_PWD=<password>
$ export RDS_DB_NAME=mydatabase
</code></pre></div></div>
<p>Set environment variables related to VPC for use with <code class="language-plaintext highlighter-rouge">aws</code> cli command.
We can avoid this step by directly typing the details into the <code class="language-plaintext highlighter-rouge">aws</code> command.
But setting these details as environment variables makes entering the
command less tedious.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ export role_arn=<AWS role arn>
$ export subnet_ids="subnet-xxxxxx,subnet-xxxxxx,..." # comma separated list
$ export sec_group_id=<security group id>
</code></pre></div></div>
<p>Create the lambda function:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ aws lambda create-function --region "us-east-1" \
--function-name "mylambda" \
--zip-file fileb://mylambda.zip \
--handler "my_lambda.handler" \
--role "${role_arn}" \
--runtime "python3.7" \
--timeout 60 \
--vpc-config SubnetIds="${subnet_ids}",SecurityGroupIds="${sec_group_id}" \
--environment Variables="{RDS_HOST=${RDS_HOST}, \
RDS_USERNAME=${RDS_USERNAME}, \
RDS_USER_PWD=${RDS_USER_PWD}, \
RDS_DB_NAME=${RDS_DB_NAME}}"
</code></pre></div></div>
<p>Invoke the lambda function:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ aws lambda invoke --function-name mylambda ~/lambda_output.txt
</code></pre></div></div>
<p>Following error is encountered on invocation of the lambda function:</p>
<p><code class="language-plaintext highlighter-rouge">Unable to import module 'mylambda': No module named 'psycopg2._psycopg'</code></p>
<p>The psycopg2 folder under the deployment package folder on our machine contains
the following library:</p>
<p><code class="language-plaintext highlighter-rouge">_psycopg.cpython-37m-darwin.so</code></p>
<p>To explore the possibility that lambda function is looking for <code class="language-plaintext highlighter-rouge">_psycopg.so</code>
file, we rename the file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mv _psycopg.cpython-37m-darwin.so _psycopg.so
</code></pre></div></div>
<p>And redeploy the lambda function:</p>
<ul>
<li>Copy the psycopg2 directory from the virtual environment to <code class="language-plaintext highlighter-rouge">pypg_lambda</code> directory</li>
<li>Create a new zip archive from the deployment package folder <code class="language-plaintext highlighter-rouge">pypg_lambda</code></li>
<li>Delete lambda function using AWS interface</li>
<li>Use <code class="language-plaintext highlighter-rouge">aws lambda create-function</code> to deploy using the updated deployment package</li>
</ul>
<p>Invoked lambda function again, this time the following error is encountered:</p>
<p><code class="language-plaintext highlighter-rouge">Runtime.ImportModuleError: Unable to import module 'mylambda': /var/task/psycopg2/_psycopg.so: invalid ELF header</code></p>
<p>We describe how we resolved this error in the next section.</p>
<h2 id="2-resolving-invalid-elf-header-error">2. Resolving “invalid ELF header” error</h2>
<h3 id="background">Background</h3>
<p>As suggested <a href="https://tg4.solutions/how-to-resolve-invalid-elf-header-error/">here</a>
and <a href="https://stackoverflow.com/a/34885155/3137099">here</a>,
the “invalid ELF header” error happens due to a mismatch between
the machine where the lambda function deployment package is created and the machine
where the lambda function is executed. We built the deployment package on a Mac,
whereas the execution environment is AWS Lambda’s environment, which is the
Amazon Linux AMI.</p>
<p>To remove the mismatch, we need to create the deployment package in the same
envionment as the AWS Lambda function runs in. The simplest approach is to spin
up an EC2 instance, install <code class="language-plaintext highlighter-rouge">psycopg2</code> library in a virtual environment there.
Described below are steps we followed to do this:</p>
<h3 id="create-an-ec2-instance-and-connect-to-it">Create an EC2 instance and connect to it</h3>
<p>Launch an EC2 instance on AWS and connect to the instance (replace with the ip
address of your instance):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ssh -i <aws-key-file> ec2-user@192.0.2.0
</code></pre></div></div>
<h3 id="setting-up-virtual-environment-on-an-ec2-instance">Setting up virtual environment on an EC2 instance</h3>
<p>Python3 is not available on Amazon Linux, so we need to install it. The following
commands will install python3 and other dependencies needed for creating
a virtual environment and installing <code class="language-plaintext highlighter-rouge">pyscopg2</code> within the virtual environment.
We are also installing the C compiler here, which we need in a later step:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo yum install python3
$ sudo yum install gcc python-setuptools python-devel python3-devel
$ sudo yum install postgresql-devel
</code></pre></div></div>
<p>The above installs python 3.7.3, which is the latest version available at
the time of writing.</p>
<p>As described in the previous section, create a virtual environment, activate
it and install <code class="language-plaintext highlighter-rouge">psycopg2</code> library:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python3 -m venv my_venv
$ source my_venv/bin/activate
$ pip install psycopg2
</code></pre></div></div>
<p>We now have the <code class="language-plaintext highlighter-rouge">psycopg2</code> package file we need in the virtual environment. You
need to copy the package from the EC2 instance to your development machine.</p>
<p>Clean up you deployment package working directory <code class="language-plaintext highlighter-rouge">pypg_lambda</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd pypg_lambda
$ rm -r psycopg2
</code></pre></div></div>
<p>Run the following command on your development machine to copy the package
directory to the local machine:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ scp -r -i <aws-key-file> \
ec2-user@192.0.2.0:~/my_venv/lib/python3.7/site-packages/psycopg2 .
</code></pre></div></div>
<p>Create zip archive:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ zip -r ../my_lambda.zip .
</code></pre></div></div>
<h3 id="create-the-lambda-function-and-invoke-it">Create the lambda function and invoke it</h3>
<p>Create the lambda function using the <code class="language-plaintext highlighter-rouge">aws lambda create-function</code> command
as shown previously and invoke it.</p>
<h3 id="a-different-error-encountered">A different error encountered</h3>
<p>Running the lambda function generates the following error:</p>
<p><code class="language-plaintext highlighter-rouge">Runtime.ImportModuleError: Unable to import module 'mylambda': libpq.so.5: cannot open shared object file: No such file or directory</code></p>
<p>While we are still encountering an error, we are no longer running into
the “invalid ELF header”. So we can consider the “invalid ELF header” error to
be resolved and let’s work on resolving the new error.</p>
<h2 id="3-resolving-libpqsox-cannot-open-shared-object-file-error">3. Resolving “libpq.so.x cannot open shared object file” error</h2>
<h3 id="background-1">Background</h3>
<p>Searching for solutions to the “cannot open shared object file” error lead us
to <a href="https://forums.aws.amazon.com/thread.jspa?messageID=680192">this</a> post
on AWS forums. This forum post also provides a link to
<a href="https://github.com/jkehler/awslambda-psycopg2">this</a> Github project (we will
refer to the Github project by its owner’s name, Jeff Kehler, in rest of this
post).</p>
<p>The solution requires us to link the <code class="language-plaintext highlighter-rouge">libpq.so</code> library statically, which
in turn requires us to build postgreSQL and psycopg2 from source code.</p>
<p>We pick the following versions of postgreSQL and psycopg2 to build
from source code:</p>
<ul>
<li>picked 10.0.0 version of postgreSQL, since this is the version used by Amazon RDS instance</li>
<li>picked 2.8.3 version of psycopg2, since this is the latest version available at the time of writing. I like to start with the latest version and see if it works. Then work backwards to go to older version if more recent versions don’t work.</li>
</ul>
<p>We download source code for postgreSQL and psycopg2 from the following locations:</p>
<ul>
<li><a href="https://www.postgresql.org/ftp/source/v10.0/">postgreSQL source downloads</a></li>
<li><a href="http://initd.org/psycopg/download/">psycopg2 download page</a>, click on <code class="language-plaintext highlighter-rouge">source package</code> link to download source code for the latest version</li>
</ul>
<p>Upload source packages to the EC2 instance:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ scp -i <aws-key-file> postgresql-10.0.tar ec2-user@192.0.2.0:~
$ scp -i <aws-key-file> psycopg2-2.8.3.tar ec2-user@192.0.2.0:~
</code></pre></div></div>
<p>Once again, we will be working in the home directory on the EC2 instance. The
above commands copied the source code tar archives to EC2 instance’s home
directory.</p>
<p>SSH into your EC2 instance and follow the steps below (as outlined in the Jeff
Kehler project).</p>
<h3 id="compiling-postgresql-from-source-code">Compiling postgresql from source code</h3>
<p>Extract the files from postgreSQL tar package:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ tar -xf postgresql-10.0.tar
</code></pre></div></div>
<p>Enter the extracted postgresql source directory:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd postgresql-10.0
</code></pre></div></div>
<p>Run the following three commands:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./configure --prefix `pwd` --without-readline --without-zlib
</code></pre></div></div>
<p>In the above command, the argument provided to the <code class="language-plaintext highlighter-rouge">prefix</code> option is the
absolute path of the postgreSQL source directory. You can type the path
(/home/ec2-user/postgresql-10.0) or simply use `pwd` since we are already
located in that directory.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ make install
</code></pre></div></div>
<p>Next, build psycopg2 from source code. Once again, the instructions are
as outlined in the Jeff Kehler project.</p>
<h3 id="compiling-psycopg2-from-source-code-and-statically-linking">Compiling psycopg2 from source code and statically linking</h3>
<p>Extract the files from psycopg2 tar package:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ tar -xf psycopg2-2.8.3
</code></pre></div></div>
<p>Enter the extracted psycopg2 source directory:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd psycopg2-2.8.3
</code></pre></div></div>
<p>Edit <code class="language-plaintext highlighter-rouge">setup.cfg</code> file and make following changes:</p>
<ul>
<li>set <code class="language-plaintext highlighter-rouge">pg_config</code> to pg_config file under postgresql source directory that was created there when postgresql was built from source code</li>
<li>set static_libpq to 1</li>
</ul>
<p>On our EC2 instance, the modified lines of setup.cfg look like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
pg_config = /home/ec2-user/postgresql-10.0/bin/pg_config
...
static_libpq = 1
</code></pre></div></div>
<p>Build the library:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python3 setup.py build
</code></pre></div></div>
<p>After completion, a build directory will be created under the psycopg-2.8.3 directory.
Under the build folder there will be folder with name similar to lib.linux-x86_64-3.7.
Under this folder there will be a folder psycopg2, which is the package we need.</p>
<p>Go back to your development machine and clean up the previous psycopg2
directory:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd pypg_lambda
$ rm -r psycopg2
</code></pre></div></div>
<p>Copy the psycopg2 directory from the EC2 instance to your development machine.
Enter the following command on your development machine:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ scp -r -i <aws-key-file> \
ec2-user@192.0.2.0:psycopg2-2.8.3/build/lib.linux-x86_64-3.7/psycopg2 .
</code></pre></div></div>
<p>Note: the Jeff Kehler project contains ready-to-use psycopg2 library
build for AMI image. Since the Github repository is about 2 years old, the
package is built to work with python 3.6. If you are using python 3.6 for
the lambda function, you can download the psycopg2 directory from the project
without having to build postgresql and pyscopg2 from source code. Since we
decided to use the latest python version (3.7 as of this writing), we had to
build the library from source code ourselves.
(<strong>Update</strong>, February 2021: Jeff Kehler project now contains pre-built psycopg2 libraries
for python 3.7 and 3.8 now.)</p>
<h3 id="create-the-lambda-function-and-invoke-it-1">Create the lambda function and invoke it</h3>
<p>As described in the sections above, create the deployment package zip archive,
create the lambda function using the deployment package and invoke the lambda
function.</p>
<h3 id="success">Success!</h3>
<p>We taste success on our third attempt. The lamdba function invocation runs
successfully and returns with the expected results:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
"statusCode": 200,
"body": [
[1, "Jack", "Software Engineer"],
[2, "Jill", "Senior Software Engineer"],
[3, "Joe", "Engineering Manager"]
]
}
</code></pre></div></div>
<h2 id="references">References</h2>
<ul>
<li>AWS document on how to create deployment package in Python
<ul>
<li><a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html">AWS Lambda Deployment Package in Python</a></li>
</ul>
</li>
<li>Resolving “invalid ELF header” error
<ul>
<li><a href="https://tg4.solutions/how-to-resolve-invalid-elf-header-error/">TG4 Solutions blog post - How to resolve an invalid ELF header error quickly</a></li>
<li><a href="https://stackoverflow.com/a/34885155/3137099">Stackoverflow answer</a></li>
<li><a href="https://aws.amazon.com/blogs/compute/nodejs-packages-in-lambda/">Amazon Compute Blog post</a> - this post is mainly about node.js, but it talks about building libraries using an EC2 instance.</li>
</ul>
</li>
<li>AWS Lamdba supports python 3.7
<ul>
<li><a href="https://aws.amazon.com/about-aws/whats-new/2018/11/aws-lambda-supports-python-37/">AWS Compute Blog post</a></li>
</ul>
</li>
<li>Accessing database from a Lambda function
<ul>
<li><a href="https://docs.aws.amazon.com/lambda/latest/dg/vpc-rds.html">AWS Lambda tutorial</a></li>
</ul>
</li>
<li>Resolving “libpq.so: cannot open shared object file” error
<ul>
<li><a href="https://forums.aws.amazon.com/thread.jspa?messageID=680192">AWS forum post</a> - this post contains discussion about this issue and a solution suggested by a forum participant.</li>
<li><a href="https://github.com/jkehler/awslambda-psycopg2">Github project with steps on building psycopg2 library</a> - this Github project is created by the forum participant mentioned in the previous reference. This project provides detailed steps to build postgresql and psycopg2 from source code. If you are using python 3.6, this project contains ready-to-use psycopg2 library built for AWS Lambda.</li>
</ul>
</li>
<li>Links to postgresql and psycopg2 source code downloads
<ul>
<li><a href="https://www.postgresql.org/ftp/source/v10.0/">postgreSQL source downloads</a></li>
<li><a href="http://initd.org/psycopg/download/">psycopg2 download page</a>, click on <code class="language-plaintext highlighter-rouge">source package</code> link to download source code for the latest version</li>
</ul>
</li>
</ul>Kalyan VedalaWhile working on a personal project for setting up a basic data pipeline, described here, I ran into an issue where psycopg2 library was not available on AWS Lambda. My lambda function uses this library to access data stored in an PostgreSQL RDS instance. It is understandable that AMI image does not include libraries such as psycopg2, it is the lambda function developer’s job to include any dependency libraries that the lambda function needs. AWS provides documentation here on deploying lambda functions with dependency libraries that are not available in the AMI image.Tutorial - Setting Up A Basic Data Pipeline2019-05-23T00:00:00+00:002019-05-23T00:00:00+00:00https://kalyanv.com/2019/05/23/tutorial-a-basic-data-pipeline<p>A few months ago, I decided to develop a personal project to help me learn
data engineering skills. I wrote this tutorial as documentation of my learning
experience. I hope the tutorial will be useful to others who might be looking
to learn basic data engineering skills.</p>
<p>The approach I took for the project was to implement a basic data pipeline
involving the usual steps of a data engineering / data warehousing project.
These steps are:</p>
<ul>
<li>identify data source and acquire data</li>
<li>clean and prepare data</li>
<li>load data into data warehouse</li>
</ul>
<h2 id="data-source">Data Source</h2>
<p>To find datasets that I could use for my demo data engineering project,
I started with a simple internet search and found
<a href="https://www.springboard.com/blog/free-public-data-sets-data-science-project/">this post</a>
among several useful hits.</p>
<p>Of the datasets described in the blog post, I picked Walmart Recruiting Store Sales data.
Some of the reasons for picking this dataset are:</p>
<ul>
<li>retail data, easy to understand domain</li>
<li>this data is hosted on Kaggle, very good description of data is provided by Kaggle</li>
</ul>
<p>Source data located <a href="https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting">here</a>.</p>
<p>The data provided is historical sales data for 45 Walmart stores for years
2010 thru 2012.</p>
<p>The primary data is weekly department-wise sales amount for each store.
Another piece of information included is whether a given week is a holiday
week or a regular week. For the purpose of this data, only the following four
holidays are considered: Super Bowl, Labor Day, Thanksgiving and
Christmas.</p>
<p>In addition, “features” data is provides information such as temperature
in the region, fuel price and markdowns etc.</p>
<p>Of the files available from the data source, I used the following files for this
data engineering project:</p>
<ul>
<li>train.csv</li>
<li>features.csv</li>
</ul>
<h2 id="schema-design">Schema Design</h2>
<p>Applying dimensional design process on the data yields one dimension
and two fact tables. The tables and their fields are listed below:</p>
<ul>
<li>date dimension
<ul>
<li>Id (PK)</li>
<li>Date</li>
<li>Is Holiday</li>
<li>Holiday Name</li>
</ul>
</li>
<li>sales fact
<ul>
<li>Store (PK)</li>
<li>Dept (PK)</li>
<li>Date Key (PK)</li>
<li>Weekly Sales</li>
</ul>
</li>
<li>features fact
<ul>
<li>Store (PK)</li>
<li>Date Key (PK)</li>
<li>Temperature</li>
<li>Fuel Price</li>
<li>Markdown 1</li>
<li>Markdown 2</li>
<li>Markdown 3</li>
<li>Markdown 4</li>
<li>Markdown 5</li>
<li>CPI</li>
<li>Unemployment</li>
</ul>
</li>
</ul>
<h2 id="data-cleaning-and-preparation">Data Cleaning and Preparation</h2>
<p>I picked <a href="https://www.stitchdata.com/">Stitch</a> as the ETL tool for
this project (I describe the process of selecting the ETL tool
in the next section).</p>
<p>In this section, I describe the cleaning and prepration I performed while
creating each of the dimension and fact CSV files. Because the ETL tool that
I picked is a sync-only tool, I had to perform cleaning of certain data
items that would otherwise be performed by an ETL tool (I found Stitch to
be great even with this limitation).</p>
<ul>
<li>Date dimension CSV file generation
<ul>
<li>extract only Date and isHoliday columns from the source sales data</li>
<li>sort on Date field and output only unique rows</li>
<li>data for few of the weeks in not available for the years 2010 and 2012. Add rows for 2010 and 2012 dates that are missing from data</li>
<li>insert holiday_name field into data for appropriate rows</li>
<li>add sequence data as first column, this will serve as primary key of the dimension table created from the CSV file</li>
</ul>
</li>
<li>Sales fact CSV file generation
<ul>
<li>delete IsHoliday field</li>
<li>replace “Date” field with “Date Key” foreign key, lookup date key from date dimension created above</li>
</ul>
</li>
<li>Features fact CSV file generation
<ul>
<li>features data contains “NA” for missing data values, since these items are numeric, I convert them to 0.0</li>
<li>delete IsHoliday field</li>
<li>similar to what was done for sales fact generation, replace “Date” field with “Date Key” foreign key</li>
</ul>
</li>
</ul>
<p><em><strong>Implementation</strong></em></p>
<p>I implemented data cleaning and prepartion using AWS Lambda functions. I
upload the source data files to AWS S3 and the Lambda functions download
the source data files, clean and prepare CSV files containing dimension
& fact data and store the generated files back to S3.</p>
<p>Links to source code of the Lambda functions are:</p>
<ul>
<li><a href="https://github.com/vedala/dataeng_wm/blob/master/lambda/prepare_datedim.sh">Date dimension Lambda function</a></li>
<li><a href="https://github.com/vedala/dataeng_wm/blob/master/lambda/prepare_salesfact.sh">Sales fact Lambda function</a></li>
<li><a href="https://github.com/vedala/dataeng_wm/blob/master/lambda/prepare_featuresfact.sh">Features fact Lambda function</a></li>
</ul>
<h2 id="data-load">Data Load</h2>
<p><em><strong>ETL Tool</strong></em></p>
<p>Stitch is used as ETL tool (<a href="https://www.stitchdata.com/">link</a>).</p>
<p>Stitch is a sync-only provider. Stitch tool does not provide any
transform ability.</p>
<p>Why Stitch</p>
<ul>
<li>offers a free plan. The only cloud-based ETL tool I found that offers a free plan. (<strong>Update</strong>: as of December 2020, Stitch Data no longer offers a free plan).</li>
<li>once I started using Stitch, I found the service to be excellent. The free account allows only one destination to be added. This was adequate for my needs.</li>
</ul>
<p>Setting up data source in Stitch</p>
<ul>
<li>as mentioned above, after the cleaning and preparation step, the cleaned data files are uploaded to S3.</li>
<li>Stitch provides the ability to use many different types of sources, CSV files stored on AWS S3 is one of the supported sources.</li>
<li>on starting a new integration, I first pick an integration name.</li>
<li>this is used as the schema name in the postgres database.</li>
<li>next, I select AWS S3 CSV integration from the list of integrations presented. Next, I type in my S3 bucket name and file name.</li>
<li>grant access to S3 bucket. Directs me to create an IAM role, provides details such as AWS account id, role name, role policy to use for creating the IAM role.</li>
<li>setup CSV files to table name mapping.</li>
<li>setup integration frequency. Since this project needs just one-time load of data, I pick the default (30 minute) interval. Stitch starts the first load within minutes of setting up the integration. After data load is complete, I turn off the integration.</li>
</ul>
<p>Setting up destination in Stitch</p>
<ul>
<li>as mentioned above, Stitch free plan allows only one destination to be setup.</li>
<li>on the user interface for setting up the destination, I pick PostgreSQL as the destination type.</li>
<li>on picking PostgreSQL, I enter details such as RDS host endpoint, port, username, password and database name.</li>
<li>the interface provides a list of IP address and directs me to whitelist these IP addresses on my RDS instance.</li>
<li>after entering all details, Stitch checks if it can connect to the database and if successful, creates the destination.</li>
</ul>
<p><em><strong>Data Warehouse</strong></em></p>
<p>Data loaded into an AWS RDS PostgreSQL instance. The data is organized
as a star schema. The Stitch tool creates the dimension and fact tables
in the PostgreSQL instance.</p>
<h2 id="analysis">Analysis</h2>
<p>The following analyses are a sample of possible analyses that can be
performed on the data.</p>
<p><em><strong>Overview</strong></em></p>
<ul>
<li>Analysis buttons kick off ajax calls to AWS Lambda functions.</li>
<li>The lambda functions run analysis SQL queries on the postgres database and return the result to the web application.</li>
<li>Chartjs is used for rendering charts.</li>
</ul>
<p><em><strong>Implementation</strong></em></p>
<ul>
<li><a href="https://github.com/vedala/dataeng_wm/blob/master/lambda/analysis501.py">Analysis-1 Lambda function</a></li>
<li><a href="https://github.com/vedala/dataeng_wm/blob/master/lambda/analysis502.py">Analysis-2 Lambda function</a></li>
</ul>
<p><em><strong>Analysis 1 - Data Availability, Number of Weeks per Year</strong></em></p>
<p>An extremely simple analysis. Counts the number of weeks for which
data is available for each year.</p>
<p><em><strong>Analysis 2 - Week-of-Holiday Sales Compared to Annual Weekly Average</strong></em></p>
<p>Compare annual weekly average for the entire year to the weekly sales for the
weeks that includes a holiday.</p>
<p><em><strong>Deployment</strong></em></p>
<p>The project is deployed at the following location:</p>
<p><a href="http://dataeng-walmart.s3-website-us-east-1.amazonaws.com/">Data Pipeline Tutorial - Demo</a></p>Kalyan VedalaA few months ago, I decided to develop a personal project to help me learn data engineering skills. I wrote this tutorial as documentation of my learning experience. I hope the tutorial will be useful to others who might be looking to learn basic data engineering skills.Jekyll Minima Theme - A Few Settings2018-11-05T00:00:00+00:002018-11-05T00:00:00+00:00https://kalyanv.com/2018/11/05/jekyll-minima-theme-a-few-settings<p>This post is a follow-up of my earlier
<a href="/2018/09/12/build-a-blog-using-jekyll-and-deploy-to-github-pages-and-set-custom-domain.html">post</a>
about building a jekyll site.</p>
<p>In this post, I will describe a few things that I modified on my site. For example,
the footer shows site title twice - I add a <code class="language-plaintext highlighter-rouge">_config.yml</code> setting to show blog author
name instead of one of the site titles.</p>
<h2 id="add-site-description">Add site description</h2>
<p>The blog site that you deployed earlier (as described in the post referenced above)
should look pretty good, but we can make a few quick improvements to make it look
even better.</p>
<p>Add a variable <code class="language-plaintext highlighter-rouge">description</code> to your <code class="language-plaintext highlighter-rouge">_config.yml</code> and set its value to a short
description of your site.</p>
<p><strong>_config.yml</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>description: This blog includes posts related to topic A, topic B and topic C.
</code></pre></div></div>
<p>Try in browser. The description now displays in footer.</p>
<h2 id="modify-title-above-list-of-posts">Modify title above list of posts</h2>
<p>On the Home/Index page, the list of posts are preceded by a title “Posts”. You can
change it to a different title. Suppose, you want the title to be “Latest Posts”.
This is easily achieved by adding a variable to <code class="language-plaintext highlighter-rouge">index.html</code>’s front matter:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>---
layout: home
list_title: "Latest Posts"
---
</code></pre></div></div>
<p>Try in browser. The Home page displays “Latest Posts” as the title above list
of posts.</p>
<h2 id="add-github-twitter-and-rss-links-to-footer">Add github, twitter and rss links to footer</h2>
<p>Github, Twitter and RSS links can be added to footer by simply setting variables
in <code class="language-plaintext highlighter-rouge">_config.yml</code>.</p>
<p>The link to RSS feed is already displayed at the end of list of posts on the Home
page. But adding a variable in <code class="language-plaintext highlighter-rouge">_config.yml</code> and setting it to just <code class="language-plaintext highlighter-rouge">rss</code> also
adds a link in the footer.</p>
<p><strong>_config.yml</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>github_username: your_github_username
twitter_username: your_twitter_username
rss: rss
</code></pre></div></div>
<p>Try in browser. Github, Twitter and RSS links are displayed nicely stacked in
the footer.</p>
<h2 id="modify-footer-to-display-blog-author-name">Modify footer to display blog author name</h2>
<p>Now the setting that I mentioned at the beginning of this post - displaying
blog author name in the footer, instead of displaying site title twice.</p>
<p>The Minima theme displays the site title a second time, because it looks for
site <code class="language-plaintext highlighter-rouge">author</code> variable and if the <code class="language-plaintext highlighter-rouge">author</code> variable is not present it uses
<code class="language-plaintext highlighter-rouge">title</code> variable as default value.</p>
<p>So, to make this changes, you can simply add <code class="language-plaintext highlighter-rouge">author</code> variable to <code class="language-plaintext highlighter-rouge">_config.yml</code>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>author: "Blog Author"
</code></pre></div></div>
<p>Try in browser. Footer should now display blog author name instead of site
title be displayed a second time.</p>
<h2 id="google-analytics">Google Analytics</h2>
<p>You can add Google analytics to your site by adding the following setting to
<code class="language-plaintext highlighter-rouge">_config.yml</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>google_analytics: UA-XXXXXXXX-X
</code></pre></div></div>Kalyan VedalaThis post is a follow-up of my earlier post about building a jekyll site.Customizing A Jekyll Theme Layout2018-10-25T00:00:00+00:002018-10-25T00:00:00+00:00https://kalyanv.com/2018/10/25/customizing-a-jekyll-theme-layout<p>After I started a blog, as described in
<a href="/2018/09/12/build-a-blog-using-jekyll-and-deploy-to-github-pages-and-set-custom-domain.html">this</a>
post, I felt it would be cool to convert the author name shown on top of each post
to a link to the <code class="language-plaintext highlighter-rouge">about</code> page. This was a minor change, but it required me to
customize <code class="language-plaintext highlighter-rouge">minima</code> theme’s <code class="language-plaintext highlighter-rouge">post</code> layout.</p>
<p>I describe the steps that I used to customize <code class="language-plaintext highlighter-rouge">minima</code>’s <code class="language-plaintext highlighter-rouge">post.html</code> layout in
this post.</p>
<h2 id="customizing-theme-files">Customizing theme files</h2>
<p>You can modify a jekyll theme’s functionality by copying a specific file
from theme gem and then modifying it. Jekyll uses local files of the same name
to override the theme behavior. In addition, the local folder name has to be
identical to the folder name in gem where you copied the file from.</p>
<h2 id="create-a-folder-in-your-site-root-directory">Create a folder in your site root directory</h2>
<p>Since you are modifying the <code class="language-plaintext highlighter-rouge">post</code> layout, you need create a copy of the file
in your local site. You need the following steps:</p>
<p>In the root directory of the site, create a <code class="language-plaintext highlighter-rouge">_layouts</code> directory:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir _layouts
</code></pre></div></div>
<h2 id="locate-minima-gems-posthtml-on-your-computer">Locate minima gem’s post.html on your computer</h2>
<p>You need to determine where Ruby gems are stored on your computer.</p>
<p>You can figure out Ruby gem folder location by running the command
<code class="language-plaintext highlighter-rouge">gem environment</code> and looking for the value of <code class="language-plaintext highlighter-rouge">INSTALLATION DIRECTORY</code>
field. The command output should look something like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gem environment
RubyGems Environment:
...
- INSTALLATION DIRECTORY: /path/to/your/ruby/installation/lib/ruby/gems/2.3.0
...
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">minima</code> gem files are located within this folder:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ls -l /path/to/your/ruby/installation/lib/ruby/gems/2.3.0/gems/minima-2.5.0
</code></pre></div></div>
<p>The file you want to copy and modify is located within the <code class="language-plaintext highlighter-rouge">_layouts</code> folder:</p>
<pre>
$ ls -l /path/to/your/ruby/installation/lib/ruby/gems/2.3.0/gems/minima-2.5.0/_layouts
default.html
home.html
page.html
<b>post.html</b>
</pre>
<p>An alternative way to figure out the location of gem files is by using the
command <code class="language-plaintext highlighter-rouge">bundle show minima</code>.</p>
<h2 id="copy-the-gem-posthtml-to-your-site">Copy the gem post.html to your site</h2>
<p>Copy <code class="language-plaintext highlighter-rouge">_layouts/post.html</code> from minima ruby gem folder into the local
<code class="language-plaintext highlighter-rouge">_layouts</code> directory just created. Go to your site’s root directory
and then run the following commands:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd _layouts
$ cp /path/to/your/ruby/installation/lib/ruby/gems/2.3.0/gems/minima-2.5.0/_layouts/post.html .
</code></pre></div></div>
<h2 id="make-author-name-a-link">Make author name a link</h2>
<p>Open <code class="language-plaintext highlighter-rouge">post.html</code> in an editor and locate the line that you want to modify. The html
that you want to modify is shown below:</p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><span</span> <span class="na">itemprop=</span><span class="s">"author"</span> <span class="na">itemscope</span> <span class="na">itemtype=</span><span class="s">"http://schema.org/Person"</span><span class="nt">><span</span> <span class="na">class=</span><span class="s">"p-author h-card"</span> <span class="na">itemprop=</span><span class="s">"name"</span><span class="nt">></span>Kalyan Vedala<span class="nt"></span></span></span>
</code></pre></div></div>
<p>Add an anchor element around <code class="language-plaintext highlighter-rouge">{{ page.author }}</code> as follows:</p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><span</span> <span class="na">itemprop=</span><span class="s">"author"</span> <span class="na">itemscope</span> <span class="na">itemtype=</span><span class="s">"http://schema.org/Person"</span><span class="nt">><span</span> <span class="na">class=</span><span class="s">"p-author h-card"</span> <span class="na">itemprop=</span><span class="s">"name"</span><span class="nt">><a</span> <span class="na">href=</span><span class="s">"/about.html"</span><span class="nt">></span>Kalyan Vedala<span class="nt"></a></span></span></span>
</code></pre></div></div>
<p>Try in browser. In the post, the author name should now be a link. Clicking
on the author name should take you to the about page.</p>Kalyan VedalaAfter I started a blog, as described in this post, I felt it would be cool to convert the author name shown on top of each post to a link to the about page. This was a minor change, but it required me to customize minima theme’s post layout.Build A Blog Using Jekyll And Deploy To Github Pages And Set Custom Domain2018-09-12T00:00:00+00:002018-09-12T00:00:00+00:00https://kalyanv.com/2018/09/12/build-a-blog-using-jekyll-and-deploy-to-github-pages-and-set-custom-domain<p>I recently decided to start a blog. I had used Wordpress in the past, so I knew
I could get my blog up and running quickly using Wordpress. I was also slightly
familiar with Jekyll. Doing a google search and reading a few blog posts educated
me on benefits of Jekyll and static sites in general. I explored Jekyll a little
more and loved it immediately.</p>
<p>The first thing that appealed to me about Jekyll was how programmer-friendly it
was. Creating a site using Jekyll felt very similar to a developer’s day-to-day
tasks. Another thing that appealed to me was Jekyll’s integration with GitHub
Pages. Finally, free hosting provided by GitHub Pages (along with the ability
to set custom domains) tipped my decision towards using Jekyll.</p>
<p>I have written this post to serve as a stand-alone tutorial, while also trying
to keep it short. I briefly describe new terms and concepts as I introduce them,
but do not go into much detail. Jekyll’s documentation is excellent and working
through <a href="https://jekyllrb.com/docs/">Quickstart</a> and
<a href="https://jekyllrb.com/docs/step-by-step/01-setup/">Step-by-Step Tutorial</a> should
provide you good background on Jekyll.</p>
<p>Let’s get started.</p>
<h2 id="install-ruby-development-environment">Install Ruby Development Environment</h2>
<p>You need Ruby development environment setup on your computer. Jekyll documentation
provides the requirements list <a href="https://jekyllrb.com/docs/installation/">here</a>.
In addition, you also need <code class="language-plaintext highlighter-rouge">bundler</code>. You can install <code class="language-plaintext highlighter-rouge">bundler</code> by using the command
<code class="language-plaintext highlighter-rouge">gem install bundler</code>.</p>
<h2 id="install-jekyll">Install Jekyll</h2>
<p>Jekyll is a ruby gem. Install it by running the following command in a terminal:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gem install jekyll
</code></pre></div></div>
<h2 id="create-a-new-directory-for-your-site">Create a new directory for your site</h2>
<p>On your computer, create a directory to hold your site:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir my-site
</code></pre></div></div>
<h2 id="create-indexhtml-in-the-new-directory">Create index.html in the new directory</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd my-site
</code></pre></div></div>
<p>Create <code class="language-plaintext highlighter-rouge">index.html</code> with some content, such as:</p>
<p><strong>index.html</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><h1>Welcome to my blog.</h1>"
</code></pre></div></div>
<h2 id="serve-the-jekyll-blog">Serve the jekyll blog</h2>
<p>In a terminal, run the following command</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ jekyll serve
</code></pre></div></div>
<p>This command generates the site files and runs a local web server at
<a href="http://localhost:4000"><code class="language-plaintext highlighter-rouge">http://localhost:4000</code></a>.</p>
<h2 id="install-theme-gem">Install theme gem</h2>
<p>You can use a theme to improve your site’s presentation. There is a wide selection
of themes to choose from. You can get started with <code class="language-plaintext highlighter-rouge">minima</code> theme which is provided
by Jekyll. You can install <code class="language-plaintext highlighter-rouge">minima</code> gem using the following command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gem install minima
</code></pre></div></div>
<h2 id="create-gemfile">Create Gemfile</h2>
<p>Create a file <code class="language-plaintext highlighter-rouge">Gemfile</code> in the root directory. <code class="language-plaintext highlighter-rouge">Gemfile</code> is used to specify which
gems your Jekyll site uses.</p>
<p><strong>Gemfile</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>source 'https://rubygems.org'
gem 'minima'
</code></pre></div></div>
<h2 id="create-jekyll-config-file-and-add-theme">Create Jekyll config file and add theme</h2>
<p>You also need to set the theme in Jekyll’s configuration file. Jekyll reads
configuration from a file named <code class="language-plaintext highlighter-rouge">_config.yml</code> in your site’s root directory.
Create <code class="language-plaintext highlighter-rouge">_config.yml</code> with the following contents:</p>
<p><strong>_config.yml</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>theme: minima
</code></pre></div></div>
<p>After making any changes to <code class="language-plaintext highlighter-rouge">_config.yml</code>, you need to restart <code class="language-plaintext highlighter-rouge">jekyll serve</code>
for Jekyll to pickup configuration changes. Even after restarting
<code class="language-plaintext highlighter-rouge">jekyll serve</code>, you will notice no difference in your site’s rendering. You
will fix this next.</p>
<h2 id="update-indexhtml-to-use-a-layout">Update index.html to use a layout</h2>
<p>In the previous section, you saw that the text of your index page rendered without
any styling from the theme. This is happening because Jekyll is treating your
<code class="language-plaintext highlighter-rouge">index.html</code> as a regular html file. You can tell Jekyll to use the theme’s <code class="language-plaintext highlighter-rouge">home</code>
layout by adding the following to the top of your <code class="language-plaintext highlighter-rouge">index.html</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>---
layout: home
---
</code></pre></div></div>
<p>This is called the front matter. Jekyll does additional processing on any file
containing a front matter.</p>
<p>Minima theme provides a <code class="language-plaintext highlighter-rouge">home</code> layout which is most suitable for a site’s index
page. Among other things <code class="language-plaintext highlighter-rouge">home</code> layout adds a list of recent posts to the home
page.</p>
<p>Try in browser. The index page of the site now renders in theme.</p>
<h2 id="add-site-title-to-config">Add site title to config</h2>
<p>With the site rendered in theme, it looks really good. Like most sites, you want
your site to have a title too. The <code class="language-plaintext highlighter-rouge">minima</code> theme uses <code class="language-plaintext highlighter-rouge">title</code> variable’s value
(if available) as title of your site. You can set <code class="language-plaintext highlighter-rouge">title</code> variable by adding the
line <code class="language-plaintext highlighter-rouge">title: MyAwesomeBlog</code> to <code class="language-plaintext highlighter-rouge">_config.yml</code>. Your <code class="language-plaintext highlighter-rouge">_config.yml</code> should look like
this now:</p>
<p><strong>_config.yml</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>theme: minima
title: MyAwesomeBlog
</code></pre></div></div>
<p>Restart <code class="language-plaintext highlighter-rouge">jekyll serve</code> and refresh browser. You will notice that the value for
title that you provided in <code class="language-plaintext highlighter-rouge">_config.yml</code> now becomes the title of your site.</p>
<h2 id="create-about-page">Create about page</h2>
<p>Create an About page by creating a <code class="language-plaintext highlighter-rouge">about.md</code> file in the site’s root directory.</p>
<p><strong>about.md</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>---
layout: page
title: About
---
# About me
This page will contain information about me.
</code></pre></div></div>
<p>This file uses the <code class="language-plaintext highlighter-rouge">page</code> layout provided by the theme.</p>
<p>Try in browser. About link shows up in top bar. Jekyll automatically adds
any html or markdown files that are in your root directory to navigation bar,
using value of the variable <code class="language-plaintext highlighter-rouge">title</code> from the page’s front matter as link text.</p>
<h2 id="create-projects-page">Create projects page</h2>
<p>Add another page to your site. Create <code class="language-plaintext highlighter-rouge">projects.md</code> in the site’s root directory.</p>
<p><strong>projects.md</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>---
layout: page
title: Projects
---
# Projects
Projects will be listed here.
</code></pre></div></div>
<p>Try in browser. The navigation bar now shows Projects link too. Clicking on projects
takes you to projects page.</p>
<h2 id="ordering-navigation-items">Ordering navigation items</h2>
<p>With about and project pages added, the site is in good shape now. Suppose you want
modify the order of items in the navigation bar with <code class="language-plaintext highlighter-rouge">About</code> appearing to the right
of <code class="language-plaintext highlighter-rouge">Projects</code>.</p>
<p>All top-level pages are added to navigation bar in alphabetical order. Reordering
navigation items is easily done by using <code class="language-plaintext highlighter-rouge">header_pages</code> configuration setting. Add
<code class="language-plaintext highlighter-rouge">header_pages</code> to configuration and set its value to a list of pages in the order
you wish them to appear.</p>
<p><strong>_config.yml</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>theme: minima
title: MyAwesomeBlog
header_pages:
- projects.md
- about.md
</code></pre></div></div>
<p>Try in browser. The <code class="language-plaintext highlighter-rouge">About</code> and <code class="language-plaintext highlighter-rouge">Project</code> items now appear in your preferred order.</p>
<h2 id="add-a-blog-post">Add a blog post</h2>
<p>Creating a blog post is as simple as creating a directory and a file within that
directory.</p>
<p>Create a folder called <code class="language-plaintext highlighter-rouge">_posts</code> in the root directory of your site. Create a
markdown (or html) file with year, month and day prefixed to the filename.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir _posts
$ cd _posts
</code></pre></div></div>
<p>Create a markdown file with year, month and day prefixed to the filename:</p>
<p><strong>2018-09-12-my-first-post.md</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>---
layout: post
---
This is the contents of this blog post.
</code></pre></div></div>
<p>Notice that this file contains Jekyll front matter and sets the layout to <code class="language-plaintext highlighter-rouge">post</code>,
which is another layout provided by <code class="language-plaintext highlighter-rouge">minima</code> theme.</p>
<p>Try in browser. The site lists the post you just added. Clicking on the post title
takes you to the post. Notice that the hyphen separated text portion of the file name
becomes the title of the post. Also notice that a link to RSS feed is added.</p>
<h2 id="adding-author-name-to-post">Adding author name to post</h2>
<p>In the blog post, you will notice that there is no author name being displayed.
Jekyll minima theme supports author name setting. It just needs <code class="language-plaintext highlighter-rouge">author</code> variable
to have a value. You can set <code class="language-plaintext highlighter-rouge">author</code> variable in the front matter of your post.
Since it is likely that all posts on your site are written just by you, it is
simpler to set <code class="language-plaintext highlighter-rouge">author</code> once in <code class="language-plaintext highlighter-rouge">_config.yml</code>.</p>
<p>Add the following to <code class="language-plaintext highlighter-rouge">_config.yml</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>defaults:
- scope:
path: ""
values:
author: "Blog Author"
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">defaults</code> is special Jekyll setting that allows you to set front matter defaults.
<code class="language-plaintext highlighter-rouge">path</code> under <code class="language-plaintext highlighter-rouge">scope</code> specifies which files this rule applies to. A blank <code class="language-plaintext highlighter-rouge">path</code>
means the rule applies to all files in the site.</p>
<p>Restart <code class="language-plaintext highlighter-rouge">jekyll serve</code> and refresh browser page. The post now displays author name.</p>
<h2 id="deploying-to-github-pages">Deploying to GitHub Pages</h2>
<p>This section describes how to host your site on GitHub Pages. GitHub allows you
to host one user-level site on github pages. The github pages site for your github
account should be created in a repository with the name <code class="language-plaintext highlighter-rouge">username.github.io</code>,
where <code class="language-plaintext highlighter-rouge">username</code> is your GitHub username.</p>
<ul>
<li>create a repository on GitHub with the name <code class="language-plaintext highlighter-rouge">username.github.io</code>.</li>
<li>add the <code class="language-plaintext highlighter-rouge">github-pages</code> gem to Gemfile. This is a gem provided by GitHub to manage
Jekyll and its dependencies. Read
<a href="https://jekyllrb.com/docs/github-pages/#the-github-pages-gem">this</a> for
more details.<br />
<strong>Gemfile</strong>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>source 'https://rubygems.org'
gem 'minima'
gem "github-pages", group: :jekyll_plugins
</code></pre></div> </div>
</li>
<li>commit and push to your repository.</li>
<li>after a couple of minutes you can point your browser to <code class="language-plaintext highlighter-rouge">http://username.github.io</code> and
your should see your site.</li>
</ul>
<h2 id="using-a-custom-domain">Using a custom domain</h2>
<p>You can set a custom domain for your site you just deployed as follows:</p>
<ul>
<li>purchase a domain name using service of your preference.</li>
<li>in the root directory of your blog site, create a file <code class="language-plaintext highlighter-rouge">CNAME</code>.</li>
<li>add the domain name as file’s contents.<br />
<strong>CNAME</strong>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>myawesomedomain.com
</code></pre></div> </div>
</li>
<li>commit and push the changes to your gitub repository.</li>
<li>to connect the domain name to your site, you need to update <code class="language-plaintext highlighter-rouge">ALIAS</code>, <code class="language-plaintext highlighter-rouge">A</code> or
<code class="language-plaintext highlighter-rouge">ANAME</code> records with your domain registrar.</li>
<li>for example, GoDaddy uses <code class="language-plaintext highlighter-rouge">A</code> records. If you registered your domain using
GoDaddy, you can use IP addresses listed in <a href="https://help.github.com/articles/setting-up-an-apex-domain/#configuring-a-records-with-your-dns-provider">this</a> article to set <code class="language-plaintext highlighter-rouge">A</code> records.</li>
<li>set the <code class="language-plaintext highlighter-rouge">www</code> subdomain to redirect to <code class="language-plaintext highlighter-rouge">myawesomedomain.com</code> by adding a <code class="language-plaintext highlighter-rouge">CNAME</code>
record with your domain registrar. This is not to be confused with the <code class="language-plaintext highlighter-rouge">CNAME</code>
file that you created earlier.</li>
<li>if GoDaddy is your domain registrar, no action needs to be taken. GoDaddy
automatically sets the <code class="language-plaintext highlighter-rouge">CNAME</code> record.</li>
</ul>
<h2 id="add-disqus-commenting">Add Disqus commenting</h2>
<p>Comments are essential component of any site. <code class="language-plaintext highlighter-rouge">minima</code> supports Disqus commenting
system. Comments can be enabled for your posts by setting a configuration
parameter. There are steps to add comments to your site:</p>
<ul>
<li>sign-up for Disqus Basic account.</li>
<li>on Disqus, add your site as organization (you will use <code class="language-plaintext highlighter-rouge">myawesomedomain.com</code>).</li>
<li>in your site’s <code class="language-plaintext highlighter-rouge">_config.yml</code>, enable Disqus commenting by adding the following:<br />
<strong>_config.yml</strong>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>disqus:
shortname: <site-shortname-from-disqus>
</code></pre></div> </div>
</li>
<li>commit and push changes to your github repository.</li>
</ul>
<p>After github pages regenerates the site in a few minutes, navigate to
<code class="language-plaintext highlighter-rouge">myawesomedomain.com</code> in your browser. You should see Disqus comments
displayed at bottom on your post page. Note that Disqus comments are
not displayed when running the site locally using <code class="language-plaintext highlighter-rouge">jekyll serve</code>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This post described how you can deploy your personal blog to GitHub Pages
hosted site. We used Jekyll site generator since that is the technology
that GitHub Pages uses internally. We saw how easy and quick it was to get
a basic site up and running. Creating a post was equally straight-forward.
Finally, we applied a custom domain to our site.</p>Kalyan VedalaI recently decided to start a blog. I had used Wordpress in the past, so I knew I could get my blog up and running quickly using Wordpress. I was also slightly familiar with Jekyll. Doing a google search and reading a few blog posts educated me on benefits of Jekyll and static sites in general. I explored Jekyll a little more and loved it immediately.