Friday, August 10, 2018

Reset password for postgres user and load sample database

In this post, I want talk about three things, first, how to create a password for a postgres user. Second, how to load a csv into a sample database and finally how to access this table using python/psycopg package.
Setup password for Postgres user:
In my previous post, I created a user krishna in Postgres but I haven't come across during installation a password setup step. Quick search in StackOverflow showed this approach
$psql
psql (10.4 (Ubuntu 10.4-0ubuntu0.18.04))
Type "help" for help.

krishna=# \password
Enter new password: 
Enter it again: 
krishna=# 
Load csv into a Postgres database.
Found this example of postgres site, it has details of land sales in the UK, going back several decades, and is 3.5GB as of August 2016,click here for the file
-- Executing query:
CREATE TABLE land_registry_price_paid_uk(
  transaction uuid,
  price numeric,
  transfer_date date,
  postcode text,
  property_type char(1),
  newly_built boolean,
  duration char(1),
  paon text,
  saon text,
  street text,
  locality text,
  city text,
  district text,
  county text,
  ppd_category_type char(1),
  record_status char(1));
Query returned successfully with no result in 273 msec.

-- Executing query:
COPY land_registry_price_paid_uk FROM '/home/krishna/Downloads/pp_100k.csv' 
with (format csv, encoding 'win1252', header false, null '', quote '"', force_null (postcode, saon, paon, street, locality, city, district));
Query returned successfully: 100000 rows affected, 1.1 secs execution time.

Install psycopg2 pip install psycopg2
$ pip install psycopg2
Collecting psycopg2
  Downloading https://files.pythonhosted.org/packages/7c/e6/d5161798a5e8900f24216cb730f2c2be5e4758a80d35c8588306831c0c99/psycopg2-2.7.5-cp27-cp27mu-manylinux1_x86_64.whl (2.7MB)
    100% |████████████████████████████████| 2.7MB 316kB/s 
Installing collected packages: psycopg2
Successfully installed psycopg2-2.7.5
Example call from python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/usr/bin/python
import psycopg2
import sys
import pprint
 
def main():
	conn_string = "host='localhost' dbname='oflc' user='krishna' password='*****'"
	# print the connection string we will use to connect
	print "Connecting to database\n	->%s" % (conn_string)
 
	# get a connection, if a connect cannot be made an exception will be raised here
	conn = psycopg2.connect(conn_string)
 
	# conn.cursor will return a cursor object, you can use this cursor to perform queries
	cursor = conn.cursor()
 
	# execute our Query
	cursor.execute("select * from land_registry_price_paid_uk limit 10")
 
	# retrieve the records from the database
	records = cursor.fetchall()
 
	# print out the records using pretty print
	# note that the NAMES of the columns are not shown, instead just indexes.
	# for most people this isn't very useful so we'll show you how to return
	# columns as a dictionary (hash) in the next example.
	pprint.pprint(records)
 
if __name__ == "__main__":
	main()

Tuesday, July 31, 2018

Binary Tree in Python - Part 1

In this post I want to write a little bit about the implementation details of Binary Tree's in my favorite language Python. Before we jump in, conceptually what is a binary tree?  A binary tree is a data structure with a node and each node having at most two child items.
Example binary tree from wikipedia
But wait isn't this a tree?

A tree in real life has trunk, and roots typically in the ground, branches, stems and leaves but in comp science we look at an inverted tree structure, we start with the root and go down to the leaves as shown in the numeric example above. Every circle is called a Node and each Node can have left and right child items.

How to materialize a Node?

In Python, we materialize a Node by defining a class as shown below

class Node(object):
  def __init__(self, data):
    self.data=data
    self.left=None
    self.right=None

Why should we define a class called Node?

Think of this node as our abstract or user defined data type, this is similar to int, float etc. By defining a Node we can leverage the left and right child items which are also of type Node. Remember, everything in Python is an object.  Take for example a variable assignment statement a=1, if you check the type of a we see its of type int, int itself is a class and every class has a constructor so when you create a variable b = int(), this automatically defaults to value zero.

a=1
print type(a)
#

b=int()
print b
#0 

c=int(1)
print c
#1

Going back to our Node class, we can instantiate objects of type Node this way

root=Node(5)
print root.data, root.left, root.right
# 5 None None
We can manually create a Binary tree structure by assigning left and right child items of root node with a variable of type Node
root=Node(5)
print root.data, root.left, root.right
# 5 None None
n4=Node(4)
print n4.data,n4.left,n4.right
# 4 None None 
n3=Node(3)
print n3.data,n3.left,n3.right
# 3 None None
root.left=n3
root.right=n4
print root.data, root.left.data, root.right.data
# 5 3 4
If we examine the type of root.left or root.right we see its an object of class '__main__.Node'. But why should we assign root.left=n3, why can't we say root.left=3? The latter assignment although assigns value three to root's left child its of type int and cannot attach any other Nodes to root.left. Also if you notice, root.left.data and n3.data are same. Originally root.left was None, later we assigned root.left = n3 so root.left is of type Node and hence we can access the data attribute through the dot notation.
print root.left.data, n3.data 
#3 3 
In part-2, I will cover how to automatically insert a new Node and other interesting methods and some common interview questions around binary tree

Saturday, July 28, 2018

Setup/Install Postgres 10 in Ubuntu

Just to make sure we have a clean install, check and purge any existing  postgres tools.

krishna@dev:~$ dpkg -l | grep postgres
ii  postgresql-client                          10+190                                     all          front-end programs for PostgreSQL (supported version)
ii  postgresql-client-10                       10.4-0ubuntu0.18.04                        amd64        front-end programs for PostgreSQL 10
ii  postgresql-client-common                   190                                        all          manager for multiple PostgreSQL client versions

Run purge

krishna@dev:~$ sudo apt-get --purge remove postgresql-client postgresql-client-10 postgresql-client-common
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be REMOVED:
  postgresql-client* postgresql-client-10* postgresql-client-common*
0 upgraded, 0 newly installed, 3 to remove and 2 not upgraded.
After this operation, 3,423 kB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 203926 files and directories currently installed.)
Removing postgresql-client (10+190) ...
Removing postgresql-client-10 (10.4-0ubuntu0.18.04) ...
Removing postgresql-client-common (190) ...
Processing triggers for man-db (2.8.3-2) ...
(Reading database ... 203671 files and directories currently installed.)
Purging configuration files for postgresql-client-common (190) ...

Install postgres10 using apt-get

krishna@dev:~$ sudo apt-get install postgresql-10
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  postgresql-client-10 postgresql-client-common postgresql-common sysstat
Suggested packages:
  locales-all postgresql-doc-10 isag
The following NEW packages will be installed:
  postgresql-10 postgresql-client-10 postgresql-client-common postgresql-common sysstat
0 upgraded, 5 newly installed, 0 to remove and 2 not upgraded.
Need to get 4,204 kB/5,167 kB of archives.
After this operation, 20.3 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://us.archive.ubuntu.com/ubuntu bionic/main amd64 postgresql-common all 190 [157 kB]
Get:2 http://us.archive.ubuntu.com/ubuntu bionic-updates/main amd64 postgresql-10 amd64 10.4-0ubuntu0.18.04 [3,752 kB]
Get:3 http://us.archive.ubuntu.com/ubuntu bionic/main amd64 sysstat amd64 11.6.1-1 [295 kB]
Fetched 4,204 kB in 3s (1,497 kB/s)
Preconfiguring packages ...
Selecting previously unselected package postgresql-client-common.
(Reading database ... 203669 files and directories currently installed.)
Preparing to unpack .../postgresql-client-common_190_all.deb ...
Unpacking postgresql-client-common (190) ...
Selecting previously unselected package postgresql-client-10.
Preparing to unpack .../postgresql-client-10_10.4-0ubuntu0.18.04_amd64.deb ...
Unpacking postgresql-client-10 (10.4-0ubuntu0.18.04) ...
Selecting previously unselected package postgresql-common.
Preparing to unpack .../postgresql-common_190_all.deb ...
Adding 'diversion of /usr/bin/pg_config to /usr/bin/pg_config.libpq-dev by postgresql-common'
Unpacking postgresql-common (190) ...
Selecting previously unselected package postgresql-10.
Preparing to unpack .../postgresql-10_10.4-0ubuntu0.18.04_amd64.deb ...
Unpacking postgresql-10 (10.4-0ubuntu0.18.04) ...
Selecting previously unselected package sysstat.
Preparing to unpack .../sysstat_11.6.1-1_amd64.deb ...
Unpacking sysstat (11.6.1-1) ...
Setting up sysstat (11.6.1-1) ...

Creating config file /etc/default/sysstat with new version
update-alternatives: using /usr/bin/sar.sysstat to provide /usr/bin/sar (sar) in auto mode
Created symlink /etc/systemd/system/multi-user.target.wants/sysstat.service → /lib/systemd/system/sysstat.service.
Processing triggers for ureadahead (0.100.0-20) ...
Setting up postgresql-client-common (190) ...
Processing triggers for systemd (237-3ubuntu10.3) ...
Setting up postgresql-common (190) ...
Adding user postgres to group ssl-cert

Creating config file /etc/postgresql-common/createcluster.conf with new version
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
  en_us
Removing obsolete dictionary files:
Created symlink /etc/systemd/system/multi-user.target.wants/postgresql.service → /lib/systemd/system/postgresql.service.
Processing triggers for man-db (2.8.3-2) ...
Setting up postgresql-client-10 (10.4-0ubuntu0.18.04) ...
update-alternatives: using /usr/share/postgresql/10/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-10 (10.4-0ubuntu0.18.04) ...
Creating new PostgreSQL cluster 10/main ...
/usr/lib/postgresql/10/bin/initdb -D /var/lib/postgresql/10/main --auth-local peer --auth-host md5
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/10/main ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    /usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile start

Ver Cluster Port Status Owner    Data directory              Log file
10  main    5432 down   postgres /var/lib/postgresql/10/main /var/log/postgresql/postgresql-10-main.log
update-alternatives: using /usr/share/postgresql/10/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode
Processing triggers for systemd (237-3ubuntu10.3) ...
Processing triggers for ureadahead (0.100.0-20) ...

Postgres is installed at this point

Data file is stored here/var/lib/postgresql/10/main
Transaction logs are here: /var/log/postgresql/postgresql-10-main.log
Postgres application userid and group:
id postgres

uid=126(postgres) gid=133(postgres) groups=133(postgres),117(ssl-cert)
Postgres is listening on port: 5432
Database encoding is set to "UTF8" and cluster locale is "en_US.UTF-8"

Start script is 
/usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile start


For some reason, when I kick off the instance I kept getting this error 

postgres@dev:~$ /usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start.... stopped waiting
pg_ctl: could not start server

Examine the log output.

Ended up restarting the server 

postgres@dev:~$ /usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile status
pg_ctl: no server running
postgres@dev:~$ /usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile restart
pg_ctl: PID file "/var/lib/postgresql/10/main/postmaster.pid" does not exist
Is server running?
starting server anyway
waiting for server to start.... done
server started
postgres@dev:~$ /usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile status
pg_ctl: server is running (PID: 15574)
/usr/lib/postgresql/10/bin/postgres "-D" "/var/lib/postgresql/10/main" "-c" "config_file=/etc/postgresql/10/main/postgresql.conf"

let's create a role/user, so you are not using the application user id 


postgres@dev:~$ psql

psql (10.4 (Ubuntu 10.4-0ubuntu0.18.04))

Type "help" for help.

postgres=# CREATE ROLE krishna SUPERUSER LOGIN REPLICATION CREATEDB CREATEROLE;
CREATE ROLE
postgres=# CREATE DATABASE krishna OWNER krishna;
CREATE DATABASE
postgres=# \q
postgres@dev:~$ exit
logout

Create database

krishna=# create database oflc;
CREATE DATABASE

List all available databases

krishna=# \l 
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 krishna   | krishna  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 oflc      | krishna  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(5 rows)

Connect to oflc database and create test table 

krishna=# \c oflc
You are now connected to database "oflc" as user "krishna".

oflc=# create table test (c1 int);
CREATE TABLE

oflc=# insert into test values (1);
INSERT 0 1

oflc=# select * from test;
 c1 
----
  1
(1 row)

oflc=# drop table test;

DROP TABLE

Other useful commands

oflc=# \c
You are now connected to database "oflc" as user "krishna".
oflc=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 krishna   | krishna  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 oflc      | krishna  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(5 rows)

oflc=# \conninfo
You are connected to database "oflc" as user "krishna" via socket in "/var/run/postgresql" at port "5432".
oflc=# select * from information_schema.tables where table_schema='public';
oflc=# \q

Finally, shut down server

postgres@dev:~$ /usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile stop
waiting for server to shut down.... done
server stopped

 To make life easier, wrap start|stop commands into a shell script 

 cat postgres_instance_manager.sh 
#!/bin/bash
#set -x

pg_start() {
sudo su - postgres -c '/usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile start'
}

pg_status(){
sudo su - postgres -c '/usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile status' 
}

pg_restart(){
sudo su - postgres -c '/usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile restart'
}

pg_stop(){
sudo su - postgres -c '/usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile restart'
}



while getopts ":srtk" OPTION; do
echo ${OPTION}
    case ${OPTION} in
        s)
                pg_start
                ;;
        r)
                pg_restart
                ;;
t)
pg_status
;;
k)
pg_stop
;;
\?)
                echo "Usage: postgres_instance_manager.sh [-s | START] [-r | RESTART] [-t | STATUS] [-k | STOP]"
                exit 1
                ;;
    esac
done