Data Modeling in Python

@PhilVacca

Data in Applications

  • Flat Files
  • Key / Value
  • Document Store
  • Distributed (Map/Reduce)
  • Relational Database

About This Talk

Concerned with relational data &
popular ways to model data in Python

Will not talk about web app design

Examples will use PostgreSQL

. . . And you should, too  ;)

Traditional Data Manipulation with SQL

"I was watching my friends have this argument about manipulating… data, and how many… datas could one guy manipulate at once"

Accessing the database with inline SQL

pyodbc to make connection

DB-API PEP-249

  • sqllite3
  • psycopg2
  • mysql-python

import psycopg2 as pg

host="192.168.99.100"
port="5432"
user="postgres"
dbname="postgres"

conn=pg.connect("host={} port={} user={} dbname={}".format(host, port, user, dbname))


cur = conn.cursor()

cur.execute("SELECT relnamespace::regnamespace, count(*) from pg_catalog.pg_class GROUP BY relnamespace;")
print(cur.fetchall())

some_parameter="pg_type"
cur.execute("SELECT relname from pg_catalog.pg_class WHERE relname = %s;", (some_parameter, ))
print(cur.fetchall())

conn.commit()

cur.close()
conn.close()


# [('pg_catalog', 208L), ('pg_toast', 36L), ('information_schema', 67L)]
[('pg_type',)]

Limitations

  • An implicit transaction is always in progress
  • Data type mappings vary by adaptor
  • Parameter mapping varies by adaptor
  • DB-API has own exception hierarchy
  • Not every adaptor even implements DB-API
  • Encourages string munging to create SQL statements

Enter the ORM

Object Relational Mapper

Conceptually borrowed from OOP

But you can use with other styles

Consistent typing within code

No change in code thinking (SQL is Set Based, Declarative)

So everybody loves ORMs, right?

Object/Relational Mapping is the Vietnam of Computer Science. It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy.

Common ORM Disadvantages

Still have to drop into SQL to handle complicated things

Whitewashes all DB platforms to make them the "same"

Impedence mismatch

Maybe the people who say those things just haven't found the right one. . .

Popular Python ORMs

  • SQLObject
  • SQLAlchemy
  • Django ORM
  • PeeWee
  • PonyORM

ORM Types

Martin Fowler lists two types in Patterns of Enterprise Application Architecture (2003)

  • Active Record pattern
  • Data Mapper pattern

Active Record

The Django ORM uses an Active Record pattern

The ORM creates a 1:1 mapping of the relational data

Each object represents one distinct row of data

Data Mapper

SQLAlchemy is a Data Mapper

The ORM creates a generic layer that maps procedures to data

Each object represents a unit of work on a modelled data object (which may or may not correspond to a row of data)

Django ORM

Implements Active Record pattern

Written as part of Django framework (ca. 2005)

Known for ease of use, short learning curve

Simplifies writing easy queries (INSERT / UPDATE / DELETE)

Takes a similar amount of code relative to SQL for writing medium difficulty queries (GROUP BY, window functions)

But at least you get to write Python

Cannot feasibly handle complicated queries (GROUPING SETS, DISTINCT ON, aggregated sub-queries, CTE's, etc)

Other Limitations

Alex Gaynor has complaints

And he wrote the thing

Does not use same syntax for simple things like field names

  • filter(field_name=value)   Aggregate("field_name")
  • Q(field_name=value)        F("field_name")

Should be used to replace common queries, not to replace SQL


And while we're at it. . .

No support for composite keys

Cannot name automatically created objects like indexes

Primary keys are always serial integer

SELECT e.event_type
  , sum(e.end_time - e.start_time)
FROM event as e
GROUP BY e.event_type
;


Event.objects.values(
	"event_type"
).annotate(
	Sum(F("start_time") - F("end_time"))
)

THIS DOES NOT WORK

Aggregates take string (not expressions)

SQLAlchemy

Data Mapper pattern

Written by Michael Bayer @Zzzeek, started in 2006

Framework agnostic

Design focus on using different db's in a consistent way

. . . While still exposing the unique functionality of different platforms

PostgreSQL != MS SQL Server != MySQL != Oracle

Bayer calls this "Hand Coded" application design

More About SQLAlchemy

Simple queries are still simple!

Mildly complex and more difficult queries are feasible, and relatively simple!

Really tricky stuff is still possible

Heavier learning curve, because it assumes you understand the database

That aggregate query in SQLAlchemy


session.query(
	TimelineEvent.event_type,
	func.sum(TimelineEvent.end_time - TimelineEvent.start_time)
).group_by(
	TimelineEvent.event_type
).all()

This WORKS!

Data Modeling with SQLAlchemy

The engine and metadata


from sqlalchemy import create_engine, MetaData

engine = create_engine("postgresql://postgres@192.168.99.100/postgres")
metadata = MetaData()
from sqlalchemy import Table, Column, Integer, String

users = Table('user', metadata
	, Column('user_id', Integer, primary_key=True)
	, Column('user_name', String(50), unique=True, nullable=False)
)

metadata.create_all(engine)

The Declarative Model

from sqlalchemy import (
	declarative_base
	, Column, String, Integer
	, ForeignKey, UniqueConstraint
	)

SQLBase = declarative_base()
	
class hacker(SQLBase):
	__tablename__ = 'hacker'
	id = Column('hacker_id', Integer, primary_key=True)
	handle = Column(String(50), nullable=False, unique=True)

class coding_language(SQLBase):
	__tablename__ = 'coding_language'
	id = Column('language_id', Integer, primary_key=True)
	language = Column(String(50), nullable=False, unique=True)

class hacker_language(SQLBase):
	id = Column('hacker_language_id', Integer, primary_key=True)
	hacker = Column('hacker_id', Integer, ForeignKey('hacker.hacker_id'), nullable=False)
	language = Column('language_id', Integer, ForeignKey('coding_language.language_id'), nullable=False)
	UniqueConstraint('hacker_id', 'language_id', 'uq_hacker_language')

SQLBase.metadata.create_all(engine)
from sqlalchemy import sessionmaker
S = sessionmaker(bind=engine)
session = S()

hl = hacker_language
hl.hacker = 'Neo'
hl.language = 'Python'

try:
	session.add(hl)
except:
	session.rollback()

Reflection

SQLAlchemy can build models from the database

meta = MetaData()
meta.reflect(bind=engine)
hackerz = meta.tables['hacker']

Other concerns

  • Roles and object permissions
  • Many to one and many to many relationships
  • Importing bulk data (csv, excel) - Simple!
  • Database migrations - Feasible!
  • Ad hoc queries

Thank you Python MKE



This presentation was built in reveal.js