Baltimore City Parking Tickets Revenue Statistics with Pandas

This tutorial will teach you how to use Python to analyze millions of Baltimore City parking tickets.  More specifically, we will be using a Python package called pandas.

It is not humanly possible to comprehend a dataset with millions of rows.  That’s why we will use the data analysis tool pandas to help us gain some insights into our dataset.  If you have never used pandas before, don’t sweat it. I’ll walk you through all the fundamentals that you need to start crunching your own datasets.

Baltimore City parking tickets data analysis with Python and pandas

Baltimore City Parking Tickets Data

Below is a preview of the data that we will be working with in this tutorial.  I created this dataset based on publicly available data through Baltimore’s Open Data website.  The dataset used in this tutorial can be downloaded as a CSV file from the Export tab of this page, or alternatively via the dropdown menu of the table below.  Note that the CSV file is a few hundred megabytes in size.

I stumbled upon this dataset after I got my very own Baltimore City parking citation.  I used to crunch large financial datasets on a daily basis as a software engineer at a hedge fund, so I figured that I’d do an analysis on parking tickets in Baltimore and see how much money the city was pulling in from these fines.

Getting Setup with Python and Pandas

I’ll assume that you have a recent version of Python installed on your machine, but if you don’t, you can learn how to install virtualenv with Python 3.

The first thing you’ll want to do is install the pandas package. We’ll do that with the pip package manager.  In a terminal, type the following.

pip install pandas

That’s it! You now have pandas installed.

If you haven’t already done so, download the Baltimore_City_Parking_Tickets.csv file from the Export tab.  Place this file in your home directory.

Finally, let’s hop into a Python interpreter in your terminal from your home directory. Simply type python into your terminal window.

python

Reading a CSV File with Pandas

The pandas Python package makes it super easy to work with delimited datasets such as CSV files.  To read our CSV file, we will use the read_csv function.

import pandas as pd
df = pd.read_csv('Baltimore_City_Parking_Tickets.csv')

We now have a variable called df with a chronological integer index that holds over 1 million rows and 7 columns of parking tickets from Baltimore City.  We called our variable df which is short for dataframe.  This is common practice when working with pandas.  You can think of a dataframe as an Excel-like spreadsheet object.

You can see a preview of the data by typing the name of the df variable into your Python interpreter.  Examine your dataframe by accessing some of these properties and methods.

df
df.columns
df.head()
df.tail()

This is a very simple way to quickly read a CSV file, but this is a very dumb way to read a CSV file.  Right now, every cell in our dataframe is an object.  In other words, Python doesn’t know that our dates are actually dates or that our fines are actually numbers. Additionally, we want to use violation date column as our index. Ultimately, we want a sorted DateTimeIndex in order to do time series analysis.  Let’s reread our CSV file into memory, but this time, be smarter about it.

index_col = 'ViolDate'
dtype = {'Description':str, 'ViolFine':float, 'Address':str, 'Citation':str, 'Tag':str, 'State':str}
df = pd.read_csv('Baltimore_City_Parking_Tickets.csv', index_col=index_col, dtype=dtype)
df.index = pd.to_datetime(df.index, format='%m/%d/%Y %I:%M:%S %p')
df = df.sort_index()

Now we have a dataframe with a DateTimeIndex, over 1 million rows, and 6 typed columns.  Perfect!

Baltimore Parking Ticket Statistics with Python and Pandas

We are ready to start analyzing this dataset! There’s a lot of insight we can gain from this data. Let’s walk through a few examples.

How Much Money Does Baltimore Make From Parking Tickets?

You’re probably wondering, as was I, how much money Baltimore is pulling in from parking tickets.  We can easily find this answer by using the sum aggregate function.

df['ViolFine'].sum()

As you’ll see, Baltimore City made a staggering $123,391,897 from parking tickets over the course of two years.

How Much Money Does Baltimore Make From Speed Cameras?

Speed cameras seem to be around every corner in Baltimore.  You’re probably in the minority if you live in Baltimore and have never received a speed camera ticket.

All joking aside, to find out how much Baltimore has made from speed camera tickets, we first need to only consider rows with a description of ‘Fixed Speed Camera’.

df['Description'] == 'Fixed Speed Camera'

As you’ll see, this just returns True or False if that row matches ‘Fixed Speed Camera’.  We need to use this boolean series to index into our dataframe and subset only those rows with a value of True.

df[df['Description'] == 'Fixed Speed Camera']

Perfect.  You’ll see right away that Baltimore issued 498,840 speed camera tickets over the course of these two years.  This is already an staggering insight.

Moving on though, let’s use sum like we did before to get the total dollar amount.

df[df['Description'] == 'Fixed Speed Camera']['ViolFine'].sum()

Wow!  Baltimore City issued $19,953,600 in speed camera tickets between October 5, 2016 and October 5, 2018.

It’s worth noting at this point that you don’t have to chain the Python code on a single line. You can also create intermediate variables if that helps you understand these concepts better. The following code is identical to above and will yield the same result.

speed_camera_df = df['Description'] == 'Fixed Speed Camera'
speed_camera_viol_fine_df = df[speed_camera_df]['ViolFine']
speed_camera_viol_fine_df.sum()

How Much Money Does Baltimore Make From Each Type of Ticket?

Now that we know Baltimore City is pulling in nearly $20 million over the course of two years from speed camera tickets alone, how much are they making from other types of tickets?  We can determine this with a single line of code.

df.groupby('Description')['ViolFine'].sum().sort_values(ascending=False)

This might seem complicated, but let’s take it one function at a time from left to right.  First, we use groupby to essentially put all the same violation descriptions into the same bucket.  Then we sum all the violation fines in each bucket.  Finally, we sort by descending order so that the larger fines appear first.

What we end up with is the following table, which shocked me, and will probably shock you too.  Remember that these numbers are over the course of two years… but still.

Baltimore City Tickets by Violation

Fixed Speed Camera$19,953,600
Red Light Violation$11,178,675
All Other Parking Meter Violations$6,549,440
No Stop/Park Street Cleaning$4,782,388
No Stopping/Standing Tow Away Zone$3,664,088
Right on Red$2,914,125
No Stop/Park Handicap$2,671,342
Residential Parking Permit Only$2,536,144
No Stopping/Standing Not Tow-Away Zone$2,331,286
Expired Tags$2,323,840
Abandonded Vehicle$1,515,136
No Parking/Standing In Bus Stop/Bus Lane$1,474,200
Obstruct/Impeding Movement of Pedestrian$971,817
All Other Stopping or Parking Violations$951,106
No Parking/Standing In Transit Stop$829,192
No Stopping//Parking Stadium Event Camden$683,196
Less Than 15 feet from Fire Hydrant$613,230
Commercial Veh/Residence under 20,000 lbs$610,848
Obstruct/Impeding Flow of Traffic$398,433
Obstructing/Imped Traffic Xwalk/inter/school$266,868
Exceeding 48 Hours$201,760
Passenger Loading Zone$199,424
Commercial Veh/Residence over 20,000 lbs$163,150
Commercial Vehicle Obstruct/Imped Traffic Flow$111,132
Fire Lane/Handicapped Violation$86,421
No Parking/Standing In Bike Lanes$49,385
Blocking Garage or Driveway$20,898
No Parking/Stand Motor Home/Campr/Travel Trailer$5,544
No Parking/Standing Vendor Truck$3,514
No Stopping or No Parking Pimlico Event$3,162
Unlawful Dumping/Waste Hauler w/o Permit$2,510
In Taxicab Stand$2,368

The second most popular ticket in Baltimore is the red light violation ticket. I would have to guess that, similar to the speed cameras, these red light violations are the automatic ones that you see fixed to a pole at intersections.

The type of ticket that I got this year was “Obstruct/Impeding Movement of Pedestrian” which brought in nearly a cool million bucks over two years.

What is Baltimore’s Monthly Ticket Revenue?

After knowing how much Baltimore City is making for each type of ticket, let’s find out how much Baltimore City is making on tickets every month.

We can resample the data on a monthly basis (i.e. ‘M’) and aggregate this result with a sum like before. Resampling on a daily basis with ‘D’ and yearly basis with ‘Y’ are also valid arguments.

df['ViolFine'].resample('M').sum()

This yields a dataframe with 25 rows, one for each month between October 5, 2016 and October 5, 2018.  Notice that the first and last row are the relatively smaller because these October months are only partial months.

Baltimore City Monthly Ticket Revenue

October 2016$1,324,135
November 2016$1,435,148
December 2016$1,397,615
January 2017$1,416,006
February 2017$1,368,683
March 2017$1,532,398
April 2017$1,396,936
May 2017$1,560,212
June 2017$1,487,301
July 2017$1,513,177
August 2017$2,826,269
September 2017$2,947,868
October 2017$3,815,125
November 2017$3,800,495
December 2017$3,160,081
January 2018$3,488,668
February 2018$3,147,869
March 2018$3,487,466
April 2018$4,159,604
May 2018$4,532,835
June 2018$4,495,594
July 2018$5,002,747
August 2018$4,992,486
September 2018$3,583,117
October 2018$200,757

You’ll quickly see that Baltimore is making well over $1 million dollars each and every month from tickets and in some cases over $5 million dollars in a single month.

Which Month Did Baltimore City Issue the Most Tickets?

Now that we know how much money Baltimore is making off of ticket for each month, we can easily find out which month saw the most revenue by using max.

df['ViolFine'].resample('M').sum().max()

Baltimore issued a record $5,002,747 of tickets in July 2018.  Mind blowing!

Final Thoughts

I’m sure by now you see the power of the Python package pandas.  We were able to effortlessly comb through millions of rows of Baltimore City parking ticket data and gain many insights that would be very hard to conclude from manually examining the raw data by hand.

As a resident of Baltimore for 10 years, all of the insights that we derived from the data were surprising to me. If nothing else, I hope that this tutorial helped you better understand how to use pandas to do basic data analysis on large datasets.

Let me know in the comments below what your thoughts are about the insights that we gained from the data or if you were able to find out anything else interesting about this Baltimore City parking ticket dataset.

Keep on crunching that data!

About The Author

With a strong software engineering background, Tony is determined to leverage the internet to positively impact as many people as possible. Discover why Tony quit his dream job to pursue this mission. You can send Tony a message here.

Leave a Comment

shares