Accessing Data with Queries
What is SQL?
So if SQL is nothing without a database, what is an SQL database?
We will be using the U.S. Census Bureau’s Small Area Income and Poverty Estimates (SAIPE) dataset.
It provides annual estimates of income and poverty statistics for all school districts, counties, and states.
It’s been documented by our RCA in CA 🔥!
Yes, it is only one table… we will start small.
Let’s start by using the saipe
table.
If we want to select all the columns in a given table, we use the SELECT
and FROM
clauses in conjunction with the *
wildcard.
Tip
If we want a single column, we can type the column name instead of the wildcard.
It’s super easy! In any SQL script, you can:
--
/*
and */
:Now that you know, let’s go back and comment our code.
Sometimes you don’t want to see all the results. For example, if you are exploring a database and you just want to see the top n rows to acquaint yourself with a table. This is a great practice when working with large databases.
In such cases, use the LIMIT
clause.
Note
Thanks to a common set of commands, SQL code written for any database will look familiar. However, variations in SQL from database to database exist and are often termed dialects. This means that some commands might be different in Oracle SQL, MSSQL, or other dialects. For example, in MSSQL we’d write the prior command as:
If we want only the unique values so that we can quickly see what states have been sampled we use DISTINCT
.
We can also do calculations with the values in a query.
For example, suppose that we want calculate the estimate number of households in poverty without small children (ages 0-4):
SELECT poverty_estimate_all_ages, poverty_estimate_age_0_4, poverty_estimate_all_ages - poverty_estimate_age_0_4
FROM saipe;
When we run the query, the expression poverty_estimate_all_ages - poverty_estimate_age_0_4
is evaluated for each row and appended to that row, in a new column with our resulting values.
Expressions can use any fields, any arithmetic operators (+
, -
, *
, and /
) and a variety of built-in functions. For example:
Write a query that returns the postal code, county FIPS, name, and difference between estimates for all ages and for people ages 0 to 17.
You got time ⏱️, but not too much.
Databases can also filter data – selecting only the data meeting certain criteria.
For example, let’s say we only want data from the state of California.
We can also filter numbers. For example, assume that we only want observations with poverty estimates for all ages greater than 1000:
Write a query that returns the state, county FIPS, name, and the estimated poverty poverty for ages. Filter these data to only include observations for California counties (excluding other counties, states, and national observations).
We can also sort the results of our queries by using ORDER BY
.
Go back to basic query and sort by poverty estimates for all ages:
The keyword ASC
tells us to order it in ascending order.
We could alternately use DESC
to get descending order.
Write a query that returns the state, county FIPS, name, the estimated poverty poverty for ages, and the estimated poverty for ages 0-17. Sort with the largest estimates at the top.
You got time ⏱️, but not too much.
Another note for ordering. We don’t actually have to display a column to sort by it.
For example, let’s say we want to sort counties in California by their estimate poverty for all ages, but we only want to see the county name.
We can do this because sorting occurs earlier in the computational pipeline than field selection. The computer is proceeding as follows:
🚨 Clauses are written in a fixed order: SELECT
, FROM
, WHERE
, then ORDER BY
🚨.
AND
or OR
to create more complex queries.This presentation was largely adapted from the wonderful course “Data Management with SQL for Ecologist” by datacarpentry.org and Cathy Tanimura’s “SQL for Data Analysis” book.
For Internal Use Only
✍️ Comment, comment, comment
One day, you will get up from your desk and (hopefully) go outside 🌄. When you come back, you may have forgotten your trail of thought…
This is how mistakes are made 😭.
We don’t like mistakes 😠.
Mistake are inevitable, but we can work to minimize them 🤞.
Commenting your code is a valuable practice, whether working alone or as part of a team, because it: