From Spreadsheets to Databases¶
Let's look at a scenario where you might want to use a spreadsheet instead of a database. More about what a database really is will become apparent during this dicussion, we hope. 👩💻
Databases vs Spreadsheets¶
Similarities¶
There are many similarities between databases and spreadsheets:
- Spreadsheets contain data that is carefully organized.
- Spreadsheets organize that data into rows and columns.
- Spreadsheets columns can have types. They can be formatted.
- Spreadsheets can have multiple tabs. These would be called tables in a database environment. Each table defines its own rows and columns.
- The rows in each table can be sorted by the values of certain columns.
- The rows of each table can be filtered by the values of certain columns.
- There is an ability to associate the values of certain columns/rows in a table with another table. This is often done by a lookup function in the spreadsheet. These lookups or associations would be referred to as foreign keys in a database.
Differences¶
Wow, that is a lot in common. Why even bother creating a database? Well, here are the differences between a spreadsheet and a database:
- Databases have rigid constraints (that you define) that cannot be violated. For example, you can define a constraint that specifies that every entry in the column must be unique (cannot have duplicates).
Click to read more about constraints
This is useful to create universal keys that are used to indicate unique records. For example, a person may have the same name, same birth date, and even the same birth place as another person.John Smith; June 5th, 1995; Milwaukee
could be such a person. However, each person is unique and specifying some king of randomized number that identifies them is often very useful. This unique key that makes each row in a table unique is called a Primary Key. -
Databases are self describing. A database's structure is very explicit and can be queried. This description of what the columns of each table contain is called the database's schema.
Click to read more about schemas
Schemas are very rigid (by design) and will not let any data be placed in any column of a table that does not match the type assigned to that column, or violates any other constraint placed on that column. For example, if a column is defined to contain a date format only legal dates can be placed in that column. -
Databases have users who are given certain permissions to create, read, update, and delete (CRUD) data in database tables and modify database schemas. SQLite is a one-user, all permissive database. Permissions can be handled in other ways in applications that use SQLite as a back-end.
Click to read more about permissions
A database root user will assign these roles to other named users. These permissions can be very granular in nature. For example, a person may be able to read data but not to modify data in a table. Another person might be able to modify only certain tables, or to modify only certain columns in a table. Other people may not be able to view table data that contains sensitive information. NOTE: SQLite will not help you solve the user permission problem. PostGres and other databases have permissions associated with users. The pictures shown below are examples of what a database with permissions is capable of doing. You can still use SQLite for an outward facing application, but the user permissions must be enforced by the application. -
Databases are usually inherently multi-user. That is, one person can be modifying one table, one modifying another table, and on viewing data from a third table all at the same time. This implies the necessity for the most important feature of databases: transactional integrity.
-
Databases possess a property known as transactional integrity. Let's say that you have multiple changes that you want to happen at the same time, for example, a user's name needs to change and a new address needs to be added for them. The name and address changes either both need to happen, or they both need to fail (and neither change happens). What we don't want to happen is that the address gets entered but the name fails to be changed, or vice versa. A database can wrap a transaction around both of these changes so that they either both happen or neither one does. This assures database integrity.
Info
We have another brief explainer about transactional integrity here.
Modifying our Spreadsheet¶
We will:
- Define a table with Excel (in this case)
Tip
If you are using a different spreadsheet application, like Google Sheets or Numbers on a Mac, you'll follow almost exactly the same process. Some of the menus might be a little different, but all spreadsheets can do the things we describe here.
- Create two seperate tables in Excel to replace the first table
- Export the information from Excel into tables
- Use DB Browser for SQLite to import the tables and confirm the operations
Info
We're using examples that are taken from the Coeur d'Alene Online Language Resource Center. We're grateful to Audra Vincent, Coeur d'Alene Language Programs Manager, for allowing us to use these examples - but please do note that they are only examples and not official language records. Any errors in the examples are the responsibility of the AILT Team.
The first step, defining the original table, is shown in the image above explaining the excel table schema.
root | nicodemus | english | originator |
---|---|---|---|
a | a | hello. | Lawrence Nicodemus |
a | a? | so. (lit. Is that so?), | Lawrence Nicodemus |
a | aaaa...! | cut out, knock off!, quit, stop. (lit. Cut it out!, Knock it off, quit it, Stop it!), | Lawrence Nicodemus |
a | aye | hey. | John Doe |
bc | buts | boots. | John Doe |
bc | etsbutsbutsshn | boots (to be wearing...). (lit. He is wearing boots) | Jane Doe |
bc | sbutsbutsshn | boot. (lit. a borrowed root) | Jane Doe |
bc | sbutsbutsshnmsh | rubber boots (putting on...). | Lawrence Nicodemus |
bl | buuli | bull. | Lawrence Nicodemus |
bm | bam | go (...fast and far), speeded (be...), be versatile. | Lawrence Nicodemus |
bm | bam | intoxicated. | Lawrence Nicodemus |
bm | bambmt | speedy (H/s is...). | Lawrence Nicodemus |
bm | bamp | speeded (He...), tipsy (He became...). | Jack Flash |
bm | sbamp | speeding. | Lawrence Nicodemus |
bm | itsbamp | intoxicated (H/s is being...). (lit. H/s is speeding) | Lawrence Nicodemus |
bm | bem | buzz. | Lawrence Nicodemus |
bm | bm | roar (...of motor or engine). | Lawrence Nicodemus |
bm | isbembish | whirr. (lit. It is...), | Lawrence Nicodemus |
bm | isbembmish | buzz. (lit. It is...) | Lawrence Nicodemus |
bm | isbembmish | hum, humming. (lit. H/s is making humming sounds/purring) | Lawrence Nicodemus |
bm | sbembmish | humming, any humming sound. | Lawrence Nicodemus |
bm | stbembm | bumblebee, hummingbird. | Lawrence Nicodemus |
bm | stbembm | hummingbird. | Lawrence Nicodemus |
bm | ubmmm | humming (continuous...). | Lawrence Nicodemus |
bn | benene | banana. | Lawrence Nicodemus |
bns | bins | bean. (l.w. from Engl.) | Lawrence Nicodemus |
bnwh | Sbiinwahulumkhw | Benewah County. (lit. Benewah land, named after a Coeur dAlene) | Lawrence Nicodemus |
bs | stbiiso | dragonfly. | Unknown |
btlym | beetlyem | Bethlehem. | Unknown |
Specific Problems with the Original Spreadsheet¶
What we have here is a problem. Well, actually, multiple problems.
- First of all, it is easy to misspell someone's name in the originator column and thus push invalid data into the database. Searching for, say, "Nicodemus" will not find an invalid originator "Nicodimus". Ideally, we would like to reference this information from one place so that we can guard against making this mistake.
- Second of all, what if we want to change the name in all of the instances. Maybe "Lawrence Nicodemus" wants to change his name to "Larry Nicodemus" because he is very informal. We have to do multiple work to change this name. It may appear in hundreds of entries. Not only that, but we run into the first problem again, where one or several of these entries are misspelled, or even left with the original name. Now we have more incorrect entries.
- Third of all, the entry in this column may not be unique. There may be more than one "Lawrence Nicodemus" in the world that contributes to entries in this table. Unlikely in this case, but "John Smith" would be a problem.
- Finally, what if we want to associate more information with people than just their name, such as an address or phone number? How do we do this in a systematic way in every table that might contain a name? We might want information associated with a name that shouldn't be shown in other tables, like phone numbers or social security numbers. Also, names aren't guaranteed to be unique, as discussed earlier.
From Specific Problems to General Problems¶
There are at least three fundamental problems you encounter in these three scenarios that databases help you to fix.
- Data Redundancy
Click to read more about data redundancy
The same data, which ideally should be stored in one place, is entered in a bunch of different places. This causes extra work and leads to errors.- Transactional Integrity
Click to read more about transactional integrity
I want to change all of the names from "Lawrence" to "Larry" but I am allowed to change one or two without changing the rest. This can be a huge problem for computers, where partial changes may enter the database and then a network error occurs, or the server running the database shuts down. This can cause partial changes to occur, which is VERY BAD especially if you are VISA and only part of a financial transaction gets entered. What you want are Transactions, which you canBEGIN
and then either COMMIT
or ROLLBACK
the changes. This ensures that everything you want to change/add/delete either happens or it doesn't. The importance of this property of databases cannot be overstated.- Lack of Data Constraints
Click to read more about data constraints
There is no way to really set an integrity constraint in a database, and where it can be done, it is often difficult. The constraint that we want here involves making sure that the name is always the same in all entries for the same person. We really ultimately want this key to be unique, in other words, we want to reference a person'sPRIMARY KEY
.Hidden problems lurk here as well.
- Multi-User Capability -
Click to read more about multi-user capability
This is a somewhat hidden problem of the spreadsheet approach. There is no way to share the workload in changing these entries. The only way is to share the spreadsheet and merge the results. New online spreadsheet sharing via Google Docs is a poor man's substitute for true multi-user access. It is the same problem you have with flat files, except that flat files are even worse, since you have no ability to seperate data into different tables except to put them in different sections of the file. And one person can easily stomp on another's work.Creating a New Spreadsheet¶
The solution involves doing the following two things (in the spreadsheet and eventually the database), and a third (in the database).
Tip
We say things like create a new table in your spreadsheet application. What that actually means is create a new tab or create a new sheet. Each page of a spreadsheet is a big table, so to make a new table you'll need a whole new page.
- Create a new table to hold the user entries, called users. The key is some number. In the database, this number will be assigned and will be unique. In the spreadsheet, care must still be taken to make sure that the number really references a row in another table.
- Create a new table called roots that is identical to the original table except that the originator column entries are replaced with the number of the row in the users table that referenced the original name contained in the table.
- The new roots table (in the database, not the spreadsheet) will contain a
FOREIGN KEY CONSTRAINT
that causes the originator column to be referenced to the users table column id that is thePRIMARY KEY
of the users table. In the spreadsheet, no such constraint exists. We must carefully enter the correct numbers because it is possible to enter an invald number (one that doesn't exist in the other spreadsheet).
The tables (tabs, or sheets) now look like this:
users
id | first | last | username | |
---|---|---|---|---|
1 | colrc@gmail.com | Lawrence | Nicodemus | original |
2 | jdoe@gmail.com | John | Doe | jdoe |
3 | jadoe@gmail.com | Jane | Doe | jadoe |
null | rdoe@gmail.com | Robert | Doe | rdoe |
roots
id | userId | english | nicodemus | root |
---|---|---|---|---|
1 | 1 | hello. | a | a |
null | 1 | so. (lit. Is that so?), | a? | a |
null | 1 | cut out, knock off!, quit, stop. (lit. Cut it out!, Knock it off, quit it, Stop it!), | aaaa...! | a |
2 | 2 | hey. | aye | a |
3 | 2 | boots. | buts | bc |
4 | 3 | boots (to be wearing...). (lit. He is wearing boots) | etsbutsbutsshn | bc |
5 | 3 | boot. (lit. a borrowed root) | sbutsbutsshn | bc |
6 | 1 | rubber boots (putting on...). | sbutsbutsshnmsh | bc |
7 | 1 | bull. | buuli | bl |
8 | 1 | go (...fast and far), speeded (be...), be versatile. | bam | bm |
9 | 1 | intoxicated. | bam | bm |
10 | 1 | speedy (H/s is...). | bambmt | bm |
11 | 4 | speeded (He...), tipsy (He became...). | bamp | bm |
12 | 1 | speeding. | sbamp | bm |
13 | 1 | intoxicated (H/s is being...). (lit. H/s is speeding) | itsbamp | bm |
15 | 1 | buzz. | bem | bm |
16 | 1 | roar (...of motor or engine). | bm | bm |
17 | 1 | whirr. (lit. It is...), | isbembish | bm |
18 | 1 | buzz. (lit. It is...) | isbembmish | bm |
19 | 1 | hum, humming. (lit. H/s is making humming sounds/purring) | isbembmish | bm |
20 | 1 | humming, any humming sound. | sbembmish | bm |
21 | 1 | bumblebee, hummingbird. | stbembm | bm |
22 | 1 | hummingbird. | stbembm | bm |
23 | 1 | humming (continuous...). | ubmmm | bm |
24 | 1 | banana. | benene | bn |
25 | 1 | bean. (l.w. from Engl.) | bins | bns |
26 | 1 | Benewah County. (lit. Benewah land, named after a Coeur dAlene) | Sbiinwahulumkhw | bnwh |
27 | null | dragonfly. | stbiiso | bs |
null | null | Bethlehem. | beetlyem | btlym |
Export the Data Tables From the Spreadsheet¶
Tip
If you are using a different spreadsheet application, like Google Sheets or Numbers on a Mac, you'll follow almost exactly the same process. Some of the menus might be a little different, but all spreadsheets can do the things we describe here.
In Excel:
- Switch to the correct tab for the data you want to export
- Choose File → Save As → CSV (Comma delimited)
- The file will be named something like
Spreadsheet Name(Tab Title)
. In my example this wassample_database(roots)
for the roots tab. This is often placed in a common directory. In my case, on a Windows system, this was the Downloads directory. Note: this directory and naming is typical of a Windows system. - You can open this file to look at it in a text editor such as NotePad (if you're using Windows) or Text Edit (if you're using a Mac). There are many other different text editors or code editors you can choose from (we use a multi-platform code editor called Visual Studio Code or VSCode).
Warning
Do not open and edit the exported file in Excel or Word, because these applications will, by default, add formatting and other hidden code back into the file when you save your edits. Use NotePad or Text Edit or similar instead.
- Note:
csv
means comma-separated value. Each table is translated into a row of data per line of text. Each line of text has each column's value for that row in it, in order by column in the spreadsheet. If the value is a string, it is quoted. Numbers are unquoted. Each value is separated by a comma.
Alright, now that our spreadsheet is ready, we're going to database land. 🤓
Next Steps¶
Next we'll create a database to hold the information from our spreadsheet. 🚀