Skip to content

From Spreadsheets to Databases

Let's look at a scenario where you might want to use a spreadsheet instead of a database. More about what a database really is will become apparent during this dicussion, we hope. 👩‍💻

Databases vs Spreadsheets

Similarities

There are many similarities between databases and spreadsheets:

  • Spreadsheets contain data that is carefully organized.
  • Spreadsheets organize that data into rows and columns.
  • Spreadsheets columns can have types. They can be formatted.
  • Spreadsheets can have multiple tabs. These would be called tables in a database environment. Each table defines its own rows and columns.

  • The rows in each table can be sorted by the values of certain columns.
  • The rows of each table can be filtered by the values of certain columns.
  • There is an ability to associate the values of certain columns/rows in a table with another table. This is often done by a lookup function in the spreadsheet. These lookups or associations would be referred to as foreign keys in a database.

Differences

Wow, that is a lot in common. Why even bother creating a database? Well, here are the differences between a spreadsheet and a database:

  • Databases have rigid constraints (that you define) that cannot be violated. For example, you can define a constraint that specifies that every entry in the column must be unique (cannot have duplicates).
    Click to read more about constraints This is useful to create universal keys that are used to indicate unique records. For example, a person may have the same name, same birth date, and even the same birth place as another person. John Smith; June 5th, 1995; Milwaukee could be such a person. However, each person is unique and specifying some king of randomized number that identifies them is often very useful. This unique key that makes each row in a table unique is called a Primary Key.
  • Databases are self describing. A database's structure is very explicit and can be queried. This description of what the columns of each table contain is called the database's schema.

    Click to read more about schemas Schemas are very rigid (by design) and will not let any data be placed in any column of a table that does not match the type assigned to that column, or violates any other constraint placed on that column. For example, if a column is defined to contain a date format only legal dates can be placed in that column.

  • Databases have users who are given certain permissions to create, read, update, and delete (CRUD) data in database tables and modify database schemas. SQLite is a one-user, all permissive database. Permissions can be handled in other ways in applications that use SQLite as a back-end.

    Click to read more about permissions A database root user will assign these roles to other named users. These permissions can be very granular in nature. For example, a person may be able to read data but not to modify data in a table. Another person might be able to modify only certain tables, or to modify only certain columns in a table. Other people may not be able to view table data that contains sensitive information. NOTE: SQLite will not help you solve the user permission problem. PostGres and other databases have permissions associated with users. The pictures shown below are examples of what a database with permissions is capable of doing. You can still use SQLite for an outward facing application, but the user permissions must be enforced by the application.

  • Databases are usually inherently multi-user. That is, one person can be modifying one table, one modifying another table, and on viewing data from a third table all at the same time. This implies the necessity for the most important feature of databases: transactional integrity.

  • Databases possess a property known as transactional integrity. Let's say that you have multiple changes that you want to happen at the same time, for example, a user's name needs to change and a new address needs to be added for them. The name and address changes either both need to happen, or they both need to fail (and neither change happens). What we don't want to happen is that the address gets entered but the name fails to be changed, or vice versa. A database can wrap a transaction around both of these changes so that they either both happen or neither one does. This assures database integrity.

Info

We have another brief explainer about transactional integrity here.

Modifying our Spreadsheet

We will:

  • Define a table with Excel (in this case)

Tip

If you are using a different spreadsheet application, like Google Sheets or Numbers on a Mac, you'll follow almost exactly the same process. Some of the menus might be a little different, but all spreadsheets can do the things we describe here.

  • Create two seperate tables in Excel to replace the first table
  • Export the information from Excel into tables
  • Use DB Browser for SQLite to import the tables and confirm the operations

Info

We're using examples that are taken from the Coeur d'Alene Online Language Resource Center. We're grateful to Audra Vincent, Coeur d'Alene Language Programs Manager, for allowing us to use these examples - but please do note that they are only examples and not official language records. Any errors in the examples are the responsibility of the AILT Team.

The first step, defining the original table, is shown in the image above explaining the excel table schema.

root nicodemus english originator
a a hello. Lawrence Nicodemus
a a? so. (lit. Is that so?), Lawrence Nicodemus
a aaaa...! cut out, knock off!, quit, stop. (lit. Cut it out!, Knock it off, quit it, Stop it!), Lawrence Nicodemus
a aye hey. John Doe
bc buts boots. John Doe
bc etsbutsbutsshn boots (to be wearing...). (lit. He is wearing boots) Jane Doe
bc sbutsbutsshn boot. (lit. a borrowed root) Jane Doe
bc sbutsbutsshnmsh rubber boots (putting on...). Lawrence Nicodemus
bl buuli bull. Lawrence Nicodemus
bm bam go (...fast and far), speeded (be...), be versatile. Lawrence Nicodemus
bm bam intoxicated. Lawrence Nicodemus
bm bambmt speedy (H/s is...). Lawrence Nicodemus
bm bamp speeded (He...), tipsy (He became...). Jack Flash
bm sbamp speeding. Lawrence Nicodemus
bm itsbamp intoxicated (H/s is being...). (lit. H/s is speeding) Lawrence Nicodemus
bm bem buzz. Lawrence Nicodemus
bm bm roar (...of motor or engine). Lawrence Nicodemus
bm isbembish whirr. (lit. It is...), Lawrence Nicodemus
bm isbembmish buzz. (lit. It is...) Lawrence Nicodemus
bm isbembmish hum, humming. (lit. H/s is making humming sounds/purring) Lawrence Nicodemus
bm sbembmish humming, any humming sound. Lawrence Nicodemus
bm stbembm bumblebee, hummingbird. Lawrence Nicodemus
bm stbembm hummingbird. Lawrence Nicodemus
bm ubmmm humming (continuous...). Lawrence Nicodemus
bn benene banana. Lawrence Nicodemus
bns bins bean. (l.w. from Engl.) Lawrence Nicodemus
bnwh Sbiinwahulumkhw Benewah County. (lit. Benewah land, named after a Coeur dAlene) Lawrence Nicodemus
bs stbiiso dragonfly. Unknown
btlym beetlyem Bethlehem. Unknown

Specific Problems with the Original Spreadsheet

What we have here is a problem. Well, actually, multiple problems.

  • First of all, it is easy to misspell someone's name in the originator column and thus push invalid data into the database. Searching for, say, "Nicodemus" will not find an invalid originator "Nicodimus". Ideally, we would like to reference this information from one place so that we can guard against making this mistake.
  • Second of all, what if we want to change the name in all of the instances. Maybe "Lawrence Nicodemus" wants to change his name to "Larry Nicodemus" because he is very informal. We have to do multiple work to change this name. It may appear in hundreds of entries. Not only that, but we run into the first problem again, where one or several of these entries are misspelled, or even left with the original name. Now we have more incorrect entries.
  • Third of all, the entry in this column may not be unique. There may be more than one "Lawrence Nicodemus" in the world that contributes to entries in this table. Unlikely in this case, but "John Smith" would be a problem.
  • Finally, what if we want to associate more information with people than just their name, such as an address or phone number? How do we do this in a systematic way in every table that might contain a name? We might want information associated with a name that shouldn't be shown in other tables, like phone numbers or social security numbers. Also, names aren't guaranteed to be unique, as discussed earlier.

From Specific Problems to General Problems

There are at least three fundamental problems you encounter in these three scenarios that databases help you to fix.

  • Data Redundancy
Click to read more about data redundancyThe same data, which ideally should be stored in one place, is entered in a bunch of different places. This causes extra work and leads to errors.
  • Transactional Integrity
Click to read more about transactional integrityI want to change all of the names from "Lawrence" to "Larry" but I am allowed to change one or two without changing the rest. This can be a huge problem for computers, where partial changes may enter the database and then a network error occurs, or the server running the database shuts down. This can cause partial changes to occur, which is VERY BAD especially if you are VISA and only part of a financial transaction gets entered. What you want are Transactions, which you can BEGIN and then either COMMIT or ROLLBACK the changes. This ensures that everything you want to change/add/delete either happens or it doesn't. The importance of this property of databases cannot be overstated.
  • Lack of Data Constraints
Click to read more about data constraintsThere is no way to really set an integrity constraint in a database, and where it can be done, it is often difficult. The constraint that we want here involves making sure that the name is always the same in all entries for the same person. We really ultimately want this key to be unique, in other words, we want to reference a person's PRIMARY KEY.

Hidden problems lurk here as well.

  • Multi-User Capability -
Click to read more about multi-user capabilityThis is a somewhat hidden problem of the spreadsheet approach. There is no way to share the workload in changing these entries. The only way is to share the spreadsheet and merge the results. New online spreadsheet sharing via Google Docs is a poor man's substitute for true multi-user access. It is the same problem you have with flat files, except that flat files are even worse, since you have no ability to seperate data into different tables except to put them in different sections of the file. And one person can easily stomp on another's work.

Creating a New Spreadsheet

The solution involves doing the following two things (in the spreadsheet and eventually the database), and a third (in the database).

Tip

We say things like create a new table in your spreadsheet application. What that actually means is create a new tab or create a new sheet. Each page of a spreadsheet is a big table, so to make a new table you'll need a whole new page.

  1. Create a new table to hold the user entries, called users. The key is some number. In the database, this number will be assigned and will be unique. In the spreadsheet, care must still be taken to make sure that the number really references a row in another table.
  2. Create a new table called roots that is identical to the original table except that the originator column entries are replaced with the number of the row in the users table that referenced the original name contained in the table.
  3. The new roots table (in the database, not the spreadsheet) will contain a FOREIGN KEY CONSTRAINT that causes the originator column to be referenced to the users table column id that is the PRIMARY KEY of the users table. In the spreadsheet, no such constraint exists. We must carefully enter the correct numbers because it is possible to enter an invald number (one that doesn't exist in the other spreadsheet).

The tables (tabs, or sheets) now look like this:

users

id email first last username
1 colrc@gmail.com Lawrence Nicodemus original
2 jdoe@gmail.com John Doe jdoe
3 jadoe@gmail.com Jane Doe jadoe
null rdoe@gmail.com Robert Doe rdoe

roots

id userId english nicodemus root
1 1 hello. a a
null 1 so. (lit. Is that so?), a? a
null 1 cut out, knock off!, quit, stop. (lit. Cut it out!, Knock it off, quit it, Stop it!), aaaa...! a
2 2 hey. aye a
3 2 boots. buts bc
4 3 boots (to be wearing...). (lit. He is wearing boots) etsbutsbutsshn bc
5 3 boot. (lit. a borrowed root) sbutsbutsshn bc
6 1 rubber boots (putting on...). sbutsbutsshnmsh bc
7 1 bull. buuli bl
8 1 go (...fast and far), speeded (be...), be versatile. bam bm
9 1 intoxicated. bam bm
10 1 speedy (H/s is...). bambmt bm
11 4 speeded (He...), tipsy (He became...). bamp bm
12 1 speeding. sbamp bm
13 1 intoxicated (H/s is being...). (lit. H/s is speeding) itsbamp bm
15 1 buzz. bem bm
16 1 roar (...of motor or engine). bm bm
17 1 whirr. (lit. It is...), isbembish bm
18 1 buzz. (lit. It is...) isbembmish bm
19 1 hum, humming. (lit. H/s is making humming sounds/purring) isbembmish bm
20 1 humming, any humming sound. sbembmish bm
21 1 bumblebee, hummingbird. stbembm bm
22 1 hummingbird. stbembm bm
23 1 humming (continuous...). ubmmm bm
24 1 banana. benene bn
25 1 bean. (l.w. from Engl.) bins bns
26 1 Benewah County. (lit. Benewah land, named after a Coeur dAlene) Sbiinwahulumkhw bnwh
27 null dragonfly. stbiiso bs
null null Bethlehem. beetlyem btlym

Export the Data Tables From the Spreadsheet

Tip

If you are using a different spreadsheet application, like Google Sheets or Numbers on a Mac, you'll follow almost exactly the same process. Some of the menus might be a little different, but all spreadsheets can do the things we describe here.

In Excel:

  • Switch to the correct tab for the data you want to export
  • Choose FileSave AsCSV (Comma delimited)
  • The file will be named something like Spreadsheet Name(Tab Title). In my example this was sample_database(roots) for the roots tab. This is often placed in a common directory. In my case, on a Windows system, this was the Downloads directory. Note: this directory and naming is typical of a Windows system.
  • You can open this file to look at it in a text editor such as NotePad (if you're using Windows) or Text Edit (if you're using a Mac). There are many other different text editors or code editors you can choose from (we use a multi-platform code editor called Visual Studio Code or VSCode).

Warning

Do not open and edit the exported file in Excel or Word, because these applications will, by default, add formatting and other hidden code back into the file when you save your edits. Use NotePad or Text Edit or similar instead.

  • Note: csv means comma-separated value. Each table is translated into a row of data per line of text. Each line of text has each column's value for that row in it, in order by column in the spreadsheet. If the value is a string, it is quoted. Numbers are unquoted. Each value is separated by a comma.

Alright, now that our spreadsheet is ready, we're going to database land. 🤓

Next Steps

Next we'll create a database to hold the information from our spreadsheet. 🚀