Collation Sequences, Character Sets and Case¶
"a A b B"
"A a B b"
Character Sets are lists of all the characters that you database will allow you to enter. If you're using the utf-8 version of the Unicode standard, your character set contains 1,112,064 different individual characters. If you're using the extended version of ASCII, your character set only contains 256. See the discussion of encodings for more information on these character encoding standards.
But you might have to make some decisions about what might count as one character in your writing system versus what should count as two or more characters. For example, in the Coeur d'Alene writing system developed by Lawrence Nicodemus, each of the following identifies one sound in the language: q
, qw
, q'
, q'w
. When we use two characters in combination to write one sound (like English sh
), we call that a digraph
, sounds that are written with three characters are called trigraphs
, etc.
Digraphs and trigraphs may or may not create issues for us in our databases - where the trouble is most likely to arise is when we ask the computer to sort our data. For languages that use the apostrophe character as a letter, and for those that use numeral characters or punctuation marks as letters, we may need to tell the computer how to sort those characters differently than they would sort by default in utf-8.
Collation Sequences are rules for the order of characters in a character set. Every character set has a default order, that includes every single character in the set. If you want your data to be sorted in an order different from the default, you'll need to create your own collation sequences
Collation Sequences include ordering of upper case (capital
) as well as lower case (small
) letters, and it has rules for when you use upper vs lower case letters. Your language might not use both upper and lower case letters, or the rules for when you use each kind might be different than the rules of English writing.
Here's a link to the SQLite documentation on collation sequences in case you need it.
How'd we do?¶
Do you feel like you understand what character sets, collation sequences, and cases are? If so, great! If not, we hope you'll ask more questions.
You can contact John, Gus, or Amy and we'll do the best we can to help!
At this point, you might go back to Day 1 🚀 or all the way back to our workshop home 🚀