Encodings¶

"x̣ʷ ɫ ʕ" cat computer "� 􏿾 ⍰ "

Your computer can understand several different ways to find the right character to show you, but all of the systems are based on encoding systems such that each element you see on the screen has it's own little numeric code.

We don't really have to care about all that except when we encounter weird characters like these:

� □ ▯ 􏿾 ⍰

These symbols might be inserted when you try to use a character that just isn't in the encoding system your computer is trying to use.

For the most part, we can get away with knowing a little bit about two different character encodings: ASCII (pronounced ask-key) and utf-8 (pronounced, uh, u-t-f eight), which is one kind of Unicode encoding system.

ASCII¶

In the olden days, the first encoding system was based on the characters English speakers had on their typewriters and adding machines and such - and would have to send electronically to each other. That transmission used a binary encoding system - just sequences of 1s and 0s (or 'on' and 'off' electrically), and computers today are still based on such systems.

The system that was first standardized in 1963 for this purpose is called ascii, which stands for American Standard Code for Information Interchange. It allows a computer to understand 128 (or, in its extended version, 256) different characters, including the letters of the Latin alphabet in both upper and lower case, the Arabic numerals 0 through 9, some arithmetic symbols, punctuation marks, and a few control characters like carriage return.

Info

If you're super interested in the history of encodings, we recommend that you start here

ASCII is very English-centric, and for a lot of years it was the standard for computers as well as other kinds of machines. It's still used in some places, and if are having trouble getting the computer to show you non-English type characters, the problem might be that it's showing you ASCII.

Don't worry, we can fix that - if we know it's the problem!

Unicode¶

Fast-forward to 2024 and we have an encoding system that works well for almost all orthographies, and that we can also thank for our ability to write with emoji 💗💗💻💗💗.

Info

The Unicode system is a massive, global effort and you can learn more about it here

What makes unicode possible is the expansion of the size of the string of numbers etc that can be used to identify a single character. There are actually multiple encodings that count as unicode, the one that's currently standard is called utf-8. utf-8 gives the computer from 1 to 4 bytes (or up to 8 bits) to name characters with. As of the time of writing, utf-8 supports display of 1,112,064 different characters. This standard can be expanded to utf-16 or even utf-32, and the more emoji etc. that are added to the Unicode standard, the more likely it is that we'll all be wanting to move to one of those in the not-too-distant future.

How'd we do?¶

Do you feel like you understand what character encodings are? If so, great! If not, we hope you'll ask more questions.

You can contact John, Gus, or Amy and we'll do the best we can to help!

At this point, you might go back to Day 1 🚀 or all the way back to our workshop home 🚀