When you start using or creating a database, you generally start with the defaults options and fix issues along the way. Here’s a quick guide on how to choose your MySQL database encoding and the collation right from the beginning. This article will definitely help you to avoid the ache of debugging your encoding issues.
The database encoding decides how data is stored in the database. However, collation is used while comparing the data. For example: in a WHERE clause to be equal or LIKE, or with unique column constraints on text columns.
You can set both encoding and collation at the various levels and stages. These can be set on the server level, or at the database level, or at the table level or even for certain columns only. I will mention all the SQL queries related to these along with the instructions as you go further in this.
Let’s start with encoding
How to choose your MySQL encoding
There is a lot of encodings available and supported in MySQL. Choosing one of them is such a big task so go for some appropriate ones. If you need to handle special characters, like letters with accents, people will usually suggest you to opt for utf8 encoding. Be careful with utf8 encoding, this encoding does not support smileys, so if you want to save such data go for the other one. For that, you will need utf8mb4. utf8mb4 encoding is supported since MySQL 5.5.3. If you don’t want any of such special characters then you can simply go with the default, latin1 encoding.
NB: a note of warning when migrating from utf8 to utf8mb4. In utf8, a character can be encoded in a maximum of 3 bytes. This means UTF-8 can only hold 3 bytes per character. Utf8mb4 is actually the real 4-byte utf8 encoding, so holds 4 bytes per character. It adds an extra byte to store special characters like smileys. That changes the maximum length a column or index can hold. So if a column was of varchar(256) in utf8, it should now be varchar(191) in utf8mb4.
Now, I am moving towards MySQL collation.
How to choose your MySQL collation
Again, the collation is used while comparing data. Like: in a WHERE clause checking for equality or like clause, or with unique constraints on text columns. Here, you need to focus and get confirmed for the following points.
case-sensitivity important for you?
If a query for “hello” should match records like “HeLLo”, you’ll want a case-insensitive collation.
- Do you
care about accents?
If a query for “deja vu” should match records like “déjà vu”, you’ll want an accent-insensitive collation.
With utf8 you usually go with utf8_general_ci (or utf8mb4_unicode_ci with utf8mb4). Notice the “_ci” part at the end? It means “case insensitive” and implicitly also “accents insensitive”. With such a collation, lower/upper case and accents will be ignored in your searches and constraints on text columns.
If you care about accents, go for utf8_bin (or utf8mb4_bin). Careful, it’s also case-sensitive. There doesn’t seem to be a collation that ignores case but not accents (utf8_general_cs, for “case-sensitive”, is experimental and might not work for you). You’ll have to work around that in your queries.
The default collation for latin1 encoding is latin1_swedish_ci (notice the “_ci” at the end). For a case-sensitive collation, go for latin1_general_cs or latin1_bin.
- You care about accents and smileys: utf8mb4
- You care about accents but not smileys: utf8
- Your text content won’t have such special characters: latin1
- For utf8mb4: utf8mb4_unicode_ci (case-insensitive) or utf8mb4_bin
- For utf8: utf8_general_ci (case-insensitive) or utf8_bin
- For latin1: latin1_swedish_ci (case-insensitive) or latin1_bin
How to check your settings?
Here’s how to check the database settings in the database. You can simply run the query below in phpMyadmin directly. This query will show the list of all the databases with their respective charset and collation.
SELECT SCHEMA_NAME 'database', default_character_set_name 'charset', DEFAULT_COLLATION_NAME 'collation' FROM information_schema.SCHEMATA;
How to change MySQL database encoding and collation
Change encoding and collation of the database
I am doing the operations on “database_one” database. I have as make a highlight for this database in the picture above. In the above picture, you can see that the encoding for database_one is koi8r and collation is koi8r_general_ci. Now I am executing the query below. This will change the encoding of database_one to utf8mb4_unicode_ci and collation to utf8mb4
ALTER DATABASE database_one CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
Now you can see the list with the previous query again. this time the encoding and collation of database_one are different.
How to change the encoding and collation of a MySQL table
Firstly, I am going to check the collation and encoding of all the tables of database_one. I have dumped some tables from another database to this one. All of these tables have character_set_name = latin1 and collation_name = latin1_swedish_ci. So let’s have a look at the list of these tables using this query below. I am showing the result of this query in the image below query.
SELECT TAB.table_name, CHST.character_set_name, CHST.collation_name
FROM information_schema.`TABLES` TAB, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CHST
WHERE CHST.collation_name = TAB.table_collation AND TAB.table_schema = 'database_one';
Now I am changing the charset of brands table to
utf8mb4 and collation to
ALTER TABLE brands
CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
And similarly changing the charset of categories table to utf8 and collation to utf8_general_ci.
ALTER TABLE categories
CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
Below is the list of tables after changing their encodings.
Change the encoding and collation of a table on column level
Same as above, firstly, I will check the collation and encoding of all the columns in all the tables of database_one. In the operations done above, I have changed collations and encoding for brands and categories tables. so we will see different values for different columns.
I am fetching the columns which have collation_name. As only these columns will get affected by such operations. These are the columns that store text values like varchar, text, etc. So let’s have a look at the list of these columns using this query below. I am showing the result of this query in the image below query.
SELECT table_name, column_name, data_type, character_maximum_length, character_set_name, collation_name, column_type FROM information_schema.`COLUMNS` WHERE table_schema = 'database_one' AND character_set_name IS NOT NULL;
Now I am changing encoding for product_title column of cart table and cat_title column of categories table. The query to do so is below:
ALTER TABLE `cart` CHANGE `product_title` `product_title` VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE `categories` CHANGE `cat_title` `cat_title` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Now again fetch all columns to see the changes.