龙空技术网

MySQL中字符集和排序规则,为什么中文等字符编码下要使用utf8mb4

编程菌zfn 327

前言:

现时朋友们对“linux修改mysql字符集”大致比较珍视,大家都想要了解一些“linux修改mysql字符集”的相关内容。那么小编同时在网络上汇集了一些关于“linux修改mysql字符集””的相关文章,希望小伙伴们能喜欢,同学们快快来了解一下吧!

两个不同的字符集不能有相同的排序规则。且每个字符集都有一个默认的排序规则。

本篇大部分内容出自官网文档...

查看当前支持的字符集INFORMATION_SCHEMA.CHARACTER_SETS

INFORMATION_SCHEMA.CHARACTER_SETS 表中可以查看可用的字符集。

MAXLEN 列表示:The maximum number of bytes required to store one character.

MariaDB [(none)]> select * from INFORMATION_SCHEMA.CHARACTER_SETS;+--------------------+----------------------+-----------------------------+--------+| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION                 | MAXLEN |+--------------------+----------------------+-----------------------------+--------+| big5               | big5_chinese_ci      | Big5 Traditional Chinese    |      2 || dec8               | dec8_swedish_ci      | DEC West European           |      1 || cp850              | cp850_general_ci     | DOS West European           |      1 || hp8                | hp8_english_ci       | HP West European            |      1 || koi8r              | koi8r_general_ci     | KOI8-R Relcom Russian       |      1 || latin1             | latin1_swedish_ci    | cp1252 West European        |      1 || latin2             | latin2_general_ci    | ISO 8859-2 Central European |      1 |......

或者查看包含 utf8 的字符集,可以使用LIKE子句或WHERE子句。

MariaDB [(none)]> SELECT * FROM INFORMATION_SCHEMA.CHARACTER_SETS    ->   WHERE CHARACTER_SET_NAME LIKE '%utf8%';+--------------------+----------------------+---------------+--------+| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION   | MAXLEN |+--------------------+----------------------+---------------+--------+| utf8               | utf8_general_ci      | UTF-8 Unicode |      3 || utf8mb4            | utf8mb4_general_ci   | UTF-8 Unicode |      4 |+--------------------+----------------------+---------------+--------+2 rows in set (0.000 sec)
SHOW CHARACTER SET

相对于 INFORMATION_SCHEMA.CHARACTER_SETS 表,SHOW CHARACTER SET 语句更常用于查看支持的字符集。

MariaDB [(none)]> SHOW CHARACTER SET;+----------+-----------------------------+---------------------+--------+| Charset  | Description                 | Default collation   | Maxlen |+----------+-----------------------------+---------------------+--------+| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 || dec8     | DEC West European           | dec8_swedish_ci     |      1 || cp850    | DOS West European           | cp850_general_ci    |      1 || hp8      | HP West European            | hp8_english_ci      |      1 || koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 || latin1   | cp1252 West European        | latin1_swedish_ci   |      1 || latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |......

或者查看包含 utf8 的字符集,可以使用LIKE子句或WHERE子句。

MariaDB [(none)]> SHOW CHARACTER SET LIKE '%utf8%';+---------+---------------+--------------------+--------+| Charset | Description   | Default collation  | Maxlen |+---------+---------------+--------------------+--------+| utf8    | UTF-8 Unicode | utf8_general_ci    |      3 || utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci |      4 |+---------+---------------+--------------------+--------+2 rows in set (0.005 sec)
查看当前支持的排序规则Collation

一个给定的字符集至少有一个排序规则,大多数字符集有多个。

使用 INFORMATION_SCHEMA.COLLATIONS 表或 SHOW COLLATION 语句,可以显示一个字符集的排序规则。

SHOW COLLATION默认显示所有的排序规则,使用WHERE、LIKE等子句可以过滤显示指定字符集的排序规则。

查看默认字符集 utf8mb4 的排序规则。

MariaDB [(none)]> SHOW COLLATION WHERE Charset = 'utf8mb4';+------------------------------+---------+------+---------+----------+---------+| Collation                    | Charset | Id   | Default | Compiled | Sortlen |+------------------------------+---------+------+---------+----------+---------+| utf8mb4_general_ci           | utf8mb4 |   45 | Yes     | Yes      |       1 || utf8mb4_bin                  | utf8mb4 |   46 |         | Yes      |       1 || utf8mb4_unicode_ci           | utf8mb4 |  224 |         | Yes      |       8 || utf8mb4_icelandic_ci         | utf8mb4 |  225 |         | Yes      |       8 |.........| utf8mb4_unicode_nopad_ci     | utf8mb4 | 1248 |         | Yes      |       8 || utf8mb4_unicode_520_nopad_ci | utf8mb4 | 1270 |         | Yes      |       8 |+------------------------------+---------+------+---------+----------+---------+33 rows in set (0.008 sec)
[为什么推荐使用utf8mb4?]MySQL中的utf8和utf8mb4字符集真正支持存储Unicode字符

MySQL/MariaDB 的 utf8 编码并不是真正的UTF-8编码。在MySQL/MariaDB中,utf8最多只支持3个字节,上面的查询中已经可以看到。

而简体中文在 utf8 编码中,一般占3个字节,使用扩展区的中文会占用4个字节。UTF-8编码是可变长编码,英文等字母字符占1个字节,其他复杂字符会占用2-4个字节。

但是目前常使用的emoji表情、繁体字、特殊字符、不常见汉字等,会占用4个字节。这就导致,

如果在 MySQL/MariaDB 中使用 utf8 编码,将无法存储4个字节的汉字、emoji等表情。MySQL/MariaDB 的 utf8 编码占用最多3个字节是由于历史原因造成的。

utf8mb4才是真正的UTF-8编码,可以最多支持4个字节。

所以实际开发中,推荐使用 utf8mb4 编码字符集,这是真正的Unicode编码。尤其是在用到中文、emoji等字符的环境中。否则就会遇到字节数不正确,报错不正确的字符值(Incorrect string value)

MySQL 8.0开始,默认的字符编码已经改为utf8mb4,MariaDB 10.0仍为latin1。

UTF-8编码是U+2528D,属于CJK Unified Ideographs Extension B(中日韩统一表意文字扩充B)字符集的字符,处于第二辅助平面(SIP,表意文字补充平面),最多支持4个字节。

而Mysql的utf8编码则属于常见的基本多文种平面(BMP,即Unicode编码范围在0000-FFFF之内)的字符,最多支持3个字节。

关于utf8mb4_bin排序规则

使用utf8mb4字符集时,通常推荐使用utf8mb4_unicode_ci(或utf8mb4_general_ci)排序规则。

但,如果想要区分区大小写且同时区分重音,推荐使用utf8mb4_bin。一般区分大小写的也都区分重音。utf8_unicode_cs并不是很推荐!

指定字符集和排序规则修改MySQL服务器的排序规则

Linux系统下,修改 /etc/my.cnf 或 /etc/my.cnf.d/server.cnf 文件;Windows系统下,修改MySQL/Mariadb安装目录下data/my.ini文件,

在对应[mysql]、[mysqld]下设置:

[client]default-character-set = utf8mb4[mysql]default-character-set = utf8mb4[mysqld]character-set-client-handshake = FALSEcharacter-set-server = utf8mb4collation-server = utf8mb4_unicode_ci

设置完后重启(restart)MySQL/MariaDB。

还可以使用命令行,设置字符集和排序规则:

mysqld --character-set-server=utf8 --collation-server=utf8_unicode_cs

创建数据库时指定

创建数据库时指定字符集和排序规则的语法如下:

CREATE DATABASE IF NOT EXISTS database_name     DEFAULT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;

创建并查看一个数据库的排序规则和字符集:

CREATE DATABASE IF NOT EXISTS CHARSETTest     DEFAULT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;-- 查看数据库show create database CHARSETTest;-- 返回-- CREATE DATABASE `CHARSETTest` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */

或:

CREATE DATABASE IF NOT EXISTS CHARSETTest1     DEFAULT CHARSET latin1 COLLATE latin1_swedish_ci;-- 查看数据库show create database CHARSETTest1;-- 返回-- CREATE DATABASE `CHARSETTest1` /*!40100 DEFAULT CHARACTER SET latin1 */
指定表和列的字符集和排序规则

如下,可以在创建一个表时,指定字符集和排序规则DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci。不指定时,默认使用所在数据库的字符集和排序规则。

同时,可以在列名后,指定当前列的排序规则,如 id char(4) COLLATE utf8mb4_unicode_ci。

CREATE TABLE `Product` (  `id` char(4) COLLATE utf8mb4_unicode_ci NOT NULL,  `name` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL,  `type` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL,  `sale_price` int(11) DEFAULT NULL,  `purchase_price` int(11) DEFAULT NULL,  `regist_date` date DEFAULT NULL,  PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

通过SHOW CREATE TABLE tablename;语句,可以完整查看创建表和列的语句,及其中用到的字符集、排序规则。

对应的,可以通过 ALTER TABLE 语句实现对列使用的字符编码的修改。

修改数据库的字符集和排序规则

ALTER DATABASE <db_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

示例:

MariaDB [test]> ALTER DATABASE CHARSETTest1 CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Query OK, 1 row affected (0.008 sec)MariaDB [test]> show create database CHARSETTest1;+--------------+-----------------------------------------------------------------------------------------------------+| Database     | Create Database                                                                                     |+--------------+-----------------------------------------------------------------------------------------------------+| CHARSETTest1 | CREATE DATABASE `CHARSETTest1` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */ |+--------------+-----------------------------------------------------------------------------------------------------+1 row in set (0.006 sec)
修改表的字符集
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

示例:

MariaDB [test]> ALTER TABLE OrderTest CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Query OK, 4 rows affected (0.035 sec)Records: 4  Duplicates: 0  Warnings: 0MariaDB [test]> show create table OrderTest;+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------+| Table     | Create Table                                                                                                                                         |+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------+| OrderTest | CREATE TABLE `OrderTest` (...) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------+1 row in set (0.002 sec)
修改某一个列的排序规则
ALTER TABLE t_name MODIFY c_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL;
查看不同对象的字符编码或排序规则SHOW VARIABLES查看MySQL系统设置的排序规则

SHOW VARIABLES系统变量可以返回和排序规则相关的各个级别的默认设置。

如下,显示与字符集和排序规则相关的所有变量的设置:

MariaDB [(none)]> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';+--------------------------+--------------------+| Variable_name            | Value              |+--------------------------+--------------------+| character_set_client     | utf8mb4            || character_set_connection | utf8mb4            || character_set_database   | utf8mb4            || character_set_filesystem | binary             || character_set_results    | utf8mb4            || character_set_server     | utf8mb4            || character_set_system     | utf8               || collation_connection     | utf8mb4_general_ci || collation_database       | utf8mb4_unicode_ci || collation_server         | utf8mb4_unicode_ci |+--------------------------+--------------------+10 rows in set (0.001 sec)

最后三行分别是服务器、数据库和连接的默认排序规则。

SELECT @@collation_server;

查看当前数据库的字符集和排序规则全局变量

use <db_name>;SELECT @@character_set_database, @@collation_database;

如下查看:

MariaDB [test]> SELECT @@character_set_database, @@collation_database;+--------------------------+----------------------+| @@character_set_database | @@collation_database |+--------------------------+----------------------+| latin1                   | latin1_swedish_ci    |+--------------------------+----------------------+1 row in set (0.004 sec)

如果未指定数据库时,将返回MySQL系统级别的字符集和排序规则。

使用information_schema.schemata

查询information_schema.schemata表,也可以获取数据库的排序规则。而且不需要切换数据库,可直接查看指定数据库的collate。

如下:

MariaDB [(none)]> SELECT    ->    default_character_set_name,    ->    default_collation_name    -> FROM information_schema.schemata    -> WHERE schema_name = 'test';+----------------------------+------------------------+| default_character_set_name | default_collation_name |+----------------------------+------------------------+| latin1                     | latin1_swedish_ci      |+----------------------------+------------------------+1 row in set (0.001 sec)
使用环境变量variables
USE db_name;show variables like "character_set_database";-- or:-- show variables like "collation_database";
查看指定表的排序规则

创建一个存放ABab的数据表。后续操作基于此表演示。

CREATE TABLE OrderTest( letter char(1) NOT NULL ); INSERT INTO OrderTest values('B'),('b'),('A'),('a');

通过查询 INFORMATION_SCHEMA.TABLES 表,可以查看某个表的排序规则。

比如:

SELECT TABLE_SCHEMA  -- 会有警告,结果为dbname    , TABLE_NAME    , TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLESWHERE TABLE_NAME = 't_name';
MariaDB [test]> SELECT TABLE_SCHEMA    ->     , TABLE_NAME    ->     , TABLE_COLLATION    -> FROM INFORMATION_SCHEMA.TABLES    -> WHERE TABLE_NAME = 'OrderTest';+--------------+------------+-------------------+| TABLE_SCHEMA | TABLE_NAME | TABLE_COLLATION   |+--------------+------------+-------------------+| test         | ordertest  | latin1_swedish_ci |+--------------+------------+-------------------+1 row in set, 2 warnings (0.003 sec)

除此之外,还可以使用 SHOW TABLE STATUS LIKE 't_name'; 或 SHOW TABLE STATUS where name like 't_name'; 语句查看表的 Collation 。(缺点是无法选择输出列...)

MariaDB [test]> SHOW TABLE STATUS LIKE 'OrderTest'\G*************************** 1. row ***************************            Name: OrderTest          Engine: InnoDB         Version: 10      Row_format: Dynamic            Rows: 4  Avg_row_length: 4096     Data_length: 16384 Max_data_length: 0    Index_length: 0       Data_free: 0  Auto_increment: NULL     Create_time: 2021-08-10 10:38:06     Update_time: 2021-08-10 10:38:06      Check_time: NULL       Collation: latin1_swedish_ci        Checksum: NULL  Create_options:         Comment:Max_index_length: 0       Temporary: N1 row in set (0.001 sec)
查看列的字符集和排序规则INFORMATION_SCHEMA.COLUMNS表

在 INFORMATION_SCHEMA.COLUMNS 表中,可以查看列的排序规则。

SELECT TABLE_NAME     , COLUMN_NAME     , character_set_name    , COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNSWHERE TABLE_NAME = 't_name';

如果想要查看所在数据库,可以在附加上TABLE_SCHEMA列。

如下:

MariaDB [(none)]> SELECT TABLE_NAME    ->     , COLUMN_NAME    ->     , character_set_name    ->     , COLLATION_NAME    -> FROM INFORMATION_SCHEMA.COLUMNS    -> WHERE TABLE_NAME = 'OrderTest';+------------+-------------+--------------------+-------------------+| TABLE_NAME | COLUMN_NAME | character_set_name | COLLATION_NAME    |+------------+-------------+--------------------+-------------------+| ordertest  | letter      | latin1             | latin1_swedish_ci |+------------+-------------+--------------------+-------------------+1 row in set (0.011 sec)
SHOW FULL COLUMNS FROM <table>

SHOW FULL COLUMNS FROM <table>可以返回一个表所有列的信息,包括排序规则。

MariaDB [test]> SHOW FULL COLUMNS FROM OrderTest;+--------+---------+-------------------+------+-----+---------+-------+---------------------------------+---------+| Field  | Type    | Collation         | Null | Key | Default | Extra | Privileges                      | Comment |+--------+---------+-------------------+------+-----+---------+-------+---------------------------------+---------+| letter | char(1) | latin1_swedish_ci | NO   |     | NULL    |       | select,insert,update,references |         |+--------+---------+-------------------+------+-----+---------+-------+---------------------------------+---------+1 row in set (0.005 sec)
SQL语句中的显式使用排序规则使用COLLATE子句指定排序规则

默认查询使用默认排序规则,不区分大小写:

MariaDB [test]> select * from OrderTest where letter='a';+--------+| letter |+--------+| A      || a      |+--------+2 rows in set (0.000 sec)

指定字符集,查找严格区分大小写:

MariaDB [test]> select * from OrderTest where letter collate latin1_general_cs ='a';+--------+| letter |+--------+| a      |+--------+1 row in set (0.007 sec)
MySQL/MariaDB排序中如何严格按照字符的码点排序?

从下面的查询,可以看到,MySQL/MariaDB的Order By子句排序查询中,指定排序列的Collation为区分大小写的latin1_general_cs,并不会使结果按照字符码点严格排序。

MariaDB [test]> select * from OrderTest order by letter;+--------+| letter |+--------+| A      || a      || B      || b      |+--------+4 rows in set (0.000 sec)MariaDB [test]> select * from OrderTest order by letter collate latin1_general_cs;+--------+| letter |+--------+| A      || a      || B      || b      |+--------+4 rows in set (0.002 sec)

而要想实现按照字符的Unicode码值排序,可以使用二进制的latin1_bin、utf8mb4_bin等排序规则。如下:

MariaDB [test]> select * from OrderTest order by letter collate latin1_bin;+--------+| letter |+--------+| A      || B      || a      || b      |+--------+4 rows in set (0.002 sec)

标签: #linux修改mysql字符集 #mysql苹果表情 #mysql中文unicode #mysql中汉字用什么类型 #mysql命令行输入中文