前言:
现时朋友们对“linux修改mysql字符集”大致比较珍视,大家都想要了解一些“linux修改mysql字符集”的相关内容。那么小编同时在网络上汇集了一些关于“linux修改mysql字符集””的相关文章,希望小伙伴们能喜欢,同学们快快来了解一下吧!两个不同的字符集不能有相同的排序规则。且每个字符集都有一个默认的排序规则。
本篇大部分内容出自官网文档...
查看当前支持的字符集INFORMATION_SCHEMA.CHARACTER_SETS
INFORMATION_SCHEMA.CHARACTER_SETS 表中可以查看可用的字符集。
MAXLEN 列表示:The maximum number of bytes required to store one character.
MariaDB [(none)]> select * from INFORMATION_SCHEMA.CHARACTER_SETS;+--------------------+----------------------+-----------------------------+--------+| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN |+--------------------+----------------------+-----------------------------+--------+| big5 | big5_chinese_ci | Big5 Traditional Chinese | 2 || dec8 | dec8_swedish_ci | DEC West European | 1 || cp850 | cp850_general_ci | DOS West European | 1 || hp8 | hp8_english_ci | HP West European | 1 || koi8r | koi8r_general_ci | KOI8-R Relcom Russian | 1 || latin1 | latin1_swedish_ci | cp1252 West European | 1 || latin2 | latin2_general_ci | ISO 8859-2 Central European | 1 |......
或者查看包含 utf8 的字符集,可以使用LIKE子句或WHERE子句。
MariaDB [(none)]> SELECT * FROM INFORMATION_SCHEMA.CHARACTER_SETS -> WHERE CHARACTER_SET_NAME LIKE '%utf8%';+--------------------+----------------------+---------------+--------+| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN |+--------------------+----------------------+---------------+--------+| utf8 | utf8_general_ci | UTF-8 Unicode | 3 || utf8mb4 | utf8mb4_general_ci | UTF-8 Unicode | 4 |+--------------------+----------------------+---------------+--------+2 rows in set (0.000 sec)SHOW CHARACTER SET
相对于 INFORMATION_SCHEMA.CHARACTER_SETS 表,SHOW CHARACTER SET 语句更常用于查看支持的字符集。
MariaDB [(none)]> SHOW CHARACTER SET;+----------+-----------------------------+---------------------+--------+| Charset | Description | Default collation | Maxlen |+----------+-----------------------------+---------------------+--------+| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 || dec8 | DEC West European | dec8_swedish_ci | 1 || cp850 | DOS West European | cp850_general_ci | 1 || hp8 | HP West European | hp8_english_ci | 1 || koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 || latin1 | cp1252 West European | latin1_swedish_ci | 1 || latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |......
或者查看包含 utf8 的字符集,可以使用LIKE子句或WHERE子句。
MariaDB [(none)]> SHOW CHARACTER SET LIKE '%utf8%';+---------+---------------+--------------------+--------+| Charset | Description | Default collation | Maxlen |+---------+---------------+--------------------+--------+| utf8 | UTF-8 Unicode | utf8_general_ci | 3 || utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |+---------+---------------+--------------------+--------+2 rows in set (0.005 sec)查看当前支持的排序规则Collation
一个给定的字符集至少有一个排序规则,大多数字符集有多个。
使用 INFORMATION_SCHEMA.COLLATIONS 表或 SHOW COLLATION 语句,可以显示一个字符集的排序规则。
SHOW COLLATION默认显示所有的排序规则,使用WHERE、LIKE等子句可以过滤显示指定字符集的排序规则。
查看默认字符集 utf8mb4 的排序规则。
MariaDB [(none)]> SHOW COLLATION WHERE Charset = 'utf8mb4';+------------------------------+---------+------+---------+----------+---------+| Collation | Charset | Id | Default | Compiled | Sortlen |+------------------------------+---------+------+---------+----------+---------+| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 || utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 || utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 || utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |.........| utf8mb4_unicode_nopad_ci | utf8mb4 | 1248 | | Yes | 8 || utf8mb4_unicode_520_nopad_ci | utf8mb4 | 1270 | | Yes | 8 |+------------------------------+---------+------+---------+----------+---------+33 rows in set (0.008 sec)[为什么推荐使用utf8mb4?]MySQL中的utf8和utf8mb4字符集真正支持存储Unicode字符
MySQL/MariaDB 的 utf8 编码并不是真正的UTF-8编码。在MySQL/MariaDB中,utf8最多只支持3个字节,上面的查询中已经可以看到。
而简体中文在 utf8 编码中,一般占3个字节,使用扩展区的中文会占用4个字节。UTF-8编码是可变长编码,英文等字母字符占1个字节,其他复杂字符会占用2-4个字节。
但是目前常使用的emoji表情、繁体字、特殊字符、不常见汉字等,会占用4个字节。这就导致,
如果在 MySQL/MariaDB 中使用 utf8 编码,将无法存储4个字节的汉字、emoji等表情。MySQL/MariaDB 的 utf8 编码占用最多3个字节是由于历史原因造成的。
utf8mb4才是真正的UTF-8编码,可以最多支持4个字节。
所以实际开发中,推荐使用 utf8mb4 编码字符集,这是真正的Unicode编码。尤其是在用到中文、emoji等字符的环境中。否则就会遇到字节数不正确,报错不正确的字符值(Incorrect string value)
MySQL 8.0开始,默认的字符编码已经改为utf8mb4,MariaDB 10.0仍为latin1。
UTF-8编码是U+2528D,属于CJK Unified Ideographs Extension B(中日韩统一表意文字扩充B)字符集的字符,处于第二辅助平面(SIP,表意文字补充平面),最多支持4个字节。
而Mysql的utf8编码则属于常见的基本多文种平面(BMP,即Unicode编码范围在0000-FFFF之内)的字符,最多支持3个字节。
关于utf8mb4_bin排序规则
使用utf8mb4字符集时,通常推荐使用utf8mb4_unicode_ci(或utf8mb4_general_ci)排序规则。
但,如果想要区分区大小写且同时区分重音,推荐使用utf8mb4_bin。一般区分大小写的也都区分重音。utf8_unicode_cs并不是很推荐!
指定字符集和排序规则修改MySQL服务器的排序规则
Linux系统下,修改 /etc/my.cnf 或 /etc/my.cnf.d/server.cnf 文件;Windows系统下,修改MySQL/Mariadb安装目录下data/my.ini文件,
在对应[mysql]、[mysqld]下设置:
[client]default-character-set = utf8mb4[mysql]default-character-set = utf8mb4[mysqld]character-set-client-handshake = FALSEcharacter-set-server = utf8mb4collation-server = utf8mb4_unicode_ci
设置完后重启(restart)MySQL/MariaDB。
还可以使用命令行,设置字符集和排序规则:
mysqld --character-set-server=utf8 --collation-server=utf8_unicode_cs
创建数据库时指定
创建数据库时指定字符集和排序规则的语法如下:
CREATE DATABASE IF NOT EXISTS database_name DEFAULT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;
创建并查看一个数据库的排序规则和字符集:
CREATE DATABASE IF NOT EXISTS CHARSETTest DEFAULT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;-- 查看数据库show create database CHARSETTest;-- 返回-- CREATE DATABASE `CHARSETTest` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */
或:
CREATE DATABASE IF NOT EXISTS CHARSETTest1 DEFAULT CHARSET latin1 COLLATE latin1_swedish_ci;-- 查看数据库show create database CHARSETTest1;-- 返回-- CREATE DATABASE `CHARSETTest1` /*!40100 DEFAULT CHARACTER SET latin1 */指定表和列的字符集和排序规则
如下,可以在创建一个表时,指定字符集和排序规则DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci。不指定时,默认使用所在数据库的字符集和排序规则。
同时,可以在列名后,指定当前列的排序规则,如 id char(4) COLLATE utf8mb4_unicode_ci。
CREATE TABLE `Product` ( `id` char(4) COLLATE utf8mb4_unicode_ci NOT NULL, `name` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL, `type` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL, `sale_price` int(11) DEFAULT NULL, `purchase_price` int(11) DEFAULT NULL, `regist_date` date DEFAULT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
通过SHOW CREATE TABLE tablename;语句,可以完整查看创建表和列的语句,及其中用到的字符集、排序规则。
对应的,可以通过 ALTER TABLE 语句实现对列使用的字符编码的修改。
修改数据库的字符集和排序规则
ALTER DATABASE <db_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
示例:
MariaDB [test]> ALTER DATABASE CHARSETTest1 CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Query OK, 1 row affected (0.008 sec)MariaDB [test]> show create database CHARSETTest1;+--------------+-----------------------------------------------------------------------------------------------------+| Database | Create Database |+--------------+-----------------------------------------------------------------------------------------------------+| CHARSETTest1 | CREATE DATABASE `CHARSETTest1` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */ |+--------------+-----------------------------------------------------------------------------------------------------+1 row in set (0.006 sec)修改表的字符集
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
示例:
MariaDB [test]> ALTER TABLE OrderTest CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Query OK, 4 rows affected (0.035 sec)Records: 4 Duplicates: 0 Warnings: 0MariaDB [test]> show create table OrderTest;+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------+| Table | Create Table |+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------+| OrderTest | CREATE TABLE `OrderTest` (...) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------+1 row in set (0.002 sec)修改某一个列的排序规则
ALTER TABLE t_name MODIFY c_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL;查看不同对象的字符编码或排序规则SHOW VARIABLES查看MySQL系统设置的排序规则
SHOW VARIABLES系统变量可以返回和排序规则相关的各个级别的默认设置。
如下,显示与字符集和排序规则相关的所有变量的设置:
MariaDB [(none)]> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';+--------------------------+--------------------+| Variable_name | Value |+--------------------------+--------------------+| character_set_client | utf8mb4 || character_set_connection | utf8mb4 || character_set_database | utf8mb4 || character_set_filesystem | binary || character_set_results | utf8mb4 || character_set_server | utf8mb4 || character_set_system | utf8 || collation_connection | utf8mb4_general_ci || collation_database | utf8mb4_unicode_ci || collation_server | utf8mb4_unicode_ci |+--------------------------+--------------------+10 rows in set (0.001 sec)
最后三行分别是服务器、数据库和连接的默认排序规则。
SELECT @@collation_server;
查看当前数据库的字符集和排序规则全局变量
use <db_name>;SELECT @@character_set_database, @@collation_database;
如下查看:
MariaDB [test]> SELECT @@character_set_database, @@collation_database;+--------------------------+----------------------+| @@character_set_database | @@collation_database |+--------------------------+----------------------+| latin1 | latin1_swedish_ci |+--------------------------+----------------------+1 row in set (0.004 sec)
如果未指定数据库时,将返回MySQL系统级别的字符集和排序规则。
使用information_schema.schemata
查询information_schema.schemata表,也可以获取数据库的排序规则。而且不需要切换数据库,可直接查看指定数据库的collate。
如下:
MariaDB [(none)]> SELECT -> default_character_set_name, -> default_collation_name -> FROM information_schema.schemata -> WHERE schema_name = 'test';+----------------------------+------------------------+| default_character_set_name | default_collation_name |+----------------------------+------------------------+| latin1 | latin1_swedish_ci |+----------------------------+------------------------+1 row in set (0.001 sec)使用环境变量variables
USE db_name;show variables like "character_set_database";-- or:-- show variables like "collation_database";查看指定表的排序规则
创建一个存放ABab的数据表。后续操作基于此表演示。
CREATE TABLE OrderTest( letter char(1) NOT NULL ); INSERT INTO OrderTest values('B'),('b'),('A'),('a');
通过查询 INFORMATION_SCHEMA.TABLES 表,可以查看某个表的排序规则。
比如:
SELECT TABLE_SCHEMA -- 会有警告,结果为dbname , TABLE_NAME , TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLESWHERE TABLE_NAME = 't_name';
MariaDB [test]> SELECT TABLE_SCHEMA -> , TABLE_NAME -> , TABLE_COLLATION -> FROM INFORMATION_SCHEMA.TABLES -> WHERE TABLE_NAME = 'OrderTest';+--------------+------------+-------------------+| TABLE_SCHEMA | TABLE_NAME | TABLE_COLLATION |+--------------+------------+-------------------+| test | ordertest | latin1_swedish_ci |+--------------+------------+-------------------+1 row in set, 2 warnings (0.003 sec)
除此之外,还可以使用 SHOW TABLE STATUS LIKE 't_name'; 或 SHOW TABLE STATUS where name like 't_name'; 语句查看表的 Collation 。(缺点是无法选择输出列...)
MariaDB [test]> SHOW TABLE STATUS LIKE 'OrderTest'\G*************************** 1. row *************************** Name: OrderTest Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 4 Avg_row_length: 4096 Data_length: 16384 Max_data_length: 0 Index_length: 0 Data_free: 0 Auto_increment: NULL Create_time: 2021-08-10 10:38:06 Update_time: 2021-08-10 10:38:06 Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment:Max_index_length: 0 Temporary: N1 row in set (0.001 sec)查看列的字符集和排序规则INFORMATION_SCHEMA.COLUMNS表
在 INFORMATION_SCHEMA.COLUMNS 表中,可以查看列的排序规则。
SELECT TABLE_NAME , COLUMN_NAME , character_set_name , COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNSWHERE TABLE_NAME = 't_name';
如果想要查看所在数据库,可以在附加上TABLE_SCHEMA列。
如下:
MariaDB [(none)]> SELECT TABLE_NAME -> , COLUMN_NAME -> , character_set_name -> , COLLATION_NAME -> FROM INFORMATION_SCHEMA.COLUMNS -> WHERE TABLE_NAME = 'OrderTest';+------------+-------------+--------------------+-------------------+| TABLE_NAME | COLUMN_NAME | character_set_name | COLLATION_NAME |+------------+-------------+--------------------+-------------------+| ordertest | letter | latin1 | latin1_swedish_ci |+------------+-------------+--------------------+-------------------+1 row in set (0.011 sec)SHOW FULL COLUMNS FROM <table>
SHOW FULL COLUMNS FROM <table>可以返回一个表所有列的信息,包括排序规则。
MariaDB [test]> SHOW FULL COLUMNS FROM OrderTest;+--------+---------+-------------------+------+-----+---------+-------+---------------------------------+---------+| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |+--------+---------+-------------------+------+-----+---------+-------+---------------------------------+---------+| letter | char(1) | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | |+--------+---------+-------------------+------+-----+---------+-------+---------------------------------+---------+1 row in set (0.005 sec)SQL语句中的显式使用排序规则使用COLLATE子句指定排序规则
默认查询使用默认排序规则,不区分大小写:
MariaDB [test]> select * from OrderTest where letter='a';+--------+| letter |+--------+| A || a |+--------+2 rows in set (0.000 sec)
指定字符集,查找严格区分大小写:
MariaDB [test]> select * from OrderTest where letter collate latin1_general_cs ='a';+--------+| letter |+--------+| a |+--------+1 row in set (0.007 sec)MySQL/MariaDB排序中如何严格按照字符的码点排序?
从下面的查询,可以看到,MySQL/MariaDB的Order By子句排序查询中,指定排序列的Collation为区分大小写的latin1_general_cs,并不会使结果按照字符码点严格排序。
MariaDB [test]> select * from OrderTest order by letter;+--------+| letter |+--------+| A || a || B || b |+--------+4 rows in set (0.000 sec)MariaDB [test]> select * from OrderTest order by letter collate latin1_general_cs;+--------+| letter |+--------+| A || a || B || b |+--------+4 rows in set (0.002 sec)
而要想实现按照字符的Unicode码值排序,可以使用二进制的latin1_bin、utf8mb4_bin等排序规则。如下:
MariaDB [test]> select * from OrderTest order by letter collate latin1_bin;+--------+| letter |+--------+| A || B || a || b |+--------+4 rows in set (0.002 sec)
标签: #linux修改mysql字符集 #mysql苹果表情 #mysql中文unicode #mysql中汉字用什么类型 #mysql命令行输入中文