龙空技术网

Doris从理论详解到千万级数据量场景使用(三)

尚硅谷教育 148

前言:

现在你们对“ge622qfapachepro”可能比较关心,看官们都想要分析一些“ge622qfapachepro”的相关资讯。那么小编也在网上收集了一些对于“ge622qfapachepro””的相关内容,希望咱们能喜欢,各位老铁们一起来学习一下吧!

Doris测试与实战

命令行操作

```shell### wm_doris01上启动fe[doris01]$ sh /opt/module/doris/doris0.12.21/fe/bin/start_fe.sh  --daemon[doris01]$ jps    19929Jps    19871PaloFe[doris02]$ sh /opt/module/doris/doris0.12.21/fe/bin/start_fe.sh --helper doris01:9010--daemon[doris03]$ sh /opt/module/doris/doris0.12.21/fe/bin/start_fe.sh --helper doris01:9010--daemon### wm_doris01,wm_doris02,wm_doris03上启动be[doris01]$ sh /opt/module/doris/doris0.12.21/be/bin/start_be.sh  --daemon[doris02]$ sh /opt/module/doris/doris0.12.21/be/bin/start_be.sh  --daemon[doris03]$ sh /opt/module/doris/doris0.12.21/be/bin/start_be.sh  --daemon[doris01]$ mysql -hdoris01 -P 9030 -urootmysql>SET PASSWORD FOR 'root' =PASSWORD('niaoshu123456');mysql> exit; mysql -hdoris01 -P 9030 -uroot -niaoshu123456mysql> alter system add backend "doris01:9050";mysql> alter system add backend "doris02:9050";mysql> alter system add backend "doris03:9050";mysql> alter system add broker niaoshu_broker_hdfs "doris01:8000","doris02:8000","doris03:8000";mysql> alter system add follower "doris02:9010";mysql> alter system add observer "doris03:9010";### 从节点首次启动sh /opt/module/doris/doris0.12.21/fe/bin/start_fe.sh --helper doris01:9010 --daemon[节点2,节点3] mysql>show proc '/backends';    ###查看Be状态mysql>show proc '/brokers';     ###查看Broker状态mysql>show proc '/frontends';   ###查看Fe状态```

注意

① 增加FE节点,FE分为Leader,Follower和Observer三种角色。默认一个集群只能有一个Leader,可以有多个Follower和Observer.其中Leader和Follower组成一个Paxos选择组,如果Leader宕机,则剩下的Follower会成为Leader,保证HA。Observer是负责同步Leader数据的不参与选举。如果只部署一个FE,则FE默认就是Leader。

② 在doris02再部署一台FE,doris03上部署Observer。--helper参数指定leader地址和端口号。

③ 增加BE节点,就像上面安装一样在mysql客户端,使用ALTER SYSTEM ADD BACKEND语句即可,相对应删除BE节点,使用==ALTER SYSTEM DECOMMISSION BACKEND"be_host:be_heartbeat_service_port"。

```shell###删除节点mysql> alter system drop follower "hadoop01:9010";mysql> alter system drop observer "hadoop01:9010";## 删除对应的 FE 节点: [ALTER SYSTEM DROP FOLLOWER[OBSERVER] "fe_host:edit_log_port";]### 提示:删除 Follower FE 时,确保最终剩余的 Follower(包括 Leader)节点为奇数#### 删除 BE 节点mysql> ALTER SYSTEM DECOMMISSION BACKEND "hadoop01:9050"[ALTER SYSTEM DECOMMISSION BACKEND "be_host:be_heartbeat_service_port";]### borker删除mysql> ALTER SYSTEM DROP BROKER broker_name "hadoop01:9050";[ALTER SYSTEM DROP BROKER broker_name "broker_host:broker_ipc_port";][ALTER SYSTEM DROP ALL BROKER broker_name;]```

提示:ALTER SYSTEMDROP BACKEND "be_host:be_heartbeat_service_port";注意:DROP BACKEND 会直接删除该 BE,并且其上的数据将不能再恢复!!!所以我们强烈不推荐使用 DROP BACKEND 这种方式删除 BE 节点。当你使用这个语句时,会有对应的防误操作提示。

重点说明:当做HA是其他节点首次启动没有使用 --helper 的解决办法,把alive值为false的节点的doris-mete文件夹下的所有文件删除[建议留一个备份]

然后重新使用以下命令启动:

sh /opt/module/doris/fe/bin/start_fe.sh --helper doris01:9010 --daemon

建表

```sqlcreate table niaoshu.niaoshu_snap (   uid int comment '用户uid',   tid int comment '老师id',   aid varchar(32) comment 'aid')duplicate key(uid)distributed by hash(uid) buckets 10;create table niaoshu.niaoshu_mid( uid int COMMENT '用户uid', k_id varchar(32) COMMENT '课程id',)duplicate key(uid)distributed by hash(uid) buckets 10;```

流式导入数据

```sqlcurl --location-trusted -u root -H "label:123" -H "column_separator:," -T naoshu.csv -X PUT  --location-trusted -u root -H "label:niaoshu_mid1" -H "column_separator:," -T niaoshu.csv-X PUT ```

第三方导入方式

尖叫提示: doris中除了olap引擎,其他引擎不存储数据。

1.直接连接mysql中的数据表,建立一样的数据模型

```sqlCREATE TABLE niaoshu.niaoshu_1(        par_date int,        a_id varchar(32) COMMENT 'aid',       b_id varchar(32) COMMENT'bid',       c_id varchar(32) COMMENT'cid',       d_id varchar(32)  COMMENT 'did'   ENGINE=mysql   PROPERTIES   (   "host" = "192.168.10.1",   "port" = "3306",   "user" = "root",   "password" = "abcd123456",   "database" = "niaoshu",   "table" = "nioashu_mid"   );

2.创建一个数据文件存储在HDFS上的 broker 外部表, 数据使用 "\t"分割,"\n" 换行

尖叫提示:==外部表只支持csv和textfile类型==

```sqlCREATE EXTERNAL TABLE niaoshu.niaoshu_2(       id  VARCHAR(32) COMMENT 'id',       a_type VARCHAR(32) COMMENT '学生课程类型',       b_name VARCHAR(32) COMMENT '学生姓名'    )ENGINE=broker   PROPERTIES (   "broker_name" = "niaoshu", "path" ="hdfs://hadoop:8020/user/hive/warehouse/wm_ods.db/niaoshu_test/*",   "column_separator" = "\t",   "line_delimiter" = "\n"   )   BROKER PROPERTIES (   "username" = "hadoop",   "password" = "abcd1234"   )```

统计hive中的数据;

以使用hive的外部表查找统计21条数据用时 1.78s。

3.使用brokers方式把数据导入doris中

建立动态分区表

```sqlcreate table niaoshu.niaoshu3( uid int COMMENT '用户id', name varchar(526) COMMENT '昵称', phone varchar(32) COMMENT '手机号')ENGINE=olapDUPLICATE KEY(par_date,uid)COMMENT 'hdfs中的数据导入doris-test'PARTITION BY RANGE(par_date)()DISTRIBUTED BY HASH(uid) BUCKETS 9 PROPERTIES(   "storage_medium" = "SSD",   "dynamic_partition.time_unit" = "DAY",   "dynamic_partition.end" = "7",   "dynamic_partition.prefix" = "p",   "dynamic_partition.buckets" = "64"    );         -- "dynamic_partition.start" = "-700",```

尖叫提示:==属性中的字段必须发在建表中的最前面==

查看动态分区:

SHOW DYNAMIC PARTITION TABLES;

1.展示指定db下指定表的所有分区信息

SHOW PARTITIONS FROMexample_db.table_name;

2.展示指定db下指定表的指定分区的信息

SHOW PARTITIONS FROM example_db.table_nameWHERE PartitionName = "p1";

3.手动添加分区,需要先关闭动态分区:

ALTER TABLE dw_t_xby_wxuser_snap  SET ("dynamic_partition.enable" ="false")

否则报错:Key columnsshould be a ordered prefix of the schema

分桶的字段必须是三个模型中的key。ex:DUPLICATE KEY

从hdfs 导入doris中, 6228517 ≈622W+ 用时2分钟左右,

plan_fragment_executor.cpp:555]Fragment 30b2646b7088479f-bd4d793a9b9dda65:(Active: 4s576ms, % non-child:0.00%)

- AverageThreadTokens: 1.00

- BytesReceived: 6.16 MB

- DeserializeRowBatchTimer: 21.636ms

- FirstBatchArrivalWaitTime: 0.000ns

- PeakReservation: 0

- PeakUsedReservation: 0

- RowsProduced: 140.09K

- SendersBlockedTimer: 0.000ns

- SendersBlockedTotalTimer(*): 0.000ns

BlockMgr:

- BlockWritesOutstanding: 0

- BlocksCreated: 0

- BlocksRecycled: 0

- BufferedPins: 0

- BytesWritten: 0

- MaxBlockSize: 8.00 MB

- MemoryLimit: 2.00 GB

- TotalBufferWaitTime: 0.000ns

- TotalEncryptionTime: 0.000ns

- TotalIntegrityCheckTime: 0.000ns

- TotalReadBlockTime: 0.000ns

DataBufferSender(dst_fragment_instance_id=30b2646b7088479f-bd4d793a9b9dda65):

EXCHANGE_NODE (id=8):(Active: 4s501ms, % non-child: 0.00%)

- ConvertRowBatchTime: 1.920ms

- MemoryUsed: 0

- RowsReturned: 140.09K

- RowsReturnedRate: 31.12 K/sec

使用show load查看load 任务,帮助命令:HELP SHOW LOAD;

定时每天导入hdfs数据进入doris中

```sql#!/bin/sh ### 每日执行生成唯一标签label_date=niaoshu_`date +%Y%m%d%H%M%S` export MYSQL_PWD='niaoshu123456' ### 用户拼接hadoop中的数据路径dt=`date -d '-1 day' +%Y%m%d` ### 用户写入doris中分区的时间par_date=`date -d '-1 day' +%Y-%m-%d` mysql -uroot -h doris01 -P 9030 -e"\ use wm_dws_db; LOAD LABEL wm_dws_db.$label_date ( DATA INFILE('hdfs://h_cluseter/user/hive/warehouse/naioshu.db/niaoshu3/par_date=$dt/*') INTO TABLE niaoshu3 FORMAT AS 'orc' (userid,name,phone)                                 SET(par_date=DATE_ADD(curdate(),interval -1day),uid=userid,name=name,phone=phone) )  WITH BROKER 'niaoshu_broker_hdfs' ( 'username'='niaoshu', 'password'='niaoshu123456', 'dfs.nameservices'='h_cluseter', 'dfs.ha.namenodes.emr-cluster'='nn1,nn2', 'dfs.namenode.rpc-address.emr-cluster.nn1'='hadoop01:8020', 'dfs.namenode.rpc-address.emr-cluster.nn2'='hadoop02:8020', 'dfs.client.failover.proxy.provider'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' )  PROPERTIES ( 'timeout'='3600', 'max_filter_ratio'='0.001' ); "```

查看具体的执行效果

$ mysql -hdoris01 -P 9030 -uroot-pniaoshu123456 mysql> use wm_dws_db mysql>  SHOW LOAD;

4.从kafka中导入数据进入到doris

①创建表

```sqlCREATE TABLE niaoshu.niaoshu4( par_date date COMMENT '触发日期', uid bigint COMMENT '游客默认0', a_name varchar(32) COMMENT '模块名', b_time datetime COMMENT '触发时间', c_os varchar(32) COMMENT '操作系统', c_ip varchar(32) COMMENT '客户端地址' ) DUPLICATE KEY(par_date,uid,a_name) PARTITION BY RANGE(par_date)() DISTRIBUTED BY HASH(par_date,uid,a_name) BUCKETS 32 PROPERTIES(   "storage_medium" = "SSD",   "dynamic_partition.enable" = "true",   "dynamic_partition.time_unit" = "DAY",   "dynamic_partition.end" = "3",   "dynamic_partition.prefix" = "p",    "dynamic_partition.buckets" ="12",   "replication_num" = "3" );```

②创建导入任务

```sqlCREATE ROUTINE LOADniaoshu.niaoshu04002 ON niaoshu04COLUMNS(p_tmp,uid,a_name,b_time,c_os,c_ip,par_date=from_unixtime(cast(p_tmpas int),'%Y-%m-%d'),event_time=from_unixtime(cast(p_tmp as int))),PROPERTIES(    "desired_concurrent_number"="24",    "strict_mode"= "false",    "max_batch_interval"= "20",   "max_batch_rows" = "300000",   "max_batch_size" = "209715200",   "format" = "json",   "jsonpaths"="[\"$.date_time\",\"$.uid\",\"$.a_name\",\"$.b_time\",\"$.c_os\",\"$.c_ip\"]")FROM KAFKA(   "kafka_broker_list"="hadoop01:9092,hadoop02:9092,hadoop03:9092",    "kafka_topic" ="testjson",   "property.group.id"="niaoshu_data_group_1",   "property.kafka_default_offsets" = "OFFSET_END",   "property.enable.auto.commit"="true",   "property.client.id"="niaoshu_id");```

查看加载

```sqlSHOW ALL ROUTINE LOAD;###具体命令和示例可以通过HELP SHOW ROUTINE LOAD TASK;HELP STOP ROUTINE LOAD;HELP PAUSE ROUTINE LOAD;HELP RESUME ROUTINE LOAD; ####修改表的字段类型### 添加表字段alter table niaoshu add columnstudy_time varchar(64) comment '学习时间'####重启任务 RESUME ROUTINE LOAD  for niaoshu04002; ###暂停任务PAUSE ROUTINE LOAD;```

kafka to doris 成功

注:kafka to doris 数据表有近7000万数据的实时查询。

Hue中集成doris,安装mysql添加方式添加即可。

大数据的切片机制有哪些

大数据之Kafka集群部署

大数据学习之部署Hadoop

大数据logstsh架构

大数据面试相关生产经验—热点问题

标签: #ge622qfapachepro #apache526