Bootstrap

Phoenix 快速入门

一、Phoenix 介绍:

  • Phoenix提供了类标准Sql的方式进行操作Hbase的数据。
  • Phoenix 操作hbase有两种方式,创建表,创建视图。
    区别如下:
    创建表的话,就可以对HBase进行插入,查询,删除操作。
    视图的话,一般就只可以进行查询操作。
    虽然看起来表的功能比视图更强大一些。但就像是mysql等关系型数据库一样,删除视图操作不会影响原始表的结构。同时Phoenix的视图也支持创建二级索引相关的。
    因为使用phoenix 创建表后,会自动和hbase建立关联映射。当你使用phoenix删除和hbase之间的关系时,就会将hbase中的表也删掉了,并将与之关联的索引表一并删除。

二、phoenix 常用命令

  • 温馨提示:笔者开启了Kerberos安全认证机制,因此第一步是先对机器进行Kerberos认证;如果开启了用户访问hbase表权限相关的,通过phoenix查询hbase中的表需要开通该用户拥有访问hbase 系统表相关的权限。

注:phoenix会将没有用双引号的表名列名等转化成大写,所以如果表名跟列名为小写需用双引号括起来。

  • 2.1 进入phoenix 命令行
[root@hdp39 ~]#  cd /usr/hdp/2.5.3.0-37/phoenix/bin
[root@hdp39 bin]# ./sqlline.py hdp39,hdp40,hdp41:2181 
(集群采用的安全模式为kerberos,因而执行这条命令前进行kinit的用户必须拥有操作Hbase相关的权限)

这里写图片描述

  • 2.2、help
    查看内置命令

  • 2.3、!tables

        List all the tables in the database
    
       查看表结构 ! desc tableName
    
0: jdbc:phoenix:hdp40,hdp41,hdp39:2181> !desc "t_hbase1"
+------------+--------------+-------------+--------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+-------------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | COLUMN_NAME  | DATA_TYPE  | TYPE_NAME  | COLUMN_SIZE  | BUFFER_LENGTH  | DECIMAL_DIGITS  | NUM_PREC_RADIX  | NULLABLE  | REMARKS  | COLUMN_DEF  |
+------------+--------------+-------------+--------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+-------------+
|            |              | t_hbase1    | ROW          | 12         | VARCHAR    | null         | null           | null            | null            | 0         |          |             |
|            |              | t_hbase1    | id           | 12         | VARCHAR    | null         | null           | null            | null            | 1         |          |             |
|            |              | t_hbase1    | salary       | 12         | VARCHAR    | null         | null           | null            | null            | 1         |          |             |
|            |              | t_hbase1    | url          | 12         | VARCHAR    | null         | null           | null            | null            | 1         |          |             |
|            |              | t_hbase1    | details      | 12         | VARCHAR    | null         | null           | null            | null            | 1         |          |             |
+------------+--------------+-------------+--------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+-------------+
  • 2.4、创建表
    • 分为两种方式,一种为hbase中没有对应表,另一种为hbase中已经存在该表
    • a 创建phoenix表
 [root@hdp07 bin]# ./sqlline.py hdp40,hdp41,hdp39:2181 /tmp/us_population.sql 
  us_population.sql 内容如下所示:
  [root@hdp07 tmp]# cat us_population.sql
CREATE TABLE IF NOT EXISTS us_population (
      state CHAR(2) NOT NULL,
      city VARCHAR NOT NULL,
      population BIGINT
      CONSTRAINT my_pk PRIMARY KEY (state, city));

创建成功后可在hbase中查询,结果如下所示(给表添加了协处理coprocessor):

  hbase(main):002:0> desc 'US_POPULATION'
Table US_POPULATION is ENABLED
US_POPULATION, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRe
gionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpoi
ntImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=or
g.apache.phoenix.index.PhoenixIndexBuilder'}
COLUMN FAMILIES DESCRIPTION
{NAME => '0', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VE
RSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.1020 seconds
  • 导入数据
[root@hdp07 bin]# ./psql.py -t US_POPULATION hdp40,hdp41,hdp39:2181 /tmp/us_population.csv
  • csv文件内容如下:
[root@hdp07 tmp]# cat us_population.csv
NY,New York,8143197
CA,Los Angeles,3844829
IL,Chicago,2842518
TX,Houston,2016582
PA,Philadelphia,1463281
AZ,Phoenix,1461575
TX,San Antonio,1256509
CA,San Diego,1255540
TX,Dallas,1213825
CA,San Jose,912332

导入数据之后查看如下:
- 查看数据

phoenix shell 中查看数据如下所示:
0: jdbc:phoenix:hdp40,hdp41,hdp39:2181> select * from us_population;
+--------+---------------+-------------+
| STATE  |     CITY      | POPULATION  |
+--------+---------------+-------------+
| AZ     | Phoenix       | 1461575     |
| CA     | Los Angeles   | 3844829     |
| CA     | San Diego     | 1255540     |
| CA     | San Jose      | 912332      |
| IL     | Chicago       | 2842518     |
| NY     | New York      | 8143197     |
| PA     | Philadelphia  | 1463281     |
| TX     | Dallas        | 1213825     |
| TX     | Houston       | 2016582     |
| TX     | San Antonio   | 1256509     |
+--------+---------------+-------------+

hbase shell 下查看数据如下所示:

hbase(main):005:0> scan 'US_POPULATION'
ROW                      COLUMN+CELL
 AZPhoenix               column=0:POPULATION, timestamp=1522308898804, value=\x80\x00\x00\x00\x00\x16MG
 AZPhoenix               column=0:_0, timestamp=1522308898804, value=x
 CALos Angeles           column=0:POPULATION, timestamp=1522308898804, value=\x80\x00\x00\x00\x00:\xAA\xDD
 CALos Angeles           column=0:_0, timestamp=1522308898804, value=x
 CASan Diego             column=0:POPULATION, timestamp=1522308898804, value=\x80\x00\x00\x00\x00\x13(t
 CASan Diego             column=0:_0, timestamp=1522308898804, value=x
 CASan Jose              column=0:POPULATION, timestamp=1522308898804, value=\x80\x00\x00\x00\x00\x0D\xEB\xCC
 CASan Jose              column=0:_0, timestamp=1522308898804, value=x
 ILChicago               column=0:POPULATION, timestamp=1522308898804, value=\x80\x00\x00\x00\x00+_\x96
 ILChicago               column=0:_0, timestamp=1522308898804, value=x
 NYNew York              column=0:POPULATION, timestamp=1522308898804, value=\x80\x00\x00\x00\x00|A]
 NYNew York              column=0:_0, timestamp=1522308898804, value=x
 PAPhiladelphia          column=0:POPULATION, timestamp=1522308898804, value=\x80\x00\x00\x00\x00\x16S\xF1
 PAPhiladelphia          column=0:_0, timestamp=1522308898804, value=x
 TXDallas                column=0:POPULATION, timestamp=1522308898804, value=\x80\x00\x00\x00\x00\x12\x85\x81
 TXDallas                column=0:_0, timestamp=1522308898804, value=x
 TXHouston               column=0:POPULATION, timestamp=1522308898804, value=\x80\x00\x00\x00\x00\x1E\xC5F
 TXHouston               column=0:_0, timestamp=1522308898804, value=x
 TXSan Antonio           column=0:POPULATION, timestamp=1522308898804, value=\x80\x00\x00\x00\x00\x13,=
 TXSan Antonio           column=0:_0, timestamp=1522308898804, value=x
  • 也可以像创建表一样通过执行查询sql或者直接在phoenix shell命令行中输入对应的命令,如下所示
[root@hdp07 bin]# ./sqlline.py hdp40,hdp41,hdp39:2181 /tmp/us_population_queries.sql
.....
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
106/106 (100%) Done
Done
1/1          SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum"
FROM us_population
GROUP BY state
ORDER BY sum(population) DESC;
+--------+-------------+-----------------+
| State  | City Count  | Population Sum  |
+--------+-------------+-----------------+
| NY     | 1           | 8143197         |
| CA     | 3           | 6012701         |
| TX     | 3           | 4486916         |
| IL     | 1           | 2842518         |
| PA     | 1           | 1463281         |
| AZ     | 1           | 1461575         |
+--------+-------------+-----------------+
6 rows selected (0.122 seconds)

us_population_queries.sql 内容如下:

[root@hdp07 tmp]# cat us_population_queries.sql
SELECT state as "State",count(city) as "City Count",
sum(population) as "Population Sum"
FROM us_population
GROUP BY state
ORDER BY sum(population) DESC;
  • 插入数据
    0: jdbc:phoenix:hdp40,hdp41,hdp39:2181> UPSERT INTO us_population VALUES(‘YY’,’PHOENIX_TEST’,99999);
    1 row affected (0.046 seconds)
    注:如果插入的是同一个rowkey对应的数据,则相当于关系数据库的修改。

  • 删除数据
    0: jdbc:phoenix:hdp40,hdp41,hdp39:2181> delete from us_population where CITY=’PHOENIX_TEST’;
    1 row affected (0.05 seconds)


  • b 创建已存在的Hbase表或视图
    (phoenix 操作Hbase的数据,首先在Phoenix中创建与Hbase关联的表或视图)

    建立hbase关联的映射表(该hbase表已存在)

hbase(main):006:0> scan 'user1'
ROW                                  COLUMN+CELL                                                                                                                                                
 1                                   column=info:age, timestamp=1503298502726, value=25                                                                                                         
 1                                   column=info:name, timestamp=1503298496483, value=zhangsan                                                                                                  
 2                                   column=info:age, timestamp=1503298515289, value=22                                                                                                         
 2                                   column=info:name, timestamp=1503298508939, value=lisi                                                                                                      
 3                                   column=info:name, timestamp=1503298521481, value=wangswu 
- phoenix 命令行中建表语句:
0: jdbc:phoenix:hdp39,hdp40,hdp41:2181> 
create table "user1"( 
 "ROW" varchar primary key,
 "info"."age" varchar ,
  "info"."name" varchar); 

这里写图片描述

0: jdbc:phoenix:hdp39,hdp40,hdp41:2181> 
select * from "user1";(后可带limit)
+------+------+-----------+
| ROW  | age  |   name    |
+------+------+-----------+
| 1    | 25   | zhangsan  |
| 2    | 22   | lisi      |
| 3    |      | wangswu   |
+------+------+-----------+
3 rows selected (0.083 seconds)

**注:查询的时候表名或字段名要用双引号(单引号会报错),不带引号会自动被转换成大写**

建立t_hbase10对应的phoenix 关联表,该表共有1000万条数据,表已存在Hbase中
create table "t_hbase10"("ROW" varchar primary key, "info"."id" varchar , "info"."salary" varchar);  
  • 创建phoenix之前

这里写图片描述

  • 创建phoenix之后

这里写图片描述

  • 执行查询sql,并观察其对应的查询速度
0: jdbc:phoenix:hdp39,hdp40,hdp41:2181>
 select count("id") from "t_hbase10";
Error: Operation timed out. (state=TIM01,code=6000)
java.sql.SQLTimeoutException: Operation timed out.

0: jdbc:phoenix:hdp39,hdp40,hdp41:2181> 
select * from "t_hbase10" limit 2;
+----------+-----+---------+
|   ROW    | id  | salary  |
+----------+-----+---------+
| rowKey0  | B0  | 0       |
| rowKey1  | B1  | 1       |
+----------+-----+---------+
2 rows selected (0.42 seconds)
0: jdbc:phoenix:hdp39,hdp40,hdp41:2181> 
select count(1) from "t_hbase10";
+-----------+
| COUNT(1)  |
+-----------+
| 10000000  |
+-----------+
1 row selected (5.571 seconds)


0: jdbc:phoenix:hdp39,hdp40,hdp41:2181> 
select * from "t_hbase10" where "id"='B100000';
+---------------+----------+---------+
|      ROW      |    id    | salary  |
+---------------+----------+---------+
| rowKey100000  | B100000  | 100000  |
+---------------+----------+---------+
1 row selected (51.416 seconds)
0: jdbc:phoenix:hdp39,hdp40,hdp41:2181> 
select * from "t_hbase10" where "ROW"='rowKey100000';
+---------------+----------+---------+
|      ROW      |    id    | salary  |
+---------------+----------+---------+
| rowKey100000  | B100000  | 100000  |
+---------------+----------+---------+
1 row selected (0.042 seconds)

给表建立索引:

CREATE INDEX id_name ON  "t_hbase10"("colfamily"."id");

参考文档:

利用phoenix进行Hbase数据访问

phoenix中新建表,以及hbase已有表与phoenix映射

Phoenix官方常用命令

;