Sqoop数据迁移将Mysql导入Hive时字段顺序错误,导致出现null乱码
问题描述
使用Sqoop将数据从mysql导入hive中,数据导入成功但出现null乱码
Sqoop命令:
# 切换到Sqoop执行目录
cd /opt/sqoop-1.4.7.bin__hadoop-2.6.0/bin/
./sqoop import \
--connect jdbc:mysql://master:3306/bookhub?useSSL=false \
--username root \
--password 123456\
--table admins \
--hive-import \
--hive-table bookhub.admins \
--target-dir /user/hive/warehouse/bookhub.db/admins \
--fields-terminated-by '\001'
原因:
出现乱码的原因是字段顺序错误,导致数据类型不对应
解决:
增加--columns id,name,sex,account,password
在Sqoop中定义数据导入Hive的字段顺序
./sqoop import \
--connect jdbc:mysql://master:3306/bookhub?useSSL=false \
--username root \
--password 123456\
--table admins \
--hive-import \
--hive-table bookhub.admins \
--target-dir /user/hive/warehouse/bookhub.db/admins \
--fields-terminated-by '\001' \
--columns id,name,sex,account,password
结果:
延伸:
shell脚本中使用循环连续导入数据
#!/usr/bin/env bash
# 数据库连接参数(这里要使用mysql的远程连接用户root)
DB_URL="jdbc:mysql://master:3306/bookhub?useSSL=false"
DB_USER="root"
DB_PASS="123456"
# 需要迁移的表列表:admins、books、stocks、users、scores、orders、applications
TABLES=("admins" "books" "stocks" "users" "scores" "orders" "applications")
# 需要迁移的表列表及其列名
TABLES_COLUMNS=(
"admins,id,name,sex,account,password"
"books,id,name,author,category,publisher,descr,pic,price,status"
"stocks,id,book_id,book_condition,num"
"users,id,nickname,account,password,status,avatar,phone,address,sex,age"
"scores,id,book_id,user_id,score"
"orders,id,user_id,book_id,book_condition,quantity,total_price,create_time,status,pay_method,order_number"
"applications,id,book_id,user_id,create_time,book_condition,acq_price,result"
)
# 切换到Hadoop执行目录
cd /opt/hadoop-3.2.2/bin/
# 循环处理每个表
for table in "${TABLES[@]}"; do
# 删除表中的数据
sudo ./hadoop fs -rm -r /user/hive/warehouse/bookhub.db/$table
done
# 切换到Sqoop执行目录
cd /opt/sqoop-1.4.7.bin__hadoop-2.6.0/bin/
# 再次循环处理每个表,执行 Sqoop 命令
for entry in "${TABLES_COLUMNS[@]}"; do
IFS=',' read -r table columns <<< "$entry"
if ! sudo ./sqoop import \
--connect $DB_URL \
--username $DB_USER \
--password $DB_PASS \
--table $table \
--delete-target-dir \
--target-dir /user/hive/warehouse/bookhub.db/$table \
--hive-import \
--hive-table bookhub.$table \
--fields-terminated-by '\001' \
--columns $columns; then
echo "Failed to migrate data for $table."
continue
fi
echo "Data migration for $table completed."
done
ate data for $table."
continue
fi
echo "Data migration for $table completed."
done