Bootstrap

Sqoop数据迁移出现字段顺序错误

Sqoop数据迁移将Mysql导入Hive时字段顺序错误,导致出现null乱码
问题描述

使用Sqoop将数据从mysql导入hive中,数据导入成功但出现null乱码

Sqoop命令:

# 切换到Sqoop执行目录
cd /opt/sqoop-1.4.7.bin__hadoop-2.6.0/bin/

./sqoop import \
    --connect jdbc:mysql://master:3306/bookhub?useSSL=false \
    --username root \
    --password 123456\
    --table admins \
    --hive-import  \
    --hive-table bookhub.admins \
    --target-dir /user/hive/warehouse/bookhub.db/admins \
    --fields-terminated-by '\001' 

在这里插入图片描述

原因:

出现乱码的原因是字段顺序错误,导致数据类型不对应

解决:

增加--columns id,name,sex,account,password

在Sqoop中定义数据导入Hive的字段顺序

./sqoop import \
    --connect jdbc:mysql://master:3306/bookhub?useSSL=false \
    --username root \
    --password 123456\
    --table admins \
    --hive-import  \
    --hive-table bookhub.admins \
    --target-dir /user/hive/warehouse/bookhub.db/admins \
    --fields-terminated-by '\001' \
    --columns id,name,sex,account,password
结果:

在这里插入图片描述

延伸:

shell脚本中使用循环连续导入数据

#!/usr/bin/env bash

# 数据库连接参数(这里要使用mysql的远程连接用户root)
DB_URL="jdbc:mysql://master:3306/bookhub?useSSL=false"
DB_USER="root"
DB_PASS="123456"

# 需要迁移的表列表:admins、books、stocks、users、scores、orders、applications
TABLES=("admins" "books" "stocks" "users" "scores" "orders" "applications")
# 需要迁移的表列表及其列名
TABLES_COLUMNS=(
    "admins,id,name,sex,account,password"
    "books,id,name,author,category,publisher,descr,pic,price,status"
    "stocks,id,book_id,book_condition,num"
    "users,id,nickname,account,password,status,avatar,phone,address,sex,age"
    "scores,id,book_id,user_id,score"
    "orders,id,user_id,book_id,book_condition,quantity,total_price,create_time,status,pay_method,order_number"
    "applications,id,book_id,user_id,create_time,book_condition,acq_price,result"
)

# 切换到Hadoop执行目录
cd /opt/hadoop-3.2.2/bin/

# 循环处理每个表
for table in "${TABLES[@]}"; do
    # 删除表中的数据
    sudo ./hadoop fs -rm -r /user/hive/warehouse/bookhub.db/$table
done

# 切换到Sqoop执行目录
cd /opt/sqoop-1.4.7.bin__hadoop-2.6.0/bin/

# 再次循环处理每个表,执行 Sqoop 命令
for entry in "${TABLES_COLUMNS[@]}"; do
    IFS=',' read -r table columns <<< "$entry"
    if ! sudo ./sqoop import \
        --connect $DB_URL \
        --username $DB_USER \
        --password $DB_PASS \
        --table $table \
        --delete-target-dir \
        --target-dir /user/hive/warehouse/bookhub.db/$table \
        --hive-import \
        --hive-table bookhub.$table \
        --fields-terminated-by '\001' \
        --columns $columns; then
        echo "Failed to migrate data for $table."
        continue
    fi
    echo "Data migration for $table completed."
done


ate data for $table."
        continue
    fi
    echo "Data migration for $table completed."
done
;