Sqoop2简介

  |   0 评论   |   2,164 浏览

Sqoop简介

Sqoop是一款高效地在结构化、半结构化和非结构化的数据源之间进行数据迁移的工具,结构化数据的例子包括有着良好数据定义的关系型数据库,半结构化的例子有Cassandra, Hbase等;Sqoop支持的非结构化数据包括HDFS;
Sqoop可以充分利用MapReduce并行特点以批处理的方式加快数据传输,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。

Sqoop1和Sqoop2的区别

  1. sqoop2引入了sqoop server(具体服务器为tomcat),对connector实现了集中的管理。
  2. 其访问方式也变得多样化了,其可以通过REST API、JAVA API、WEB UI以及CLI控制台方式进行访问。
  3. 另外,其在安全性能方面也有一定的改善,在sqoop1中我们经常用脚本的方式将HDFS中的数据导入到mysql中,或者反过来将mysql数据导入到HDFS中,其中在脚本里边都要显示指定mysql数据库的用户名和密码的,安全性做的不是太完善。在sqoop2中,如果是通过CLI方式访问的话,会有一个交互过程界面,你输入的密码信息不被看到

    Sqoop1架构

    image

    Sqoop2架构

    image

Sqoop2安装包分为两个部分:服务器和客户端,你需要在Hadoop集群上的一个节点上(一般是Master节点)上安装Sqoop服务器,客户端安装不依赖Hadoop集群。Sqoop服务端扮演着从客户端收集的入口点,同时也充当着MapReduce的客户机。

Sqoop1的简单实用举例
import是将关系数据库迁移到HDFS上

sqoop import --connect jdbc:mysql://192.168.81.176/hivemeta2db --username root -password passwd --table sds --where "sd_id > 100"

export是import的反向过程,将hdfs上的数据导入到关系数据库中

sqoop export --connect jdbc:mysql://192.168.81.176/sqoop --username root -password passwd --table sds --export-dir /user/guojian/sds --staging-table sds_tmp

启动Sqoop2 Server

[hadoop@gateway2 sqoop]$ sqoop.sh server start
Sqoop home directory: /usr/local/sqoop
Setting SQOOP_HTTP_PORT:     12000
Setting SQOOP_ADMIN_PORT:     12001
Using   CATALINA_OPTS:       
Adding to CATALINA_OPTS:    -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001
Using CATALINA_BASE:   /usr/local/sqoop/server
Using CATALINA_HOME:   /usr/local/sqoop/server
Using CATALINA_TMPDIR: /usr/local/sqoop/server/temp
Using JRE_HOME:        /usr/local/jdk
Using CLASSPATH:       /usr/local/sqoop/server/bin/bootstrap.jar

Sqoop2 Client操作

启动Client

[hadoop@gateway2 sqoop]$ sqoop.sh client
Sqoop home directory: /usr/local/sqoop
Sqoop Shell: Type 'help' or '\h' for help.
sqoop:000> show version --all
client version:
Sqoop 1.99.5-cdh5.9.0 source revision Unknown 
Compiled by root on Tue Jan 10 11:52:45 CST 2017
server version:
Sqoop 1.99.5-cdh5.9.0 source revision Unknown 
Compiled by root on Tue Jan 10 11:52:45 CST 2017
API versions:
[v1]

查看支持的Connnector

sqoop:000> show connector
+----+------------------------+-----------------+------------------------------------------------------+----------------------+
| Id | Name | Version | Class | Supported Directions |
+----+------------------------+-----------------+------------------------------------------------------+----------------------+
| 1 | generic-jdbc-connector | 1.99.5-cdh5.9.0 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
| 2 | kite-connector | 1.99.5-cdh5.9.0 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| 3 | hdfs-connector | 1.99.5-cdh5.9.0 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| 4 | kafka-connector | 1.99.5-cdh5.9.0 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
+----+------------------------+-----------------+------------------------------------------------------+----------------------+

查看所有命令

sqoop:000> help
For information about Sqoop, visit: http://sqoop.apache.org/

Available commands:
  exit    (\x  ) Exit the shell
  history (\H  ) Display, manage and recall edit-line history
  help    (\h  ) Display this help message
  set     (\st ) Configure various client options and settings
  show    (\sh ) Display various objects and configuration options
  create  (\cr ) Create new object in Sqoop repository
  delete  (\d  ) Delete existing object in Sqoop repository
  update  (\up ) Update objects in Sqoop repository
  clone   (\cl ) Create new object based on existing one
  start   (\sta) Start job
  stop    (\stp) Stop job
  status  (\stu) Display status of a job
  enable  (\en ) Enable object in Sqoop repository
  disable (\di ) Disable object in Sqoop repository

For help on a specific command type: help command

示例

创建JDBC Link

sqoop:000> create link -c 1
Creating link for connector with id 1
Please fill following values to create new link object
Name: gateway2_test
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://192.168.10.10:3306/test?useUnicode=true&characterEncoding=UTF-8
Username: test
Password: ****
JDBC Connection Properties: 
There are currently 0 values in the map:
entry# 
New link was successfully created with validation status OK and persistent id 1

创建HDFS Link

sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: hdfs_root
Link configuration
HDFS URI: hdfs://ns1/
New link was successfully created with validation status OK and persistent id 2

查看所有的link

sqoop:000> show link --all 
2 link(s) to show: 
link with id 1 and name gateway2_test (Enabled: true, Created by caixiaoyu at 17-1-22 下午6:17, Updated by caixiaoyu at 17-1-22 下午6:17)
Using Connector generic-jdbc-connector with id 1
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://192.168.10.10:3306/test?useUnicode=true&characterEncoding=UTF-8
Username: test
Password: 
JDBC Connection Properties: 
link with id 2 and name hdfs_root (Enabled: true, Created by caixiaoyu at 17-1-22 下午6:19, Updated by caixiaoyu at 17-1-22 下午6:19)
Using Connector hdfs-connector with id 3
Link configuration
HDFS URI: hdfs://ns1/

创建Job

sqoop:000> create job -f 2 -t 1
Creating job for links with from id 2 and to id 1
Please fill following values to create new job object
Name: hdfs2mysql_common.feature
From Job configuration
Input directory: /tmp/caixiaoyu/feature
Override null value: 
Null value:
To database configuration
Schema name: test
Table name: feature
Table SQL statement: 
Table column names: 
Stage table name: 
Should clear stage table:
Throttling resources
Extractors: 
Loaders: 
New job was successfully created with validation status OK and persistent id 1

查看所有Job

sqoop:000> show job -all
1 job(s) to show: 
Job with id 2 and name hdfs2mysql_common.feature (Enabled: true, Created by caixiaoyu at 17-1-22 下午6:35, Updated by caixiaoyu at 17-1-22 下午6:35)
Using link id 2 and Connector id 3
  From Job configuration
    Input directory: /tmp/caixiaoyu/feature
    Override null value: 
    Null value: 
  Throttling resources
    Extractors: 
    Loaders: 
  To database configuration
    Schema name: test
    Table name: feature
    Table SQL statement: 
    Table column names: 
    Stage table name: 
    Should clear stage table:

启动Job

sqoop:000> start job -j 2 -s
Submission details
Job ID: 2
Server URL: http://localhost:12000/sqoop/
Created by: caixiaoyu
Creation date: 2017-01-22 18:50:47 CST
Lastly updated by: caixiaoyu
External ID: job_1484048632162_13377
http://192.168.10.11:8041/proxy/application_1484048632162_13377/
2017-01-22 18:50:47 CST: BOOTING - Progress is not available
2017-01-22 18:56:26 CST: BOOTING - 0.00 %
2017-01-22 18:56:36 CST: RUNNING - 0.00 %
2017-01-22 18:56:46 CST: SUCCEEDED 
Counters:
org.apache.hadoop.mapreduce.FileSystemCounter
FILE_LARGE_READ_OPS: 0
FILE_WRITE_OPS: 0
HDFS_READ_OPS: 39
HDFS_BYTES_READ: 20025
HDFS_LARGE_READ_OPS: 0
FILE_READ_OPS: 0
FILE_BYTES_WRITTEN: 1232583
FILE_BYTES_READ: 0
HDFS_WRITE_OPS: 0
HDFS_BYTES_WRITTEN: 0
org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
BYTES_WRITTEN: 0
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ: 0
org.apache.hadoop.mapreduce.JobCounter
TOTAL_LAUNCHED_MAPS: 9
MB_MILLIS_MAPS: 54996480
SLOTS_MILLIS_REDUCES: 0
VCORES_MILLIS_MAPS: 35805
SLOTS_MILLIS_MAPS: 71610
OTHER_LOCAL_MAPS: 9
MILLIS_MAPS: 35805
org.apache.sqoop.submission.counter.SqoopCounters
ROWS_READ: 51
ROWS_WRITTEN: 51
org.apache.hadoop.mapreduce.TaskCounter
SPILLED_RECORDS: 0
MERGED_MAP_OUTPUTS: 0
VIRTUAL_MEMORY_BYTES: 31632576512
MAP_INPUT_RECORDS: 0
SPLIT_RAW_BYTES: 1478
MAP_OUTPUT_RECORDS: 51
FAILED_SHUFFLE: 0
PHYSICAL_MEMORY_BYTES: 3226644480
GC_TIME_MILLIS: 804
CPU_MILLISECONDS: 12770
COMMITTED_HEAP_BYTES: 9128378368
Job executed successfully

查看MySQL结果

mysql> select * from feature;
+---------+-----------------+----------------+----------+-------------------------+------------+-------------+----------------+
| feature | feature_desc | online_display | sortrank | changetime | featurepic | isrecommend | over_seas_sort |
+---------+-----------------+----------------+----------+-------------------------+------------+-------------+----------------+
| 23 | 依山 | T | 0 | 2014-05-26 14:06:00.367 | NULL | NULL | NULL |
| 24 | 傍水 | T | 0 | 2014-05-26 14:06:06.913 | NULL | NULL | NULL |
| 25 | 赏花 | T | 0 | 2014-05-26 14:07:42.453 | NULL | NULL | NULL |

读后有收获可以支付宝请作者喝咖啡