In this post we will discuss about handling Null value during sqoop import/export.
If any value is NULL in the table and we want to sqoop that table ,then sqoop will import NULL value as string “null” in HDFS. So , that will create problem to use Null condition in our query using hive
For example: – Lets insert NULL value to mysql table “cities”.
mysql> insert into cities values(6,7,NULL);
mysql> select * from cities;
By default ,Sqoop will import NULL value as string “null” in HDFS.
Lets sqoop and see what happens:–
sqoop import –connect jdbc:mysql://localhost:3306/sqoop –username sqoop -P –table cities –hive-import –hive-overwrite –hive-table vikas.cities -m 1
After executing above sqoop import command . We will verify in HDFS the sqooped data.
As we can see that this string “null” not NULL
So if we query on table in hive we will get string “null”.
So , after including the conditions “is not null” or “is null” in hive query, we will not get appropriate result.
hive> select * from cities where city is not null;
hive> select * from cities where city is not NULL;
Please refer below screenshot, which shows the output for above queries, which is not coming as per requirement.
To avoid this scenario , please find the solution below.
Solution:-
As we know that ,hive stores any NULL value as “\n “ in HDFS.
So while doing sqoop we have to handle NULL value using the same parameter in sqoop command .
$ sqoop import … –null-string ‘\\N’ –null-non-string ‘\\N’
So final sqoop import command will be as below:–
sqoop import –connect jdbc:mysql://localhost:3306/sqoop –username sqoop -P –table cities –hive-import –hive-overwrite –null-string ‘\\N’ –null-non-string ‘\\N’ –hive-table vikas.cities -m 1
Data in HDFS:-
Hive query:– As per below screenshot ,we can see that query dealing with NULL value is working fine now.
Similarly we can export without having any issue with NULL as below:-
Issue:- Now we have “\N” value in HDFS and if we want to sqoop this value to mysql , it will come as “\N” not NULL value.
Sqoop export command:
sqoop export –connect jdbc:mysql://localhost:3306/sqoop \
–username sqoop \
-P \
–table citiestest \
–export-dir /user/hive/warehouse/vikas.db/cities/ \
–input-fields-terminated-by ‘\001’
The values after export from HDFS to Mysql:-
In Order to avoid the above scenario, we have to include below parameter in our sqoop export command.
–input-null-string ‘\\N’ –input-null-non-string ‘\\N’
Sqoop export command:-
sqoop export –connect jdbc:mysql://localhost:3306/sqoop –username sqoop -P –table citiestest –export-dir /user/hive/warehouse/vikas.db/cities/ –input-fields-terminated-by ‘\001’ –input-null-string ‘\\N’ –input-null-non-string ‘\\N’
After executing the above sqoop export command , now we can see that value is coming as NULL in table instead of “\N” (expected result)
Comments
Post a Comment