Tuesday, 9 December 2014

Updated Hadoop certification question

1.You decide to use Hive to process data in HDFS. You have not created any Hive tables until now. Hive is configured with its default settings. You run the following commands from the Hive shell:

CREATE DATABASE db1;
USE db1;
CREATE TABLE t1(name STRING, id INT, salary INT);


A./user/hive/db1.db/t1
B./user/hive/t1/db1.db
C./user/hive/db1/t1
D./user/hive/warehouse/db1/t1
E./user/hive/warehouse/db1.db/t1
F./user/hive/warehouse/db1t1


Explanation:
When you create a database named db1 in Hive, that creates a subdirectory of Hive’s warehouse directory named db1.db.All tables are placed under this.

2.Assume that the two lines that set the mapper’s key and value type (as indicated by the comment in the code) were commented out, and the job and JAR file were recompiled and rebuilt. What would be the final output file if this new MapReduce job were passed the same input file as in the previous question?

A.The job fails, throwing a ClassCastException
B.The job fails, throwing an IOException
C.The job fails, throwing a JobConfigurationException
D.The job fails, throwing a NumberFormatException
E.The code won’t compile

Explanation: The job has an implicit IdentityMapper that just passes along its input keys and values. Since the reducer emits a Text key and Text value, that implies that the mapper will as well, unless the map output key and value classes are set explicitly. Without those two lines, the job will fail when the reducer tries to read the map output, throwing an IOException. It doesn't throw a ClassCastException because it notices the problem while trying to read the intermediate data. See chapter 4 in Hadoop: The Definitive Guide, 3rd Edition in the Serialization section for more information


3.You submit a job to Hadoop and notice in the JobTracker’s Web UI that the Mappers are 85% complete while the reducers are 10% complete. What is the best explanation for this?

A.The progress attributed to the reducer refers to the transfer of data from completed Mappers.
B.The job is using a custom partitioner and this completion percentage refers to the progress of the partitioning operation.

Explanation: While the reduce() method is not called until all of the mappers have completed, transfer of data from completed mappers starts prior to all of the mappers having completed.

4.Which command will delete the Hive LOGIN table you just created?

A.% hive -e 'DROP TABLE Login'
B.% hive -e 'DELETE TABLE LOGIN'
C.% hive -e 'DROP LOGIN'
D.% hive -e 'REMOVE Login
E.% sqoop delete-hive-table --connect jdbc:mysql://dbhost/db --table LOGIN

Explanation: Sqoop does not offer a way to delete a table from Hive, although it will overwrite the table definition during import if the table already exists and --hive-overwrite is specified. The correct HiveQL statement to drop a table is "DROP TABLE tablename".To find out more about the Hive command line tool, see the Hive docs.

4.The example directory in HDFS is an empty directory, and the output directory does not exist. What is the result of running this job twice as follows:

% hadoop job example.jar Example example output
% hadoop job example.jar Example example output

A.Both jobs will fail
B.The first job will fail, and the second job will succeed
C.The first job will succeed and the second job will fail
D.Both jobs will succeed

Explanation: There’s no issue processing an empty directory, so the first job succeeds. As there is no input for the first job, no mappers will be run. The specified number of reducers (or 1, which is the default) will nonetheless be run, and the job will succeed. The second job fails because the output directory is created by the first job, even though there isn’t any output. 

5.Now that you have the USERS table imported into Hive, you need to make the log data available to Hive so that you can perform a join operation. Assuming you have uploaded the log data into HDFS, which approach creates a Hive table that contains the log data:

A.Create a table in the Hive shell and execute a LOAD command using org.apache.hadoop.hive.input.RegexInputFormat to load the log data into the table
B.Create a table in the Hive shell and execute the INSERT OVERWRITE command using org.apache.hadoop.hive.serde2.RegexSerDe to load the data into the table
C.Create an external table in the Hive shell using org.apache.hadoop.hive.serde2.RegexSerDe to extract the column data from the logs
D.Create an external table in the Hive shell using org.apache.hadoop.hive.input.RegexInputFormat to extract the column data from the logs

1 comment: