Skip to main content

Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Snappy

Problem:  Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Snappy

Queries:
CREATE TABLE movies_temp (
  movie_id INT, 
  title STRING,
  genres ARRAY<STRING>
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY "|"
STORED AS TEXTFILE
LOCATION 'LOCATION TO THE FILE IN HDFS';

CREATE EXTERNAL TABLE movies (
  movie_id INT, 
  title STRING,
  genres ARRAY<STRING>
) STORED AS ORC 
TBLPROPERTIES ('orc.compress'='Snappy','creator'='uvaraj', 'created_on' = '2016-12-30', 'description'='movie details');

INSERT OVERWRITE TABLE movies SELECT * FROM movies_temp;

Screenshots:
Executing hive queries using Tez as the execution engine and the error message in the hive query builder.


The highlighted error messages tells the problem statement,


Solution: The CREATE statement 'orc.compress'='Snappy' should be replaced with 'orc.compress'='SNAPPY'. It is case sensitive.

Comments

Post a Comment

Popular posts from this blog

HDFS Health Check

HDFS supports the  fsck  command to check for various inconsistencies. It is designed for reporting problems with various files, Corrupt blocks Missing blocks  Under-replicated blocks Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Command:  sudo hdfs  hdfs fsck / --files --blocks  --locations HDFS: Corrupted/Missing/Under Replicated Blocks As per the below screenshot of  fsck  output, there is 1 corrupt block, 1 missing block and 4 under-replicated blocks and status of HDFS is “CORRUPT”. These indicates HDFS health is bad and these should be addressed ASAP to recover our HDFS into HEALTHY. Corrupt block:  block is called corrupt by HDFS if it has at least one corrupt replica along with at least one live replica. As such, a corrupt block does not indicate unavailable data, but they do indicate an increased chance that data may become unavailable. Missing block: ...

Load epoch timestamp value into hive table

Problem: Load epoch timestamp into hive table. Solution:  Use BIGINT and load into temp table. Use UDF function of to_utc_timestamp() to convert it into specific timezone. Queries: CREATE EXTERNAL TABLE tags_temp ( user_id INT , movie_id INT , tag STRING, date_time BIGINT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY "|" STORED AS TEXTFILE LOCATION 'LOCATION TO THE FILE IN HDFS' ; CREATE EXTERNAL TABLE tags ( user_id INT , movie_id INT , tag STRING, date_time timestamp )STORED AS ORC TBLPROPERTIES ( 'orc.compress' = 'SNAPPY' , 'creator' = 'uvaraj' , 'created_on' = '2016-12-30' , 'description' = 'tags details' ); INSERT OVERWRITE TABLE tags SELECT user_id,movie_id,tag, to_utc_timestamp(date_time, 'UTC' ) FROM tags_temp;

Fix: Under Replicated blocks in HDFS manually

Problem: Under replicated blocks in HDFS Solution: Execute the below command to fix the under replicated blocks in HDFS, sudo -u hdfs hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files