Techie Savior

Posts

Fix: Under Replicated blocks in HDFS manually

Problem: Under replicated blocks in HDFS Solution: Execute the below command to fix the under replicated blocks in HDFS, sudo -u hdfs hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files

Running beyond physical memory limit

Problem: Running beyond physical memory limit Container is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_e32_1486122208753_0488_01_000004 This error happens when using JOIN operator in Pig script. It indicates task container resource does not have enough physical memory to complete the Job. Solution: Increase the tez task memory. “SET tez.task.resource.memory.mb 2048;”

HDFS Disk Usage Error

Problem: HDFS Disk Usage Error Exception: 2017-02-09 06:15:41,946 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=FAILED, progress=TotalTasks: 34 Succeeded: 32 Running: 0 Failed: 1 Killed: 1 FailedTaskAttempts: 4, diagnostics=Vertex failed, vertexName=scope-426, vertexId=vertex_1486122208753_0502_1_13, diagnostics=[Task failed, taskId=task_1486122208753_0502_1_13_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:org.apache.pig.backend.executionengine.ExecException: ERROR 2135: Received error from store function.org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/tsluc/loan_dataset/refined/_temporary/1/_temporary/attempt_148612220875313_0502_r_000000_0/part-v013-o000-r-00000 could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation. Ambari monitoring screen: Tez execution eng...

Load epoch timestamp value into hive table

Problem: Load epoch timestamp into hive table. Solution: Use BIGINT and load into temp table. Use UDF function of to_utc_timestamp() to convert it into specific timezone. Queries: CREATE EXTERNAL TABLE tags_temp ( user_id INT , movie_id INT , tag STRING, date_time BIGINT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY "|" STORED AS TEXTFILE LOCATION 'LOCATION TO THE FILE IN HDFS' ; CREATE EXTERNAL TABLE tags ( user_id INT , movie_id INT , tag STRING, date_time timestamp )STORED AS ORC TBLPROPERTIES ( 'orc.compress' = 'SNAPPY' , 'creator' = 'uvaraj' , 'created_on' = '2016-12-30' , 'description' = 'tags details' ); INSERT OVERWRITE TABLE tags SELECT user_id,movie_id,tag, to_utc_timestamp(date_time, 'UTC' ) FROM tags_temp;

Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Snappy

Problem: Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Snappy Queries: CREATE TABLE movies_temp ( movie_id INT , title STRING, genres ARRAY < STRING > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY "|" STORED AS TEXTFILE LOCATION 'LOCATION TO THE FILE IN HDFS' ; CREATE EXTERNAL TABLE movies ( movie_id INT , title STRING, genres ARRAY < STRING > ) STORED AS ORC TBLPROPERTIES ( 'orc.compress' = 'Snappy' , 'creator' = 'uvaraj' , 'created_on' = '2016-12-30' , 'description' = 'movie details' ); INSERT OVERWRITE TABLE movies SELECT * FROM movies_temp; Screenshots: Executing hive queries using Tez as the execution engine and the error message in the hive query builder. The highlighted error messages tells the problem statement, Sol...