Skip to main content

Posts

HDFS Health Check

HDFS supports the  fsck  command to check for various inconsistencies. It is designed for reporting problems with various files, Corrupt blocks Missing blocks  Under-replicated blocks Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Command:  sudo hdfs  hdfs fsck / --files --blocks  --locations HDFS: Corrupted/Missing/Under Replicated Blocks As per the below screenshot of  fsck  output, there is 1 corrupt block, 1 missing block and 4 under-replicated blocks and status of HDFS is “CORRUPT”. These indicates HDFS health is bad and these should be addressed ASAP to recover our HDFS into HEALTHY. Corrupt block:  block is called corrupt by HDFS if it has at least one corrupt replica along with at least one live replica. As such, a corrupt block does not indicate unavailable data, but they do indicate an increased chance that data may become unavailable. Missing block: ...
Recent posts

Fix: Under Replicated blocks in HDFS manually

Problem: Under replicated blocks in HDFS Solution: Execute the below command to fix the under replicated blocks in HDFS, sudo -u hdfs hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files

Running beyond physical memory limit

Problem:  Running beyond physical memory limit Container is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_e32_1486122208753_0488_01_000004 This error happens when using JOIN operator in Pig script. It indicates task container resource does not have enough physical memory to complete the Job. Solution:  Increase the tez task memory. “SET tez.task.resource.memory.mb 2048;”

HDFS Disk Usage Error

Problem: HDFS Disk Usage Error Exception: 2017-02-09 06:15:41,946 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=FAILED, progress=TotalTasks: 34 Succeeded: 32 Running: 0 Failed: 1 Killed: 1 FailedTaskAttempts: 4, diagnostics=Vertex failed, vertexName=scope-426, vertexId=vertex_1486122208753_0502_1_13, diagnostics=[Task failed, taskId=task_1486122208753_0502_1_13_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:org.apache.pig.backend.executionengine.ExecException: ERROR 2135: Received error from store function.org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/tsluc/loan_dataset/refined/_temporary/1/_temporary/attempt_148612220875313_0502_r_000000_0/part-v013-o000-r-00000 could only be replicated to 0 nodes instead of minReplication (=1).  There are 4 datanode(s) running and no node(s) are excluded in this operation. Ambari monitoring screen: Tez execution eng...

Load epoch timestamp value into hive table

Problem: Load epoch timestamp into hive table. Solution:  Use BIGINT and load into temp table. Use UDF function of to_utc_timestamp() to convert it into specific timezone. Queries: CREATE EXTERNAL TABLE tags_temp ( user_id INT , movie_id INT , tag STRING, date_time BIGINT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY "|" STORED AS TEXTFILE LOCATION 'LOCATION TO THE FILE IN HDFS' ; CREATE EXTERNAL TABLE tags ( user_id INT , movie_id INT , tag STRING, date_time timestamp )STORED AS ORC TBLPROPERTIES ( 'orc.compress' = 'SNAPPY' , 'creator' = 'uvaraj' , 'created_on' = '2016-12-30' , 'description' = 'tags details' ); INSERT OVERWRITE TABLE tags SELECT user_id,movie_id,tag, to_utc_timestamp(date_time, 'UTC' ) FROM tags_temp;

Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Snappy

Problem:    Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Snappy Queries: CREATE TABLE movies_temp ( movie_id INT , title STRING, genres ARRAY < STRING > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY "|" STORED AS TEXTFILE LOCATION 'LOCATION TO THE FILE IN HDFS' ; CREATE EXTERNAL TABLE movies ( movie_id INT , title STRING, genres ARRAY < STRING > ) STORED AS ORC TBLPROPERTIES ( 'orc.compress' = 'Snappy' , 'creator' = 'uvaraj' , 'created_on' = '2016-12-30' , 'description' = 'movie details' ); INSERT OVERWRITE TABLE movies SELECT * FROM movies_temp; Screenshots: Executing hive queries using Tez as the execution engine and the error message in the hive query builder. The highlighted error messages tells the problem statement, Sol...