Skip to main content

Load epoch timestamp value into hive table

Problem: Load epoch timestamp into hive table.

Solution: 
  • Use BIGINT and load into temp table.
  • Use UDF function of to_utc_timestamp() to convert it into specific timezone.
Queries:
CREATE EXTERNAL TABLE tags_temp (
  user_id INT, 
  movie_id INT,
  tag STRING,
  date_time BIGINT
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY "|"
STORED AS TEXTFILE
LOCATION 'LOCATION TO THE FILE IN HDFS';

CREATE EXTERNAL TABLE tags (
  user_id INT, 
  movie_id INT,
  tag STRING,
  date_time timestamp
)STORED AS ORC 
TBLPROPERTIES ('orc.compress'='SNAPPY','creator'='uvaraj', 'created_on' = '2016-12-30', 'description'='tags details');

INSERT OVERWRITE TABLE tags SELECT user_id,movie_id,tag,
to_utc_timestamp(date_time,'UTC') FROM tags_temp;

Comments

Popular posts from this blog

HDFS Health Check

HDFS supports the  fsck  command to check for various inconsistencies. It is designed for reporting problems with various files, Corrupt blocks Missing blocks  Under-replicated blocks Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Command:  sudo hdfs  hdfs fsck / --files --blocks  --locations HDFS: Corrupted/Missing/Under Replicated Blocks As per the below screenshot of  fsck  output, there is 1 corrupt block, 1 missing block and 4 under-replicated blocks and status of HDFS is “CORRUPT”. These indicates HDFS health is bad and these should be addressed ASAP to recover our HDFS into HEALTHY. Corrupt block:  block is called corrupt by HDFS if it has at least one corrupt replica along with at least one live replica. As such, a corrupt block does not indicate unavailable data, but they do indicate an increased chance that data may become unavailable. Missing block: ...

Fix: Under Replicated blocks in HDFS manually

Problem: Under replicated blocks in HDFS Solution: Execute the below command to fix the under replicated blocks in HDFS, sudo -u hdfs hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files