Skip to main content

Apache Ambari - Monitoring

Apache Ambari Monitoring


Ambari’s monitoring tools use two other open source projects: Ganglia and Nagios. Both come preconfigured with Ambari out of the box after a cluster has been provisioned.

Ambari can monitor the health and status of a Hadoop cluster right down to a granular detail for metric display on the web UI. Ganglia integration provides not just tons of useful metrics that are all Hadoop based (e.g. HDFS Disk Usage, Links, DataNode info etc) but also provides nice histograms for one to understand patterns of usage from a threshold and trend perspective.

The following are the some of the widgets provided by ambari for monitoring the cluster,
  • HDFS Disk Usage
  • DataNodes Live
  • HDFS Links
  • Memory Usage
  • Network Usage
  • CPU Usage
  • Cluster Load
  • NameNode Heap
  • NameNode RPC
  • NameNode CPU WIO
  • NameNode Uptime

Ambari Metrics System (AMS) is a system for collecting, aggregating and serving Hadoop and system metrics in Ambari-managed clusters.

Ambari also utilizes Ganglia for detailed heatmaps which are a great way to quickly see which hosts are utilizing too many resources or not in an acceptable threshold for a certain monitored value. 

The following are the few of heatmaps provided by Ambari,
  • Host Disk Space Used
  • Host Memory Used
  • Host CPU WIO
  • HDFS Bytes Read & Written
  • Garbage Collection Time
  • JVM Heap Memory Used

Host Disk Space Used %:




YARN: Total Allocatable RAM Utilized per NodeManager:


Host Metrics:



Nagios is primarily used for health checking and alerting. During the Ambari wizard installation, the installer must provide a support email for Nagios alerts. These alert emails come set with the notification type, service, host, host address, state, date and any other additional info.

Comments

Popular posts from this blog

HDFS Health Check

HDFS supports the  fsck  command to check for various inconsistencies. It is designed for reporting problems with various files, Corrupt blocks Missing blocks  Under-replicated blocks Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Command:  sudo hdfs  hdfs fsck / --files --blocks  --locations HDFS: Corrupted/Missing/Under Replicated Blocks As per the below screenshot of  fsck  output, there is 1 corrupt block, 1 missing block and 4 under-replicated blocks and status of HDFS is “CORRUPT”. These indicates HDFS health is bad and these should be addressed ASAP to recover our HDFS into HEALTHY. Corrupt block:  block is called corrupt by HDFS if it has at least one corrupt replica along with at least one live replica. As such, a corrupt block does not indicate unavailable data, but they do indicate an increased chance that data may become unavailable. Missing block: ...

Load epoch timestamp value into hive table

Problem: Load epoch timestamp into hive table. Solution:  Use BIGINT and load into temp table. Use UDF function of to_utc_timestamp() to convert it into specific timezone. Queries: CREATE EXTERNAL TABLE tags_temp ( user_id INT , movie_id INT , tag STRING, date_time BIGINT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY "|" STORED AS TEXTFILE LOCATION 'LOCATION TO THE FILE IN HDFS' ; CREATE EXTERNAL TABLE tags ( user_id INT , movie_id INT , tag STRING, date_time timestamp )STORED AS ORC TBLPROPERTIES ( 'orc.compress' = 'SNAPPY' , 'creator' = 'uvaraj' , 'created_on' = '2016-12-30' , 'description' = 'tags details' ); INSERT OVERWRITE TABLE tags SELECT user_id,movie_id,tag, to_utc_timestamp(date_time, 'UTC' ) FROM tags_temp;

Fix: Under Replicated blocks in HDFS manually

Problem: Under replicated blocks in HDFS Solution: Execute the below command to fix the under replicated blocks in HDFS, sudo -u hdfs hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files