YARN and MR Memory Configuration:
When determining the appropriate YARN and MapReduce memory configurations for a cluster node, start with the available hardware resources. Specifically, note the following values on each node:
Use the following table to determine the Reserved Memory per node.
The next calculation is to determine the maximum number of containers allowed per node. The following formula can be used:
Where DISKS is the value for dfs.data.dirs (number of data disks) per machine and MIN_CONTAINER_SIZE is the minimum container size (in RAM). This value is dependent on the amount of RAM available (in smaller memory nodes, the minimum container size should also be smaller).
The following table outlines the recommended values:
The final calculation is to determine the amount of RAM per container
With these calculations, the YARN and MapReduce configurations can be set.
Example
Cluster nodes have 12 CPU cores, 48 GB RAM, and 12 disks.
Reserved Memory = 6 GB reserved for system memory + (if HBase) 8 GB for HBase Min container size = 2 GB If there is no HBase:
# of containers = min (2*12, 1.8* 12, (48-6)/2) = min (24, 21.6, 21) = 21
RAM-per-container = max (2, (48-6)/21) = max (2, 2) = 2
If HBase is included:
# of containers = min (2*12, 1.8* 12, (48-6-8)/2) = min (24, 21.6, 17) = 17 RAM-per-container = max (2, (48-6-8)/17) = max (2, 2) = 2
Calculate Reserved Memory
When determining the appropriate YARN and MapReduce memory configurations for a cluster node, start with the available hardware resources. Specifically, note the following values on each node:
- RAM (Amount of memory)
- CORES (Number of CPU cores)
- DISKS (Number of disks)
Reserved Memory = Reserved for stack memory + Reserved for HBase Memory (If HBase is on the same node).
Use the following table to determine the Reserved Memory per node.
| Total Memory per Node | Recommended Reserved System Memory | Recommended Reserved HBase Memory |
|---|---|---|
| 4 GB | 1 GB | 1 GB |
| 8 GB | 2 GB | 1 GB |
| 16 GB | 2 GB | 2 GB |
| 24 GB | 4 GB | 4 GB |
| 48 GB | 6 GB | 8 GB |
| 64 GB | 8 GB | 8 GB |
| 72 GB | 8 GB | 8 GB |
| 96 GB | 12 GB | 16 GB |
| 128 GB | 24 GB | 24 GB |
| 256 GB | 32 GB | 32 GB |
| 512 GB | 64 GB | 64 GB |
# of containers = min (2*CORES, 1.8*DISKS, (Total available RAM) / MIN_CONTAINER_SIZE)
Where DISKS is the value for dfs.data.dirs (number of data disks) per machine and MIN_CONTAINER_SIZE is the minimum container size (in RAM). This value is dependent on the amount of RAM available (in smaller memory nodes, the minimum container size should also be smaller).
The following table outlines the recommended values:
| Total RAM per Node | Recommended Minimum Container Size |
|---|---|
| Less than 4 GB | 256 MB |
| Between 4 GB and 8 GB | 512 MB |
| Between 8 GB and 24 GB | 1024 MB |
| Above 24 GB | 2048 MB |
The final calculation is to determine the amount of RAM per container
RAM-per-container = max(MIN_CONTAINER_SIZE, (Total Available RAM) / containers))
With these calculations, the YARN and MapReduce configurations can be set.
| Configuration File | Configuration Setting | Calculation |
|---|---|---|
| yarn-site.xml | yarn.nodemanager.resource.memory-mb | = containers * RAM-per-container |
| yarn-site.xml | yarn.scheduler.minimum-allocation-mb | = RAM-per-container |
| yarn-site.xml | yarn.scheduler.maximum-allocation-mb | = containers * RAM-per-container |
| mapred-site.xml | mapreduce.map.memory.mb | = RAM-per-container |
| mapred-site.xml | mapreduce.reduce.memory.mb | = 0.8 * RAM-per-container |
| mapred-site.xml | mapreduce.map.java.opts | = 0.8 * RAM-per-container |
| mapred-site.xml | mapreduce.reduce.java.opts | = 0.8 * 2 * RAM-per-container |
| mapred-site.xml | yarn.app.mapreduce.am.resource.mb | = 2 * RAM-per-container |
| mapred-site.xml | yarn.app.mapreduce.am.command-opts | = 0.8 * 2 * RAM-per-container |
Example
Cluster nodes have 12 CPU cores, 48 GB RAM, and 12 disks.
Reserved Memory = 6 GB reserved for system memory + (if HBase) 8 GB for HBase Min container size = 2 GB If there is no HBase:
# of containers = min (2*12, 1.8* 12, (48-6)/2) = min (24, 21.6, 21) = 21
RAM-per-container = max (2, (48-6)/21) = max (2, 2) = 2
| Configuration | Calculation |
|---|---|
| yarn.nodemanager.resource.memory-mb | = 21 * 2 = 42*1024 MB |
| yarn.scheduler.minimum-allocation-mb | = 2*1024 MB |
| yarn.scheduler.maximum-allocation-mb | = 21 * 2 = 42*1024 MB |
| mapreduce.map.memory.mb | = 2*1024 MB |
| mapreduce.reduce.memory.mb | = 2 * 2 = 4*1024 MB |
| mapreduce.map.java.opts | = 0.8 * 2 = 1.6*1024 MB |
| mapreduce.reduce.java.opts | = 0.8 * 2 * 2 = 3.2*1024 MB |
| yarn.app.mapreduce.am.resource.mb | = 2 * 2 = 4*1024 MB |
| yarn.app.mapreduce.am.command-opts | = 0.8 * 2 * 2 = 3.2*1024 MB |
If HBase is included:
# of containers = min (2*12, 1.8* 12, (48-6-8)/2) = min (24, 21.6, 17) = 17 RAM-per-container = max (2, (48-6-8)/17) = max (2, 2) = 2
| Configuration | Calculation |
|---|---|
| yarn.nodemanager.resource.memory-mb | = 17 * 2 = 34*1024 MB |
| yarn.scheduler.minimum-allocation-mb | = 2*1024 MB |
| yarn.scheduler.maximum-allocation-mb | = 17 * 2 = 34*1024 MB |
| mapreduce.map.memory.mb | = 2*1024 MB |
| mapreduce.reduce.memory.mb | = 2 * 2 = 4*1024 MB |
| mapreduce.map.java.opts | = 0.8 * 2 = 1.6*1024 MB |
| mapreduce.reduce.java.opts | = 0.8 * 2 * 2 = 3.2*1024 MB |
| yarn.app.mapreduce.am.resource.mb | = 2 * 2 = 4*1024 MB |
| yarn.app.mapreduce.am.command-opts | = 0.8 * 2 * 2 = 3.2*1024 MB |
Calculate Reserved Memory
Host configuration = 16GB RAM, 4 Cores and 1 disk
Totally 4 host = 64 GB RAM, 16 Cores and 4 disks.
Reserved memory recommendation: Recommended
Reserved System Memory + Recommended Reserved
HBase Memory = 2 + 2 = 4 GB
Total Reserved memory recommendation (If HBase): 4 host *
4 GB = 16GB.
Minimum container size
Total RAM per Node = 16 GB.
Recommended minimum container size=1024 MB.
#
of containers = min (2*CORES, 1.8*DISKS, (Total available RAM) /
MIN_CONTAINER_SIZE)
Cluster
nodes have 12 CPU cores, 48 GB RAM, and 12 disks.
Reserved
Memory = 6 GB reserved for system memory + (if HBase) 8 GB for HBase Min
container size = 2 GB If there is no HBase:
#
of containers = min (2*16, 1.8* 4, (64-6)/8) = min (32, 7.2, 7.2) = 7.2
RAM-per-container
= max (2, (64-6)/21) = max (2, 2) = 2
Comments
Post a Comment