博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
MapReduce(3): Partitioner, Combiner and Shuffling
阅读量:7040 次
发布时间:2019-06-28

本文共 2014 字,大约阅读时间需要 6 分钟。

Partitioner:

Partitioning and Combining take place between Map and Reduce phases. It is to club the data which should go to the same reducer based on keys. The number of partitioners is equal to the number of reducers. That means a partitioner will divide the data according to the number of reducers. Therefore, the data passed from a single partitioner is processed by a single Reducer. HashPartitioner is the default Partitioner in hadoop.

 

A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the data using a user-defined condition, which works like a hash function. The total number of partitions is same as the number of Reducer tasks for the job. Records having the same key value go into the same partition (within each mapper).

 

Partition doing jobs on local machine.

 

Combiner:

Combiner is a 'mini-reducer' (semi-reducer), used to process reducer's work before transfering data onto reducers. It can reduce network congestion. An example is shown below:

 

Shuffle:

shuffle notify master to copy files onto reducer machines. In the final output of map task there can be multiple partitions and these partitions should go to different reduce task. Shuffling is basically transferring map output partitions to the corresponding reduce tasks. Map task notified application master about completion of map task and application master notifies corresponding reducer to copy the map output into reduce machine. As shuffling can start even before the map phase has finished so this saves some time and completes the tasks in lesser time.

 

References:

https://www.cnblogs.com/hadoop-dev/p/5910459.html

https://blog.csdn.net/bitcarmanlee/article/details/60137837

http://geekdirt.com/blog/map-reduce-in-detail/

Using hash function to map immediate K,V pairs

https://en.wikipedia.org/wiki/Hash_function

https://www.tutorialspoint.com/map_reduce/map_reduce_partitioner.htm

https://data-flair.training/blogs/hadoop-partitioner-tutorial/

 

转载于:https://www.cnblogs.com/rhyswang/p/10946833.html

你可能感兴趣的文章
GCD子队列的优先级
查看>>
介绍Spring Cloud微服务架构的核心特性
查看>>
剥开比原看代码(六):比原是如何把请求区块数据的信息发出去的
查看>>
小猿圈linux之linux基础命令大全(一)
查看>>
当经历所有大厂的实习面试过后
查看>>
从BEC“代币蒸发”事件看智能合约编写注意事项
查看>>
CentOS 7 Minimal 安装 LXQT
查看>>
机器码 指令 汇编语言 的关系
查看>>
摸索 JS 内深拷贝的最佳实践
查看>>
设计师面试会遇到的问题(part1:HR篇)
查看>>
周记_
查看>>
去掉UIPickerView的弯曲弧度
查看>>
使阿里oss实现前端代码自动上传
查看>>
JavaScript中的作用域和闭包
查看>>
暴力破解WiFi密码
查看>>
Pulsar 2.0 新版变化
查看>>
Permission denied (publickey)
查看>>
轻松搞定JS中的prototype、__proto__与constructor
查看>>
java静态代理和动态代理分析
查看>>
关于苹果销量的一些看法
查看>>