MapReduce

less than 1 minute read

Published: February 24, 2022

model
implementation
refinement
- combiner

MapReduce is a programming model for large data operating on multiple computers. Key functions listed below:

partitioning the input data
scheduling the program’s execution across a set of machines
handling machine failures
managing the required inter-machine communication

model

map: input key/value pairs $\to$ intermediate key/value pairs
reduce: values with same key->combine

implementation

mapreduce

reduce worker sort the intermediate pairs
master keep the state of the tasks (idle, in-progress, completed)

Fault Tolerance

worker failure

master ping every worker periodically
completed map task on a failure worker will be re-executed (result on local disk), complete reduce task will not (result on global disk)

master failure

change master and continue after checkpoints

semantics in the presence of failures

a mask task produce $R$ result files, a reduce task produce one result
weak semantics?

locality

task granularity

map phase $M$ and reduce phase $R$ should be much greater than worker number
$M$ to make individual task roughly 16 MB to 64 MB, $R$ a small multiple of worker

Backup tasks

combiner

partial merging
executed after map by the same worker

Share on

Twitter Facebook LinkedIn

You May Also Enjoy

Durbin-levinson algorithm

1 minute read

Published: April 16, 2022

在看shumway学习ARMA模型的PACF时，下面两个问题总是困扰着我：

MT Lesson 17: Product measures

less than 1 minute read

Published: March 05, 2022

In this section, we talk about product measures.

MT Lesson 16: Monotone converge theorem

1 minute read

Published: March 05, 2022

In this section, we talk about the the theorems related to the convergence of sequence of integrals.

MT Lesson 4: Caratheodory theorem

3 minute read

Published: February 22, 2022

In this section, we prove the Caratheodory theorem: the $\sigma$-additive set function on algebra $\mathcal{A}\subseteq \mathcal{P}(\Omega)$ can be extended to a $\sigma$ -additive function, which is an outer measure, on the $\sigma$-algebra $\mathcal{F}$ generated from $\mathcal{A}$, and the extension is unique if $\Omega$ is $\sigma$-finite.