Apache Hadoop YARN is a sub-project/sub-module of Apache Hadoop, an open source project at the Apache Software Foundation (ASF). Hadoop is a very highly active community projects and YARN shares that break-neck pace of innovations. A book focused on such a rapidly evolving software is an exhausting venture. At the time of the beginning of writing of the first edition of this book, ASF released version 2.2.0. By the time of this edition’s finish, the releases already jumped by about two minor versions, which in reality have tons of new stuff that warrant additions to the book. We took 2.2.0, the first stable releases of Hadoop-2, as the snapshot for the book’s coverage and hope to frequently push out updated editions to include all the latest and greatest changes to the software.
This book is intended to provide a detailed coverage of Apache Hadoop YARN’s goals, its design and architecture and how it expands the Apache Hadoop ecosystem to take advantage of data at scale beyond MapReduce. It primarily focuses on installation and administration of YARN clusters, on helping users with YARN applications’ development and a brief coverage of new frameworks that run on top of YARN beyond MapReduce.
Please note that, at this point of time, this book is not intended to be an introduction to Apache Hadoop itself. We assume that the reader has a some basic knowledge of Hadoop version 1, writing applications on top of the Hadoop MapReduce framework, and some understanding of the architecture and usage of the Hadoop Distributed FileSystem. In the future editions of this book, we hope to expand our material related to the MapReduce application framework itself and how users can think and code their own MapReduce applications.