This book is a logical journey through a data science pipeline. In Chapter 1, the many methods for getting,
cleaning, and arranging data into its purest form are examined, as are basic data output to files and
plotting. Chapter 2 addresses the important concept of viewing our data as a matrix. An exhaustive review
of matrix operations is presented. Now that we have data and know what data structure it should take,
Chapter 3 introduces the basic concepts that allow us to test the origin and validity of our data. In Chapter
4, we directly use the concepts from Chapters 2 and 3 to transform our data into stable and usable
numerical values. Chapter 5 contains a few useful supervised and unsupervised learning algorithms, as
well as methods for evaluating their success. Chapter 6 provides a quick guide to getting up and running
with MapReduce by using customized components suitable for data science algorithms. A few useful
datasets are described in Appendix A.