Ragu Pappu2015-05-15T21:59:26+00:00http://www.ragupappu.com/Ragu PappuInteractive Dublin (California) Weather Visualization Using Shiny2015-05-07T00:00:00+00:00http://www.ragupappu.com//2015/05/07/shiny-dublin-weather<p>A few weeks back I wrote a <a href="/2015/03/12/dublintemp/">post</a> in which I visualized Dublin (CA) 2014 weather in Edward Tufte's style of New York 2003 weather visualization. That post used a dataset containing 15 years of daily weather data from 2000 to 2014 and showed daily temperature highs and lows against a background of daily <em>record</em> highs and lows and <em>average</em> highs and lows of years before 2014. I wanted to write an interactive application using <a href="http://shiny.rstudio.com/">Shiny</a>, RStudio's web application framework for R, and the Dublin weather visualization seemed like a great candidate. I wrote the Shiny app and in the chart below you see the app in action -- an interactive version of that visualization. The user can select the range of years of temperature data to form the background, anywhere between years 2000 and 2014 inclusive.
<iframe src="https://ragupappu.shinyapps.io/shiny-dublin-temps/" style="border: none; width: 100%; height: 700px"></iframe></p>
<p>The above app is hosted on <a href="http://ragupappu.shinyapps.io/shiny-dublin-temps">my account</a> at <a href="https://www.shinyapps.io/">shinyapps.io</a>. The code for it is located <a href="https://github.com/ragupappu/shiny-dublin-temps">here</a>.</p>
<h2>Code Changes</h2>
<p>A Shiny app requires a minimum of two code modules named ui.R and server.R. I got started writing the app by copying those two modules from lesson 2 of the RStudio <a href="http://shiny.rstudio.com/tutorial/">Shiny tutorial</a> to my local folder and then making the necessary changes.</p>
<p>I added a <code>sliderInput</code> user-interface to allow the user to select the start_year to bracket data upto and including 2014. By default the weather visualization displays the daily <em>record</em> highs and lows and also the daily <em>average</em> highs and lows in the background. I added two <code>checkboxInput</code>s to allow the user to hide those background displays. The code for the original visualization post used a file, viz_temps.R, which I re-used for this post but with several changes to allow it to work within the Shiny framework. I converted the R script in that file into a function called viztemps() which is invoked from server.R.</p>
<p>The code changes in ui.R and server.R were simple and straightforward as you can see below. Other code modules are not shown (they are located <a href="https://github.com/ragupappu/shiny-dublin-temps">here</a>).</p>
<h3>ui.R</h3>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="kn">library</span><span class="p">(</span>shiny<span class="p">)</span>
<span class="c1"># Define UI for application</span>
shinyUI<span class="p">(</span>fluidPage<span class="p">(</span>
<span class="c1"># Application title</span>
titlePanel<span class="p">(</span><span class="s">"Dublin (California) 2014 Temperatures"</span><span class="p">),</span>
<span class="c1"># Show a plot of the generated distribution</span>
plotOutput<span class="p">(</span><span class="s">"DublinTemps"</span><span class="p">),</span>
hr<span class="p">(),</span>
<span class="c1"># Sidebar with a slider input for the number of bins</span>
fluidRow<span class="p">(</span>
column<span class="p">(</span><span class="m">5</span><span class="p">,</span>
h4<span class="p">(</span><span class="s">"Data Range (Start Year to 2014)"</span><span class="p">),</span>
sliderInput<span class="p">(</span><span class="s">'start_year'</span><span class="p">,</span>
<span class="s">'Select StartYear'</span><span class="p">,</span>
min <span class="o">=</span> <span class="m">2000</span><span class="p">,</span>
max <span class="o">=</span> <span class="m">2014</span><span class="p">,</span>
value <span class="o">=</span> <span class="m">2004</span><span class="p">),</span>
offset<span class="o">=</span><span class="m">1</span>
<span class="p">),</span>
column<span class="p">(</span><span class="m">5</span><span class="p">,</span>
checkboxInput<span class="p">(</span><span class="s">'hide_hilows'</span><span class="p">,</span> <span class="s">'Hide background daily record highs and lows bars'</span><span class="p">),</span>
checkboxInput<span class="p">(</span><span class="s">'hide_avgs'</span><span class="p">,</span> <span class="s">'Hide background average daily highs and lows bars'</span><span class="p">),</span>
offset<span class="o">=</span><span class="m">1</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="p">))</span>
</code></pre></div>
<h3>server.R</h3>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="kn">library</span><span class="p">(</span>shiny<span class="p">)</span>
<span class="c1"># Preprocessing and summarizing data</span>
<span class="kn">library</span><span class="p">(</span>dplyr<span class="p">)</span>
<span class="c1"># Visualization development</span>
<span class="kn">library</span><span class="p">(</span>ggplot2<span class="p">)</span>
<span class="c1"># For text graphical objects (to add text annotation)</span>
<span class="kn">library</span><span class="p">(</span>grid<span class="p">)</span>
<span class="kn">source</span><span class="p">(</span><span class="s">"viz_temps.R"</span><span class="p">)</span>
<span class="c1"># Define server logic required to draw a histogram</span>
shinyServer<span class="p">(</span><span class="kr">function</span><span class="p">(</span>input<span class="p">,</span> output<span class="p">)</span> <span class="p">{</span>
output<span class="o">$</span>DublinTemps <span class="o"><-</span> renderPlot<span class="p">({</span>
viztemps<span class="p">(</span>input<span class="o">$</span>start_year<span class="p">,</span>
input<span class="o">$</span>hide_hilows<span class="p">,</span>
input<span class="o">$</span>hide_avgs
<span class="p">)</span>
<span class="p">})</span>
<span class="p">})</span>
</code></pre></div>
<p>In order to get it to run on this webpage I simply embedded within this post's markdown file an HTML iframe tag using the code below</p>
<div class="highlight"><pre><code class="language-text" data-lang="text"><iframe src="https://ragupappu.shinyapps.io/shiny-dublin-temps/" style="border: none; width: 100%; height: 700px"></iframe>
</code></pre></div>
<p><em><strong><em>NOTE</em></strong>: Sometimes it can take up to a minute to draw the chart when the page first loads. But it can also take that much time to re-draw when the user changes slider settings or selects/unselects checkboxes. At the time of posting this article I know of no way to speed up the drawing. If you know how to do it, please post in the comments section.</em></p>
How I Set Up My Website Using GitHub Pages and Jekyll2015-04-22T00:00:00+00:00http://www.ragupappu.com//2015/04/22/setup-website-using-github-pages-and-jekyll<p>I had been thinking of spiffying up my plain-looking WordPress-hosted blog for some time now and a few days back I got to do just that. Actually I did more than that: I ended up building a brand new website! While searching online for the how-tos I discovered <a href="https://pages.github.com/">GitHub Pages</a> and <a href="http://www.jekyllrb.com/">Jekyll</a>. GitHub Pages allows you to host your website for free and Jekyll lets you build static websites. It costs me ~$50 per year to host my blog on Wordpress so the free hosting on GitHub sounded like a great deal. On top of it the Jekyll website generator would allow the look and feel of the blog — the whole design — to be completely under my control. And so I took on the project. The project required <a href="http://git-scm.com/">git</a> which was already installed on my desktop PC.</p>
<h2>The Lanyon Theme</h2>
<p>There are several website <a href="http://jekyllthemes.org/">themes</a> for building a Jekyll website. I wanted a minimalistic look and started with the <a href="http://www.getpoole.com/">Poole</a> theme which provides a basic template for a website. Part way through I switched to the <a href="http://lanyon.getpoole.com/">Lanyon</a> theme because it emphasizes content by hiding the navigation in a side drawer. It is minimalist not only in looks but also in features, providing only a template: it does not feature, for example, "extras" such as an <em>Archive</em> page, or a <em>Comments</em> facility, or social media buttons, and so on. But that is easily addressed since adding extras is straightforward, and the web has many resources that show you how to do just that.</p>
<h2>Website Build Steps</h2>
<p>I built my website using these main steps:</p>
<ul>
<li>Started by setting up a repository using instructions on <a href="https://pages.github.com/">GitHub Pages</a></li>
<li>Downloaded theme files from the <a href="https://github.com/poole/lanyon">Lanyon</a> repository on GitHub by clicking the <code>Download Zip</code> button on the right-hand side. Extracted files from the downloaded zip file into the <em>username</em>.github.io directory created in the first step. Note: <em>username</em> represents the username of your GitHub account.</li>
<li>Changed directory to <em>username</em>.github.io and in a command terminal ran these commands:
<pre>
$ cd <i>username</i>.github.io
<i>username</i>.github.io$ jekyll build
<i>username</i>.github.io$ jekyll serve
</pre></li>
</ul>
<p>Next, I opened up a browser and went to http://localhost:4000 to look at my website. That was it! My website was hosted locally on my computer and ready to be tweaked.</p>
<p>I then made design changes — added pages, changed fonts, tweaked colors, etc — and when I was satisfied, 'pushed' all of the changes on to my repository on GitHub. Once the changes were in GitHub my official website was accessible on the Internet.</p>
<h2>Website Design Changes</h2>
<p>Over the last few days I have made a number of changes to the Lanyon theme which I list below, and where available, provide the how-to links. The code for this site is available on <a href="https://github.com/ragupappu/ragupappu.github.io/">my GitHub repository</a>.</p>
<ul>
<li>Added <em>About</em>, and <em>Archive</em> pages using <a href="http://joshualande.com/jekyll-github-pages-poole/">Joshua Lande's</a> Poole post</li>
<li>Added <a href="https://disqus.com/">Disqus</a> (to support comments) and <a href="http://www.google.com/analytics/">Google Analytics</a>, again using Joshua Lande's Poole post</li>
<li>Added a <em>Tags</em> page using Michael Lanyon's <a href="http://blog.lanyonm.org/articles/2013/11/21/alphabetize-jekyll-page-tags-pure-liquid.html">post</a> about tags.</li>
<li>Changed the landing page to show excerpts of posts instead of full posts</li>
<li>Added icon links for Twitter, GitHub, etc. to the <a href="https://github.com/ragupappu/ragupappu.github.io/blob/master/_includes/sidebar.html">sidebar</a>. Used font-awesome CSS file from Michael Lanyon's GitHub <a href="https://github.com/lanyonm/lanyonm.github.io">repo</a>.</li>
<li>Added tags below title of post</li>
<li>Added social media 'Share' links (Twitter, Facebook, Google+) to the bottom of each post. Kanishk Kunal explains how to add Share buttons to a Jekyll blog in this <a href="http://codingtips.kanishkkunal.in/share-buttons-jekyll/">post</a>.</li>
<li>Converted CSS stylesheets to Sass stylesheets using file structures similar to those in Michael Lanyon's GitHub <a href="https://github.com/lanyonm/lanyonm.github.io">repo</a>.</li>
<li>Designed my favicon (the tiny icon that shows on the tab of your browser) at <a href="http://www.favicon-generator.org/">Favicon and App Icon Generator</a> and replaced the original Lanyon theme favicon with the new one.</li>
<li>The Lanyon theme is a fixed-width two-column design. Changed widths for @media CSS statements in main.scss to percentage values (by replacing em values) to convert the website to a liquid layout. This <a href="http://maxdesign.com.au/articles/liquid/">article</a> by Max Design explains how to do liquid layouts.</li>
</ul>
<p>Tools are available to migrate posts from Wordpress to other sites but since I had only a few blog posts I simply copied them over from Wordpress and saved them into the <i>_posts</i> folder as .md files. I embedded YAML front-matter in each file and changed the name of the files to be in the format required by Jekyll (YEAR-MONTH-DAY-title.md).</p>
<p>After pushing the initial set of website changes on to my GitHub repository, <a href="http://ragupappu.github.io/">ragupappu.github.io</a>, I linked my personal domain <a href="https://www.ragupappu.com">ragupappu.com</a> to it. My domain is hosted by <a href="http://www.namecheap.com">Namecheap</a> and I used David Ensinger's instructions <a href="http://davidensinger.com/2013/03/setting-the-dns-for-github-pages-on-namecheap/">here</a> to link the two.</p>
<h2>Quick Links</h2>
<ul>
<li><a href="">GitHub Pages</a></li>
<li><a href="http://jekyllrb.com/">Jekyll</a></li>
<li><a href="http://jekyllrb.com/docs/home/">Jekyll Documentation</a></li>
<li>The <a href="http://lanyon.getpoole.com/">Lanyon</a> theme</li>
<li>My GitHub <a href="http://github.com/ragupappu/ragupappu.github.io">repository</a></li>
<li><a href="http://www.favicon-generator.org/">Favicon Generator</a></li>
<li>Joshua Lande's <a href="http://joshualande.com/jekyll-github-pages-poole/">Poole</a> post</li>
<li>Michael Lanyon's personal blog GitHub <a href="https://github.com/lanyonm/lanyonm.github.io">repo</a></li>
</ul>
Dublin (California) 2014 Weather Using R2015-03-12T00:00:00+00:00http://www.ragupappu.com//2015/03/12/dublintemp<p>In the data visualization community Edward Tufte's chart of <a href="http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=00014g">New York City 2003 weather</a> is well-known. A few weeks ago Brad Boehmke published a blogpost with a similar chart for his city, Dayton, titled <a href="http://rpubs.com/bradleyboehmke/weather_graphic">Dayton's weather in 2014</a> which inspired me to do a similar visualization for my city, Dublin, California. The result is the chart below. The R code to build the chart is located <a href="https://github.com/rpappu0206/tufte/tree/master/src">here</a> and it draws from Boehmke's post but also includes original code. Further below I describe the steps used to arrive at the chart.
<img src="http://www.ragupappu.com/assets/images/dublintemps/dublin2014temps.svg" alt="Dublin 2014 Weather testing"></p>
<p>I started by searching the <a href="http://ncdc.noaa.gov/">NCDC</a> (National Climatic Data Center) website for Dublin weather data but when the results came up empty I widened the search to neighboring towns. Luckily, data was available for Livermore Municipal Airport, located less than 10 miles from Dublin, which, for all practical purposes, could serve as a substitute for Dublin weather data. I obtained that <a href="https://github.com/rpappu0206/tufte/tree/master/data">data</a> from NCDC for the most recent 15-year period (January 01, 2000 through December 31, 2014) and used it as the raw data for this project. This data is hereafter referred to as Dublin weather data.</p>
<p>Unlike Tufte's original chart (shown below) which included graphics for Temperature and Precipitation, I decided to include only Temperature in my chart. </p>
<p><img src="http://www.ragupappu.com/assets/images/dublintemps/New-York-City-Weather-Chart-2003_3D887A31.jpg" alt="Edward Tufte's New York City Weather 2003 Chart"></p>
<p>The raw data contains daily high and low temperatures but also several other 'main' variables such as snowfall, snow depth, precipitation, type of weather, etc. Associated with each main variable are four variables - Measurement Flag, Quality Flag, Source Flag, and Time of Measurement. The <a href="https://github.com/rpappu0206/tufte/tree/master/doc">codebook</a> for the raw data, GHCND_documentation.pdf, explains these variables in detail. An excerpt of select columns from the raw data CSV file is shown below. The temperature data are in tenths of degrees Celsius.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">station date tmax quality.flag.4 source.flag.4
1 GHCND:USW00023285 20000101 117 NA W
2 GHCND:USW00023285 20000102 133 NA W
3 GHCND:USW00023285 20000103 133 NA W
4 GHCND:USW00023285 20000104 144 NA W
5 GHCND:USW00023285 20000105 161 NA W
6 GHCND:USW00023285 20000106 144 NA W</code></pre></div>
<p>I processed the data in two stages: Stage1 - load and extract relevant data, and Stage2 - clean the data and chart it. Accordingly, the two source files are named load.R and viz_temps.R.</p>
<p>In Stage1 I processed the raw data as follows: extract relevant columns of data into a dataframe, split Dates into Year, Month and Day data, and finally, save the dataframe to file 'livermore_15yr_temps.csv' for use as input to Stage2. The relevant columns of data are Dates, and temperature columns Tmax and Tmin and 'qualifier' data columns associated with each, titled Measurement Flag, Quality Flag and Source Flag.</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Preprocessing and summarizing data</span>
<span class="kn">library</span><span class="p">(</span>dplyr<span class="p">)</span>
<span class="c1"># Load the NCDC 15-year (2000 Jan - 2014 Dec) Livermore (California) Airport daily weather</span>
weather_15yr_data <span class="o"><-</span> read.csv<span class="p">(</span><span class="s">"ncdc_livermore_15yr_weather.csv"</span><span class="p">,</span> stringsAsFactors<span class="o">=</span><span class="kc">FALSE</span><span class="p">,</span> sep<span class="o">=</span><span class="s">","</span><span class="p">)</span>
<span class="c1"># convert all column names into lowercase (for convenience)</span>
<span class="kp">colnames</span><span class="p">(</span>weather_15yr_data<span class="p">)</span> <span class="o"><-</span> <span class="kp">tolower</span><span class="p">(</span><span class="kp">names</span><span class="p">(</span>weather_15yr_data<span class="p">))</span>
<span class="c1"># Create a dataframe containing data related only to temperature measurements; ignore other data</span>
all_temps_15yrs <span class="o"><-</span> weather_15yr_data <span class="o">%>%</span>
select<span class="p">(</span><span class="kp">date</span><span class="p">,</span>
tmax<span class="p">,</span> measurement.flag.4<span class="p">,</span> quality.flag.4<span class="p">,</span> source.flag.4<span class="p">,</span>
tmin<span class="p">,</span> measurement.flag.5<span class="p">,</span> quality.flag.5<span class="p">,</span> source.flag.5<span class="p">)</span>
<span class="c1"># Convert date into three columns - year, month, day</span>
<span class="c1"># The date in the raw data file is a string in the format YYYYMMDD</span>
<span class="c1"># First convert String object to class "Date"</span>
all_dates <span class="o"><-</span> <span class="kp">as.Date</span><span class="p">(</span><span class="kp">as.character</span><span class="p">(</span>all_temps_15yrs<span class="o">$</span><span class="kp">date</span><span class="p">),</span> format<span class="o">=</span><span class="s">"%Y%m%d"</span><span class="p">,</span> origin<span class="o">=</span><span class="s">"1970-01-01"</span><span class="p">)</span>
<span class="c1"># Extract parts of the date and make three columns</span>
all_dates <span class="o"><-</span> <span class="kp">as.POSIXlt</span><span class="p">(</span>all_dates<span class="p">)</span> <span class="c1"># POSIXlt object is a list of date parts</span>
year <span class="o"><-</span> all_dates<span class="o">$</span>year <span class="o">+</span> <span class="m">1900</span> <span class="c1"># years is num of years from 1900, therefore adding 1900</span>
month <span class="o"><-</span> all_dates<span class="o">$</span>mon
day <span class="o"><-</span> all_dates<span class="o">$</span>mday
<span class="c1"># Replace the date column with 'split' date part columns (i.e., year, month, day columns)</span>
all_temps_15yrs <span class="o"><-</span> <span class="kp">subset</span><span class="p">(</span>all_temps_15yrs<span class="p">,</span> select<span class="o">=-</span><span class="kp">date</span><span class="p">)</span> <span class="c1"># drop date column</span>
all_temps_15yrs <span class="o"><-</span> <span class="kp">cbind</span><span class="p">(</span>year<span class="p">,</span> month<span class="p">,</span> day<span class="p">,</span> all_temps_15yrs<span class="p">)</span> <span class="c1"># Add back the 'split' date column</span>
<span class="c1"># Save the data into a file. Data in this file will be used for visualization</span>
write.csv<span class="p">(</span>all_temps_15yrs<span class="p">,</span> file<span class="o">=</span><span class="s">"livermore_15yr_temps.csv"</span><span class="p">,</span> row.names<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span></code></pre></div>
<p>In Stage2 I cleaned the data first and then processed it. I started by including these packages:</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Preprocessing and summarizing data</span>
<span class="kn">library</span><span class="p">(</span>dplyr<span class="p">)</span>
<span class="c1"># Visualization development</span>
<span class="kn">library</span><span class="p">(</span>ggplot2<span class="p">)</span>
<span class="c1"># For text graphical object (to add text annotation)</span>
<span class="kn">library</span><span class="p">(</span>grid<span class="p">)</span></code></pre></div>
<p>After reading livermore_15yr_temps.csv (the file created in Stage1) into a dataframe, as a first step I prepared the data for analysis by cleaning it up by dropping invalid records: those with Tmax and Tmin values of -9999, and those with invalid entries in any of the three flag columns - Measurement, Quality and Source. Next I converted the temperature data from Celcius to Fahrenheit.</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Load 15 years temperature data</span>
all_temps_15yrs <span class="o"><-</span> read.csv<span class="p">(</span><span class="s">"livermore_15yr_temps.csv"</span><span class="p">,</span> stringsAsFactors<span class="o">=</span><span class="kc">FALSE</span><span class="p">,</span> sep<span class="o">=</span><span class="s">","</span><span class="p">)</span>
<span class="c1"># Clean the data: drop records with invalid temp values, and missing or invalid</span>
<span class="c1"># measurement, quality or source flags</span>
all_temps_15yrs <span class="o"><-</span> all_temps_15yrs <span class="o">%>%</span>
filter<span class="p">(</span>tmax <span class="o">!=</span> <span class="m">-9999</span> <span class="o">&</span> <span class="c1"># drop missing max temp identified by -9999</span>
tmin <span class="o">!=</span> <span class="m">-9999</span> <span class="o">&</span> <span class="c1"># drop missing min temp identified by -9999</span>
<span class="kp">is.na</span><span class="p">(</span>measurement.flag.4<span class="p">)</span> <span class="o">&</span> <span class="c1"># keep tmax data with no special measurement info</span>
<span class="kp">is.na</span><span class="p">(</span>measurement.flag.5<span class="p">)</span> <span class="o">&</span> <span class="c1"># keep tmin data with no special measurement info</span>
<span class="kp">is.na</span><span class="p">(</span>quality.flag.4<span class="p">)</span> <span class="o">&</span> <span class="c1"># keep tmax data that did not fail quality check</span>
quality.flag.5 <span class="o">==</span> <span class="s">" "</span> <span class="o">&</span> <span class="c1"># keep tmin data that did not fail quality check</span>
source.flag.4 <span class="o">!=</span> <span class="s">" "</span> <span class="o">&</span> <span class="c1"># drop tmax data with no source (blank)</span>
<span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>source.flag.4<span class="p">)</span> <span class="o">&</span> <span class="c1"># or NA </span>
source.flag.5 <span class="o">!=</span> <span class="s">" "</span> <span class="o">&</span> <span class="c1"># drop tmin data with no source (blank)</span>
<span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>source.flag.5<span class="p">))</span> <span class="c1"># or NA</span>
<span class="c1"># Raw data temps are in Celsius degrees to tenths. Convert to Fahrenheit and scale for tenths</span>
all_temps_15yrs <span class="o"><-</span> all_temps_15yrs <span class="o">%>%</span>
mutate<span class="p">(</span>tmaxF <span class="o">=</span> tmax<span class="o">*</span><span class="m">0.18</span> <span class="o">+</span> <span class="m">32</span><span class="p">,</span> <span class="c1"># convert to Fahrenheit (F = C*9/5 + 32)</span>
tminF <span class="o">=</span> tmin<span class="o">*</span><span class="m">0.18</span> <span class="o">+</span> <span class="m">32</span><span class="p">)</span> <span class="c1"># note: temps are also being converted from tenths</span>
<span class="c1"># to real (normal) values</span></code></pre></div>
<p>There were only 2 invalid records in the 15 year dataset, or 0.036503% of the observations, which were dropped. Next I computed the average max and average min temperature for each year to see how Dublin fared in 2014 relative to the previous 14 years. </p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Compute the average per year min and max temps</span>
avg_temps_each_year <span class="o"><-</span> all_temps_15yrs <span class="o">%>%</span>
group_by<span class="p">(</span>year<span class="p">)</span> <span class="o">%>%</span>
summarise<span class="p">(</span>avg_min <span class="o">=</span> <span class="kp">mean</span><span class="p">(</span>tminF<span class="p">),</span>
avg_max <span class="o">=</span> <span class="kp">mean</span><span class="p">(</span>tmaxF<span class="p">))</span> <span class="o">%>%</span>
ungroup<span class="p">()</span></code></pre></div>
<p>Sorting the average yearly temperatures yielded an interesting result: 2014 was the warmest of the past 15 years! In a <a href="http://www.nasa.gov/press/2015/january/nasa-determines-2014-warmest-year-in-modern-record/">report</a> in January NASA and NOAA found that 2014 was the warmest year in modern record, and the above results for Dublin temperatures align with that finding.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">## Source: local data frame [15 x 2]
##
## year avg_max
## 1 2014 76.96252
## 2 2013 74.76997
## 3 2012 73.74869
## 4 2008 73.25246
## 5 2001 73.15885
## 6 2007 72.73918
## 7 2009 72.67901
## 8 2002 72.47288
## 9 2006 72.40456
## 10 2003 72.40334
## 11 2005 72.10449
## 12 2004 72.04016
## 13 2000 71.72970
## 14 2011 71.44416
## 15 2010 70.91353</code></pre></div>
<div class="highlight"><pre><code class="language-text" data-lang="text">## Source: local data frame [15 x 2]
##
## year avg_min
## 1 2014 50.83096
## 2 2012 47.54246
## 3 2005 47.49578
## 4 2013 47.40110
## 5 2006 47.32077
## 6 2009 47.14416
## 7 2001 47.13036
## 8 2000 47.09041
## 9 2010 47.08499
## 10 2004 46.94934
## 11 2008 46.90803
## 12 2003 46.87786
## 13 2007 46.48926
## 14 2011 46.24860
## 15 2002 45.86690</code></pre></div>
<p>With the preliminaries completed, I created two key dataframes, a <code>Past</code> dataframe containing 14-year data from 2000 to 2013, and a <code>Present</code> dataframe containing 2014 data. Using the <code>Past</code> dataframe I computed two additional metrics: (i) highest and lowest daily temperature over the past 14 years, and (ii) average daily maximum and average daily minimum temperatures. The former will form the wheatish-colored broad background temperature swath, as in Tufte's image, that will show the record daily temperatures over a 14 year period. The latter will form the light-brown swath that represents the <em>normal</em> daily temperature range over the same 14-year period. I created a <code>newDay</code> variable for use as the x-axis variable for days 1 through 365 of the year.</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># create a dataframe that represents 14 years of historical temp data from 2000-2013</span>
Past <span class="o"><-</span> all_temps_15yrs <span class="o">%>%</span>
group_by<span class="p">(</span>year<span class="p">,</span> month<span class="p">)</span> <span class="o">%>%</span>
arrange<span class="p">(</span>day<span class="p">)</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
group_by<span class="p">(</span>year<span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>newDay <span class="o">=</span> <span class="kp">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="kp">length</span><span class="p">(</span>day<span class="p">)))</span> <span class="o">%>%</span> <span class="c1"># label days as 1:365 (will represent x-axis)</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
filter<span class="p">(</span>year <span class="o">!=</span> <span class="m">2014</span><span class="p">)</span> <span class="o">%>%</span> <span class="c1"># filter out 2014 data</span>
group_by<span class="p">(</span>newDay<span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>upper <span class="o">=</span> <span class="kp">max</span><span class="p">(</span>tmaxF<span class="p">),</span> <span class="c1"># identify same day highest max temp from all years</span>
lower <span class="o">=</span> <span class="kp">min</span><span class="p">(</span>tminF<span class="p">),</span> <span class="c1"># identify same day lowest min temp from all years</span>
avg_upper <span class="o">=</span> <span class="kp">mean</span><span class="p">(</span>tmaxF<span class="p">),</span> <span class="c1"># compute same day average max temp from all years</span>
avg_lower <span class="o">=</span> <span class="kp">mean</span><span class="p">(</span>tminF<span class="p">))</span> <span class="o">%>%</span> <span class="c1"># compute same day average min temp from all years</span>
ungroup<span class="p">()</span>
<span class="c1"># create a dataframe that represents 2014 temperature data</span>
Present <span class="o"><-</span> all_temps_15yrs <span class="o">%>%</span>
group_by<span class="p">(</span>year<span class="p">,</span> month<span class="p">)</span> <span class="o">%>%</span>
arrange<span class="p">(</span>day<span class="p">)</span> <span class="o">%>%</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
group_by<span class="p">(</span>year<span class="p">)</span> <span class="o">%>%</span>
mutate<span class="p">(</span>newDay <span class="o">=</span> <span class="kp">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="kp">length</span><span class="p">(</span>day<span class="p">)))</span> <span class="o">%>%</span> <span class="c1"># label days as 1:365 (will represent x-axis)</span>
ungroup<span class="p">()</span> <span class="o">%>%</span>
filter<span class="p">(</span>year <span class="o">==</span> <span class="m">2014</span><span class="p">)</span> <span class="c1"># filter out all years except 2014 data</span></code></pre></div>
<p>As we saw earlier, 2014 turns out to be the warmest year, and to further explore if any records were set in the year, I created four additional dataframes called <em>PastLows</em>, <em>PastHighs</em>, <em>PresentLows</em>, and <em>PresentHighs</em>. The first two contain the record lows and highs of the past 4 years, and the second two contain the record highs and lows in 2014 relative to the full 15-year period.</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># create dataframe that represents the lowest same-day temperature from years 2000-2013</span>
PastLows <span class="o"><-</span> Past <span class="o">%>%</span>
group_by<span class="p">(</span>newDay<span class="p">)</span> <span class="o">%>%</span>
summarise<span class="p">(</span>Pastlow <span class="o">=</span> <span class="kp">min</span><span class="p">(</span>tminF<span class="p">))</span> <span class="c1"># identify lowest same-day temp between 2000 and 2013</span>
<span class="c1"># create dataframe that represents the highesit same-day temperature from years 2000-2013</span>
PastHighs <span class="o"><-</span> Past <span class="o">%>%</span>
group_by<span class="p">(</span>newDay<span class="p">)</span> <span class="o">%>%</span>
summarise<span class="p">(</span>Pasthigh <span class="o">=</span> <span class="kp">max</span><span class="p">(</span>tmaxF<span class="p">))</span> <span class="c1"># identify highest same-day temps between 2000 and 2013</span>
<span class="c1"># create dataframe that identifies days in 2014 when temps were lower than in all previous 14 years</span>
PresentLows <span class="o"><-</span> Present <span class="o">%>%</span>
left_join<span class="p">(</span>PastLows<span class="p">)</span> <span class="o">%>%</span> <span class="c1"># merge historical lows to 2014 low temp data</span>
mutate<span class="p">(</span>record <span class="o">=</span> <span class="kp">ifelse</span><span class="p">(</span>tminF<span class="o"><</span>Pastlow<span class="p">,</span> <span class="s">"Y"</span><span class="p">,</span> <span class="s">"N"</span><span class="p">))</span> <span class="o">%>%</span> <span class="c1"># current year was a record low?</span>
filter<span class="p">(</span>record <span class="o">==</span> <span class="s">"Y"</span><span class="p">)</span> <span class="c1"># filter for 2014 record low days</span>
<span class="c1"># create dataframe that identifies days in 2014 when temps were higher than in all previous 14 years</span>
PresentHighs <span class="o"><-</span> Present <span class="o">%>%</span>
left_join<span class="p">(</span>PastHighs<span class="p">)</span> <span class="o">%>%</span> <span class="c1"># merge historical lows to 2014 low temp data</span>
mutate<span class="p">(</span>record <span class="o">=</span> <span class="kp">ifelse</span><span class="p">(</span>tmaxF<span class="o">></span>Pasthigh<span class="p">,</span> <span class="s">"Y"</span><span class="p">,</span> <span class="s">"N"</span><span class="p">))</span> <span class="o">%>%</span> <span class="c1"># current year was a record high?</span>
filter<span class="p">(</span>record <span class="o">==</span> <span class="s">"Y"</span><span class="p">)</span> <span class="c1"># filter for 2014 record high days</span></code></pre></div>
<p>At this point all data required to create the chart was available. We already had the x-axis variable but the y-axis variable needed to be created. Also, the y-axis values needed to show the degree symbol, which I created as follows.</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># function: Turn y-axis labels into values with a degree superscript</span>
degree_format <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>x<span class="p">,</span> <span class="kc">...</span><span class="p">)</span> <span class="p">{</span>
<span class="kp">parse</span><span class="p">(</span>text <span class="o">=</span> <span class="kp">paste</span><span class="p">(</span>x<span class="p">,</span> <span class="s">"*degree"</span><span class="p">,</span> sep<span class="o">=</span><span class="s">""</span><span class="p">))</span>
<span class="p">}</span>
<span class="c1"># create y-axis variable</span>
yaxis_temps <span class="o"><-</span> degree_format<span class="p">(</span><span class="kp">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">120</span><span class="p">,</span> by<span class="o">=</span><span class="m">10</span><span class="p">))</span></code></pre></div>
<p>The stage was now set to actually create the chart using ggplot2. The chart was created in a series of steps adding layers at each step. Since I followed the steps in Boehmke's post I will not go into details except for short descriptions.</p>
<p><strong>Step 1</strong>: Create the canvas for the plot and show the 14-year record highs and lows as the broad background</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Step 1: create the canvas for the plot. Also plot the background lowest, highest 2000-2013 temps</span>
p <span class="o"><-</span> ggplot<span class="p">(</span>Past<span class="p">,</span> aes<span class="p">(</span>newDay<span class="p">,</span> tmaxF<span class="p">))</span> <span class="o">+</span>
theme<span class="p">(</span>plot.background <span class="o">=</span> element_blank<span class="p">(),</span>
panel.grid.minor <span class="o">=</span> element_blank<span class="p">(),</span>
panel.grid.major <span class="o">=</span> element_blank<span class="p">(),</span>
panel.border <span class="o">=</span> element_blank<span class="p">(),</span>
panel.background <span class="o">=</span> element_rect<span class="p">(</span>fill <span class="o">=</span> <span class="s">"seashell2"</span><span class="p">),</span>
axis.ticks <span class="o">=</span> element_blank<span class="p">(),</span>
<span class="c1">#axis.text = element_blank(), </span>
axis.title <span class="o">=</span> element_blank<span class="p">())</span> <span class="o">+</span>
geom_linerange<span class="p">(</span>Past<span class="p">,</span>
mapping<span class="o">=</span>aes<span class="p">(</span>x<span class="o">=</span>newDay<span class="p">,</span> ymin<span class="o">=</span>lower<span class="p">,</span> ymax<span class="o">=</span>upper<span class="p">),</span>
size<span class="o">=</span><span class="m">0.8</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"#CAA586"</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">.6</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>p<span class="p">)</span></code></pre></div>
<p><img src="/../figs/step1-1.png" alt="center"> </p>
<p><strong>Step 2</strong>: Add the average daily temperatures over the 14-year period, 2000-2013.</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Step 2: Plot average low and high temps from 2000-2013</span>
p <span class="o"><-</span> p <span class="o">+</span>
geom_linerange<span class="p">(</span>Past<span class="p">,</span>
mapping<span class="o">=</span>aes<span class="p">(</span>x<span class="o">=</span>newDay<span class="p">,</span> ymin<span class="o">=</span>avg_lower<span class="p">,</span> ymax<span class="o">=</span>avg_upper<span class="p">),</span>
size<span class="o">=</span><span class="m">0.8</span><span class="p">,</span>
colour <span class="o">=</span> <span class="s">"#A57E69"</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>p<span class="p">)</span></code></pre></div>
<p><img src="/../figs/step2-1.png" alt="center"> </p>
<p><strong>Step 3</strong>: Finally, add the 2014 temperatures.</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Step 3: Plot 2014 high and low temps</span>
p <span class="o"><-</span> p <span class="o">+</span>
geom_linerange<span class="p">(</span>Present<span class="p">,</span> mapping<span class="o">=</span>aes<span class="p">(</span>x<span class="o">=</span>newDay<span class="p">,</span> ymin<span class="o">=</span>tminF<span class="p">,</span> ymax<span class="o">=</span>tmaxF<span class="p">),</span> size<span class="o">=</span><span class="m">0.8</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"#4A2123"</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>p<span class="p">)</span></code></pre></div>
<p><img src="/../figs/step3-1.png" alt="center"> </p>
<p><strong>Step 4</strong>: Add the y-axis border and the x-axis gridlines</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Step 4: Add the y-axis border and the x-axis gridlines</span>
p <span class="o"><-</span> p <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">0</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">0</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">10</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">20</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">30</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">40</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">50</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">60</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">70</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">80</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">90</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">100</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">110</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span> <span class="o">+</span>
geom_hline<span class="p">(</span>yintercept <span class="o">=</span> <span class="m">120</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"ivory2"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">.1</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>p<span class="p">)</span></code></pre></div>
<p><img src="/../figs/step4-1.png" alt="center"> </p>
<p><strong>Steps 5 and 6</strong>: Add vertical gridlines to mark each month and add labels to the x and y axes.</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Step 5: Add vertical gridlines to mark end of each month</span>
p <span class="o"><-</span> p <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">31</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span> <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">59</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span> <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">90</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span> <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">120</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span> <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">151</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span> <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">181</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span> <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">212</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span> <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">243</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span> <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">273</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span> <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">304</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span> <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">334</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span> <span class="o">+</span>
geom_vline<span class="p">(</span>xintercept <span class="o">=</span> <span class="m">365</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"wheat4"</span><span class="p">,</span> linetype<span class="o">=</span><span class="m">3</span><span class="p">,</span> size<span class="o">=</span><span class="m">.4</span><span class="p">)</span>
<span class="c1"># Step 6: Add labels to the x and y axes</span>
p <span class="o"><-</span> p <span class="o">+</span>
coord_cartesian<span class="p">(</span>ylim <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">120</span><span class="p">))</span> <span class="o">+</span>
scale_y_continuous<span class="p">(</span>breaks <span class="o">=</span> <span class="kp">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">120</span><span class="p">,</span> by<span class="o">=</span><span class="m">10</span><span class="p">),</span> labels <span class="o">=</span> yaxis_temps<span class="p">)</span> <span class="o">+</span>
scale_x_continuous<span class="p">(</span>expand <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">),</span>
breaks <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">15</span><span class="p">,</span><span class="m">45</span><span class="p">,</span><span class="m">75</span><span class="p">,</span><span class="m">105</span><span class="p">,</span><span class="m">135</span><span class="p">,</span><span class="m">165</span><span class="p">,</span><span class="m">195</span><span class="p">,</span><span class="m">228</span><span class="p">,</span><span class="m">258</span><span class="p">,</span><span class="m">288</span><span class="p">,</span><span class="m">320</span><span class="p">,</span><span class="m">350</span><span class="p">),</span>
labels <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="s">"January"</span><span class="p">,</span> <span class="s">"February"</span><span class="p">,</span> <span class="s">"March"</span><span class="p">,</span> <span class="s">"April"</span><span class="p">,</span>
<span class="s">"May"</span><span class="p">,</span> <span class="s">"June"</span><span class="p">,</span> <span class="s">"July"</span><span class="p">,</span> <span class="s">"August"</span><span class="p">,</span> <span class="s">"September"</span><span class="p">,</span>
<span class="s">"October"</span><span class="p">,</span> <span class="s">"November"</span><span class="p">,</span> <span class="s">"December"</span><span class="p">))</span>
<span class="kp">print</span><span class="p">(</span>p<span class="p">)</span></code></pre></div>
<p><img src="/../figs/steps5n6-1.png" alt="center"> </p>
<p><strong>Step 7</strong>: Mark days with record temperatures. Note that there are no record low temperature points in the chart below. That's because there really were no record lows in 2014 because it was the warmest year.</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Step 7: Add points to mark the 2014 record high and low temps</span>
p <span class="o"><-</span> p <span class="o">+</span>
geom_point<span class="p">(</span>data<span class="o">=</span>PresentLows<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>newDay<span class="p">,</span> y<span class="o">=</span>tminF<span class="p">),</span> colour<span class="o">=</span><span class="s">"blue3"</span><span class="p">)</span> <span class="o">+</span>
geom_point<span class="p">(</span>data<span class="o">=</span>PresentHighs<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>newDay<span class="p">,</span> y<span class="o">=</span>tmaxF<span class="p">),</span> colour<span class="o">=</span><span class="s">"firebrick3"</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>p<span class="p">)</span></code></pre></div>
<p><img src="/../figs/step7-1.png" alt="center"> </p>
<p><strong>Steps 8 and 9</strong>: Add a title to the chart and provide explanation about the data in the top left</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Step 8: Add title to plot</span>
p <span class="o"><-</span> p <span class="o">+</span>
ggtitle<span class="p">(</span><span class="s">"Dublin (California) Weather in 2014"</span><span class="p">)</span> <span class="o">+</span>
theme<span class="p">(</span>plot.title<span class="o">=</span>element_text<span class="p">(</span>face<span class="o">=</span><span class="s">"bold"</span><span class="p">,</span>hjust<span class="o">=</span><span class="m">.012</span><span class="p">,</span>vjust<span class="o">=</span><span class="m">.8</span><span class="p">,</span>colour<span class="o">=</span><span class="s">"gray30"</span><span class="p">,</span>size<span class="o">=</span><span class="m">18</span><span class="p">))</span>
<span class="c1"># Step 9: Add explanation text under the plot title</span>
grob1 <span class="o">=</span> grobTree<span class="p">(</span>textGrob<span class="p">(</span><span class="s">"Temperature\n"</span><span class="p">,</span>
x<span class="o">=</span><span class="m">0.02</span><span class="p">,</span> y<span class="o">=</span><span class="m">0.92</span><span class="p">,</span> hjust<span class="o">=</span><span class="m">0</span><span class="p">,</span>
gp<span class="o">=</span>gpar<span class="p">(</span>col<span class="o">=</span><span class="s">"gray30"</span><span class="p">,</span> fontsize<span class="o">=</span><span class="m">10</span><span class="p">,</span> fontface<span class="o">=</span><span class="s">"bold"</span><span class="p">)))</span>
p <span class="o"><-</span> p <span class="o">+</span> annotation_custom<span class="p">(</span>grob1<span class="p">)</span>
grob2 <span class="o">=</span> grobTree<span class="p">(</span>textGrob<span class="p">(</span><span class="kp">paste</span><span class="p">(</span><span class="s">"Bars represent range between daily high and low temperatures.\n"</span><span class="p">,</span>
<span class="s">"Data set includes data from Jan 1, 2000 to December 31, 2014.\n"</span><span class="p">,</span>
<span class="s">"Average high temperature for 2014 was 76.9F making it\n"</span><span class="p">,</span>
<span class="s">"the warmest in 15 years since 2000."</span><span class="p">,</span> sep<span class="o">=</span><span class="s">""</span><span class="p">),</span>
x<span class="o">=</span><span class="m">0.02</span><span class="p">,</span> y<span class="o">=</span><span class="m">0.83</span><span class="p">,</span> hjust<span class="o">=</span><span class="m">0</span><span class="p">,</span>
gp<span class="o">=</span>gpar<span class="p">(</span>col<span class="o">=</span><span class="s">"gray30"</span><span class="p">,</span> fontsize<span class="o">=</span><span class="m">8.5</span><span class="p">)))</span>
p <span class="o"><-</span> p <span class="o">+</span> annotation_custom<span class="p">(</span>grob2<span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>p<span class="p">)</span></code></pre></div>
<p><img src="/../figs/steps8n9-1.png" alt="center"> </p>
<p><strong>Step 10</strong>: Add annotation for the record high 2014 temperature points.</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Step 10: Add annotation for points representing the record high 2014 temperatures</span>
<span class="c1"># Note: There were no record lows in 2014, it was the warmest year of the 15-yr period!</span>
grob3 <span class="o">=</span> grobTree<span class="p">(</span>textGrob<span class="p">(</span><span class="kp">paste</span><span class="p">(</span><span class="s">"In 2014 there were 60 days that were\n"</span><span class="p">,</span>
<span class="s">"hottest since 2000\n"</span><span class="p">,</span>sep<span class="o">=</span><span class="s">""</span><span class="p">),</span>
x<span class="o">=</span><span class="m">0.72</span><span class="p">,</span> y<span class="o">=</span><span class="m">0.9</span><span class="p">,</span> hjust<span class="o">=</span><span class="m">0</span><span class="p">,</span>
gp<span class="o">=</span>gpar<span class="p">(</span>col<span class="o">=</span><span class="s">"firebrick3"</span><span class="p">,</span> fontsize<span class="o">=</span><span class="m">7</span><span class="p">)))</span>
p <span class="o"><-</span> p <span class="o">+</span> annotation_custom<span class="p">(</span>grob3<span class="p">)</span>
p <span class="o"><-</span> p <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"segment"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">257</span><span class="p">,</span> xend <span class="o">=</span> <span class="m">263</span><span class="p">,</span> y <span class="o">=</span> <span class="m">99</span><span class="p">,</span> yend <span class="o">=</span> <span class="m">108</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"firebrick3"</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>p<span class="p">)</span></code></pre></div>
<p><img src="/../figs/step10-1.png" alt="center"> </p>
<p><strong>Step 11</strong>: Finally, add legend to explain the three "layers" of data: the background record highs and lows over a 14-year period, the average highs and lows over the same period, and the 2014 daily highs and lows.</p>
<div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Step 11: Add legend to explain difference between the different data point layers</span>
p <span class="o"><-</span> p <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"segment"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">181</span><span class="p">,</span> xend <span class="o">=</span> <span class="m">181</span><span class="p">,</span> y <span class="o">=</span> <span class="m">5</span><span class="p">,</span> yend <span class="o">=</span> <span class="m">25</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"#CAA586"</span><span class="p">,</span> size<span class="o">=</span><span class="m">3</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"segment"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">181</span><span class="p">,</span> xend <span class="o">=</span> <span class="m">181</span><span class="p">,</span> y <span class="o">=</span> <span class="m">11</span><span class="p">,</span> yend <span class="o">=</span> <span class="m">19</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"#A57E69"</span><span class="p">,</span> size<span class="o">=</span><span class="m">3</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"segment"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">181</span><span class="p">,</span> xend <span class="o">=</span> <span class="m">181</span><span class="p">,</span> y <span class="o">=</span> <span class="m">13</span><span class="p">,</span> yend <span class="o">=</span> <span class="m">22</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"#4A2123"</span><span class="p">,</span> size<span class="o">=</span><span class="m">2</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"segment"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">177</span><span class="p">,</span> xend <span class="o">=</span> <span class="m">179</span><span class="p">,</span> y <span class="o">=</span> <span class="m">18.7</span><span class="p">,</span> yend <span class="o">=</span> <span class="m">18.7</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"#A57E69"</span><span class="p">,</span> size<span class="o">=</span><span class="m">.5</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"segment"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">177</span><span class="p">,</span> xend <span class="o">=</span> <span class="m">179</span><span class="p">,</span> y <span class="o">=</span> <span class="m">11.2</span><span class="p">,</span> yend <span class="o">=</span> <span class="m">11.2</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"#A57E69"</span><span class="p">,</span> size<span class="o">=</span><span class="m">.5</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"segment"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">177</span><span class="p">,</span> xend <span class="o">=</span> <span class="m">177</span><span class="p">,</span> y <span class="o">=</span> <span class="m">11.2</span><span class="p">,</span> yend <span class="o">=</span> <span class="m">18.7</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"#A57E69"</span><span class="p">,</span> size<span class="o">=</span><span class="m">.5</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"segment"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">183</span><span class="p">,</span> xend <span class="o">=</span> <span class="m">185</span><span class="p">,</span> y <span class="o">=</span> <span class="m">13.25</span><span class="p">,</span> yend <span class="o">=</span> <span class="m">13.25</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"#4A2123"</span><span class="p">,</span> size<span class="o">=</span><span class="m">.3</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"segment"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">183</span><span class="p">,</span> xend <span class="o">=</span> <span class="m">185</span><span class="p">,</span> y <span class="o">=</span> <span class="m">21.75</span><span class="p">,</span> yend <span class="o">=</span> <span class="m">21.75</span><span class="p">,</span> colour <span class="o">=</span> <span class="s">"#4A2123"</span><span class="p">,</span> size<span class="o">=</span><span class="m">.3</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"text"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">165</span><span class="p">,</span> y <span class="o">=</span> <span class="m">14.75</span><span class="p">,</span> label <span class="o">=</span> <span class="s">"NORMAL RANGE"</span><span class="p">,</span> size<span class="o">=</span><span class="m">2.1</span><span class="p">,</span> colour<span class="o">=</span><span class="s">"gray30"</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"text"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">170</span><span class="p">,</span> y <span class="o">=</span> <span class="m">25</span><span class="p">,</span> label <span class="o">=</span> <span class="s">"RECORD HIGH"</span><span class="p">,</span> size<span class="o">=</span><span class="m">2.1</span><span class="p">,</span> colour<span class="o">=</span><span class="s">"gray30"</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"text"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">170</span><span class="p">,</span> y <span class="o">=</span> <span class="m">5</span><span class="p">,</span> label <span class="o">=</span> <span class="s">"RECORD LOW"</span><span class="p">,</span> size<span class="o">=</span><span class="m">2.1</span><span class="p">,</span> colour<span class="o">=</span><span class="s">"gray30"</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"text"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">195</span><span class="p">,</span> y <span class="o">=</span> <span class="m">21.75</span><span class="p">,</span> label <span class="o">=</span> <span class="s">"ACTUAL HIGH"</span><span class="p">,</span> size<span class="o">=</span><span class="m">2.1</span><span class="p">,</span> colour<span class="o">=</span><span class="s">"gray30"</span><span class="p">)</span> <span class="o">+</span>
annotate<span class="p">(</span><span class="s">"text"</span><span class="p">,</span> x <span class="o">=</span> <span class="m">195</span><span class="p">,</span> y <span class="o">=</span> <span class="m">13.25</span><span class="p">,</span> label <span class="o">=</span> <span class="s">"ACTUAL LOW"</span><span class="p">,</span> size<span class="o">=</span><span class="m">2.1</span><span class="p">,</span> colour<span class="o">=</span><span class="s">"gray30"</span><span class="p">)</span>
<span class="kp">print</span><span class="p">(</span>p<span class="p">)</span></code></pre></div>
<p><img src="/../figs/step11-1.png" alt="center"> </p>
<p>Thanks to Boehmke's detailed post, I was able to create a chart of Dublin 2014 weather very much like Edward Tufte's New York 2003 weather chart.</p>
DIY: Desktop PC Using Shuttle DS81 Barebone PC System2014-12-04T00:00:00+00:00http://www.ragupappu.com//2014/12/04/DIY-Shuttle-DS81-Desktop-PC2<p>I needed to run a 64-bit Virtual Machine on my laptop when I found out that it did not support virtualization even though it had a 64-bit CPU. That got me started on acquiring a new PC. I decided that I wanted a mid-to-high-end PC -- not quite a high-end gaming PC -- but one with sufficient horsepower to be able to run multi-threaded applications with ease. I settled on the following minimum configuration: a 64-bit CPU with hardware support for virtualization, 8GB RAM, and 256GB hard drive. Superior graphics performance was not on my list of must-haves.</p>
<p>A search online revealed that the minimum-configuration PCs were priced upwards of $600. The prices seemed high, the desktop PCs looked bulky, and the laptop screens were tiny. It was then that I decided to build a desktop PC, a first for me. The project would be a fun, learning experience. And it would put to good use some spare PC hardware in my garage, and maybe even save me some money.</p>
<h4>Barebone PC System</h4>
<p>I started the project by going online and searching for desktop PC parts. During the search I stumbled upon what are referred to as <em>barebone PC kits</em>. In general, these kits come with the PC motherboard installed in a case and include the power supply and cooling fans. The DIYer supplies the CPU, hard-drive, RAM, and any other parts such as graphics card, CD/DVD drive, etc.</p>
<p>I wanted a small form factor for my desktop PC and after a brief search online settled on the <a href="http://www.amazon.com/SHUTTLE-DS81-LGA1150-USB3-0-Barebone/dp/B00IXFFD4W/ref=sr_1_1?ie=UTF8&amp;qid=1416956458&amp;sr=8-1&amp;keywords=Shuttle+DS81&amp;pebp=1416956448003">Shuttle DS81</a> PC Barebone System. It's support for 4th generation Intel i3, i5, or i7 processors, the many ports (USB, SATA, HDMI, etc.), and a surprisingly small form factor for the punch it claimed to deliver were the main technical factors that influenced my selection. And it helped that there were many positive and a few highly informative reviews on Amazon for that product. The <a href="http://www.shuttle.eu/fileadmin/resources/download/docs/spec/barebones/DS81_e.pdf">DS81 product specifications</a> document contains a detailed list of features. Table 1 below is an excerpt from the DS81 features contained in the spec.
<p style="text-align: center;"><strong>Table 1. A few features of Shuttle DS81 PC Barebone System</strong></p></p>
<table>
<thead>
<tr>
<th style="text-align: center;"><strong>Item</strong></th>
<th style="text-align: center;"><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Chassis</td>
<td style="text-align: left;">1.35 L (19 x 16.5 x 4.3 cm)</td>
</tr>
<tr>
<td>Drive Bays</td>
<td style="text-align: left;">1x 2.5" supports hard disk or SSD</td>
</tr>
<tr>
<td>CPU Support</td>
<td style="text-align: left;">Socket LGA 1150 Supports Core i7/i5/i3, Pentium, Celeron Supports Haswell TDP max. 65W</td>
</tr>
<tr>
<td>BIOS</td>
<td style="text-align: left;">AMI BIOS, Watchdog, Power fail resume</td>
</tr>
<tr>
<td>Memory</td>
<td style="text-align: left;">2x DDR3-1333/1600 SO-DIMM, max. 2x8GB</td>
</tr>
<tr>
<td>Expansion slots</td>
<td style="text-align: left;">1x Mini-PCIe (supports mSATA 3G), supports half-size and full-size, 1x Full-size mSATA (supports SATA 6G), 1x Mini-PCIe half-size (supports WLAN kit)</td>
</tr>
<tr>
<td>Front panel</td>
<td style="text-align: left;">Power button, Power LED & HDD LED, 4x USB 2.0, SD card reader, Head phone & Microphone</td>
</tr>
<tr>
<td>Back panel</td>
<td style="text-align: left;">HDMI + 2x DisplayPort (DP), 2x USB 3.0 + 2x USB 2.0, 2x Gigabit LAN, RS232 + RS232/422/485, 2x Kensington Lock, Optional WLAN (2 antennas), 4-pin connector supports, power button, clear CMOS, +5VDC</td>
</tr>
<tr>
<td>Onboard Connectors</td>
<td style="text-align: left;">1x SATA 6G, 1x SATA 3G, 1x USB 2.0, 1x VGA (analog video output), Clear CMOS jumper, LPC, 2x10 pin (2mm Pitch), Always-on-Jumper</td>
</tr>
<tr>
<td>Power supply</td>
<td style="text-align: left;">90W external</td>
</tr>
<tr>
<td>Accessory</td>
<td style="text-align: left;">1x SATA data/power cable (preinstalled), VESA mount (PV04), optional: WLAN kit (WLN-S), VGA port (PVG01)</td>
</tr>
</tbody>
</table>
<h4>Desktop PC Parts</h4>
<p>After selecting the Shuttle DS81 system I selected the internal parts of the desktop PC: the CPU, the storage, and RAM.</p>
<h5>CPU</h5>
<p>The DS81 is a PC that uses Intel desktop processors, so the CPU had to be an Intel processor, of course, but there were two other aspects to consider when selecting the CPU -- Speed Rating and Power Dissipation. For speed, I wanted a <a href="https://www.cpubenchmark.net/high_end_cpus.html">fast</a> (but not necessarily a top-of-the-line) CPU. For power dissipation, the CPU maximum TDP (Thermal Design Power) had to be no greater than 65W, a limit imposed by the <a href="http://www.shuttle.eu/fileadmin/resources/download/docs/spec/barebones/DS81_e.pdf">DS81 spec</a>.</p>
<p>The Intel Core Desktop CPU devices come in a few flavors, each flavor identified by an alpha suffix in the processor <a href="http://www.intel.com/content/www/us/en/processors/processor-numbers.html">name</a>, and each flavor optimized for one metric such as, graphics performance, or clocking speed, or power, etc. I selected the performance-optimized <a href="http://ark.intel.com/products/80812/Intel-Core-i5-4690S-Processor-6M-Cache-up-to-3_90-GHz">Intel Core i5-4690S</a> CPU based on a <a href="https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i5-4690+%40+3.50GHz&amp;id=2236">performance-price tradeoff</a>. It has a <a href="http://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i5-4690S+%40+3.20GHz">speed rating</a> of 7619 at 3.20GHz, and max TDP of 65W.</p>
<h5>Storage</h5>
<p>The DS81 has a storage bay for a 2.5" hard disk or an SSD. But it also has a full-size PCIe expansion slot that can support an mSATA 6 Gbps drive. I selected <a href="http://www.amazon.com/gp/product/B00B3X73EE/ref=oh_aui_detailpage_o06_s00?ie=UTF8&amp;psc=1">MyDigitalSSD 256GB</a> as the internal storage device, the one from which the PC normally boots up. The MyDigitalSSD package even includes a small screw and a 1.4mm screwdriver! I selected the SSD for its high data transfer speed and less generated heat compared to a hard disk drive. The SSD capacity is modest at 256GB but because the DS81 has several USB ports and high storage capacity external hard drives are relatively inexpensive I could, if needed, easily add more storage in the future. I could also add an internal hard drive or SSD since the storage bay is now unused.</p>
<h5>RAM</h5>
<p>The DS81 can support a maximum of 16 GB of RAM (2 x 8GB). I selected <a href="http://www.amazon.com/gp/product/B006YG8X9Y/ref=oh_aui_detailpage_o01_s00?ie=UTF8&amp;psc=1">Crucial 8GB Single DDR3 RAM</a>. The single module would leave the second DS81 RAM slot empty and allow adding 8GB RAM in the future.</p>
<h5>Wireless LAN</h5>
<p>There is no wired networking capability near where I wanted to set up the desktop PC, and because the DS81 does not natively provide wireless networking capability, a wireless LAN card was necessary. I selected the <a href="http://www.amazon.com/gp/product/B00HUIJ4H0/ref=oh_aui_detailpage_o09_s00?ie=UTF8&amp;psc=1">Intel WiFi Link 5100 Half Size Wirless MINI PCIe Card</a>. This card would fit into the DS81 half-size PCIe expansion slot. I also selected a <a href="http://www.amazon.com/gp/product/B007XVHQ9M/ref=oh_aui_detailpage_o08_s00?ie=UTF8&amp;psc=1">Mini PCIe Wifi Antenna</a> kit to go with the wireless LAN card.</p>
<p><img src="http://www.ragupappu.com/assets/images/ds81pc/IMG_7187e.jpg" alt="Parts for DS81 PC">
<img src="http://www.ragupappu.com/assets/images/ds81pc/IMG_7193e.jpg" alt="Parts inside Shuttle DS81 box">
<img src="http://www.ragupappu.com/assets/images/ds81pc/IMG_7196e.jpg" alt="Parts inside Intel Core i5 CPU box"></p>
<h5>Miscellaneous Parts</h5>
<p>I already had a Logitech Wireless Keyboard and Mouse set and a 21" Samsung monitor so I did not need to newly purchase those PC parts.</p>
<p>The DS81 provides audio output via the HDMI and DisplayPorts, and also via a headphone jack on the front panel. I selected a pair of <a href="http://www.amazon.com/gp/product/B00GHY5F3K/ref=oh_aui_detailpage_o02_s00?ie=UTF8&amp;psc=1">USB powered speakers</a> by AmazonBasics.</p>
<p>The full list of parts acquired for this desktop and the cost of each part are listed in Table 2 below. The total cost of this desktop came out to $697.52.
<p style="text-align: center;"><strong>Table 2. Desktop PC Parts and Costs</strong></p></p>
<dl><dd>
<table width="456" cellspacing="0" cellpadding="4" align="center">
<thead>
<tr valign="top">
<th style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: .04in 0 .04in .04in;" width="33">
<p align="center">No.</p>
</th>
<th style="border-top: 1px solid #000000; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: .04in 0 .04in .04in;" width="340">
<p align="center">Part
Name</p>
</th>
<th style="border: 1px solid #000000; padding: .04in;" width="57">
<p align="center">Cost</p>
</th>
</tr>
</thead>
<tbody>
<tr valign="top">
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="33">
<p align="center">1</p>
</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="340">SHUTTLE
DS81 LGA1150/Intel H81/ DDR3/ SATA3&USB3.0/ A&V&2GbE/
90W Slim PC Barebone System, DS81</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; padding: 0 .04in .04in;" width="57">
<p align="right">192.99</p>
</td>
</tr>
<tr valign="top">
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="33">
<p align="center">2</p>
</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="340">MyDigitalSSD
256GB (240GB) 50mm Bullet Proof 4 BP4 50mm mSATA Solid State
Drive SSD SATA III 6G - MDMS-BP4-240</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; padding: 0 .04in .04in;" width="57">
<p align="right">119.95</p>
</td>
</tr>
<tr valign="top">
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="33">
<p align="center">3</p>
</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="340">Intel
Core i5-4690S LGA 1150 - BX80646I54690S</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; padding: 0 .04in .04in;" width="57">
<p align="right">219.29</p>
</td>
</tr>
<tr valign="top">
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="33">
<p align="center">4</p>
</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="340">Crucial
8GB Single DDR3 1600 MT/s (PC3-12800) CL11 SODIMM 204-Pin
1.35V/1.5V Notebook Memory CT102464BF160B</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; padding: 0 .04in .04in;" width="57">
<p align="right">72.49</p>
</td>
</tr>
<tr valign="top">
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="33">
<p align="center">5</p>
</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="340">Intel
WiFi Link 5100 Half Size Wirless MINI PCI-E Card 512AN_MMW
802.11a/b/g/n 2.4GHz and 5.0GHz 300 Mbps</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; padding: 0 .04in .04in;" width="57">
<p align="right">9.19</p>
</td>
</tr>
<tr valign="top">
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="33">
<p align="center">6</p>
</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="340">Mini
PCIe Wifi Antenna Kit 3Dbi RP-SMA Antenna + 20cm U.fl/IPEX to
Bulkhead RP-SMA Pigtail</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; padding: 0 .04in .04in;" width="57">
<p align="right">7.65</p>
</td>
</tr>
<tr valign="top">
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="33">
<p align="center">7</p>
</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="340">AmazonBasics
USB Powered Computer Speakers (A100)</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; padding: 0 .04in .04in;" width="57">
<p align="right">13.99</p>
</td>
</tr>
<tr valign="top">
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="33"></td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 .05in .04in .04in;" width="340">
<p align="right">Total</p>
</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; padding: 0 .04in .04in;" width="57">
<p align="right">635.55</p>
</td>
</tr>
<tr valign="top">
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="33"></td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 .05in .04in .04in;" width="340">
<p align="right">Local
Taxes at 9.75%</p>
</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; padding: 0 .04in .04in;" width="57">
<p align="right">61.97</p>
</td>
</tr>
<tr valign="top">
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 0 .04in .04in;" width="33"></td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: none; padding: 0 .05in .04in .04in;" width="340">
<p align="right"><b>Grand
Total</b></p>
</td>
<td style="border-top: none; border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; padding: 0 .04in .04in;" width="57">
<p align="right"><b>697.52</b></p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<h4>Assembling the PC</h4>
<p>I used these tools during assembly:</p>
<ul>
<li>Phillips head screwdriver #2</li>
<li>An old (expired) plastic credit card</li>
<li>A few tissue papers</li>
<li>A "flat-blade" 1/4" screwdriver</li>
</ul>
<p>The Shuttle DS81 package comes with a <em>DS81 Series Quick Guide</em> that identifies various parts of the DS81 box and the parts on the motherboard, and also contains detailed, illustrated installation instructions. Assembling the PC was simply a matter of following the instructions carefully. I installed parts in the following order: the CPU unit, the RAM module, the Wireless LAN card, and finally, the SSD.</p>
<p>The DS81 comes with cooling fans, so the cooling fan unit that came in the Intel CPU package was not used.</p>
<p>I started installation by removing the chassis cover, then the rack (on which the internal hard drive, if installed, is mounted) and then the ICE module attachment, which revealed the CPU socket. After locking the Intel i5-4690S CPU chip in the socket I injected thermal paste in the syringe onto the chip, starting in the middle of the chip and drawing a spiral outward. About three-quarters of the paste in the syringe was more than sufficient to cover most of the CPU chip. Next, using an old credit card I spread the paste over the CPU to form a uniform layer, wiping away excess paste accumulating on the credit card as I continued to even out the paste. The Quick Guide has this warning:</p>
<blockquote>
<p>Please do not apply excess amount of thermal paste.</p>
</blockquote>
<p>And I made sure that the layer of thermal paste was not too thick (I reckon the layer was about 0.2mm thick). I also took care to not let the paste spill over the sides of the chip.</p>
<p><img src="http://www.ragupappu.com/assets/images/ds81pc/IMG_7194e.jpg" alt="DS81 showing CPU socket">
<img src="http://www.ragupappu.com/assets/images/ds81pc/IMG_7200e.jpg" alt="ICE module underside">
<img src="http://www.ragupappu.com/assets/images/ds81pc/IMG_7201e.jpg" alt="DS81 with CPU installed and with thermal compound"></p>
<p>The last CPU assembly step was to put back the ICE module. I carried out this step within ~2 minutes of applying the thermal paste, before it dried out. This last step was a bit tricky because the ICE module uses spring-loaded screws and also because the copper surface of the ICE module that lays over the CPU tended to slip sideways on the still-fluid thermal paste. Once the first screw was in place, however, the other two screws went in easily and ICE module attached firmly.</p>
<p>Next I installed RAM. I inserted the 8GB RAM module into the SODIMM slot nearest to the CPU socket lever. The second (empty) slot could be used for adding another 8GB of RAM in the future.</p>
<p>Next I installed the half-PCIe Wireless LAN card. This card went into the lower of the two PCIe slots. After securing the card with a screw (provided on the motherboard itself) I started installing the Wireless antenna but it turned out to be a challenge because there was little room for fingers in that region of the DS81 box. To make room I unmounted the Wireless LAN card and disconnected two connector cables near the PCIe slots. I removed the right metal cutout for the antenna on the back of DS81 using a flat-blade screwdriver by pushing hard on the cutout. And as I set about inserting the antenna screw connector I discovered that the cutout hole was a tad too small for the connector. I remedied the problem by pushing the screwdriver into the cutout hole and giving it 3-4 turns to shave off some metal. The hole was now a bit bigger and the antenna connector went through easily and I was able to install the antenna kit. I re-installed the LAN card, screwed it in, and pressed in the connector of the antenna cable onto the connector marked 1 on the Wireless LAN card. My antenna kit had only one antenna but a second one could be added if desired.</p>
<p><img src="http://www.ragupappu.com/assets/images/ds81pc/IMG_7208e.jpg" alt="DS81 with Wireless LAN card and antenna installed"></p>
<p>Finally, I installed the SSD card into the PCIe slot. Because there was no internal hard-disk drive (HDD) I simply fastened the rack and completed the installation by replacing the cover and refastening the screws. The desktop PC was now ready for an Operating System (OS).</p>
<p><img src="http://www.ragupappu.com/assets/images/ds81pc/IMG_7209e.jpg" alt="DS81 with SSD card, ICE module installed"></p>
<h4>Installing the OS and Configuring BIOS</h4>
<p>I wanted a Linux-based PC and decided to install Ubuntu OS. As of this writing the stable release is Ubuntu 14.04.1 LTS. I prepared for the Ubuntu OS installation by running through the following steps on my Windows 7 laptop to create a bootable USB stick:</p>
<ul>
<li>Downloaded the 64-bit ISO file, ubuntu-14.04.1-desktop-amd64.iso from <a href="http://www.ubuntu.com/download/desktop">Ubuntu</a>, and saved it into the Downloads folder.</li>
<li>Inserted an available, empty 32GB USB stick into the laptop USB port. Any capacity USB stick will work as long as it has a minimum 2GB of free space.</li>
<li>Followed these <a href="http://www.ubuntu.com/download/desktop/create-a-usb-stick-on-windows">instructions</a> on <em>How to Create a Bootable USB stick on Windows</em>.</li>
</ul>
<p>After completing the above steps the bootable USB stick was ready to help install Ubuntu on the DS81 PC. But first the DS81 needed to be powered up and the BIOS configuration changed suitably.</p>
<p>I connected the USB keys of the wireless keyboard and mouse to the two USB 2.0 ports on the back of DS81 and plugged in the HDMI cable from the monitor into HDMI output on the back of DS81, and powered up the DS81. As the computer started up, I pressed the <code>Esc</code> key to get into the BIOS settings screen. [Note: The computer powers up quickly, so if you miss hitting the key within the short time window, press <code>Ctrl</code>-<code>Alt</code>-<code>Del</code> simultaneously which will reboot the computer and you can make another attempt to get back into the BIOS settings screen by pressing <code>Esc</code> or <code>Del</code> key.]</p>
<p>In the BIOS screen I made changes in only two tabs: Advanced and Boot. The images below are screenshots of the Boot tab and the Advanced tab <em>after</em> changes. In the Boot tab, I set up Boot Option #1 to be the bootable USB device (to help set up the OS). Note that in order to be set up as one of the Boot Option devices the bootable USB device must be present in one of the USB ports <em>before</em> the computer is powered up. Next, I selected the internal SSD device as Boot Option #3 (it could also have been set up as Boot Option #2, but I just set it up as option #3). In the Advanced tab, I changed the setting under <code>mini-PCIe/mSATA Select</code> to mSATA.</p>
<p><img src="http://www.ragupappu.com/assets/images/ds81pc/20141204_125053e.jpg" alt="BIOS changes, Boot tab">
<img src="http://www.ragupappu.com/assets/images/ds81pc/20141204_125323e.jpg" alt="BIOS changes, Advanced tab"></p>
<p>After making the BIOS configuration changes, I saved the changes and exited (hit key <code>F4</code>), and rebooted the computer by pressing <code>Ctrl</code>-<code>Alt</code>-<code>Del</code> simultaneously (alternatively, you can also press the DS81 Power-On button). The DS81 booted from the bootable USB stick and I followed these excellent <a href="http://www.ubuntu.com/download/desktop/install-ubuntu-desktop">instructions</a> on the Ubuntu website to install Ubuntu on to the SSD inside the DS81. I did not need to install any drivers for the Wireless LAN card, as I discovered during the OS installation -- the Wireless card came up and connected without issues to my home wireless network. The OS installation completed successfully and my DIY desktop PC was ready for use!</p>
<p>Update (15 December 2014):
I came across the excellent <a href="https://sites.google.com/site/easylinuxtipsproject/">Easy Linux tips project</a> website while searching for some Linux information. It has a section titled 'Right after the installation' which contains links to Ubintu OS tweaks, including topics such as <a href="https://sites.google.com/site/easylinuxtipsproject/first">Round off Ubuntu 14.04 neatly: do this first</a>, <a href="https://sites.google.com/site/easylinuxtipsproject/speed">Speed up your Ubuntu!</a>, and so on. I ran through the steps listed at those links and highly recommend them -- they will improve system performance and, for systems with SSD(s), longevity of the drive(s). Depending on your system and software setup you may not need to run all of the steps of course, but it will take between 40 minutes and 2 hours to cover all the sections, so plan accordingly.</p>
<h4>Conclusion</h4>
<p>This was my first DIY build-a-PC project and I thoroughly enjoyed doing it. Researching the PC form factor and the PC parts and making decisions to select from among several options was a lot of fun. The web is a fantastic resource with countless product reviews, detailed instructions, and helpful community forums, and I would be remiss to not acknowledge the many, many people that helped indirectly to make this project feasible and smooth-sailing. And I am thankful to my family for constant encouragement and support.</p>
<p>Research for this DIY project took about 24 hours, spread over a week. Assembling the PC and installing the OS took about 6 hours, including the time to document build steps (taking notes and photos) and figure out ways around a couple of minor problems during assembly and software installation. The PC cost a bit more than I would have liked to spend but I expect it will serve me well for a long time into the future given what I consider are its 'premier' specs. The PC runs very quiet, I can barely hear the CPU fan.</p>
<p>Overall, it was a fun and joyful project, one that helped me learn a few things, and one I would gladly do again.</p>
Cloudera Hadoop Developer Training – My experience2014-11-21T00:00:00+00:00http://www.ragupappu.com//2014/11/21/cloudera-hadoop-developer-training<p>Last week (Nov 11-14) I attended the <a href="http://www.cloudera.com/content/cloudera/en/training/courses/developer-training.html">Cloudera Developer Training for Apache Hadoop</a> course and in this post I share my experience and takeaways from that training. But first, a brief bit about me to lay out the context for my experience with this training. I have worked in the Telecommunications industry and have several years experience with Embedded Systems software design and development and about 2 months ago I decided to work in the Big Data space, which led me to this training course.</p>
<h3>Research and Preparation</h3>
<p>I did a lot of research online and discussed with friends on ways to get started in Big Data. The general recommendation was to start by getting trained in Hadoop. I decide to go with <a href="http://www.cloudera.com/content/cloudera/en/home.html">Cloudera</a> training because it is the clear leader in the Big Data space. I reviewed the training course topics for the Apache Hadoop course, and with an intent to get credentialed as a <a href="http://www.cloudera.com/content/cloudera/en/training/certification/ccdh.html">Cloudera Certified Developer for Apache Hadoop</a>, a certification well-regarded in the industry, I registered for the Developer Training for Hadoop course.</p>
<p>I had about 6 weeks till start of training and, wanting to make the most of the training class, I prepared for it. My goal was to have, in time for the start of training, a few things covered -- basic understanding of the two key Hadoop components, MapReduce and HDFS, high-level knowledge of the Hadoop ecosystem, a little bit of Hadoop programming, and some knowledge of the practical applications, issues, and limits of Hadoop as it related to Big Data. To that end I relied heavily on the Web, Google search, and books. I started learning Hadoop from <a href="http://www.amazon.com/Hadoop-Beginners-Guide-Garry-Turkington/dp/1849517304/ref=sr_1_1_twi_1_pap?s=books&ie=UTF8&qid=1428696471&sr=1-1&keywords=hadoop+beginner%27s+guide">Hadoop Beginner's Guide</a>, and a week later started reading Tom White's <a href="http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520/ref=sr_1_1?ie=UTF8&amp;qid=1416625165&amp;sr=8-1&amp;keywords=Hadoop+the+definitive+guide">Hadoop: The Definitive Guide</a>, in parallel. I installed Hadoop on my laptop and ran example programs from the first book. I pored through articles, blog posts, presentations, and white papers about anything Hadoop and Big Data. As the training date approached, I felt reasonably well-prepared. In particular, Tom White's book was very helpful -- it provides a great introduction to Hadoop with simple, clear explanations of fairly involved concepts, lots of real code examples in Java, C, Python, etc, and detailed discussions on the relative benefits and limitations of different methods used in Hadoop.</p>
<h3>Hadoop Developer Training</h3>
<p>The 4-day training was delivered by <a href="http://http://www.exitcertified.com/">ExitCertified</a> at their clean, quiet, and well-prepared facility in San Francisco. Our instructor, Joel, taught the class all four days. The training class ran each day from 9AM to 5PM with one hour break for lunch around noon, and with two 5-10 minute breaks each during the morning and afternoon sessions. Our class comprised of 6 on-site and 8 remote-site trainees, the latter joining in via video conferencing. One of the trainees had been flown in from Ireland by her company to get trained on-site! The trainees came from diverse job backgrounds, industries, and work experience levels, and the majority did not have Hadoop development experience. I was somewhat puzzled by the rather small number of on-site trainees given the high demand for Hadoop developers in Silicon Valley. Maybe the upcoming Thanksgiving break had something to do with it. But the small number of trainees made for better opportunities for intimate discussions and detailed question-and-answer sessions.</p>
<p>The training started with each student signing into the Cloudera training portal which gave access to the Cloudera Hadoop Training PowerPoint slides and the Developer Exercise Instructions document. Our instructor Joel also generously provided a USB thumb drive containing a lot of additional material, including his lecture slides (separate from the Cloudera training slides), articles, FAQs, demo code snippets, etc for downloading to our personal laptops for later study and reference. The setup for each student included a desktop with two monitors to facilitate viewing lecture slides and running lab exercises. The desktop came installed with a Virtual Machine (VM) which was our platform to run the lab exercises.</p>
<p>The Hadoop Developer Training Course attempts a comprehensive coverage of Hadoop and includes a variety of <a href="http://www.cloudera.com/content/dam/cloudera/Resources/PDF/Datasheet/Developer_Training_for_Apache_Hadoop.pdf">topics</a>. It starts with a high-level introduction to Hadoop, moves on to detailed coverage of MapReduce and HDFS, and winds down with an introduction to some of the Hadoop ecosystem projects. In each class time was spent on lectures (65%) and hands-on lab exercises (35%). On Day1 the topics were Motivation for Hadoop and Introduction to MapReduce and HDFS; on Day2 and Day3, there was in-depth coverage of MapReduce, including writing MapReduce programs, writing unit tests, and detailed presentation of Reducers and Combiners; and on final Day4, the topics were other Hadoop ecosystem projects, including Hive, Pig, Impala, Sqoop, and Oozie. Each lecture on a key topic was followed by a lab exercise that explored that topic. On Day4 the class got a bonus. Joel asked for a vote on <em>other</em> Hadoop ecosystem projects of interest and the class voted for Spark, Mahout, Zookeeper, and Graph Processing which topics he lectured on in good detail.</p>
<p>The Cloudera Hadoop training slide deck contains 600+ slides, so there was a lot of material to cover during the 4-day Developer Training course. And appropriately, Joel maintained a brisk pace in his lectures, which also allowed room for questions and discussions. Occasionally, some of the topics were skimmed or entirely skipped in favor of in-depth and detailed discussions of the more important topics around MapReduce. I liked Joel's lecture style: he placed just the right emphasis on details and highlights of important topics and frequently switched to his own lecture slides (that were detailed, accurate, and contained more up-to-date information) to supplement the Cloudera training material. Early on it was clear that Joel came prepared for the lectures, that he was hands-on (wrote code snippets), and was knowledgeable about the wider industry and related technologies. To keep students engaged in the often dense technical material Joel injected doses of humor in his lectures and frequently ended lectures on major topics with review questions directed at the trainees. At the end of lab exercises Joel went over the solution source code of the lab exercise problem, explaining the details and pointing out key pieces in the code.</p>
<p>Speaking of lab exercises, the Cloudera Hadoop Developer training <a href="http://www.cloudera.com/content/cloudera/en/training/courses/developer-training.html">webpage</a> states under the <em>Audience and Prerequisites</em> section that "<em>...Knowledge of Java is strongly recommended and is required to complete the hands-on exercises...</em>". This prerequisite is desirable but should not be viewed as a show-stopper. I do not have Java programming experience but during lab that limitation only slowed down my ability to complete the lab exercises. The solutions (i.e., full source code) for the lab exercises were also provided within the project directories and one could refer to those solutions while attempting the lab exercises, which was a big help for trainees without Java programming experience. The important thing was to understand how the technique or methodology presented during the lecture was implemented in code.</p>
<p>In summary, the following are my takeaways from the training:</p>
<ul>
<li>Hadoop is powerful technology and its component, MapReduce, is powerful but also complex; an optimal Hadoop implementation to solve a real-world problem requires a deep understanding of the various "components" of MapReduce. This training provides sufficient material to be able to start writing and debugging MapReduce programs using Hadoop. The training only helps flatten the learning curve but, like any endeavor toward mastery, only deliberate practice will make you a Hadoop expert.</li>
<li>Newer Hadoop ecosystem projects such as Pig and Hive hide some of the complexities of MapReduce from the user. The question then is this: does a user need to intimately understand MapReduce in order to solve the problem at hand (which is somewhat akin to the need to intimately understand internal combustion engines in order to drive a car)? In my opinion, the answer is, for many problems, no. For example, a candidate MapReduce problem could potentially be solved with just a few lines of a Pig script while requiring minimal to no understanding of MapReduce. So, for somebody seeking to learn data analysis for example, then, a more beneficial course might be the <a href="http://www.cloudera.com/content/cloudera/en/training/courses/data-analyst-training.html">Cloudera Data Analyst Training</a> which covers Pig, Hive, and Impala in greater depth than the Developer Training for Hadoop course.</li>
<li>More recently, Apache Spark, has been gaining in popularity for problems requiring iterative computing (e.g., Machine Learning problems). Spark can run programs on the computing cluster upto 100x faster than Hadoop MapReduce programs. For someone attempting to get started in Big Data, the <a href="http://www.cloudera.com/content/cloudera/en/training/courses/spark-training.html">Cloudera Developer Training for Apache Spark</a> may be a more attractive, or at least, equally good, alternative to Developer Training for Hadoop.</li>
</ul>
<h3>Conclusion</h3>
<p>Overall, I found the training useful. The classes held my interest and I gained some insights that I might not have easily discovered on my own by merely reading books or searching on the web. Another benefit of attending the classes on-site was that it provided opportunities to interact with other students in the class which helped me get a picture of the kinds of problems companies are attempting to solve using Hadoop. For its technical content, immediate use, and the high quality of the lecturer, I would happily recommend the Cloudera Developer Training for Hadoop course. But I would also urge anyone considering signing up for a training course, especially someone just getting started, to carefully evaluate various aspects of each Developer Training course, including course content, merits of the technology, and growing (or not) use of that technology in the market, and then sign up for one that (a) can help solve the problem(s) relatively quickly (not too steep learning curve), and (b) that uses a technology with demonstrated advantages over similar, perhaps competing, technologies.</p>
Installing Hadoop on Linux Mint2014-10-31T00:00:00+00:00http://www.ragupappu.com//2014/10/31/hadoop-install-linux-mint32<p>In this post I list steps to install Hadoop on LinuxMint 17. After installing Hadoop I also list the steps to set up a single-node cluster. My system is a laptop running Windows 7 Home Premium. On this laptop Linux Mint is hosted on Oracle's VirtualBox VM 4.3.16r59572 which is itself hosted on Windows 7. I am a Hadoop beginner and a relatively light user (less experienced?) of Linux and my attempt here is to document the installation and set up process in detail to help the reader understand the purpose and motivations for some of the steps in the installation process.</p>
<p>The installation process documented here is a synthesis of steps and neat ideas from a few resources on the web that I discovered during my research. Those resources are included in the References section at the end of this post.</p>
<p>Hadoop requires Java 1.5 (aka Java 5) and therefore the installation process described here consists of two parts: Part I covers Java installation, and Part II covers Hadoop installation.</p>
<h3>Part I - Java Installation</h3>
<p>Login to your Linux user account and open up a terminal window. Menu → Terminal or Menu → System Tools → Terminal. In this installation the Linux user account name is <code>user1</code>.
The first step before installing the Java Development Kit (JDK) is to determine the system architecture of the system: is it a 32-bit (x86) system or a 64-bit (x64) system? This is determined by using the uname command which prints the system information.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-Vbox ~ $ uname -m
user1@user1-VBox ~ $ uname --kernel-name --processor
Linux i686
user1@user1-Vbox ~ $
</code></pre></div>
<p>The lscpu command yields more user-friendly information.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-VBox ~ $ lscpu
Architecture: i686
CPU op-mode(s): 32-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Stepping: 10
CPU MHz: 2107.042
BogoMIPS: 4214.08
user1@user1-VBox ~ $
</code></pre></div>
<p>i686 refers to the Intel P6 microarchitecture. The first two rows of the response to lscpu command mean that the kernel is 32-bit (Architecture: i686) and that the CPU is set to operate in 32-bit instruction width mode (CPU op-mode(s): 32-bit). So, from the perspective of Linux Mint17, my laptop is a 32-bit system. On a 64-bit system the command will return a string containing '64', for example, x86_64 (for an Intel CPU) or amd64 (for a 64-bit AMD CPU).
<span style="text-decoration: underline;">Note</span>: <em>On my laptop the Virtual Box VM runs in 32-bit mode and so does Linux Mint17. But Windows7 on my laptop reports System Type as 64-bit Operating System. Why then are Virtual Box, and consequently Linux Mint17, running in 32-bit mode? The answer is that it is hardware capability, not Windows, that determines the mode a VM runs in. For a VM to run in 64-bit mode the underlying hardware, specifically the CPU, must be 64-bit AND capable of virtualization support. Additionally, the virtualization support must be enabled. The processor in my laptop is an Intel Core2 Duo T6600 CPU which is a 64-bit CPU but does not support virtualization. Indeed, running the <a href="http://www.microsoft.com/en-us/download/details.aspx?id=592%22%20target=%22_blank%22">Microsoft Hardware-Assisted Virtualization Detection Tool</a> on my laptop confirms this because it reports back the following: "This computer does not have hardware-assisted virtualization."</em></p>
<p>The latest version of Java (JDK), as of mid-October 2014, is JDK 8. Download the appropriate JDK binary for your system from <a href="http://www.oracle.com/technetwork/java/javase/downloads/index.html">Orcale</a>. Since mine is a 32-bit system, I download file jdk-8u25-linux-i586.tar.gz and save it into the Downloads directory (<code>/home/user1/Downloads</code>). For a 64-bit system the corresponding file to download would be jdk-8u25-linux-x64.tar.gz. The remaining installation instructions apply the same way for a 64-bit system as for the 32-bit system.
<span style="text-decoration: underline;">Note</span>: <em>This is not an RPM installation, so be careful to download only the .tar.gz file.</em></p>
<p>First, remove current installation, if any, of OpenJDK, IcedTea, and any Java stuff unrelated to Oracle Java.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-Vbox ~ $ sudo apt-get update
user1@user1-Vbox ~ $ sudo apt-get remove openjdk* icedtea*
user1@user1-Vbox ~ $ sudo apt-get remove default-jre default-jdk gcj-jre gcj-jdk
user1@user1-Vbox ~ $ sudo apt-get remove icedtea-{6..7}-plugin icedtea-plugin
# clean up
user1@user1-Vbox ~ $ sudo apt-get autoremove
user1@user1-Vbox ~ $ sudo apt-get autoclean
</code></pre></div>
<p>Install Java in <code>/opt/java</code>. Create the java directory under <code>/opt</code>, change directory to <code>/opt/java</code>, relocate the downloaded JDK tar file from Downloads directory to <code>/opt/java</code> directory, and extract the files.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-Vbox ~ $ sudo mkdir -p /opt/java
user1@user1-Vbox ~ $ cd /opt/java
user1@user1-Vbox ~ $ sudo cp /home/user1/Downloads/jdk-8u25-linux-i586.tar.gz .
user1@user1-Vbox ~ $ sudo tar -zxvf jdk-8u25-linux-i586.tar.gz
</code></pre></div>
<p>Create a sym-link called <code>current-java</code> to the just-created jdk1.8.0_25 directory. When it is time to upgrade JDK this sym-link will help simplify the upgrade process -- fewer steps will be required during upgrade compared to first-time installation.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-Vbox /opt/java $ sudo rm -f /opt/java/current-java
user1@user1-Vbox /opt/java $ sudo ln -s /opt/java/jdk1.8.0_25 /opt/java/current-java
</code></pre></div>
<p>Now current-java points to jdk1.8.0_25. When it is time to upgrade, simply delete current-java and re-create a sym-link to the newer version JDK folder.</p>
<p>Verify that the symbolic link has been created.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-Vbox /opt/java $ ls
current-java jdk1.8.0_25 jdk-8u25-linux-i586.tar.gz
</code></pre></div>
<p>Update the system Java settings. The following set of steps need be executed only one time - during initial installation. These steps are not required during upgrade. Since several commands need to be executed, it is best to create a file that contains all those commands and then run the file. Let's call this file updt<em>java</em>settings.cmds. Run these commands to create the file.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-Vbox /opt/java $ su
user1-Vbox java # cd /opt/java
user1-Vbox java # touch update_java_settings.cmds # create the file
user1-Vbox java # chmod +x update_java_settings.cmds # set execute permissions
</code></pre></div>
<p>The text below is the set of commands to update Java system settings. Copy the full text below to the clipboard.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">update-alternatives --install /usr/bin/jexec jexec /opt/java/current-java/lib/jexec 1065 &amp;&amp; sleep 1s
update-alternatives --set jexec /opt/java/current-java/lib/jexec &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/appletviewer appletviewer /opt/java/current-java/bin/appletviewer 1065 &amp;&amp; sleep 1s
update-alternatives --set appletviewer /opt/java/current-java/bin/appletviewer &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/apt apt /opt/java/current-java/bin/apt 1065 &amp;&amp; sleep 1s
update-alternatives --set apt /opt/java/current-java/bin/apt &amp;&amp; sleep 1s
# the file /opt/java/current-java/bin/ControlPanel was ignored
update-alternatives --install /usr/bin/extcheck extcheck /opt/java/current-java/bin/extcheck 1065 &amp;&amp; sleep 1s
update-alternatives --set extcheck /opt/java/current-java/bin/extcheck &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/idlj idlj /opt/java/current-java/bin/idlj 1065 &amp;&amp; sleep 1s
update-alternatives --set idlj /opt/java/current-java/bin/idlj &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jar jar /opt/java/current-java/bin/jar 1065 &amp;&amp; sleep 1s
update-alternatives --set jar /opt/java/current-java/bin/jar &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jarsigner jarsigner /opt/java/current-java/bin/jarsigner 1065 &amp;&amp; sleep 1s
update-alternatives --set jarsigner /opt/java/current-java/bin/jarsigner &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/java java /opt/java/current-java/bin/java 1065 &amp;&amp; sleep 1s
update-alternatives --set java /opt/java/current-java/bin/java &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/javac javac /opt/java/current-java/bin/javac 1065 &amp;&amp; sleep 1s
update-alternatives --set javac /opt/java/current-java/bin/javac &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/javadoc javadoc /opt/java/current-java/bin/javadoc 1065 &amp;&amp; sleep 1s
update-alternatives --set javadoc /opt/java/current-java/bin/javadoc &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/javafxpackager javafxpackager /opt/java/current-java/bin/javafxpackager 1065 &amp;&amp; sleep 1s
update-alternatives --set javafxpackager /opt/java/current-java/bin/javafxpackager &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/javah javah /opt/java/current-java/bin/javah 1065 &amp;&amp; sleep 1s
update-alternatives --set javah /opt/java/current-java/bin/javah &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/javap javap /opt/java/current-java/bin/javap 1065 &amp;&amp; sleep 1s
update-alternatives --set javap /opt/java/current-java/bin/javap &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/java-rmi.cgi java-rmi.cgi /opt/java/current-java/bin/java-rmi.cgi 1065 &amp;&amp; sleep 1s
update-alternatives --set java-rmi.cgi /opt/java/current-java/bin/java-rmi.cgi &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/javaws javaws /opt/java/current-java/bin/javaws 1065 &amp;&amp; sleep 1s
update-alternatives --set javaws /opt/java/current-java/bin/javaws &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jcmd jcmd /opt/java/current-java/bin/jcmd 1065 &amp;&amp; sleep 1s
update-alternatives --set jcmd /opt/java/current-java/bin/jcmd &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jconsole jconsole /opt/java/current-java/bin/jconsole 1065 &amp;&amp; sleep 1s
update-alternatives --set jconsole /opt/java/current-java/bin/jconsole &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jcontrol jcontrol /opt/java/current-java/bin/jcontrol 1065 &amp;&amp; sleep 1s
update-alternatives --set jcontrol /opt/java/current-java/bin/jcontrol &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jdb jdb /opt/java/current-java/bin/jdb 1065 &amp;&amp; sleep 1s
update-alternatives --set jdb /opt/java/current-java/bin/jdb &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jhat jhat /opt/java/current-java/bin/jhat 1065 &amp;&amp; sleep 1s
update-alternatives --set jhat /opt/java/current-java/bin/jhat &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jinfo jinfo /opt/java/current-java/bin/jinfo 1065 &amp;&amp; sleep 1s
update-alternatives --set jinfo /opt/java/current-java/bin/jinfo &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jmap jmap /opt/java/current-java/bin/jmap 1065 &amp;&amp; sleep 1s
update-alternatives --set jmap /opt/java/current-java/bin/jmap &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jmc jmc /opt/java/current-java/bin/jmc 1065 &amp;&amp; sleep 1s
update-alternatives --set jmc /opt/java/current-java/bin/jmc &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jmc.ini jmc.ini /opt/java/current-java/bin/jmc.ini 1065 &amp;&amp; sleep 1s
update-alternatives --set jmc.ini /opt/java/current-java/bin/jmc.ini &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jps jps /opt/java/current-java/bin/jps 1065 &amp;&amp; sleep 1s
update-alternatives --set jps /opt/java/current-java/bin/jps &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jrunscript jrunscript /opt/java/current-java/bin/jrunscript 1065 &amp;&amp; sleep 1s
update-alternatives --set jrunscript /opt/java/current-java/bin/jrunscript &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jsadebugd jsadebugd /opt/java/current-java/bin/jsadebugd 1065 &amp;&amp; sleep 1s
update-alternatives --set jsadebugd /opt/java/current-java/bin/jsadebugd &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jstack jstack /opt/java/current-java/bin/jstack 1065 &amp;&amp; sleep 1s
update-alternatives --set jstack /opt/java/current-java/bin/jstack &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jstat jstat /opt/java/current-java/bin/jstat 1065 &amp;&amp; sleep 1s
update-alternatives --set jstat /opt/java/current-java/bin/jstat &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jstatd jstatd /opt/java/current-java/bin/jstatd 1065 &amp;&amp; sleep 1s
update-alternatives --set jstatd /opt/java/current-java/bin/jstatd &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/jvisualvm jvisualvm /opt/java/current-java/bin/jvisualvm 1065 &amp;&amp; sleep 1s
update-alternatives --set jvisualvm /opt/java/current-java/bin/jvisualvm &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/keytool keytool /opt/java/current-java/bin/keytool 1065 &amp;&amp; sleep 1s
update-alternatives --set keytool /opt/java/current-java/bin/keytool &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/native2ascii native2ascii /opt/java/current-java/bin/native2ascii 1065 &amp;&amp; sleep 1s
update-alternatives --set native2ascii /opt/java/current-java/bin/native2ascii &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/orbd orbd /opt/java/current-java/bin/orbd 1065 &amp;&amp; sleep 1s
update-alternatives --set orbd /opt/java/current-java/bin/orbd &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/pack200 pack200 /opt/java/current-java/bin/pack200 1065 &amp;&amp; sleep 1s
update-alternatives --set pack200 /opt/java/current-java/bin/pack200 &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/policytool policytool /opt/java/current-java/bin/policytool 1065 &amp;&amp; sleep 1s
update-alternatives --set policytool /opt/java/current-java/bin/policytool &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/rmic rmic /opt/java/current-java/bin/rmic 1065 &amp;&amp; sleep 1s
update-alternatives --set rmic /opt/java/current-java/bin/rmic &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/rmid rmid /opt/java/current-java/bin/rmid 1065 &amp;&amp; sleep 1s
update-alternatives --set rmid /opt/java/current-java/bin/rmid &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/rmiregistry rmiregistry /opt/java/current-java/bin/rmiregistry 1065 &amp;&amp; sleep 1s
update-alternatives --set rmiregistry /opt/java/current-java/bin/rmiregistry &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/schemagen schemagen /opt/java/current-java/bin/schemagen 1065 &amp;&amp; sleep 1s
update-alternatives --set schemagen /opt/java/current-java/bin/schemagen &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/serialver serialver /opt/java/current-java/bin/serialver 1065 &amp;&amp; sleep 1s
update-alternatives --set serialver /opt/java/current-java/bin/serialver &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/servertool servertool /opt/java/current-java/bin/servertool 1065 &amp;&amp; sleep 1s
update-alternatives --set servertool /opt/java/current-java/bin/servertool &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/tnameserv tnameserv /opt/java/current-java/bin/tnameserv 1065 &amp;&amp; sleep 1s
update-alternatives --set tnameserv /opt/java/current-java/bin/tnameserv &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/unpack200 unpack200 /opt/java/current-java/bin/unpack200 1065 &amp;&amp; sleep 1s
update-alternatives --set unpack200 /opt/java/current-java/bin/unpack200 &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/wsgen wsgen /opt/java/current-java/bin/wsgen 1065 &amp;&amp; sleep 1s
update-alternatives --set wsgen /opt/java/current-java/bin/wsgen &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/wsimport wsimport /opt/java/current-java/bin/wsimport 1065 &amp;&amp; sleep 1s
update-alternatives --set wsimport /opt/java/current-java/bin/wsimport &amp;&amp; sleep 1s
update-alternatives --install /usr/bin/xjc xjc /opt/java/current-java/bin/xjc 1065 &amp;&amp; sleep 1s
update-alternatives --set xjc /opt/java/current-java/bin/xjc &amp;&amp; sleep 1s
# system config for /opt/java/current-java/jre/bin/
# the file /opt/java/current-java/bin/ControlPanel was ignored
update-alternatives --install /usr/bin/java_vm java_vm /opt/java/current-java/jre/bin/java_vm 1065 &amp;&amp; sleep 1s
update-alternatives --set java_vm /opt/java/current-java/jre/bin/java_vm &amp;&amp; sleep 1s
</code></pre></div>
<p>Open updt_java_settings.cmds in your favorite editor and paste the clipboard. Save the file, exit out of the editor and execute all the commands.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1-Vbox java # /bin/bash updt_java_settings.cmds
</code></pre></div>
<p>The Java installation is now complete. Verify that Java has installed properly.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1-Vbox java # java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) Client VM (build 25.25-b02, mixed mode)
</code></pre></div>
<p>Set up environment variables for use with the new Java installation. These variables could be set up for each user or for the whole system. For individual user the changes can be made to <code>~/.bashrc</code> and for system overall, to <code>/etc/bash.bashrc</code>. Copy text below into the appropriate file.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">### $CLASS_PATH
CLASS_PATH="$CLASS_PATH"
if ! [[ "$CLASS_PATH" =~ ^:*(.*:+)*\.{1}(:+.*)*$ ]]; then
CLASS_PATH="$CLASS_PATH:."
fi
if ! [[ "$CLASS_PATH" =~ ^:*(.*:+)*\.{2}(:+.*)*$ ]]; then
CLASS_PATH="$CLASS_PATH:.."
fi
export CLASS_PATH
### $JAVA_HOME
: ${JAVA_HOME:=/opt/java/current-java}
export JAVA_HOME
### $PATH
export PATH="$PATH:$JAVA_HOME/bin"
</code></pre></div>
<p>The above environment variables will take effect upon rebooting the system or sourcing the newly modified <code>.bashrc</code> file. Create sym-links to the Java manpages. The following steps till the end of this section are optional.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-Vbox /opt/java $ su
user1-Vbox java # cd /opt/java
user1-Vbox java # touch manpagelinker.cmds # create the file
user1-Vbox java # chmod +x manpagelinker.cmds # set execute permissions
</code></pre></div>
<p>Open the manpagelinker.cmds file in your text editor, copy-paste the text below into it, and save the file.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">ln -s /opt/java/current-java/man/man1/appletviewer.1 /usr/share/man/man1/appletviewer.1
ln -s /opt/java/current-java/man/man1/apt.1 /usr/share/man/man1/apt.1
ln -s /opt/java/current-java/man/man1/extcheck.1 /usr/share/man/man1/extcheck.1
ln -s /opt/java/current-java/man/man1/idlj.1 /usr/share/man/man1/idlj.1
ln -s /opt/java/current-java/man/man1/jar.1 /usr/share/man/man1/jar.1
ln -s /opt/java/current-java/man/man1/jarsigner.1 /usr/share/man/man1/jarsigner.1
ln -s /opt/java/current-java/man/man1/java.1 /usr/share/man/man1/java.1
ln -s /opt/java/current-java/man/man1/javac.1 /usr/share/man/man1/javac.1
ln -s /opt/java/current-java/man/man1/javadoc.1 /usr/share/man/man1/javadoc.1
ln -s /opt/java/current-java/man/man1/javafxpackager.1 /usr/share/man/man1/javafxpackager.1
ln -s /opt/java/current-java/man/man1/javah.1 /usr/share/man/man1/javah.1
ln -s /opt/java/current-java/man/man1/javap.1 /usr/share/man/man1/javap.1
ln -s /opt/java/current-java/man/man1/javaws.1 /usr/share/man/man1/javaws.1
ln -s /opt/java/current-java/man/man1/jcmd.1 /usr/share/man/man1/jcmd.1
ln -s /opt/java/current-java/man/man1/jconsole.1 /usr/share/man/man1/jconsole.1
ln -s /opt/java/current-java/man/man1/jdb.1 /usr/share/man/man1/jdb.1
ln -s /opt/java/current-java/man/man1/jhat.1 /usr/share/man/man1/jhat.1
ln -s /opt/java/current-java/man/man1/jinfo.1 /usr/share/man/man1/jinfo.1
ln -s /opt/java/current-java/man/man1/jmap.1 /usr/share/man/man1/jmap.1
ln -s /opt/java/current-java/man/man1/jmc.1 /usr/share/man/man1/jmc.1
ln -s /opt/java/current-java/man/man1/jps.1 /usr/share/man/man1/jps.1
ln -s /opt/java/current-java/man/man1/jrunscript.1 /usr/share/man/man1/jrunscript.1
ln -s /opt/java/current-java/man/man1/jsadebugd.1 /usr/share/man/man1/jsadebugd.1
ln -s /opt/java/current-java/man/man1/jstack.1 /usr/share/man/man1/jstack.1
ln -s /opt/java/current-java/man/man1/jstat.1 /usr/share/man/man1/jstat.1
ln -s /opt/java/current-java/man/man1/jstatd.1 /usr/share/man/man1/jstatd.1
ln -s /opt/java/current-java/man/man1/jvisualvm.1 /usr/share/man/man1/jvisualvm.1
ln -s /opt/java/current-java/man/man1/keytool.1 /usr/share/man/man1/keytool.1
ln -s /opt/java/current-java/man/man1/native2ascii.1 /usr/share/man/man1/native2ascii.1
ln -s /opt/java/current-java/man/man1/orbd.1 /usr/share/man/man1/orbd.1
ln -s /opt/java/current-java/man/man1/pack200.1 /usr/share/man/man1/pack200.1
ln -s /opt/java/current-java/man/man1/policytool.1 /usr/share/man/man1/policytool.1
ln -s /opt/java/current-java/man/man1/rmic.1 /usr/share/man/man1/rmic.1
ln -s /opt/java/current-java/man/man1/rmid.1 /usr/share/man/man1/rmid.1
ln -s /opt/java/current-java/man/man1/rmiregistry.1 /usr/share/man/man1/rmiregistry.1
ln -s /opt/java/current-java/man/man1/schemagen.1 /usr/share/man/man1/schemagen.1
ln -s /opt/java/current-java/man/man1/serialver.1 /usr/share/man/man1/serialver.1
ln -s /opt/java/current-java/man/man1/servertool.1 /usr/share/man/man1/servertool.1
ln -s /opt/java/current-java/man/man1/tnameserv.1 /usr/share/man/man1/tnameserv.1
ln -s /opt/java/current-java/man/man1/unpack200.1 /usr/share/man/man1/unpack200.1
ln -s /opt/java/current-java/man/man1/wsgen.1 /usr/share/man/man1/wsgen.1
ln -s /opt/java/current-java/man/man1/wsimport.1 /usr/share/man/man1/wsimport.1
ln -s /opt/java/current-java/man/man1/xjc.1 /usr/share/man/man1/xjc.1
</code></pre></div>
<p>Now create sym-links to the manpages.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1-Vbox java # /bin/bash manpagelinker.cmds
</code></pre></div>
<p>Java installation is now complete. Next we start with Part II, Hadoop Installation.</p>
<h3>Part II - Hadoop Installation</h3>
<p>This part of the installation assumes that Java has been successfully installed and verified.</p>
<p>Create a new user group for Hadoop and create a new user account for Hadoop programming. This provides the benefit of keeping the Hadoop installation and user account(s) separate from other software installations running on the system. For example, security, permissions, data storage, etc. can be managed independently for Hadoop-related activities.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-VBox ~ $ cd ~
user1@user1-VBox ~ $ sudo addgroup hadoop
user1@user1-VBox ~ $ sudo adduser --ingroup hadoop hduser1
</code></pre></div>
<h4>Install and Configure SSH</h4>
<p>Hadoop requires communication between multiple processes running on one or more machines which, in turn, implies that the user needs to be able to connect to the various hosts without a password. This is achieved by using Secure Shell (SSH). We use OpenSSH (the free version of SSH connectivity tools). The setup requires three steps:
i. install OpenSSH,
ii. generate SSH key pair with empty passphrase, and
iii. enable access to your local machine with the newly created keys</p>
<p>In the main user account (<code>user1</code> in our setup) install OpenSSH.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-VBox ~ $ sudo apt-get install openssh-server
</code></pre></div>
<p>Generate the SSH key pair.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-VBox ~ $ su hduser1
hduser1@user1-VBox /home/user1 $ ssh-keygen -t rsa -P ""
</code></pre></div>
<p>Simply hit Enter to accept the default filename in which to save the key. This will create directory <code>/home/hduser1/.ssh</code> and save a public key in file <code>rsa.pub</code> in that directory. Next, add this public key to the list of authorized keys to enable access to the local machine.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">hduser1@user1-VBox /home/user1 $ cat $HOME/.ssh/id_rsa.pub &gt;&gt; $HOME/.ssh/authorized_keys
</code></pre></div>
<p>Test the SSH setup (by issuing command below) by connecting to the local machine from <code>hduser1</code>. This also saves the host fingerprint of the local machine to list of hosts known to user <code>hduser1</code>. Specifically, the local machine host fingerprint is added to <code>hduser1</code>'s known_hosts file.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">hduser1@user1-VBox /home/user1 $ ssh localhost
</code></pre></div>
<p>Enter yes to the question “Are you sure you want to continue connecting (yes/no)?” This step confirms to user <code>hduser1</code> that the host certificate presented by the local machine can be trusted. This completes SSH installation and configuration. Type exit (or <code>Ctrl-d</code>) to close the connection to the local host.</p>
<h4>Install Hadoop</h4>
<p>Login to the main user account (<code>user1</code> in our case). Download the latest Hadoop binaries from <a href="http://www.apache.org/dyn/closer.cgi/hadoop/common/">Apache</a>. Click on the suggested mirror, then click on the 'stable' directory, and finally on the link to the hadoop-x.y.z.tar.gz file, where x.y.z represents the Hadoop version number. As of mid-October 2014 the stable Hadoop version is 2.5.1. Save the file into the local 'Downloads' directory.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-VBox ~ $ cd Downloads/
user1@user1-VBox ~/Downloads $ sudo tar zxvf hadoop-2.5.1.tar.gz
</code></pre></div>
<p>Relocate the Hadoop files to /usr/local, create a sym-link, and change ownership of the Hadoop directories and files to the hadoop group and <code>hduser1</code>.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-VBox ~/Downloads $ sudo mv hadoop-2.5.1 /usr/local
user1@user1-VBox ~/Downloads $ cd /usr/local/
user1@user1-VBox /usr/local $ sudo ln -s hadoop-2.5.1 hadoop
user1@user1-VBox /usr/local $ sudo chown -R hduser1:hadoop hadoop
user1@user1-VBox /usr/local $ sudo chown -R hduser1:hadoop hadoop-2.5.1
</code></pre></div>
<h4>Configure the Environment</h4>
<p>In a freshly created user account on Linux Mint there is no <code>.bashrc</code> file. It needs to be created and then edited to add commands to configure the environment.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-VBox ~ $ su hduser1
hduser1@user1-VBox /home/user1 $ cd ~
hduser1@user1-VBox ~ $ touch .bashrc
</code></pre></div>
<p>Open <code>.bashrc</code> in your text editor and add the following lines.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text"># Run the .bashrc file in /etc first if it exists
if [ -f /etc/bash.bashrc ]; then
. /etc/bash.bashrc
fi
# Hadoop variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
</code></pre></div>
<p><span style="text-decoration: underline;">Notes</span>:</p>
<ol>
<li><em>Environment variable JAVA</em>HOME is not defined or exported here because those actions happen at the system level via <code>/etc/bash.bashrc</code>._</li>
<li><p><em>Setting and exporting env variable $HADOOP</em>COMMON<em>LIB</em>NATIVE<em>DIR fixes the warning shown below when Hadoop is started (via start-dfs.sh) or stopped (via stop-dfs.sh).</em></p>
<p><em>Java HotSpot(TM) Client VM warning: You have loaded library /usr/local/hadoop-2.5.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.</em></p></li>
</ol>
<p>Verify that Hadoop is installed properly.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">hduser1@user1-VBox ~ $ source .bashrc
hduser1@user1-VBox ~ $ hadoop version
Hadoop 2.5.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 2e18d179e4a8065b6a9f29cf2de9451891265cce
Compiled by jenkins on 2014-09-05T23:11Z
Compiled with protoc 2.5.0
From source with checksum 6424fcab95bfff8337780a181ad7c78
This command was run using /usr/local/hadoop-2.5.1/share/hadoop/common/hadoop-common-2.5.1.jar
hduser1@user1-VBox ~ $
</code></pre></div>
<h4>Configure Hadoop</h4>
<p>Configure the directories Hadoop uses to store data, etc. By default, Hadoop uses hadoop.tmp.dir as temporary storage location for the local filesystem and for HDFS. Create the temporary storage area. Also create storage for hdfs to store namenode and datanode data. But first get back to the main user account (i.e., return to <code>user1</code>, since <code>hduser1</code> does not have the required permissions for the next steps).</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">hduser1@user1-VBox ~ $ exit
user1@user1-VBox ~ $ sudo mkdir -p /app/hadoop/tmp
user1@user1-VBox ~ $ sudo mkdir -p /app/hadoop/usr/hduser1/data/hdfs/namenode
user1@user1-VBox ~ $ sudo mkdir -p /app/hadoop/usr/hduser1/data/hdfs/datanode
</code></pre></div>
<p>Change ownership of the directories to the hadoop group and change permissions to tighten security.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-VBox ~ $ sudo chown hduser1:hadoop /app/hadoop/tmp
user1@user1-VBox ~ $ sudo chown -R hduser1:hadoop /app/hadoop/usr/hduser1
user1@user1-VBox ~ $ sudo chmod 750 /app/hadoop/tmp
user1@user1-VBox ~ $ sudo chmod -R 750 /app/hadoop/usr/hduser1
</code></pre></div>
<p>Now configure these four .xml files: <code>core-site.xml</code>, <code>yarn-site.xml</code>, <code>mapred-site.xml</code>, and <code>hdfs-site.xml</code>. By default, file <code>mapred-site.xml</code> is not present. It can be created from <code>mapred-site.xml.template</code>.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-VBox ~ $ cd /usr/local/hadoop/etc/hadoop/
user1@user1-VBox /usr/local/hadoop/etc/hadoop $ sudo cp mapred-site.xml.template mapred-site.xml
</code></pre></div>
<p>Open each .xml file using your favorite editor, copy the text below for the corresponding file, paste it between the and tags, and save and exit the file.</p>
<p><strong>coresite.xml</strong>
<pre><strong><property></strong>
<strong><name></strong>hadoop.tmp.dir<strong></name></strong>
<strong><value></strong>/app/hadoop/tmp<strong></value></strong>
<strong><description></strong>A base for other temporary directories.<strong></description></strong>
<strong></property></strong>
<strong><property></strong>
<strong><name></strong>fs.default.name<strong></name></strong>
<strong><value></strong>hdfs://localhost:54310<strong></value></strong>
<strong><description></strong>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
URI's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The URI's authority is used to
determine the host, port, etc. for a filesystem.<strong></description></strong>
<strong></property></strong>
</pre>
<strong>yarn.xml</strong>
<pre><strong><property></strong>
<strong><name></strong>yarn.nodemanager.aux-services<strong></name></strong>
<strong><value></strong>mapreduce_shuffle<strong></value></strong>
<strong></property></strong>
<strong><property></strong>
<strong><name></strong>yarn.nodemanager.aux-services.mapreduce.shuffle.class<strong></name></strong>
<strong><value></strong>org.apache.hadoop.mapred.ShuffleHandler<strong></value></strong>
<strong></property></strong>
</pre>
<strong>mapred.xml</strong>
<pre><strong><property></strong>
<strong><name></strong>mapreduce.framework.name<strong></name></strong>
<strong><value></strong>yarn<strong></value></strong>
<strong></property></strong>
</pre>
<strong>hdfs.xml</strong>
<pre><strong><property></strong>
<strong><name></strong>dfs.replication<strong></name></strong>
<strong><value></strong>1<strong></value></strong>
<strong><description></strong>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.<strong></description></strong>
<strong></property></strong>
<strong><property></strong>
<strong><name></strong>dfs.namenode.name.dir<strong></name></strong>
<strong><value></strong>file:///home/hduser1/data/hdfs/namenode<strong></value></strong>
<strong></property></strong>
<strong><property></strong>
<strong><name></strong>dfs.datanode.data.dir<strong></name></strong>
<strong><value></strong>file:///home/hduser1/data/hdfs/datanode<strong></value></strong>
<strong></property></strong>
</pre>
<span style="text-decoration: underline;">Note</span>: <em>In the text above, the value of the namenode and datanode storage directories must be proper URIs, and therefore three forward slash ('/') characters are required, not just two (i.e., the third forward slash is not a typo).</em></p>
<h4>Format HDFS</h4>
<p>This step is similar to formatting any new storage (e.g., a USB personal drive) prior to use. But first switch to the Hadoop user account (<code>hduser1</code>)</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">user1@user1-VBox ~ $ su hduser1
hduser1@user1-VBox /home/user1 $ cd ~
hduser1@user1-VBox ~ $ /usr/local/hadoop/bin/hdfs namenode -format
</code></pre></div>
<p>If the formatting goes through properly the output will look something like this</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">hduser1@user1-VBox $ /usr/local/hadoop/bin/hdfs namenode -format
14/10/17 12:23:50 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = user1-VBox/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.5.1
STARTUP_MSG: classpath = /usr/local/hadoop-
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 2e18d179e4a8065b6a9f29cf2de9451891265cce; compiled by 'jenkins' on 2014-09-05T23:11Z
STARTUP_MSG: java = 1.8.0_25
************************************************************/
14/10/17 12:23:50 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
14/10/17 12:23:50 INFO namenode.NameNode: createNameNode [-format]
14/10/17 12:23:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-96f0c84f-4438-4118-b039-ff311731f47e
14/10/17 12:23:52 INFO namenode.FSNamesystem: fsLock is fair:true
14/10/17 12:23:54 INFO util.GSet: Computing capacity for map cachedBlocks
14/10/17 12:23:54 INFO util.GSet: VM type = 32-bit
14/10/17 12:23:54 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
14/10/17 12:23:54 INFO util.GSet: capacity = 2^19 = 524288 entries
14/10/17 12:23:54 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
14/10/17 12:23:54 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
14/10/17 12:23:54 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
14/10/17 12:23:54 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
14/10/17 12:23:54 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
14/10/17 12:23:54 INFO util.GSet: Computing capacity for map NameNodeRetryCache
14/10/17 12:23:54 INFO util.GSet: VM type = 32-bit
14/10/17 12:23:54 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
14/10/17 12:23:54 INFO util.GSet: capacity = 2^16 = 65536 entries
14/10/17 12:23:54 INFO namenode.NNConf: ACLs enabled? false
14/10/17 12:23:54 INFO namenode.NNConf: XAttrs enabled? true
14/10/17 12:23:54 INFO namenode.NNConf: Maximum size of an xattr: 16384
Re-format filesystem in Storage Directory /home/hduser1/data/hdfs/namenode ? (Y or N) Y
14/10/17 12:24:01 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1140521248-127.0.1.1-1413573841475
14/10/17 12:24:01 INFO common.Storage: Storage directory /home/hduser1/data/hdfs/namenode has been successfully formatted.
14/10/17 12:24:02 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid &gt;= 0
14/10/17 12:24:02 INFO util.ExitUtil: Exiting with status 0
14/10/17 12:24:02 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at user1-VBox/127.0.1.1
************************************************************/
hduser1@user1-VBox $
</code></pre></div>
<p>Start HDFS and Yarn.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">hduser1@user1-VBox ~ $ start-dfs.sh
hduser1@user1-VBox ~ $ start-yarn.sh
</code></pre></div>
<p>Verify that the expected Hadoop processes are running properly using the jps command.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">hduser1@user1-VBox ~ $ jps
5076 NodeManager
5382 Jps
4662 DataNode
4569 NameNode
4987 ResourceManager
4813 SecondaryNameNode
hduser1@user1-VBox ~ $
</code></pre></div>
<p>Validate the installation by running an example MapReduce job to compute the value of pi.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">hduser1@user1-VBox ~ $ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar pi 4 1000
</code></pre></div>
<h4>References</h4>
<p>[1] <a href="http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/">Running Hadoop on Ubuntu Linux (Single-Node Cluster)</a> by Michael G. Noll</p>
<p>[2] <a href="http://parambirs.wordpress.com/2014/05/20/install-hadoopyarn-2-4-0-on-ubuntu-virtualbox/">Install Hadoop/YARN 2.4.0 on Ubuntu (Virtual Box)</a> by Param Gyaan</p>
<p>[3] <a href="http://community.linuxmint.com/tutorial/view/1839">How to Completely Install Oracle JDK on Linux Mint 17</a></p>