Mérida, Yucatán

Monumento a la Patria at the End of Paseo Montejo

Merida is a beautiful unique city with the styling and diversity that an average American tourist would expect out of cities in western Europe. One of my most memorable experiences during my weeks bushwhacking in the Yucatan was cruising along the beautiful wide tree-lined streets of Paseo de Montejo during the Sunday Bici-Ruta. The city closes a series of streets to all automobile traffic every Sunday. This strip was modeled after the Champs de Elisye and showcases the mansions of the 1800’s.

The long car-less Paseo Montejo. What a dream if this could be everyday.

Sutton’s Temporal-Difference Learning

A critical aspect of research is the reproduction of previously published results. Yet most will find reproduction of research challenging since important parameters needed to reproduce results are often not stated in the papers. I’ve noticed nn the past 5 years there has been a sort of catharsis regarding the lack of reproducibility [1][2][3]. This isn’t an issue for wetlab science alone [4]. The obvious benefit of reproduction is to aid in your own understanding of the results. This then enables one to extend and compare new contributions to existing publications.

As a grad student working on Reinforcement Learning problems I struggled with the intuition behind Sutton’s Learning to Predict by the Methods of Temporal Differences paper. In this post we’ll reproduce the paper’s  “A Random-Walk” example (Page 19 Section 3.2). I recommend at least skimming the paper and/or ch. 6 & 12 of Sutton’s latest Reinforcement Learning textbook (available for free). At minimum it would help if the reader has a basic understanding of Value-Functions and Q-Learning.

Temporal-Difference (TD) algorithms work without an explicit model and learn from the experience and outcomes of iterating over multiple episodes (or sequences). TD Learning is similar to Monte Carlo methods, however TD can learn from individual steps without needing the final result. These are among other differences:

Monte Carlo methods Temporal Difference learning
MC must wait until the end of the episode before the return is known. TD can learn online after every step and does not need to wait until the end of episode.
MC has high variance and low bias. TD has low variance and some decent bias.
MC does not exploit the Markov property. TD exploits the Markov property.

Rather than computing the estimate of a next state, TD can estimate n-steps into future. In the case of TD-λ , we use lambda to set the myopicness of our reward emphasis.  The value of λ can be optimized to for a performance/speed tradeoff. The λ parameter is also called the trace decay parameter, with 0 ≤ λ ≤ 1. The higher the value, the longer lasting the traces. In this case, a larger proportion of credit from a reward can be given to more distant states and actions. λ = 1 is essentially Monte Carlo.

Topic Modeling Amazon Reviews

Adapted from Biel 2011

I found Professor Julian McAuley’s work at UCSD when I was searching for academic work identifying the ontology and utility of products on Amazon. Professor McAuley and his students have accomplished impressive work inferring networks of substitutable and complementary items. They constructed a browseable product graph of related products and discovered topics or ‘microcategories’ that are associated with product relationships to infer networks of substitutable and complementary products. Much of this work utilizes topic modeling, and as I’ve never applied it in academia or work, this blog will be a practical intro to Latent Dirichlet Allocation (LDA) through code.

More broadly what can we do with and what do we need to know about LDA?

  • It is an Unsupervised Learning Technique that assumes documents are produced from a mixture of topics
  • LDA extracts key topics and themes from a large corpus of text
  • Each topic is a ordered list of representative words (Order is based on importance of word to a Topic)
  • LDA describes each document in the corpus based on allocation to the extracted topics
  • Many domain specific methods to create training datasets
  • It is easy to use for exploratory analysis

We’ll be using a subset (reviews_Automotive_5.json.gz) of the 142.8 million reviews spanning May 1996 – July 2014 that Julian and his team have compiled and provided in a very convenient manner on their site.