Examining Your Presence on Twitter with Python

My Evil The Following with absoluteBLACK’s direct mount oval ring.


The purpose of this post is to show how a sponsorship/marketing manager might track their athletes or brand ambassadors. The code we’re writing below can be used for many other applications such as tracking general trends across locales or HR insidiously monitoring if employees are discussing the company in a manner consistent with social media policies (just kidding!) Since we’re doing something specific I hope this post doesn’t get lost in the sea of yet another web scraping or twitter data mining post using yet another beginner’s abstracted away R or Python package.

Twitter helps me stay up to date with the #pydata community through following prominent contributors and hashtags, I also use twitter (and facebook) to see what famous athletes and brands are doing in the mountain bike world, they seem to get instant Instagram followers. I discover events and great places to visit that I wouldn’t have stumbled across in other ways. Most of these major brands sponsor professional athletes, and to a different extent, ambassadors that might be competitive amateurs or supporters of a local community such as trail builders and K-12 instructors. These companies do so to spread word of their brand, technology, and experience. One especially well known and sought after sponsorship program is from Patagona (ski patrol and surf competitors take note!)

There are plenty of general social media platform management companies out there like https://hootsuite.com/ and https://buffer.com/ that pair CRM tools with data analytics, but there aren’t many sports specific platforms that enable marketing managers to track and maximize the value of their athletes and sports marketing programs such as http://www.hookit.com/. These solutions differ in that they consider an athlete’s reach via their placement in race events and travel coverage, the latter being an intelligent metric as doing 100 races in your back yard isn’t the same as doing 10 in a different country.

I was picked as one of absoluteBLACK’s (a sweet component manufacturer out of England) ambassadors for 2016, and similar to other brand’s programs, they expect that their ambassadors use social media to share photos or videos of their products in a lifestyle or action shot accompanied by relevant tags. I thought it would be interesting to take a look at absoluteBLACK’s presence on Twitter by examining who is talking about them and what they are posting and from where. If you were in sponsorship or marketing, you could run this easy to configure script weekly and start examining your presence.

Note for fullscreen, click the gist link below this embedded view.

To get more value to your twitter account just visit https://themarketingheaven.com/.

Lending Club Data Analysis Revisited with Python

2.5 years ago I analyzed Lending Club’s issued loans data (yikes! I was using R back then!) . It was the most visited blog post on my site in 2013 through 2014. Today it’s still number 5. Reddit picked up my simple “35-hour work week with Python” post which is now #1:


Lending Club is the first peer-to-peer lending company to register its offerings as securities with the Securities and Exchange Commission (SEC). Their operational statistics are public and available for download. It has been a while since I’ve posted an end to end solution blog post and would like to replicate the post with a bit more sophistication in Python with the latest dataset from lendinglub.com. In summary, let’s examine all the attributes Lending Club collects on users and how they influence the interest rates issued.

Pure Python Decision Trees


By now we all know what Random Forests is. We know about the great off-the-self performance, ease of tuning and parallelization, as well as it’s importance measures. It’s easy for engineers implementing RF to forget about it’s underpinnings. Unlike some of it’s more modern and advanced contemporaries, descision trees are easy to interpret. A neural net might obtain great results but it is difficult to work backwards from and explain to stake holders as the weights of the connections between two neurons have little meaning on their own. Decision trees won’t be a great choice for a feature space with complex relationships between numerical variables, but it’s great for data with a simplier mix of numerical and categorical.

I recently dusted off one of my favorite books, Programming Collective Intelligence by Toby Segaran (2007), and was quickly reminded how much I loved all the pure python explanations of optimization and modeling. It was never enough for me to read about and work out proofs on paper, I had to implement something abstract in code to truly learn it.

I went through some of the problems sets and to my dismay, realized that the code examples were no longer hosted on his personal site. A quick google search revealed that multiple kind souls had not only shared their old copies on github, but even corrected mistakes and updated python methods.

I’ll be using some of this code as inpiration for an intro to decision trees with python.