2 min read

A Python post

Hello Python.

Over the last 18 months, I have adopted Python as my primary tool for data science work. While I still like R, I now think that Python is better. It takes a bit more time to get used to if you are not a software engineer, but the effort to learn it is worth it. While R is known to have better support for statistical models as usually researchers implement any new statistical methods in R, the support for data engineering and machine learning is significantly better in Python (in my opinion).

Also, over last few years, the R community has been taken over by the craze for ‘tidy’. While I think that many packages from that universe are quite good, it has now almost become an unwritten rule in the R community that unless you love and cheer for tidy, your work will not be considered important or not recognized. There are political consequences of trying to oppose the biggest cheerleaders of the community too. Anyway, in short, Python is the way to go for me in the near future.

I started this blog around 3 years back and have been generating it using the R package blogdown. I have also been a huge fan of Yihui Xie ever since I discovered his work and listened to a couple of his interviews. While my language of choice will be Python, I will continue to use this R package for future content creation.

The rest of this post is some very simple Python code to check I can create a random pandas dataframe and plot it’s columns using matplotlib in the Rmarkdown document which was used to create this post.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.DataFrame({'a': np.random.randn(100), 'b': np.random.randn(100)})
print(data.head())
##           a         b
## 0  0.558796  1.999457
## 1 -0.413915 -0.705748
## 2 -0.435213 -1.061820
## 3  0.664194  0.888226
## 4  0.214548  0.393927