Earlier today, I noticed a tweet from well known R community member Jozef Hajnala. The tweet was about Springer releasing around 65 books related to data science and machine learning for free to download as PDFs.
In this article, we will look at various options for encoding categorical features. We will also present R code for each of the encoding techniques. Categorical feature encoding is an important data processing step required for using these features in many statistical modelling and machine learning algorithms.
I am working on a Shiny application which allows the user to upload data, do some analysis and processing on each variable in the data, and finally use the processed variables to build a statistical model.
If you search for the phrase “fastest man on earth” in Google, chances are that it will return the answer “Usain Bolt”. It certainly does so for me, even though the results might be different if Google decides to personalize the results for you.
Whenever we start working with data with which we are not familiar, our first step is usually some kind of exploratory data analysis. We may look at the structure of the data using the str function, or use a tool like the RStudio Viewer to examine the data.