How to handle new labels in Categorical data

(Naveen Honest Raj) #1

Every Machine Learning Enthusiast knew that we should convert categorical data to numerical data before creating a model.
And many of us are so much familiar with Scikit’s LabelEncoder which can do that mapping of categorical data to numerical data. But it does not provide a right way to handle newly occurring categories.

Here’s my approach. I usually, pickle down a list that will map the categories to names. Like,
Consider, we have a column that has unique values of ["action", "horror", "thriller"]. So I pickle dump the unique values. When I need to map it to ID (i.e., numbers), I will pickle load it and find the index.
GenreList.index(input) => where GenreList is the pickle loaded list.
So when I see a new category, I append it to the list and pickle dump it again. So now my list will be ["action", "horror", "thriller", "romance"].

But this is not an efficient approach. So to the enthusiasts out there, please do share your approach of handling these scenario.

P.S: I hope my approach might be helpful in small scale codes.