Thoughts on NLP's Rapid Growth as a super popular domain in Machine Learning
NLP - Natural Language Processing has seen tremendous growth in the last couple of years. I believe there are certain factors that might have caused this uptick in NLP and this edition NullData Newsletter discusses that.
Before this, The domain that was seeing an uprising was CV - Computer Vision but suddenly NLP started to rise. The start difference between these two domains is the Data Availability. NLP which deals with Text (typically Natural Language Text) is quite vast and available almost everywhere. While for CV, We need Image data whose volume isn’t as same as Text.
Say, you see an FB post with an Image, It’s highly possible that you’d have seen at least a 3x more Text posts and Images with Texts.
In Short, Text Data is every where so they make themselves a good candidate for any further exploration.
Rise of Transformers
No, This isn’t Michael Bay’s Transformers. The paper “Attention is All you Need” introduced a new time of Neural Architecture, Transformers in 2017 and as of today the paper has got totally ~19K Citations which is a testament to the high correlation of rise of Transformers and rise of NLP. This also gave rise to a family of BERT models which suddenly started finding its way in a lot of Applications.
Hugging Face, Rasa, et al
Open Source NLP Companies
Open Source Company - It’s a word you wouldn’t often hear because most open source projects aren’t companies and most companies aren’t primarily open source projects. But thanks to the rise of NLP and its close ties with projects like Hugging Face, Rasa and so on - which are proper companies but also heavily invested in the open source ecosystem. I’m not a big fan of startups raising fund but I was super happy when I saw Hugging Face and Rasa raising funds because this will lead to this new breed of Open Source Company that will lead to more sustainable open source projects while a lot of people working for it get paid well, which I believe is fair.
Having these startups in the NLP space means, New Models got more visibility, easier access, better documentation, strong community, NLP Ecosystem, training, and tutorials. These reasons further fuelled Applied NLP. Some research work never gets into code until somebody actually spends time to make it accessible and these companies and their contributors do that heavy lifting and ultimately makes it easier for anyone to do Cutting-Edge State of the Art NLP in just a few python 🐍 code starting with `pip`.
All the above are good reasons for the NLP ecosystem to thrive. But there’s a very big difference between CV and NLP is Actual Business Case that most organizations care about. I can build as many as fun CV Models using ImageNet data but organizations and analytics consulting CoEs struggled to find use-cases with CV. This is what mostly happens when you try to fit technology into business just for the sake of making presentations with “AI” mentioned on it.
On the other hand, Applied NLP projects managed to fit nicely into Business Cases with ROI. Every company almost has got some sort of Text Data (Natural language or Not). Starting from everyone’s favorite - Sentiment Analysis to Topic Modelling or advanced applications like Semantic Searches or Content Recommendations - everything had a Business Case attached to it and it was easy to convert the benefits of these Applied NLP projects into the KPIs that most Businesses care about (of course, along with bragging “AI” on their slides).
With some ROI in-sight, Data in Hand, proven Business Cases, growing open source products, thriving ecosystem fuelled by continued Research, The question is why wouldn’t anyone pick up NLP? After all,
NLP is all you need!
If you made it through this, My ❤️-y thanks for reading this post. If you found this useful, Please share it with your friends and let me know your thoughts. Feedbacks fuel me.
Hands-on Applied NLP Tutorials