Do you love 'Data' Science? I mean, the Data part!
“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” ― Sir Arthur Conan Doyle, Sherlock Holmes
Hello,
Last week, We talked all about Artificial Intelligence (also Artifical Stupidity) which led me to think about the foundation of Data Science that’s the Data itself. I think, Data is the least appreciated entity in the Data Science Value chain. You might agree with me, If you do Data Science outside Competitive Platforms like Kaggle where Data given to you is what most of the Data Scientists dream about in their jobs.
Not sure if you are like me, But I find it extremely boring to write a bunch of Hive/SQL Queries to extract Data from multiple Data Tables and finally produce something that I can further channel into R/Python to building something meaningful or to be honest, build something cool - that I can brag about with my friends when I talk Data Science. I’m yet to find a Data Science conversation where Building Data for doing Science gets as much appreciation as the Science is appreciated.
“AI God fathers” have a good fan following but many of us know Fei-Fei Li whose (with her team) contribution of building the ImageNet for AI is invaluable.
“One thing ImageNet changed in the field of AI is suddenly people realized the thankless work of making a dataset was at the core of AI research. People really recognize the importance the dataset is front and center in the research as much as algorithms.” - Fei-Fei Li
Meanwhile, Venture Capitalists aren’t shying away from putting their money where Data is created and curated - Recently, silicon-valley startup Scale AI has hit the unicorn status. Scale AI’s about us page reads:
The Data Platform for AI
Scale AI has also open-sourced Datasets and That’s sweet.
Zalando that open-sourced Fashion-MNIST published a nice paper that listed out the steps they took to publish the dataset. There are also free tools like labelImg and makesense.ai to help you annotate images for a typical Image dataset. For NLP Annotation, BRAT is a nice free open-source tool. And, If you are planning for a pet project and don’t have the required dataset this tutorial by Mat Kelcey of counting bees on a rasp pi with a conv net would be a tremendous help.
That said, If you appreciate Data Science as much as you’d appreciate the beauty of a Ferrari or Lamborghini, then you might also have to remind you that car is only useful if you’ve got the oil in it which is your super-clean labelled Data that’s usable for Data science and Machine Learning.
If you enjoyed this edition, Hit + Reply and also share it with your friends!
Thank you for reading (also if you’re sharing),
Abdul Majed.
PS: Starting from this Newsletter, I’ve been thinking to add one section - keeping it very minimal - that’s not about Data Science. Because do we only want to talk about AUC or Neural Net Architectures all the time? Let me know by Hit Reply!
Entrepreneurship - Not Data Science
Gumroad’s Open Board Meeting
(Chris Zukowski ) I quit my job to make video games full time - https://imgur.com/gallery/CVq5bO8