When our Sr. Data Scientists aren't teaching the intensive, 12-week bootcamps, they're working on a variety of other projects. This monthly blog series tracks and discusses some of their recent activities and accomplishments.
In our November edition of the Roundup, we shared Sr. Data Scientist Roberto Reif's excellent blog post on The Importance of Feature Scaling in Modeling. We're excited to share his next post now, The Importance of Feature Scaling in Modeling Part 2.
"In the previous post, we demonstrated that by normalizing the features used in a model (such as Linear Regression), we can more accurately obtain the optimum coefficients that allow the model to best fit the data," he writes. "In this post, we will go deeper to analyze how a method commonly used to extract the optimum coefficients, known as Gradient Descent (GD), is affected by the normalization of the features."
Reif's writing is incredibly detailed as he eases the reader through the process, step by step. We highly recommend you take the time to read it through and learn a thing or two from a gifted instructor.
Another of our Sr. Data Scientists, Vinny Senguttuvan, wrote an article that was featured in Analytics Week. Titled The Data Science Pipeline, he writes on the importance of understanding a typical pipeline from start to finish, giving yourself the ability to take on an array of responsibility, or at the very least, understand the entire process. He uses the work of Senthil Gandhi, Data Scientist at Autodesk, and his creation of the machine learning system Design Graph, as an example of a project that spans both the breadth and depth of data science.
In the post, Senguttuvan writes, "Senthil Gandhi joined Autodesk as Data Scientist in 2012. The big idea floating in the corridors was this. Tens of thousands of designers use Autodesk 3D to design products ranging from gadgets to cars to bridges. Today anyone using a text editor takes for granted tools like auto-complete and auto-correct. Features that help the users create their documents faster and with less errors. Wouldn’t it be fantastic to have such a tool for Autodesk 3D? Increasing the efficiency and effectiveness of the product to that level would be a true game-changer, putting Autodesk, already the industry leader, miles ahead of the competition."
Read more to find out how Gandhi pulled it off (and for more on his work and his approach to data science, read an interview we conducted with him last month).
Data Science Weekly recently featured a blog post from Sr. Data Scientist Seth Weidman. Titled The 3 Tricks That Made AlphaGo Zero Work, Weidman writes about DeepMind's AlphaGo Zero, a program that he calls a "shocking breakthrough" in Deep Learning and AI within the past year.
"...not only did it beat the prior version of AlphaGo — the program that beat 17-time world champion Lee Sedol just a year and a half earlier — 100–0, it was trained without any data from real human games," he wries. "Xavier Amatrain called it 'more [significant] than anything…in the last 5 years' in Machine Learning."
So, he asks, how did DeepMind do it? His post provides that answer, as he gives an idea of the techniques AlphaGo Zero used, what made them work, and what the implications for future AI research are.
Sr. Data Scientist David Ziganto created Linear Regression 101, a three-part blog series starting with The Basics, proceeding to The Metrics, and rounding out with Assumptions & Evaluation.
Ziganto describes linear regression as "simple yet surprisingly powerful." In these three instructional posts, he aims to "give you a deep enough fluency to effectively build models, to know when things go wrong, to know what those things are, and what to do about them."
We think he does just that. See for yourself!
What were Metis Sr. Data Scientists up to last month? See here.