We thought it could be useful (and fun) to pick the collective brains of our Data Science Section committee members (as well as those of our impressive array of subscribers and followers) and put together a monthly newsletter. This will undoubtedly be biased but will hopefully surface materials that we collectively feel is interesting and relevant to the data science community at large.
So, without further ado, here goes our first attempt, creatively titled…
RSS Data Science Section
Industrial Strength Data Science Feb 2020 Newsletter
To give this some vague attempt at structure, we thought we would roughly break the newsletter down into three sections: Section and Member Activities; Posts We Like; Upcoming Events
Section and Member Activities
Our very own Section Chair, Martin Goodson, has been at his thought-provoking best, wading into the deep-learning vs semantic/symbolic learning debate and taking on the illustrious Gary Marcus . Either way GPT2 is still pretty impressive!
On a similar theme, Richard Pugh presented on the impact of data science in the pharmaceutical industry
Posts We Like
It is easy to assume there is always a right way and a wrong way to do data science, and certainly in many instances some approaches are objectively better than others. However, we all know that often it is far more nuanced than non-practitioners might assume- here’s an opinionated guide to Machine Learning we found interesting
There has been some amazing progress in NLP over the last few years, with the previously mentioned GPT2 from Open AI bringing an impressively powerful model to anyone’s hands. This is an entertaining read giving some practical tips on utilising GPT2 in python.
Google of course are ever-present in this space and recently made a big announcement of their own
Many of the technical skills you learn in academia are useful in the ‘real-world’ but others don’t translate very well. Some useful pointers from David Dale on transitioning from Academia to Industry and Business
Regardless of your views on Facebook as a product, they employ some pretty impressive data scientists and produce some pretty impressive work (e.g. Prophet is great if you’ve not come across it). Reproducibility in machine learning is an increasingly important topic, and is surprisingly (or not so to those who do it…) difficult. While it is is key in academia in order to build on the foundations of others, it is also crucial in an industrial setting to make sure you have complete audit trails and can reproduce decisions made in the past. This piece from the Facebook AI group provided some interesting commentary
Finally, understanding why a machine learning model produces a given output is also an increasingly hot topic. Even though fundamentally the multi-dimensional nature of the underlying models makes it very complex and hard to “boil down” to a simple explanation, the field of ‘model explainability’ is looking to do so, and we found this a useful primer on the topic
This meetup on Feb 12th on detecting violent propaganda could be interesting
And this looks very useful on Feb 28th – London AI and Deep Learning on Operational AI and best coding practices
The open source data collection event next week (“Into the Light” Feb 5th) hosted by the economist looks like it could be interesting
That’s it for now- tell us what you think? We will aim to get a new one out every month and would love to include commentary from followers and subscribers.
If you liked this, do please send on to your friends- we are looking to build a strong committee of data science practitioners- and sign up for future updates here