Hi everyone-
Another month flies by- somehow lockdown days seem to go slowly but weeks disappear – and it’s time for the June edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity…
As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:
Industrial Strength Data Science June 2020 Newsletter
RSS Data Science Section
Covid Corner
We can’t not talk about COVID-19 and as always there is plenty of data science related themes to wade through.
- First of all there are increasing numbers of interactive charts and visualisations enabling granular slicing and dicing of the progression of the virus around the world.
- This COVID-19 infections tracker comes from the team that has led accuracy in modelling fatalities in the US for the past few weeks and is currently one of the models in use at the CDC. It provides a lot of granular and up to date detail on Rt around the world as well as some useful forecasts.
- For those who haven’t seen it, the Imperial College model is now publicly available with a nice summary of findings as well as source code.
- Although the use of numbers out of context for attention grabbing headlines is still prevalent, it has been good to see more thoughtful analysis cropping up. David Spiegelhalter , the President of the RSS in 2017-18 (and author of ‘The Art of Statistics’ which is a good read in these times), has been quite prominent in the press including:
- Putting the risk of death from Covid-19 in context (following on from earlier analysis here).
- Digging into the topic of viral load and whether or not children are different from adults
- As countries begin to step tentatively (and not so tentatively) out of lockdown, understanding in a more detailed way exactly what drives the virus spread becomes increasingly important. Statnews highlights new research that has dug into the initial outbreak in the US
- Finally on Covid, LSE hosted a recent online session titled “Data–driven Responses to COVID–19: opportunities and limitations” which covered a number of interesting and relevant topics.
Committee Activities
It has been a quieter time for committee members this month although we are playing an active role in joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation.
- There is still time to submit to NEURIPS, the conference on Neural Information Processing systems which Danielle Belgrave is organising.
- Magda Woods is writing a paper with her ex-BBC colleagues, trying to understand what is helping some companies thrive during the crisis and would love feedback from readers.
Elsewhere in Data Science
Lots of non-Covid data science going on, as always!
With a little more time at home on our hands (at least for some) we’ve come across some useful primers on relevant data topics:
- Understanding causality is generally high on many a data-scientist’s list, but often easier said than done. “A Survey of Learning Causality with Data: Problems and Methods” provides a useful summary of approaches and methods.
- In a similar “how does that really work” vein, Bayesian Optimisation is a topic that surfaces on a reasonably regular basis- this post does a nice job of talking through a practical application in hyper-parameter optimisation.
- I find self-supervised learning an elegant way of generating training data- this article gives some useful examples of this approach in action.
If you prefer your “brain-food” in audible form, Lex has had some fantastic conversations recently- they are long but well worth the time.
- His conversation with Steven Wolfram was an epic. Wolfram is the founder and CEO of Wolfram Research which produces Mathematica, Wolfram Alpha and Wolfram Language amongst other things. His background is in Physics although his work on Cellular Automata and computation brought him more public recognition.
- An interesting component of the discussion focused on general intelligence and the work that Wolfram has accomplished in pulling together and codifying the underlying semantic knowledge base that drives Wolfram Alpha (which apparently powers Siri and Alexa). Wolfram Language takes a high level abstracted approach but is certainly thought provoking and worth exploring.
- His conversation with Iliya Sutskever was very insightful. Sutskever is one of the founders of OpenAI and a co-author on the original AlexNet paper with Hinton, so ‘influential’ in Deep Learning to say the least!
- Some great topics covered including a definition of Deep Learning as “the geometric mean of physics and biology”
- A discussion on the “Double Descent” phenomenon in Deep Learning where model performance on a given data set first increases with model size (number of parameters), then decreases (as over-fitting kicks in), but then increases again! This is one of the drivers of the recently released GPT-3 NLP model, with 175 billion parameters… I definitely need to dig into this more as it’s never happened for me!
Is machine learning living up to the hype? There has been some recent commentary that progress in both machine learning research, and the commercial application of machine learning have not been delivering the purported benefits.
- This article in science digs into over 80 ML research papers from recent years and makes the case that the “eye catching advances in some AI fields are not real”.
- This arxiv paper highlights various “troubling Trends in Machine Learning Scholarship”.
- And this article digs into “why Artificial Intelligence is so useless for business”
A few more practical tips:
- This article re-visits Google’s original ‘tech debt in machine learning’ paper, and talks through what is still relevant to data science practitioners
- This piece is an excellent summary of the different methods of actually getting your machine learning model into production– well worth a read.
For those wanting a bit more of a hands-on project…
- This (OpenTPOD) must be the simplest way of creating your own deep-learning based object detection system from scratch!
- Similarly on object detection, if you want to get a little bit more “under the hood”, then facebook have open-sourced another interesting pytorch application, DE:TR. This makes use of Transformers which feel increasingly like the go to building block for Deep Learning architecture.
- How about bringing your cartoon characters to life with pose-animation from tensor-flow?
Updates from Members and Contributors
- Kevin O’Brien highlights the great work the R Forwards foundation is doing in promoting diversity and inclusion in the data science community:
- Some excellent event best practices for inclusion of deaf and disabled participants at conferences, with guidelines for presenters from Dr Liz Hare
- An example of a successful recent event organised by the R Forwards foundation to help expand the R community in South Africa.
- Ole Schulz-Trieglaff announces that Py Data Cambridge is now running online meetups every Wednesday- more info here
- Finally, Glen Wright Colopy asked to include the following:
- “In June, the American Statistical Association is sponsoring a set of weekly podcasts celebrating precision medicine research at the Statistical and Applied Mathematical Sciences Institute (SAMSI).
Highlights include (i) machine learning and mathematical modelling of wound healing, (ii) big data squared – combining brain imaging and genomics for Alzheimer’s studies, and (iii) innovative trial design and master trials. You can hear about these episodes as they come out by joining the mailing (https://www.podofasclepius.com/mail-list) or subscribing to the YouTube channel (https://www.youtube.com/channel/UCkEz2tDR5K6AjlKw-JrV57w)”
- “In June, the American Statistical Association is sponsoring a set of weekly podcasts celebrating precision medicine research at the Statistical and Applied Mathematical Sciences Institute (SAMSI).
Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:
And this feels like an appropriate way to conclude…
https://xkcd.com/2311/
– Piers
4 thoughts on “June Newsletter”