September Newsletter

Hi everyone-

I don’t know about you, but that didn’t feel particularly August-like…. I miss the sun! Perhaps September will save the summer, together with some inspiration from the Paralympics … How about a few curated data science materials for perusing during the lull in the wheelchair rugby final?

Following is the September edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity … We are continuing with our move of Covid Corner to the end to change the focus a little.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science September 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

Thank you all for taking the time to fill in our survey responding to the UK Government’s proposed AI Strategy We are working on a series of posts digging into the results which we hope will be thought provoking.

This year’s RSS Conference is almost here (Manchester from 6-9 September, register here), with some great keynote talks from the likes of Hadley Wickham, Bin Yu and Tom Chivers. There is online access to over 40 hours of content at the conference covering a wide variety of topics. The full list of the online content can be found here. We really hope to see you all there, particularly at “Confessions of a Data Scientist” (11:40-13:00 Tuesday, 7 September), chaired by Data Science Section committee member Louisa Nolan.  

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and is very active in with events. The next talk is on September 7th when Thomas Kipf, Research Scientist at Google Research in the Brain Team in Amsterdam, will discuss “Relational Structure Discovery“. Videos are posted on the meetup youtube channel – and future events will be posted here.

Many congratulations to Martin and the team at for winning the Leading Innovators in Data Extraction Award during the FinTech Awards 2021!

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

"The fact that diagnostic models recognize race in medical scans is startling. The mystery of how they do it only adds fuel to worries that AI could magnify existing racial disparities in health care"
  • The Stanford Institute for Human-Centered Artificial Intelligence released a comprehensive review of the opportunities and risks of what it calls “Foundation Models” – these are models (such as BERT, DALL-E, and GPT-3) that are trained on “broad data at scale and are adaptable to a wide range of downstream tasks”
    • The research paper is a weighty tome (available here) but definitely worth a look
    • A good review can be found here
"They create a single point of failure, so any defects, any biases which these models have, any security vulnerabilities . . . are just blindly inherited by all the downstream tasks"
  • Of course the models and algorithms could be perfect, but still cause harm if they are not solving the right problem, or the outputs are not used in the right way
    • Motherboard reports that police are apparently attempting to have evidence generated from gunshot-detecting AI system altered
    • And a short but well reasoned piece in defence of algorithms:
"These algorithms aren’t “mutant” in any meaningful sense – their outcomes are the inevitable consequence of decisions made during their design"

Developments in Data Science…
As always, lots of new developments…

  • All sorts of activity in the reinforcement learning/robotics space this month:
“As far as I know, this is an entirely unprecedented level of generality for a reinforcement-learning agent"
  • As always, lots of research is going on in the deep learning architecture space:
  • Similarly investigation into methods that learn from smaller data sets continues
    • Researchers at Facebook, PSL Research and NYU have developed an elegant unsupervised pre-training method called VICReg that attempts to minimise issues of variance (identical representations for different inputs), invariance (dissimilar representations for inputs that humans find similar) and covariance (redundant parts of a representation)- this shows great promise for aiding more efficient use of pre-training and data augmentation
    • This paper also gives a good survey of data augmentation methods for Deep Learning

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

"If we intervene early, the treatments can kick in early and slow down the progression of the disease and at the same time avoid more damage"
"Another method that we found to be effective was the use of unsupervised self-training. We prepared a set of 100 million satellite images from across Africa, and filtered these to a subset of 8.7 million images that mostly contained buildings. This dataset was used for self-training using the Noisy Student method, in which the output of the best building detection model from the previous stage is used as a ‘teacher’ to then train a ‘student’ model that makes similar predictions from augmented images."

How does that work?
A new section on understanding different approaches and techniques

"ML is notoriously bad at this inverse causality type of problems. They require us to answer “what if” questions, what Economists call counterfactuals. What would happen if instead of this price I’m currently asking for my merchandise, I use another price?"

Practical tips
How to drive analytics and ML into production

"Analytics isn’t primarily technical. While technical skills are useful, they’re not what separate average analysts from great ones."

Bigger picture ideas
Longer thought provoking reads

If you tell me a story and I say, ‘Oh, the same thing happened to me,’ literally the same thing did not happen to me that happened to you, but I can make a mapping that makes it seem very analogous. It’s something that we humans do all the time without even realizing we’re doing it. We’re swimming in this sea of analogies constantly.
"There’s a slightly humorous stereotype about computational complexity that says what we often end up doing is taking a problem that is solved a lot of the time in practice and proving that it’s actually very difficult"

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:

All of the images in this post were synthesized by a combination of several machine learning models, directed by text that I provided, VQGAN for generation, and CLIP for directing the image to match the text.

Covid Corner

Still lots of uncertainty on the Covid front… vaccinations keep progressing in the UK, which is good news, but we still have very high community covid case levels due to the Delta variant…

“In the end, many hundreds of predictive tools were developed. None of them made a real difference, and some were potentially harmful.”

Updates from Members and Contributors

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

One thought on “September Newsletter

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: