February Newsletter

Hi everyone-

Well, January seemed to flash by in the blink of an eye- certainly the holiday period seems a long time ago already. All is not lost- the Winter Olympics seems to have crept up on us and is just about to start which will no doubt provide some entertainment and distraction…. as I hope will some thought provoking data science reading materials.

Following is the February edition of our Royal Statistical Society Data Science and AI Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity. Check out our new ‘Jobs!’ sectionan extra incentive to read to the end!

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science February 2022 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

The committee is busy planning out our activities for the year with lots of exciting events and even hopefully some in-person socialising… Watch this space for upcoming announcements.

We do in fact have a couple of spaces opening up on our committee (RSS Data Science and AI Section) – if you are interested in learning more please contact James Weatherall

Anyone interested in presenting their latest developments and research at the Royal Statistical Society Conference? The organisers of this year’s event – which will take place in Aberdeen from 12-15 September – are calling for submissions for 20-minute and rapid-fire 5-minute talks to include on the programme.  Submissions are welcome on any topic related to data science and statistics.  Full details can be found here. The deadline for submissions is 5 April.

Our very own Giles Pavey took part in a panel debate, exploring the role of AI in creating trustworthy digital commerce – see recording here

Meanwhile, Martin Goodson continues to run the excellent London Machine Learning meetup and is very active in with events. The next talk will be tomorrow (February 2nd) where Sebastian Flennerhag, research scientist at DeepMind, will give a talk entitled “Towards machines that teach themselves“. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

  • With the anniversary of the January 6th attack on the US Capital, there is commentary in the mainstream press about misinformation and how algorithms can both exacerbate and help curb the problem – see here in the Washington Post for example.
"The provocative idea behind unrest prediction is that by designing an AI model that can quantify variables — a country’s democratic history, democratic “backsliding,” economic swings, “social-trust” levels, transportation disruptions, weather volatility and others — the art of predicting political violence can be more scientific than ever."
  • We’ve posted previously about bias in recruiting and hiring algorithms – so it’s welcome to see the Data and Trust Alliance‘s publication of their Algorithmic Bias Safeguards for Workforce: criteria and education for HR teams to evaluate vendors on their ability to detect, mitigate, and monitor algorithmic bias in workforce decisions
  • There was an interesting recent recommendation from the UK Law Commission that users of self driving cars should have immunity from a wide range of motoring offences. This is increasingly relevant, as the various self-driving car providers move towards commercial propositions- Waymo (Google/Alphabet’s self-driving unit), for instance, recently announced its first commercial autonomous trucking customer (interesting background on how Waymo does what it does here)
"While a vehicle is driving itself, we do not think that a human should be required to respond to events in the absence of a transition demand (a requirement for the driver to take control). It is unrealistic to expect someone who is not paying attention to the road to deal with (for example) a tyre blow-out or a closed road sign. Even hearing ambulance sirens will be difficult for those with a hearing impairment or listening to loud music.”
"People were more likely to roll with a positive suggestion than a negative one— participants also often found themselves in a situation where they wanted to disagree, but were only offered expressions of agreement. The effect is to make a conversation go faster and more smoothly" ... 
... "This technology (combined with our own suggestibility) could discourage us from challenging someone, or disagreeing at all. In making our communication more efficient, AI could also drum our true feelings out of it, reducing exchanges to bouncing “love it!” and “sounds good!” back at each other"

Developments in Data Science…
As always, lots of new developments on the research front and plenty of arXiv papers to read…

  • The research theme around making models more ‘efficient’ (whether that’s in terms of power consumption, model size, data usage etc) continues:
    • Focusing on reducing computational cost for low power network-edge usage, ‘Mobile-Former‘ breaks all sorts of records
    • Interesting research into reducing/simplifying inputs to neural net models looks promising … and they said feature engineering was dead;-)
    • More progress on ‘few-shot learning’ (making accurate predictions with limited examples) – this time with ‘HyperTransformers
    • Active Learning is an elegant approach to improving sample efficiency by focusing efforts in the most productive areas of the data space – however, watch out for outliers
  • Then some more random research directions…
“However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games such as Go.”
"Over the last several decades, I've witnessed a lot of change in the fields of machine learning (ML) and computer science. Early approaches, which often fell short, eventually gave rise to modern approaches that have been very successful. Following that long-arc pattern of progress, I think we'll see a number of exciting advances over the next several years, advances that will ultimately benefit the lives of billions of people with greater impact than ever before"

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

  • What seems like our now monthly update from ETH Zürich’s Robotic Systems Lab this time ‘robots learning to hike‘ (cue robot-dog interaction videos…).
  • In order for robots to take action, they have to understand the world around them, a far from trivial task: a couple of useful developments in this space using large language models to understand the relationship between objects and relevant actions, from MIT and also from Carnegie-Mellon/ Google Brain
“In an effort to solve this problem, MIT researchers have developed a model that understands the underlying relationships between objects in a scene. Their model represents individual relationships one at a time, then combines these representations to describe the overall scene. This enables the model to generate more accurate images from text descriptions, even when the scene includes several objects that are arranged in different relationships with one another.”
“It's a monumental shift,” says Jahmy Hindman, Deere’s chief technology officer, of the new machine, revealed at the 2022 Consumer Electronics Show in Las Vegas. “I think it's every bit as big as the transition from horse to tractor.”
"'Is it safe to walk downstairs backwards if I close my eyes?'

GPT-3: Yes, there is nothing to worry about. It’s safe because the spiral stairs curve outwards, it will make your descent uncomfortable.

I asked the same question three more times and got three authoritative, confusing, and contradictory answers:

GPT-3: That depends. Do you have a TV?
GPT-3: No, it is not safe.
GPT-3: Yes, it is safe to walk downstairs backwards if you close your eyes."
“You’re playing a pot that’s effectively worth half a million dollars in real money,” he said afterward. “It’s just so much goddamned stress.”

How does that work?
A new section on understanding different approaches and techniques

  • For those with a programming background, vectorisation may come naturally, but it can be hard to think through if you are new to it … it does speed things up though, so worth digging into: good python tutorial here.
  • We are a section of the Royal Statistical Society, so it’s good to see a bit of stats once in a while- ‘Six Statistical Critiques That Don’t Quite Work
  • If you’ve not come across Streamlit, you should definitely check it out – very quick and easy way to create apps in python.
  • JAX is a relatively new but very scalable framework for numerical methods (bayesian sampling etc) developed at DeepMind – definitely worth exploring
  • It’s always good to understand at a low level how different modelling approaches work. If you’re unclear on the fundamentals of neural networks, this is an excellent introductory guide from Simon Hørup Eskildsen (love that it’s called ‘Napkin Math’!)
"In this edition of Napkin Math, we'll invoke the spirit of the Napkin Math series to establish a mental model for how a neural network works by building one from scratch"
  • I know, we’ve had a fair few ‘this is how Transformers work’ posts over the last few months… but they are so central to many of the image processing and NLP improvements over the last few years that checking out another good one couldn’t hurt..
"It was in the year 2017, the NLP made the key breakthrough. Google released a research paper “Attention is All you need” which introduced a concept called Attention. Attention helps us to focus only on the required features instead of focusing on all features. Attention mechanism led to the development of the Transformer and Transformer-based models.."
  • Finally, variational autoencoders... unsupervised learning is an area of data science that can sometimes feel neglected, and variational autoencoders are a fantastic tool in the unsupervised learning arsenal, leveraging the power of Deep Learning.
  • For anyone interested in learning more about how DeepMind does what it does, I definitely recommend Hannah Fry‘s podcast- the last episode, ‘A breakthrough unfolds‘ tells the story well of how they went from winning at Go to predicting protein structures…

Practical tips
How to drive analytics and ML into production

"I’m not a management expert, but I did try really hard during my first year managing and I’ve since spent time digesting the experience. My hope is that others will find a few of the things I learned useful when they’re at the start of their own management journey.”

Bigger picture ideas
Longer thought provoking reads – lean back and pour a drink!

"Isaac Newton apocryphally discovered his second law – the one about gravity – after an apple fell on his head. Much experimentation and data analysis later, he realised there was a fundamental relationship between force, mass and acceleration. He formulated a theory to describe that relationship – one that could be expressed as an equation, F=ma – and used it to predict the behaviour of objects other than apples. His predictions turned out to be right (if not always precise enough for those who came later).

Contrast how science is increasingly done today."
"These schemas were the subject of a competition held in 2016 in which the winning program was correct on only 58% of the sentences — hardly a better result than if it had guessed. Oren Etzioni, a leading AI researcher, quipped, 'When AI can’t determine what ‘it’ refers to in a sentence, it’s hard to believe that it will take over the world.'”
"Repeatedly tap on a box of marbles or sand and the pieces will pack themselves more tightly with each tap. However, the contents will only approach its maximum density after a long time and if you use a carefully crafted tapping sequence. But in new experiments with a cylinder full of dice vigorously twisted back and forth, the pieces achieved their maximum density quickly. The experiments could point to new methods to produce dense and technologically useful granular systems, even in the zero gravity environments of space missions."

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:

Covid Corner

Although there are still some Covid restrictions in place, the UK Government has eased a number of rules: to be fair, it’s quite hard to keep track. Omicron is far from gone though…

Updates from Members and Contributors

  • Kevin OBrien highlights a couple of excellent events:
    • The inaugural SciMLCon (of the Scientific Machine Learning Open Source Software Community) will take place online on Wednesday 23rd March 2022. SciMLCon is focused on the development and applications of the Julia-based SciML tooling -with expansion into R and Python planned in the near future.
    • JuliaCon which will be free and virtual with the main conference taking place Wednesday 27th July to Friday 29th July 2022. (Julia is a high performance, high-level dynamic language designed to address the requirements of high-level numerical and scientific computing, and is becoming increasingly popular in Machine Learning, IOT, Robotics, Energy Trading and Data Science)
  • Harald Carlens launched a very useful Discord server to help facilitate easier matchmaking for teams in the competitive ML community spanning across Kaggle and other platforms (AIcrowd/Zindi/DrivenData/etc), to go along with the mlcontests.com website. There are over 250 people on the server already and the audience is growing daily. More info here
  • Prithwis De contributed as chair at the 6th International Conference on Data Management, Analytics & Innovation, held during January 14-16, 2022.
  • Sarah Parker calls out the work of Professor Simon Maskell, (Professor Autonomous Systems, and Director of the EPSRC Centre for Doctoral Training in Distributed Algorithms at University of Liverpool), who has developed a Bayesian model used by the UK Government to estimate the UK’s R number – the reproduction number – of COVID -19. More info here.


A new section highlighting relevant job openings across the Data Science and AI community (let us know if you have anything you’d like to post here…)

  • Holisticai, a startup focused on providing insight, assessment and mitigation of AI risk, has a number of relevant AI related job openings- see here for more details
  • EvolutionAI, are looking for a machine learning research engineer to develop their award winning AI-powered data extraction platform, putting state of the art deep learning technology into production use. Strong background in machine learning and statistics required
  • AstraZeneca are looking for a Data Science Training Developer – more details here
  • Cazoo is looking for an experienced Principal Data Scientist to lead technical development of a wide range of ML projects – more details here (I’m biased… but this is an amazing job for the right person 😉 )

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: