February Newsletter

Hi everyone-

Well.. January seemed to fly by. 2021 has certainly started with a bang (Brexit!, Impeachment!, New President!, Vaccinations!) and the holidays seem an age ago. I hope you are surviving lockdown 3.0 as best as you can… maybe there is room in the long dark evenings for a few curated data science reading materials?

Following is the February edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners.

Industrial Strength Data Science February 2021 Newsletter

RSS Data Science Section

Covid Corner

I keep thinking we might be able to drop the ‘Covid Corner’ section from the newsletter, but sadly the pandemic is still very much alive. The vaccination roll-out in the UK does seem to be going well, however, with over 9m first dose vaccinations made (as of Feb 1st) which is great news.

"one side claims that the tests are more than 90% effective at what they do; the other side says they could be as low as 3%, depending on what you mean by “effective”."
  • Finally, this feels like a very exciting development. The recent breakthroughs in natural language processing (NLP) and language models (like BERT-2/3) are at heart based on understanding the likelihood of different sequences of letters and words, codified into word embeddings (vector representations). Applying this approach to other fields (remember chess?) feels very elegant, and the MIT researchers in this case have used the underlying gene sequences (‘letters’) of viruses to train their model. From this they are able to predict likely virus mutations using sequence data alone:
"The model achieved 0.85 AUC in predicting SARS-CoV-2 variants that were highly infectious and capable of evading antibodies."

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

There is still time to register for our upcoming fireside chat with none-other than Andrew Ng on February 10th. We are very excited for what is going to be a fantastic event: don’t miss out, sign up here.

As we previously announced we are looking forward to our first AI Ethics Happy Hour event – details to follow.

The joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation are picking up again and we are actively involved in these. We also hope to be posting our own version of a basic data science curriculum soon- will keep you posted.

Giles Pavey has been discussing what it takes to build world class data science teams.

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and continues to be very active in with virtual events. The next event is on 11th February where Manzil Zaheer, a research scientist at Google, will talk about Big Bird: Transformers for Longer Sequences. Videos are posted on the meetup youtube channel – and future events will be posted here.

Finally, we are really pleased to include a call for contributions to RSS 2021 Conference, 6-9 September in Manchester. The organisers are seeking submissions for contributed talks which can be on any topic related to statistics and data science (deadline April 6th).

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Big Government and AI
Governments around world mapping out grand AI plans…

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

AI in Healthcare
Increasing utilisation of AI and machine learning in healthcare…

  • Exciting announcement from the Korea Institute of Science and Technology who have developed a prostate cancer urine screening test using machine learning.
  • Interesting comment published in Nature discussing how recent applications of AI to ageing research are leading to the emergence of the field of longevity medicine.
  • We have seen a number of studies in recent times highlighting the power of deep learning techniques in medical imaging and the automatic assessment of resulting scans- this review article in nature assesses the overall gains over the last decade.
  • As the previous article alludes to, going from prototype to real world production in a healthcare setting is far from simple, and this article from Rachel Thomas of fast.ai highlights some of the underlying issues.
  • Interestingly, the FDA in the US has released an action plan focused on methods for approving AI and Machine Learning based applications in health care in the US.

Developments in Data Science…
As always, lots of new developments…

  • Fresh on the heels of GPT-3, OpenAI have released an amazing application, called DALL-E (Salvador Dali crossed with Pixar’s WALL-E…), a 12 billion parameter version of GPT-3 trained to generate images from text descriptions. You have to try this… Good summary here from MIT Technology Review.
“In the long run, you’re going to have models which understand both text and images. AI will be able to understand language better because it can see what words and sentences mean.”
  • Not to be outdone on the ‘my model has more parameters than your model’ stakes, Google recently announced their Switch Transformer Language Model with 1.6 trillion parameters.
  • Great summary, from Jeff Dean, head of Google AI, of Google’s research output in 2020 (over 800 publications) and what lies ahead for 2021. This is long, but well worth a read as it highlights the amazing breadth and depth of the output from the Google researchers.
"I’m particularly enthusiastic about the possibilities of building more general-purpose machine learning models that can handle a variety of modalities and that can automatically learn to accomplish new tasks with very few training examples"

How does that work?
A new section on understanding different approaches and techniques

Teams, people and production…
Still one of the biggest obstacles…

  • Interesting commentary from Gergely Orosz on the approach to motivating and empowering software engineers in Silicon Valley, very relevant also for Data Scientists and ML engineers.
  • What skills do you really need in your data team? Is it all about the models, or do you need more breadth, both on the business side, and engineering.
  • How do you scale a team at different stages of development? Useful advice here from Peter Gao.
  • If you want to put in place proper monitoring of your ML systems but aren’t quite ready for a full blown MLOps solution, how about giving this a try, from Jeremy Jordan?
  • A pretty bland ‘top x trends in data’ title, but some useful pointers on best practices in building out a a modern data stack

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

  • Adriano Soares Koshiyama highlights what looks like an excellent upcoming UCL webinar on AI in the Judicial System on Feb 25 at 1pm: “In this webinar we welcome Dr Pamela Ugwudike (University of Southampton, Alan Turing Institute) and Charles Kerrigan (CMS partner and global head of Fintech) to present their perspectives from academia and industry”. Register here.
  • Rafael Garcia-Navarro has been doing some impressive work in conversational ai, implementing on top of Metaflow (Netflix’s MLOps framework) – definitely worth a read.
  • Kevin O’Brien draws our attention to a great write-up on the Climate Modeling Alliance (CliMA) project and how they use Julia (“Meet the team shaking up climate models”). Also, don’t forget JuliaCon 2021 Wednesday 28th July to Friday 30th July 2021.

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Success! You're on the list.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

One thought on “February Newsletter

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: