Hi everyone-
It’s all go… US Presidential Elections, second waves, third tiers and second lockdowns, all while struggling to maintain some semblance of professionalism for the next Zoom call… Definitely time for ‘self-care’ via some selected data science reading materials!
Following is the November edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …
As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:
Industrial Strength Data Science November 2020 Newsletter
RSS Data Science Section
Covid Corner
Lockdowns are looming again across Europe as COVID-19 cases continue to rise. As always numbers, statistics and models are front and centre in all sorts of ways.
- Positive Covid case numbers are still rising- the New York Times is a useful resource for comparing case rates across countries and regions on a like for like basis (US local area data, UK local area data). We often think we are faring better than the US, but right now that is not the case: London currently has double the infection rate of New York.
- Back in September, we were told by Sir Patrick Vallance, the government’s Chief Scientific Advisor, that unless we took collective measures to stem the spread of the virus, the estimated trajectory “would lead to 200 deaths a day by mid-November”.
- Since then we have seen the various alert levels, tiers and associated restrictions implemented.
- Sadly, though, we passed 200 deaths a day by mid-October, ahead of the schedule laid out by Vallance.
- Clearly the measures we have attempted to implement so far have not been successful. Hence the recently announced lockdown… let’s hope it proves more effective.
- In a similar vein, it appears that while the “Eat Out to Help Out” program may well have provided significant economic benefit for the hospitality sector it may also have unintentionally helped drive the rise in Covid infections.
"Between 8 and 17% of the newly detected COVID-19 infection clusters
can be attributed to the scheme"
- One initiative we have been repeatedly told is critical to bringing the virus spread under control is our ‘Test and Trace’ program, into which a great deal of money has been spent.
- Indeed, we have certainly made great strides in the volume of tests being run, and on a per-capita basis are now doing better than many European countries.
- However, testing on its own does not necessarily help. An interdiscipinary team at UCL have put together this insightful dashboard highlighting the point. Leveraging data science best practice, they have broken the process down into different stages (Find, Test, Trace, Isolate, Support), defining key success metrics for each stage.
- Not only are the key metrics we can currently measure very weak (only 14% of close contacts were advised to isolate in the recent period), but we are not even capturing the data to measure certain pieces of the puzzle (we do not actually know how many of those advised to isolate do indeed do so although survey data indicates this may be as low as 20%).
- It’s clear that we are still learning about how the virus spreads. Originally we thought it was only spread by symptomatic individuals via surface transmission but now we know asymptomatic people can also spread the virus, and there is increasing consensus that understanding ‘aerosol transmission’ could be the key to slowing the spread.
- Clearly, numbers, statistics and logical analytics are front and centre in this crisis: Carlo Rovelli, the renowned Physicist, argues powerfully for far more widespread training in these fields.
In this uncertain world, it is foolish to ask for absolute
certainty. Whoever boasts of being certain is usually the least
reliable. But this doesn’t mean either that we are in the dark.
Between certainty and complete uncertainty there is a precious
intermediate space – and it is in this intermediate space that our
lives and our decisions unfold.
Committee Activities
We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.
As previewed in our last newsletter, and our recent release, we are excited to be launching a new initiative: AI Ethics Happy Hours. We are now working on organising the first event based on suggestions we have received.
The joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation are picking up again and we are actively involved in these.
Janet Bastiman recently gave a talk at the Minds Mastering Machines conference with the provocative title “Your testing sucks”, using lessons from space exploration to highlight the areas where people in Data Science and AI should apply testing thinking.
Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and has been active in with virtual events. Next up, on Wednesday November 4th, is “Beyond Accuracy: Behavioural Testing of NLP Models with CheckList“, by Marco Ribeiro, Senior Researcher at Microsoft Research. Videos are posted on the meetup youtube channel – and future events will be posted here.
Elsewhere in Data Science
Lots of non-Covid data science going on, as always!
Bias and more bias…
Bias, ethics and diversity continue to be hot topics in data science…
- An interesting twist in the facial recognition saga: “Activists turn facial recognition tools against the police”. Facial recognition models are now relatively widely available so the key piece of the puzzle is often the training data set.
- In perhaps a sign of things to come, BMW have released their AI Code of Ethics: the key with all these initiatives will be how well these noble principles are actually adhered to.
- Causal Bayesian Networks look like an elegant approach to identifying and correcting bias in Machine Learning model training.
By defining unfairness as the presence of a harmful influence from
the sensitive attribute in the graph, CBNs provides a simple and
intuitive visual representation for describing different possible
unfairness scenarios underlying a dataset
Science Data Science …
An exciting application of data science and machine learning is in enabling scientific research. In fact, Demis Hassabis, CEO and co-founder of Deep Mind, talks about this being the area of AI application that he is most excited about. In this excellent interview with Azeem Azhar, he discusses the foundations of Deep Mind, the systematic approach they have taken to generalising their AI breakthroughs, and specifically about applications in scientific research.
- The first (and most currently accessible) avenue to advancing scientific discovery is in leveraging machine learning best practice to parse through the increasingly sizeable volumes of data generated in scientific experimentation.
- A great example of this approach is in the Warwick University discovery of 50 exoplanets from previously overlooked data collected by NASA’s now-defunct Kepler space telescope.
- Similarly, machine learning now enables the identification and count of craters on Mars.
- The area with more potential long term impact is in leveraging machine learning to help drive the direction of new scientific research, similar to the way in which AlphaZero allowed Go players to explore previously unknown strategic options.
- Deep Mind had previously discussed their Alpha-Fold system, allowing exploration of potential protein structures.
- Recently they have released FermiNet, a novel Deep Learning architecture that allows scientists to explore the structure of complex molecules from first principles by estimating solutions to Schrödinger’s equation at scale.
Developments in Data Science…
As always, lots of new developments…
- An interesting new approach to training models with almost no data at all – “Less than One Shot” Learning using “soft labels”.
- A different way of reducing the amount of data required to train models, this time in reinforcement learning.
- What looks like an excellent new approach to tackling unbalanced multi category data sets with semi-supervised and self-supervised learning.
- We know that “Wisdom of the Crowd” does indeed work in certain situations – “swarms” takes this one step further by connecting the crowd together and allowing for feedback. The approach leverages one of nature’s best optimisation techniques based on honey bees.
Applications of Data Science…
And as always, lots of new applications as well…
- Deciphering lost languages at MIT – the intriguing achievement here is managing the task without needing advanced knowledge of the target language’s relation to other languages.
- Using news sentiment to predict macroeconomic indicators- we have seen this claim many times in the past, but often as a result of over fitting. However, the Bank of England’s research in this area does look promising – worth checking out the working paper here.
- A different approach to A/B testing for more dynamic environments – switchback testing.
AI Trends and Business
- Really interesting insight into how Apple is organised – no general managers, lots of specialists.
To create such innovations, Apple relies on a structure that centers
on functional expertise. Its fundamental belief is that those with
the most expertise and experience in a domain should have decision
rights for that domain.
Apple’s managers at every level, from senior vice president on down,
have been expected to possess three key leadership characteristics:
deep expertise that allows them to meaningfully engage in all the
work being done within their individual functions; immersion in the
details of those functions; and a willingness to collaboratively
debate other functions during collective decision-making
Having senior technical leaders who can “meaningfully engage in all the work being done within their function” is something we are big believers in for data science, and something that is lacking in many organisations.
- You would not normally expect a Venture Capital firm to be producing technical reviews, but Andreessen-Horowitz have done an impressive job of identifying the components of a modern data platform architecture.
- Wired highlights the increasing number of companies attempting to “implement AI” but failing to see the return.
“We’re seeing that this blending of humans and machines is where
companies are performing well”
Practical Projects
As always here are a few potential practical projects to while away the socially distanced hours:
- Machine Learning art experiments!
- “What colour is this?” Sounds easy, but maybe not the case…
- Machine Learning on an Arduino!
- Not the best recording, but an entertaining video tutorial for creating a model to generate tweets from your voice!
Updates from Members and Contributors
- Following on from his contribution last month, David Higgins has published (and is looking for feedback on) new guidelines for developing regulatory compatible medical AI devices. It is very relevant in the current climate and well worth a read.
- Adriano Koshiyama is organising what looks like an excellent (and free entry) event called TheAlgo 2020 Conference (thealgo.co), on November 12th. It is a joint initiative focused on AI and Disruptive Technologies between UCL AI Centre, HMRC, CDEI, and the Turing institute. The registration link is here.
Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:
– Piers
One thought on “November Newsletter”