Back in September 2019 while travelling in Japan, I just happened to come across details about a Datathon organised by Dementias Platform UK (DPUK) / Medical Research Council. I quickly submitted my application and found out in October 2019 that I had been selected.
Today’s the 4th of November and I’ll be heading over to the University of East Anglia tomorrow. It’s about a 5 hour journey door to door, and I’m certain it’s totally worth it.
The 3 Day Datathon runs from 6th – 8th November there, and I’m really excited about it.
I’ll be writing about my experience each day, so this post is part of a ‘Series’ if you like.
About the DPUK Datathon
The objective of the DPUK Datathon is to explore predicting dementia using statistical and machine learning techniques.
The idea’s to bring together experts from a variety of different backgrounds and let them play with Big Data to explore some pertinent research questions.
I’m pleased and proud to see that it’s part funded by the University of Exeter (my Alma Mater) as well as other stalwarts of erudition including Oxford, Cambridge, King’s College London, to name a few.
DPUK’s Robust Infrastructure
Apart from the interesting work lined up, I find the infrastructure of DPUK particularly striking. Especially in the context of the ethics and integrity of the data security. And the fact that I’ll be working with a rich, clean dataset.
Ethics & Integrity of Data Security
It’s pretty remarkable to see the amount of thought that’s gone in to protecting the integrity of the data we’ll be working with.
As expected of course, there’s no way for me (or any researcher) to identify who the data relates to specifically. In other words, there’s no individual identifiers and so I can’t see whose data I’m looking at.
It gets better though.
There is no way for us to export / extract/ save the data outside of the secure portal!
This means that although all of the data is aggregated, it still goes above and beyond in the context of ethics and integrity, by preventing anyone from downloading the data (for any other purpose).
You can learn more about DPUK’s Data Security here.
Cleaned Data – Every Data Scientist’s Dream
A large part of my PhD empirical research time goes into organising and cleaning my datasets (i.e., in “data wrangling”). While this task can be quite mundane, it’s incredibly important because a “garbage dataset” will produce “garbage analysis” (“garbage in, garbage out”).
But while most if not all researchers realise the importance of data wrangling, few will suggest that it’s an activity they enjoy.
DPUK has removed all that misery.
What we’ll have is a totally cleaned, organised, and ready to use dataset that we can just play with.
It’s every Data Scientist’s dream.
Why I’m doing this
You probably already know that my domain expertise is in Finance. So why am I taking part in a medical related Datathon?
Exposure to Statistical and Machine Learning Techniques outside of Finance
First, the focus is largely on the statistical and machine learning applications to answering research questions. It’s not so much about medical domain knowledge although that will undoubtedly help. The fact that it’s about the statistical side means that I transfer my existing quite easily (he says, optimistically).
Data for Good
Secondly, this is an opportunity for me to be a part of the Data for Good movement. I’ve recently also signed up to DataKind UK but the lack of funding means I can’t really take part in their programs as much as I’d like to. My participation in the The DPUK Datathon is fully funded by DPUK (accommodation, travel, dinner, etc).
Dementia is a deadly disease, and if my skills can be put towards helping reduce / alleviate / better predict this disease in any way whatsoever, then I’m only too happy to do it.
Working with the best and brightest
Third, and perhaps most importantly, it gives me the opportunity to work with some of the best and brightest minds in the country, arguably the world.
I can’t even fathom the amount of knowledge and perspective I’ll gain over these 3 days. But it’s not all about taking. I sincerely hope and trust that I’ll bring a lot to the table, too.
And that’s a wrap
That’s pretty much everything I wanted to talk about in this post. I’ll write about my experience of the Datathon each day and link to them below.