The last day of the DPUK Datathon. Did we find answers to our questions? Did we finally get some results? Yes we did, but I can’t share what precisely unfortunately.
NOTE: this post was written ‘as is’, while thinking out loud to help keep things authentic. Please excuse and pardon any grammatical and typographical errors, as well as possible incoherence in the structure.
On the last day of the DPUK Datathon, we were told about some terms and conditions which meant we can’t ‘publish’ details about what we found.
Apparently, the terms meant that we can see and use the datasets available, and conduct analysis on them. But not share information about the results we may or may not have found.
The final push
With the data finally fully ‘cleaned’, it was ready for some final ‘processing’. This is the last step before we can conduct our embryotic analysis.
Creating the ‘final’ variables
This last stage involved creating some variables of interest using the available data.
For example, if we wanted to measure the wealth of an individual in their childhood, we might need to ‘estimate’ it. That’s because we might not have access to the actual wealth an individual had during that time.
‘Estimating’ the wealth can be as crude, simplistic, or complicated as one likes of course. For example, one might look at the number of people in a household vs. the number of bedrooms.
If a family of 4 are staying in a 1 bedroom house, then it may well be the case that they’re not very wealthy. It’s a total assumption of course, and does not account for the real wealth they may have in any way whatsoever.
I’m not saying that we measured the wealth, or that we measured it this way. I’m just saying that that’s one possible way one might do it.
Finalising the outcome variables
In the end, we made a collective decision to look at cases of dementia as well as cognition. Using the expertise within the team, we created 6-7 different ‘outcome’ variables reflecting whether or not certain unidentifiable individuals had dementia and / or cognition related effects.
These variables were then merged with the independent variables to create the final set of usable data for our analysis.
Evaluating the impact of early experiences on cognition
We ran several univariate logistic and logit regressions before running one ‘final’ one with all the independent variables.
This was followed by running a couple of OLS regressions on some of the cognition measures since they’re more continuous than binary / categorical variables.
It’s a real shame that I’m not permitted to share any more than this.
We might have found something.
We might not have.
Our team knows, and so do the other teams that participated.
It’s important to note that regardless of whether we found anything or not, the level of analysis is pretty basic in the scheme of things.
I mean, we literally only had 2.5 days to do everything.
The last 0.5 of the 3 Day Datathon was spent in presenting our work to all the teams.
Presenting the work
This was really great because we could see how everyone’s fared, what people have learnt, and how similar our trajectories ended up being.
Every single team faced the identical issues in terms of dealing with the data.
We were all overwhelmed by the sheer volume – tens of thousands of variables!
It was nice to see how the results and methodologies varied, depending on whether teams had taken:
- Data driven approaches to selecting ‘features’,
- Hypotheses driven approaches to selecting ‘features’, or
- Combing both data driven and hypotheses driven approaches.
Final reflections
Overall, I’d say quite confidently and happily that this was a pretty incredible experience.
Some of the problems we faced with the data are things that I’ve never seen while conducting research in Finance. These are also problems that I’ll likely never even see in Finance, simply because the datasets are totally different by design.
Gaining this exposure is invaluable, and it’s really broadened my knowledge and awareness of the challenges with data, and how we can do better to manage our data more efficiently and effectively.
Unquestionably, the best part of the entire datathon was to be surrounded by some of the brightest minds in the country. And work with them in exploring some really interesting challenges.
It’s inspiring and remarkable to see 40-50 total strangers – all intelligent experts in their respective fields – coming together under one roof, and tackling problems together.
I’ve seen how our research methodologies and ethos differ, and how they’re similar, too.
While the technical aspects may well be different in some regards, it’s clear that we’re almost identical in our objectives.
We want to do good research, and take pride in solving problems using our skills and abilities. Collaborating with one another, and contributing individually too.
I’m very grateful to have been part of a fantastic team. A pleasure working with Nicola, James, Tim, Ollie, Shea, Joyce, and Travis.
While we weren’t creative with our team name (“DS1” for DataScience1), we certainly were creative in our work and outputs.
I am grateful to Dementia Platform UK / Medical Research Council for organising the DPUK Datathon, and for fully funding my place on it.
Leave a Reply