Hello!
My name is Yukai Zou and I am an MRI Physicist at the University Hospital Southampton NHS Foundation Trust in the UK. I earned my PhD in Biomedical Engineering from Purdue University in the US.
I am currently pursuing the ACS Route 2 towards HCPC registration as a Clinical Scientist. Before taking on this role, I have worked at the University of Southampton as a postdoctoral research fellow for two years. I am very excited about the transition from academia to the NHS, and I’d like to see how I can help translate some of the research skills into clinical practice, as well as supporting the clinical research work at the UHS.
In my spare time, I enjoy cooking Asian cuisines, playing ukulele, running, swimming, and exploring different places by hiking and camping.

What’s this blog about?
“kaidsen” stands for “Kai’s Data Science Efficiency Notebook.” A few reasons behind this name:
- Kai is my common name.
- “dsen” stands for Data Sciences Efficiency Notebook which I started back in 2015.
- Kaidsen sounds similar to kaizen, meaning continuous improvement (改善).
I started this notebook back in 2015 when I was studying medical imaging at the University of California San Francisco. I took the online course “Data Sciences Specialization” at Coursera, which a good friend recommended to me. At that time, I gave this notebook a name:“Data Sciences Efficiency” - by all means, I believed those skills could at least help me analyze data more efficiently. I used R for the statistical analyses in my masters thesis, I built a Shiny app in my PhD thesis, and I programmed in R and Python to process large datasets at the ABCD Neurocognitive Challenge. Over the years, this notebook accumulated small bits of R functions, as well as some Python and bash scripts. I now decide to make these notes public in the hope that they serve useful resources for the community.
Inspirations
I love reading. I read a lot every day, and I often get inspired by thoughts from other data sciences enthusiasts. Some of my favorite blogs and podcast are R-bloggers, Simply Statistics, Not So Standard Deviations, and more recently Econometrics and Free Software. I have gained many useful tips about using R and best practices in data sciences techniques, and through these channels I keep myself updated about what is going on.
The Inconvenient Truth, from Kamil Bartocha
- Data is never clean.
- You will spend most of your time cleaning and preparing data.
- 95% of tasks do not require deep learning.
- In 90% of cases generalized linear regression will do the trick.
- Big Data is just a tool.
- You should embrace the Bayesian approach.
- No one cares how you did it.
- Academia and business are two different worlds.
- Presentation is key - be a master of Power Point.
- All models are false, but some are useful.
- There is no fully automated Data Science. You need to get your hands dirty.