About

Hello!

My name is Yukai Zou and I am an MRI Physicist at the University Hospital Southampton NHS Foundation Trust in the UK. I earned my PhD in Biomedical Engineering from Purdue University in the US.

I am currently pursuing the ACS Route 2 towards HCPC registration as a Clinical Scientist. Before taking on this role, I have worked at the University of Southampton as a postdoctoral research fellow for two years. I am very excited about the transition from academia to the NHS, and I’d like to see how I can help translate some of the research skills into clinical practice, as well as supporting the clinical research work at the UHS.

In my spare time, I enjoy cooking Asian cuisines, playing ukulele, running, swimming, and exploring different places by hiking and camping.

with parents at san francisco twin peaks
With my parents at Twin Peaks, San Francisco, October 2015

What’s this blog about?

“kaidsen” stands for “Kai’s Data Science Efficiency Notebook.” A few reasons behind this name:

  • Kai is my common name.
  • “dsen” stands for Data Sciences Efficiency Notebook which I started back in 2015.
  • Kaidsen sounds similar to kaizen, meaning continuous improvement (改善).

I started this notebook back in 2015 when I was studying medical imaging at the University of California San Francisco. I took the online course “Data Sciences Specialization” at Coursera, which a good friend recommended to me. At that time, I gave this notebook a name:“Data Sciences Efficiency” - by all means, I believed those skills could at least help me analyze data more efficiently. I used R for the statistical analyses in my masters thesis, I built a Shiny app in my PhD thesis, and I programmed in R and Python to process large datasets at the ABCD Neurocognitive Challenge. Over the years, this notebook accumulated small bits of R functions, as well as some Python and bash scripts. I now decide to make these notes public in the hope that they serve useful resources for the community.

Inspirations

I love reading. I read a lot every day, and I often get inspired by thoughts from other data sciences enthusiasts. Some of my favorite blogs and podcast are R-bloggers, Simply Statistics, Not So Standard Deviations, and more recently Econometrics and Free Software. I have gained many useful tips about using R and best practices in data sciences techniques, and through these channels I keep myself updated about what is going on.

The Inconvenient Truth, from Kamil Bartocha

  1. Data is never clean.
  2. You will spend most of your time cleaning and preparing data.
  3. 95% of tasks do not require deep learning.
  4. In 90% of cases generalized linear regression will do the trick.
  5. Big Data is just a tool.
  6. You should embrace the Bayesian approach.
  7. No one cares how you did it.
  8. Academia and business are two different worlds.
  9. Presentation is key - be a master of Power Point.
  10. All models are false, but some are useful.
  11. There is no fully automated Data Science. You need to get your hands dirty.