The Data Science of Family History


Innovations in computing and data science have made it possible to delve deeper into our pasts online, with online genealogical sites offering an astonishing amount of historical information we could never have dreamed of having access to before the internet era: from genealogical sources to digitised newspapers to DNA testing. In this course, we'll examine how historical records, often stored in archives, are transformed into online datasets, accessible from anywhere in the world. We consider the reasons why they have been selected for digitisation, by whom, how they were transcribed and the challenges of creating searchable datasets. 

We’ll explore how different types of data automation and algorithms are applied to genealogical data, to understand how genealogical site 'hints' are generated. We’ll consider what genealogical DNA tests can reveal and how to evaluate ancestral connections and ethnic origins. Throughout, we'll discuss how each of these computational techniques can both enhance but also obscure the chance of finding our ancestors and heritage. And we’ll discuss what impact all these advances have on our own genealogical research, particularly in relation to identity protection and ethical concerns around big data. 

Programme details

Courses starts: 17 Apr 2024

Week 0: Course Orientation

Week 1: Genealogical techniques & sources

Week 2: Building datasets: what is data?

Week 3: Building datasets: digitisation and transcription

Week 4: Building datasets: databases and enrichment 

Week 5: Searching databases: rule-based algorithms

Week 6: Searching databases: machine learning approaches

Week 7: Genetic genealogy: ethnicity and reference populations

Week 8: Genetic genealogy: family connections through autosomal DNA testing

Week 9: Playing our part?: from personal to big data

Week 10: Future directions


Students who register for CATS points will receive a Record of CATS points on successful completion of their course assessment.

To earn credit (CATS points) you will need to register and pay an additional £10 fee per course. You can do this by ticking the relevant box at the bottom of the enrolment form or when enrolling online.

Coursework is an integral part of all weekly classes and everyone enrolled will be expected to do coursework in order to benefit fully from the course. Only those who have registered for credit will be awarded CATS points for completing work at the required standard.

Students who do not register for CATS points during the enrolment process can either register for CATS points prior to the start of their course or retrospectively from the January 1st after the current full academic year has been completed. If you are enrolled on the Certificate of Higher Education you need to indicate this on the enrolment form but there is no additional registration fee.


Description Costs
Course Fee £280.00
Take this course for CATS points £10.00


If you are in receipt of a UK state benefit, you are a full-time student in the UK or a student on a low income, you may be eligible for a reduction of 50% of tuition fees. Please see the below link for full details:

Concessionary fees for short courses


Dr Olivia Robinson

Dr Olivia Robinson is a social historian and genealogist, and currently project researcher on a historical big data project at the University of Copenhagen and National Archives Denmark. She holds a DPhil in History from Oxford University on the migration of women servants to Britain (1850-1939) and also runs her own family history consultancy.

Course aims

To introduce students to a number of data science techniques used by the genealogy industry and evaluate their impact on family history research. 

Course objectives:

  • To familiarise students with how original sources are transformed into digital datasets. 
  • To introduce students to data science techniques used by online genealogy providers.
  • To equip students with skills to identify both advantages and disadvantages of data science applications in genealogy.

Teaching methods

Students will watch a weekly lecture (approx. one hour) followed by an hour of facilitated interactive discussion based on the themes of the lecture. Students will be encouraged to participate by drawing on their own experiences of family history research.

Learning outcomes

By the end of the course students will be expected to:

  • recognise how genealogical and historical records are transformed into online data;
  • understand how data science techniques are used by online genealogy platforms;
  • critically evaluate online sources and genetic genealogical results.

Assessment methods

Coursework will consist of two reports or essays: one during the course and one at the end.

Students must submit a completed Declaration of Authorship form at the end of term when submitting your final piece of work. CATS points cannot be awarded without the aforementioned form - Declaration of Authorship form


We will close for enrolments 7 days prior to the start date to allow us to complete the course set up. We will email you at that time (7 days before the course begins) with further information and joining instructions. As always, students will want to check spam and junk folders during this period to ensure that these emails are received.

To earn credit (CATS points) for your course you will need to register and pay an additional £10 fee per course. You can do this by ticking the relevant box at the bottom of the enrolment form or when enrolling online.

Please use the 'Book' or 'Apply' button on this page. Alternatively, please complete an enrolment form (Word) or enrolment form (Pdf).

Level and demands

The course is open to anyone. No prior knowledge is required but the students will be encouraged to share their own experiences of online family history research during the interactive sessions.

Most of the Department's weekly classes have 10 or 20 CATS points assigned to them. 10 CATS points at FHEQ Level 4 usually consist of ten 2-hour sessions. 20 CATS points at FHEQ Level 4 usually consist of twenty 2-hour sessions. It is expected that, for every 2 hours of tuition you are given, you will engage in eight hours of private study.

Credit Accumulation and Transfer Scheme (CATS)