An overview of the Skin and Bone data

Brief description of the data and processing methods

Author

Sharon Howard

Published

7 April 2025

The Data

Download

Consists of six spreadsheets (.xlsx) listing the project data per collection, by person and by injury.

License: Creative Commons Attribution-ShareAlike 4.0 International (see the download page for citation details).

Methodology

The technical challenge is to extract and classify detailed evidence from more than 87000 descriptions of individuals, including physical characteristics, bodily infirmities and personal details. The language used to describe injuries in the data is highly varied across the datasets, and it can be difficult to classify and standardise the nature of injury.

The amount of processing required (and the amount and kind of information available) varies among the datasets. The osteological data is already highly structured, and the hospital data partly so. However, the physical descriptions in Digital Panopticon datasets often consist of undifferentiated and fragmentary language which can include various physical attributes (eg heights, scars, tattoos and disabilities).

The relevant descriptive language needs to be identified, delineated and classified, but the datasets are too large in most cases for it to be feasible to mark them up manually. Instead, the project builds on automated techniques developed for the Criminal Tattoos project.

Outline of the process:

tokenisation of the text (ie, break into individual linguistic units eg words, punctuation) and then classify each token using a dictionary prepared by the project team, to assign a type to each token: injury type, body part, body specifier (left, right, upper, back of), cause of injury [where possible], stop word (anything unrelated to an injury, such as boil, pockmark, or mole), separator.
a simple grammatical rule set is applied to the text in order to assign individual injuries to the correct locations on the body. The rules-based approach is augmented by training a spaCy relation extraction deep learning model.

The process is not perfect. The original text of the descriptions is retained in the injuries tables for reference, and anyone using the data is advised to take some time to understand the data and its limitations.

The collections

Osteoarchaeological data (OS)

Data from three London burial grounds:

St. Bride’s lower churchyard (1770-1849), c.300 individuals
Royal London Hospital (1825-1841), c.200 individuals
Payne Road/Bow Baptist (1816-1854), c.200 individuals

Notes:

This is fine-grained, systematically recorded data covering a range of the adult population. It’s already very structured so it’s easy to work with. However, it includes a relatively limited range of injuries (mainly fractures), constrained to those that can be observed on the skeleton. There is also limited background information (we don’t know exactly when people were buried). Both age and sex are determined from the skeleton itself.

Hospital Records (HP)

Admission Registers:

Middlesex (1760-64, 1771-88), c.6000 individuals
Royal London (1760, 1791, 1792, 1805), c.4000 individuals
Guy’s (1813-39), c.70000 individuals
St. Thomas’ (1773-6, 1781-9, 1790-9, 1800-9), c.30000 individuals

Notes:

The registers vary considerably both in terms of their survival and the amount of detail they recorded. In the best cases, there is valuable data on length of hospital stay and outcomes, as well as ages and occupations. But the largest registers have much less detail than this. Moreover, by its nature the data only records injuries that were severe enough to necessitate hospitalisation (and descriptions only record the specific injuries that occasioned hospitalisation). There is no “uninjured” population for comparison.

Criminal records (DP)

Three datasets from the Digital Panopticon:

Millbank Prison Register (1816-1828), c.500 individuals
Home Office Prison Licenses (1853-1889), c.6000 individuals
Metropolitan Police Habitual Criminal Register (1881-1901), c.55,000 individuals

Notes:

These can be extraordinarily detailed documents of bodily experience, all the more valuable because they have been linked to extensive biographical and criminal histories. But, as noted, the varied language of the descriptions is highly challenging for computational analysis and there will be many (though hopefully mostly minor) errors and omissions in classification, so the data should be used with care.

Resources

On the Digital Panopticon descriptions and a similar methodology see Tattoos.

The Hospital and Osteological data sources are described in detail in Madeleine Mant’s PhD thesis, Slips, Trips, Falls, and Brawls: Fractures of the Working Poor in London During the Long Eighteenth Century.