tpp schema🔗
Available on backends: TPP
This schema defines the data (both primary care and externally linked) available in the OpenSAFELY-TPP backend. For more information about this backend, see "SystmOne Primary Care".
from ehrql.tables.tpp import (
addresses,
apcs,
apcs_cost,
appointments,
clinical_events,
clinical_events_ranges,
ec,
ec_cost,
emergency_care_attendances,
ethnicity_from_sus,
household_memberships_2020,
medications,
occupation_on_covid_vaccine_record,
ons_deaths,
opa,
opa_cost,
opa_diag,
opa_proc,
open_prompt,
patients,
practice_registrations,
sgss_covid_all_tests,
vaccinations,
wl_clockstops,
wl_openpathways,
)
many rows per patient
addresses🔗
Geographic characteristics of the home address a patient registers with a practice. Each row in this table is one registration period per patient. Occasionally, a patient has multiple active registrations on a given date. The postcode from the address is mapped to an Output Area, from which other larger geographic representations can be derived (see various ONS publications for more detail).
Tip
To group rounded IMD ranks by quintile:
imd = addresses.for_patient_on("2023-01-01").imd_rounded
dataset.imd_quintile = case(
when((imd >=0) & (imd < int(32844 * 1 / 5))).then("1 (most deprived)"),
when(imd < int(32844 * 2 / 5)).then("2"),
when(imd < int(32844 * 3 / 5)).then("3"),
when(imd < int(32844 * 4 / 5)).then("4"),
when(imd < int(32844 * 5 / 5)).then("5 (least deprived)"),
otherwise="unknown"
)
-
address_id
🔗
integer
-
Unique address identifier.
-
start_date
🔗
date
-
Date patient moved to address.
-
end_date
🔗
date
-
Date patient moved out of address.
-
address_type
🔗
integer
-
Type of address:
- 0 - Permanent
- 1 - Temporary
-
3 - Correspondence only
-
Possible values:
0
,1
,3
-
rural_urban_classification
🔗
integer
-
Rural urban classification:
- 1 - Urban major conurbation
- 2 - Urban minor conurbation
- 3 - Urban city and town
- 4 - Urban city and town in a sparse setting
- 5 - Rural town and fringe
- 6 - Rural town and fringe in a sparse setting
- 7 - Rural village and dispersed
-
8 - Rural village and dispersed in a sparse setting
-
Always >= 1 and <= 8
-
imd_rounded
🔗
integer
-
Index of Multiple Deprivation (IMD) rounded to the nearest 100, where lower values represent more deprived areas.
- Always >= 0, <= 32800, and a multiple of 100
-
msoa_code
🔗
string
-
Middle Layer Super Output Areas (MSOA) code.
- Matches regular expression:
E020[0-9]{5}
- Matches regular expression:
-
has_postcode
🔗
boolean
-
Indicating whether a valid postcode is recorded for the patient.
-
care_home_is_potential_match
🔗
boolean
-
Indicating whether the patient's address matched with a care home, using TPP's algorithm.
-
care_home_requires_nursing
🔗
boolean
-
Indicating whether the patient's address matched with a care home that required nursing.
-
care_home_does_not_require_nursing
🔗
boolean
-
Indicating whether the patient's address matched with a care home that did not require nursing.
-
for_patient_on(date)
🔗
-
Return each patient's registered address as it was on the supplied date.
Where there are multiple registered addresses we prefer any which have a known postcode (though we never have access to this postcode) as this is used by TPP to cross-reference other data associated with the address, such as the MSOA or index of multiple deprevation.
Where there are multiple of these we prefer the most recently registered address and then, if there are multiple of these, the one with the longest duration. If there's stil an exact tie we choose arbitrarily based on the address ID.
View method definition
spanning_addrs = addresses.where(addresses.start_date <= date).except_where( addresses.end_date < date ) ordered_addrs = spanning_addrs.sort_by( case(when(addresses.has_postcode).then(1), otherwise=0), addresses.start_date, addresses.end_date, addresses.address_id, ) return ordered_addrs.last_for_patient()
many rows per patient
apcs🔗
Admitted Patient Care Spells (APCS) data is provided via the NHS Secondary Uses Service.
This table gives core details of spells.
Each row is an in-hospital spell: a period of continuous care within a single trust.
Refer to the OpenSAFELY documentation on the APCS data source and the GitHub issue discussing more of the background context.
-
apcs_ident
🔗
integer
-
Unique identifier for the spell used across the APCS tables.
- Never
NULL
- Never
-
admission_date
🔗
date
-
The admission date of the hospital provider spell.
-
discharge_date
🔗
date
-
The date of discharge from a hospital provider spell.
-
spell_core_hrg_sus
🔗
string
-
The core Healthcare Resource Group (HRG) code for the spell according to the derivations made by NHS Digital prior to import to the National Commissioning Data Repository (NCDR). HRGs are used to assign baseline tariff costs.
-
admission_method
🔗
string
-
Code identifying admission method. Refer to APCS data source documentation for details of codes.
-
primary_diagnosis
🔗
ICD-10 code
-
Code indicating primary diagnosis. This is not necessarily the primary reason for admission, and could represent an escalation/complication of initial reason for admission.
-
secondary_diagnosis
🔗
ICD-10 code
-
Code indicating secondary diagnosis. This is a single code giving the first listed secondary diagnosis, but there may other secondary diagnoses listed in the
all_diagnoses
field below. -
all_diagnoses
🔗
string
-
List of all diagnoses as ICD-10 codes.
Note that the codes are not quite in the standard ICD-10 format in that they omit the dot character e.g. instead of
I80.1
it will be writtenI801
.The codes are arranged in clusters separated by commas, with each cluster separated by two pipe characters (
||
). These separators may or may not be surrounded by spaces. For example:||E119 ,J849 ,K869 ,M069 ,Z824 ,Z867 ||I801 ,I802 ,N179 ,N183
The significance of this clustering is not yet clear.
This field can be queried using the
contains
method. This uses simple substring matching to find a code anywhere inside the field. For example, to match the codeN17.1
(Acute renal failure with acute cortical necrosis) you could use:apcs.where(apcs.all_diagnoses.contains("N171"))
You can take advantage of the hierarchical structure of ICD-10 by searching the just the prefix of a code. For example to match all N17 (Acute renal failure) codes you could use:
apcs.where(apcs.all_diagnoses.contains("N17"))
-
all_procedures
🔗
string
-
List of all procedures as OPCS-4 codes.
Note that the codes are not quite in the standard OPCS-4 format in that they omit the dot character e.g. instead of
W23.2
it will be writtenW232
.The codes are arranged in clusters separated by commas, with each cluster separated by two pipe characters (
||
). These separators may or may not be surrounded by spaces. For example:||E851,T124,X403||Y532,Z921
The significance of this clustering is not yet clear.
This field can be queried using the
contains
method. This uses simple substring matching to find a code anywhere inside the field. For example, to match the codeW23.2
(Secondary open reduction of fracture of bone and extramedullary fixation HFQ) you could use:apcs.where(apcs.all_procedures.contains("W232"))
-
days_in_critical_care
🔗
integer
-
Number of days spent in critical care. This is counted in number of days (or part-days) not the number of nights as per normal "length of stay" calculations. Note the definition of critical care may vary between trusts.
-
patient_classification
🔗
string
-
Refer to APCS data source documentation for details.
many rows per patient
apcs_cost🔗
Admitted Patient Care Spells (APCS) data is provided via the NHS Secondary Uses Service.
This table gives details of spell cost.
Each row is an in-hospital spell: a period of continuous care within a single trust.
Note that data only goes back a couple of years.
-
apcs_ident
🔗
integer
-
Unique identifier for the spell used across the APCS tables.
- Never
NULL
- Never
-
grand_total_payment_mff
🔗
float
-
The grand total payment for the activity (
Net_SLA_Payment + Tariff_MFF_Payment
) where SLA = service level agreement, i.e. all contractual payments which is national tariff for the type of activity plus any additional payments minus any applicable deductions. MFF = Market Forces Factor, a geography-based cost adjustment). -
tariff_initial_amount
🔗
float
-
The base national tariff.
-
tariff_total_payment
🔗
float
-
The total payment according to the national tariff.
-
admission_date
🔗
date
-
The admission date of the hospital provider spell.
-
discharge_date
🔗
date
-
The date of discharge from a hospital provider spell.
many rows per patient
appointments🔗
Appointments in primary care.
Warning
In TPP this data comes from the "Appointment" table. This table has not yet been well characterised, so there are some issues around how to interpret findings from it. The data contains records created when an appointment is made with a GP practice, but may not capture absolutely all GP/patient interactions, for example it's uncertain whether an ad-hoc call to a patient would be included. There are also duplicate events in the table that we need to better understand.
As a consequence, if you try to use the appointment table, you will see warnings when running your code locally, and failures when the GitHub action tests your code. If you need access to the appointments data, please speak to your OpenSAFELY co-pilot. We will be considering projects on a case by case basis until it can enter the normal stable pool of data.
A very important caveat for this data: there are some circumstances where
historical appointment records will be incomplete, for example when a patient
moves from a practice using a different EHR provider, or when a practice changes
EHR provider. If your study could be negatively affected by such missing data,
it may be important to use the
practice_registrations.spanning_with_systmone()
method to identify patients which have a suitably continuous practice
registration during the study period.
Some further investigation of the appointments data in TPP can be found in this King's fund report.
And you can find out more about the associated database table in the short data report. It shows:
- Date ranges for
booked_date
,start_date
, andseen_date
- Row counts by month for
booked_date
andstart_date
- The distribution of lead times (
start_date - booked_date
) - Row counts for each value of
status
To view it, you will need a login for OpenSAFELY Jobs and the Project Collaborator or Project Developer role for the project. The workspace shows when the code that comprises the report was run; the code itself is in the appointments-short-data-report repository on GitHub.
Tip
Querying this table is similar to using Cohort Extractor's
patients.with_gp_consultations
function. However, that function filters by
the status of the appointment. To achieve a similar result with this table:
appointments.where(
appointments.status.is_in([
"Arrived",
"In Progress",
"Finished",
"Visit",
"Waiting",
"Patient Walked Out",
])
)
-
booked_date
🔗
date
-
The date the appointment was booked
-
start_date
🔗
date
-
The date the appointment was due to start
-
seen_date
🔗
date
-
The date the patient was seen
-
status
🔗
string
-
The status of the appointment
- Possible values:
Booked
,Arrived
,Did Not Attend
,In Progress
,Finished
,Requested
,Blocked
,Visit
,Waiting
,Cancelled by Patient
,Cancelled by Unit
,Cancelled by Other Service
,No Access Visit
,Cancelled Due To Death
,Patient Walked Out
- Possible values:
many rows per patient
clinical_events🔗
Each record corresponds to a single clinical or consultation event for a patient.
Each event is recorded twice: once with a CTv3 code, and again with the equivalent SNOMED-CT code. Each record will have only one of the ctv3_code or snomedct_code columns set and the other will be null. This allows you to query the table using either a CTv3 codelist or SNOMED-CT codelist and all records using the other coding system will be effectively ignored.
Note that event codes do not change in this table. If an event code in the coding system becomes inactive, the event will still be coded to the inactive code. As such, codelists should include all relevant inactive codes.
Detailed information on onward referrals is not currently available. A subset of referrals are recorded in the clinical events table but this data will be incomplete.
many rows per patient
clinical_events_ranges🔗
Each record corresponds to a single clinical or consultation event for a patient,
as presented in clinical_events
, but with additional fields regarding the event's
numeric_value
.
Warning
Use of this table carries a severe performance penalty and should only be done so if the additional fields it provides are neccesary for a study.
These additional fields are:
- any comparators (if present) recorded with an event's
numeric_value
(e.g. '<9.5') - the lower bound of the reference range associated with an event's
numeric_value
- the upper bound of the reference range associated with an event's
numeric_value
-
date
🔗
date
-
snomedct_code
🔗
SNOMED-CT code
-
ctv3_code
🔗
CTV3 (Read v3) code
-
numeric_value
🔗
float
-
lower_bound
🔗
float
-
The lower bound of the reference range associated with an event's numeric_value
-
upper_bound
🔗
float
-
The upper bound of the reference range associated with an event's numeric_value
-
comparator
🔗
string
-
If an event's numeric_value is returned with a comparator, e.g. as '<9.5', then this column contains that comparator
- Possible values:
~
,=
,>=
,>
,<
,<=
- Possible values:
-
consultation_id
🔗
integer
-
ID of the consultation associated with this event
many rows per patient
ec🔗
Emergency care attendances data — the Emergency Care Data Set (ECDS) — is provided via the NHS Secondary Uses Service.
This table gives core details of attendances.
Refer to the OpenSAFELY documentation on the ECDS data source and the GitHub issue that discusses more of the background context.
-
ec_ident
🔗
integer
-
Unique identifier for the attendance used across the EC tables.
- Never
NULL
- Never
-
arrival_date
🔗
date
-
The date the patient self presented at the accident & emergency department, or arrived in an ambulance at the accident & emergency department.
-
sus_hrg_code
🔗
string
-
The core Healthcare Resource Group (HRG) code derived by sus+, used for tariff application.
- Matches regular expression:
[a-zA-Z]{2}[0-9]{2}[a-zA-Z]
- Matches regular expression:
many rows per patient
ec_cost🔗
Emergency care attendances data is provided via the NHS Secondary Uses Service.
This table gives details of attendance costs.
-
ec_ident
🔗
integer
-
Unique identifier for the attendance used across the EC tables.
- Never
NULL
- Never
-
grand_total_payment_mff
🔗
float
-
The grand total payment for the activity (
Net_SLA_Payment + Tariff_MFF_Payment
) where SLA = service level agreement, i.e. all contractual payments which is national tariff for the type of activity plus any additional payments minus any applicable deductions. MFF = Market Forces Factor, a geography-based cost adjustment). -
tariff_total_payment
🔗
float
-
The total payment according to the national tariff.
-
arrival_date
🔗
date
-
The date the patient self presented at the accident & emergency department, or arrived in an ambulance at the accident & emergency department.
-
ec_decision_to_admit_date
🔗
date
-
The date a decision to admit was made (if applicable).
-
ec_injury_date
🔗
date
-
The date the patient was injured (if applicable).
many rows per patient
emergency_care_attendances🔗
Emergency care attendances data is provided via the NHS Secondary Uses Service.
This table gives details of attendances.
Note that there is a limited number of diagnoses allowed within this dataset, and so will not match with the range of diagnoses allowed in other datasets such as the primary care record.
-
id
🔗
integer
-
Unique identifier for the attendance used across the EC tables.
- Never
NULL
- Never
-
arrival_date
🔗
date
-
The date the patient self presented at the accident & emergency department, or arrived in an ambulance at the accident & emergency department.
-
discharge_destination
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the intended destination of the patient following discharge from the emergency care department.
-
diagnosis_01
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_02
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_03
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_04
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_05
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_06
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_07
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_08
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_09
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_10
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_11
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_12
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_13
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_14
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_15
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_16
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_17
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_18
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_19
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_20
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_21
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_22
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_23
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
-
diagnosis_24
🔗
SNOMED-CT code
-
The SNOMED CT concept ID which is used to identify the patient diagnosis. Note that only a limited subset of SNOMED CT codes are used; see the NHS Data Model and Dictionary entry for emergency care diagnosis.
one row per patient
ethnicity_from_sus🔗
This finds the most frequently used national ethnicity code for each patient from the various SUS (Secondary Uses Service) tables.
Specifically it uses ethnicity codes from the following tables:
APCS (In-patient hospital admissions)
EC (A&E attendances)
OPA (Out-patient hospital appointments)
Codes are as defined by "Ethnic Category Code 2001" — the 16+1 ethnic data categories used in the 2001 census: https://www.datadictionary.nhs.uk/data_elements/ethnic_category.html
Codes beginning Z ("Not stated") and 99 ("Not known") are excluded.
Where there is a tie for the most common code the lexically greatest code is used.
-
code
🔗
string
-
First character of recorded ethncity code (national code): https://www.datadictionary.nhs.uk/data_elements/ethnic_category.html
- Possible values:
A
,B
,C
,D
,E
,F
,G
,H
,J
,K
,L
,M
,N
,P
,R
,S
- Possible values:
one row per patient
household_memberships_2020🔗
Inferred household membership as of 2020-02-01, as determined by TPP using an as yet undocumented algorithm.
many rows per patient
medications🔗
The medications table provides data about prescribed medications in primary care.
Prescribing data, including the contents of the medications table are standardised across clinical information systems such as SystmOne (TPP). This is a requirement for data transfer through the Electronic Prescription Service in which data passes from the prescriber to the pharmacy for dispensing.
Medications are coded using dm+d codes. The medications table is structured similarly to the clinical_events table, and each row in the table is made up of a patient identifier, an event (dm+d) code, and an event date. For this table, the event refers to the issue of a medication (coded as a dm+d code), and the event date, the date the prescription was issued.
Factors to consider when using medications data🔗
Depending on the specific area of research, you may wish to exclude medications in particular periods. For example, in order to ensure medication data is stable following a change of practice, you may want to exclude patients for a period after the start of their practice registration . You may also want to exclude medications for patients for a period prior to their leaving a practice. Alternatively, for research looking at a specific period of interest, you may simply want to ensure that all included patients were registered at a single practice for a minimum time prior to the study period, and were registered at the same practice for the duration of the study period.
Examples of using ehrQL to calculation such periods can be found in the documentation on how to use ehrQL to answer specific questions.
many rows per patient
occupation_on_covid_vaccine_record🔗
This data is from the NHS England COVID-19 data store, and reflects information collected at the point of vaccination where recipients are asked by vaccination staff whether they are in the category of health and care worker.
Refer to the OpenSAFELY database build report to see when this data was last updated.
See the GitHub issue that discusses more of the background context.
-
is_healthcare_worker
🔗
boolean
one row per patient
ons_deaths🔗
Registered deaths
Date and cause of death based on information recorded when deaths are certified and registered in England and Wales from February 2019 onwards. The data provider is the Office for National Statistics (ONS). This table is updated approximately weekly in OpenSAFELY.
This table includes the underlying cause of death , place of death and up
to 15 medical conditions mentioned on the death certificate.
These codes (cause_of_death_01
to cause_of_death_15
) are not ordered meaningfully.
More information about this table can be found in following documents provided by the ONS:
- Information collected at death registration
- User guide to mortality statistics
- How death registrations are recorded and stored by ONS
In the associated database table ONS_Deaths,
a small number of patients have multiple registered deaths.
This table contains the earliest registered death.
The ehrql.tables.raw.tpp.ons_deaths
table contains all registered deaths.
Warning
There is also a lag in ONS death recording caused amongst other things by things like autopsies and inquests delaying reporting on cause of death. This is evident in the OpenSAFELY historical database coverage report
-
date
🔗
date
-
Patient's date of death.
-
place
🔗
string
-
Patient's place of death.
- Possible values:
Care Home
,Elsewhere
,Home
,Hospice
,Hospital
,Other communal establishment
- Possible values:
-
underlying_cause_of_death
🔗
ICD-10 code
-
Patient's underlying cause of death of death.
-
cause_of_death_01
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_02
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_03
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_04
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_05
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_06
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_07
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_08
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_09
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_10
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_11
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_12
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_13
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_14
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
-
cause_of_death_15
🔗
ICD-10 code
-
Medical condition mentioned on the death certificate.
many rows per patient
opa🔗
Outpatient appointments data (OPA) is provided via the NHS Secondary Uses Service.
This table gives core details of outpatient appointments.
Refer to the GitHub issue that describes limitations of the outpatient appointments data and the GitHub issue that discusses more of the background context.
-
opa_ident
🔗
integer
-
Unique identifier for the appointment used across the OPA tables.
- Never
NULL
- Never
-
appointment_date
🔗
date
-
The date of an appointment.
-
attendance_status
🔗
string
-
Indicates whether or not an appointment for a care contact took place. If the appointment did not take place it also indicates whether or not advanced warning was given. Refer to the NHS Data Model and Dictionary entry for "attended or did not attend" for details on code meanings.
- Possible values:
5
,6
,7
,2
,3
,4
,0
- Possible values:
-
consultation_medium_used
🔗
string
-
Identifies the communication mechanism used to relay information between the care professional and the person who is the subject of the consultation, during a care activity. Refer to the NHS Data Model and Dictionary entry for "consultation mechanism" for details on code meanings. Note that the allowed codes as listed in TPP's data appear to be a subset of the codes listed in the NHS Data Model and Dictionary.
- Possible values:
01
,02
,03
,04
,05
,09
,10
,11
,98
- Possible values:
-
first_attendance
🔗
string
-
An indication of whether a patient is making a first attendance or contact; or a follow-up attendance or contact and whether the consultation medium used national code was face to face communication or telephone or telemedicine web camera. Refer to the NHS Data Model and Dictionary entry for "first attendance" for details on code meanings. Note that the allowed codes as listed in TPP's data contain an additional
9
code over the NHS Data Model and Dictionary entry.- Possible values:
1
,2
,3
,4
,5
,9
- Possible values:
-
hrg_code
🔗
string
-
The Healthcare Resource Group (HRG) code assigned to the activity, used to assign baseline tariff costs.
- Matches regular expression:
[a-zA-Z]{2}[0-9]{2}[a-zA-Z]
- Matches regular expression:
-
treatment_function_code
🔗
string
-
The treatment function under which the patient is treated. It may be the same as the main specialty code or a different treatment function which will be the care professional's treatment interest.
many rows per patient
opa_cost🔗
Outpatient appointments data is provided via the NHS Secondary Uses Service.
This table gives details of outpatient appointment costs.
Note that data only goes back a couple of years.
-
opa_ident
🔗
integer
-
Unique identifier for the appointment used across the OPA tables.
- Never
NULL
- Never
-
tariff_opp
🔗
float
-
The base national tariff where the procedure tariff is applicable.
-
grand_total_payment_mff
🔗
float
-
The grand total payment for the activity (
Net_SLA_Payment + Tariff_MFF_Payment
) where SLA = service level agreement, i.e. all contractual payments which is national tariff for the type of activity plus any additional payments minus any applicable deductions. MFF = Market Forces Factor, a geography-based cost adjustment). -
tariff_total_payment
🔗
float
-
The total payment according to the national tariff.
-
appointment_date
🔗
date
-
The date of an appointment.
-
referral_request_received_date
🔗
date
-
The date the referral request was received by the health care provider.
many rows per patient
opa_diag🔗
Outpatient appointments data is provided via the NHS Secondary Uses Service.
This table gives details of outpatient appointment diagnoses.
Note that diagnoses are often absent from outpatient records.
-
opa_ident
🔗
integer
-
Unique identifier for the appointment used across the OPA tables.
- Never
NULL
- Never
-
primary_diagnosis_code
🔗
ICD-10 code
-
The international classification of diseases (ICD) code used to identify the primary patient diagnosis. Note that this will typically not be completed.
-
primary_diagnosis_code_read
🔗
CTV3 (Read v3) code
-
The Read coded clinical terms code to identify the primary patient diagnosis. Note that this will typically not be completed.
-
secondary_diagnosis_code_1
🔗
ICD-10 code
-
The international classification of diseases (ICD) code used to identify the secondary patient diagnosis. Note that this will typically not be completed.
-
secondary_diagnosis_code_1_read
🔗
CTV3 (Read v3) code
-
The Read coded clinical terms used to identify the secondary patient diagnosis. Note that this will typically not be completed.
-
appointment_date
🔗
date
-
The date of an appointment.
-
referral_request_received_date
🔗
date
-
The date the referral request was received by the health care provider.
many rows per patient
opa_proc🔗
Outpatient appointments data is provided via the NHS Secondary Uses Service.
This table gives details of outpatient procedures. Typically, procedures will only be recorded where they attract a specified payment. The majority of appointments will have no procedure recorded.
-
opa_ident
🔗
integer
-
Unique identifier for the appointment used across the OPA tables.
- Never
NULL
- Never
-
primary_procedure_code
🔗
OPCS-4 code
-
The OPCS classification of interventions and procedures code which is used to identify the primary patient procedure carried out.
-
primary_procedure_code_read
🔗
CTV3 (Read v3) code
-
The Read coded clinical terms code which is used to identify the primary patient procedure carried out.
-
procedure_code_2
🔗
OPCS-4 code
-
TODO
-
procedure_code_2_read
🔗
CTV3 (Read v3) code
-
The Read coded clinical terms for a procedure other than the primary procedure.
-
appointment_date
🔗
date
-
The date of an appointment.
-
referral_request_received_date
🔗
date
-
The date the referral request was received by the health care provider.
many rows per patient
open_prompt🔗
This table contains responses to questions from the OpenPROMPT project.
You can find out more about this table in the associated short data report. To view it, you will need a login for Level 4. The workspace shows when the code that comprises the report was run; the code itself is in the airmid-short-data-report repository on GitHub.
-
ctv3_code
🔗
CTV3 (Read v3) code
-
The response to the question, as a CTV3 code. Alternatively, if the question does not admit a CTV3 code as the response, then the question, as a CTV3 code.
- Never
NULL
- Never
-
snomedct_code
🔗
SNOMED-CT code
-
The response to the question, as a SNOMED CT code. Alternatively, if the question does not admit a SNOMED CT code as the response, then the question, as a SNOMED CT code.
-
creation_date
🔗
date
-
The date the survey was administered
- Never
NULL
- Never
-
consultation_date
🔗
date
-
The response to the question, as a date, if the question admits a date as the response. Alternatively, the date the survey was administered.
- Never
NULL
- Never
-
consultation_id
🔗
integer
-
The ID of the survey
- Never
NULL
- Never
-
numeric_value
🔗
float
-
The response to the question, as a number
one row per patient
patients🔗
Patients in primary care.
Representativeness🔗
You can find out more about the representativeness of these data in the OpenSAFELY-TPP backend in:
The OpenSAFELY Collaborative, Colm D. Andrews, Anna Schultze, Helen J. Curtis, William J. Hulme, John Tazare, Stephen J. W. Evans, et al. 2022. "OpenSAFELY: Representativeness of Electronic Health Record Platform OpenSAFELY-TPP Data Compared to the Population of England." Wellcome Open Res 2022, 7:191. https://doi.org/10.12688/wellcomeopenres.18010.1
Orphan records🔗
If a practice becomes aware that a patient has moved house, then the practice deducts, or removes, the patient's records from their register. If the patient doesn't register with a new practice within a given amount of time (normally from four to eight weeks), then the patient's records are permanently deducted and are orphan records. There are roughly 1.6 million orphan records.
Recording of death in primary care🔗
In England, it is the statutory duty of the doctor who had attended in the last illness to complete a medical certificate of cause of death (MCCD). ONS death data are considered the gold standard for identifying patient deaths because they are based on these MCCDs.
There is generally a lag between the death being recorded in ONS data and it
appearing in the primary care record, but the coverage or recorded death is almost
complete and the date of death is usually reliable when it appears. There is
also a lag in ONS death recording (see ons_deaths
below
for more detail). You can find out more about the accuracy of date of death
recording in primary care in:
Gallagher, A. M., Dedman, D., Padmanabhan, S., Leufkens, H. G. M. & de Vries, F 2019. The accuracy of date of death recording in the Clinical Practice Research Datalink GOLD database in England compared with the Office for National Statistics death registrations. Pharmacoepidemiol. Drug Saf. 28, 563–569. https://doi.org/10.1002/pds.4747
By contrast, cause of death is often not accurate in the primary care record so we don't make it available to query here.
-
age_on(date)
🔗
-
Patient's age as an integer, in whole elapsed calendar years, as it would be on the given date.
This method takes no account of whether the patient is alive on the given date. In particular, it may return negative values if the given date is before the patient's date of birth.
View method definition
return (date - patients.date_of_birth).years
many rows per patient
practice_registrations🔗
Each record corresponds to a patient's registration with a practice.
See the TPP backend information for details of which patients are included.
-
start_date
🔗
date
-
Date patient joined practice.
- Never
NULL
- Never
-
end_date
🔗
date
-
Date patient left practice.
-
practice_pseudo_id
🔗
integer
-
Pseudonymised practice identifier.
- Never
NULL
- Never
-
practice_stp
🔗
string
-
ONS code of practice's STP (Sustainability and Transformation Partnership). STPs have been replaced by ICBs (Integrated Care Boards), and ICB codes will be available soon.
- Matches regular expression:
E540000[0-9]{2}
- Matches regular expression:
-
practice_nuts1_region_name
🔗
string
-
Name of the NUTS level 1 region of England to which the practice belongs. For more information see: https://www.ons.gov.uk/methodology/geography/ukgeographies/eurostat
- Possible values:
North East
,North West
,Yorkshire and The Humber
,East Midlands
,West Midlands
,East
,London
,South East
,South West
- Possible values:
-
practice_systmone_go_live_date
🔗
date
-
Date on which the practice started using the SystmOne EHR platform.
Most patient records will have been transferred from the previous EHR platform but records which are specific to SystmOne will not exist before this date. In particular, the appointments table should only be considered accurate for a given practice after this date.
-
for_patient_on(date)
🔗
-
Return each patient's practice registration as it was on the supplied date.
Where a patient is registered with multiple practices we prefer the most recent registration and then, if there are multiple of these, the one with the longest duration. If there's stil an exact tie we choose arbitrarily based on the practice ID.
View method definition
spanning_regs = practice_registrations.where(practice_registrations.start_date <= date).except_where( practice_registrations.end_date < date ) ordered_regs = spanning_regs.sort_by( practice_registrations.start_date, practice_registrations.end_date, practice_registrations.practice_pseudo_id, ) return ordered_regs.last_for_patient()
-
spanning(start_date, end_date)
🔗
-
Filter registrations to just those spanning the entire period between
start_date
andend_date
.View method definition
return practice_registrations.where( practice_registrations.start_date.is_on_or_before(start_date) & (practice_registrations.end_date.is_after(end_date) | practice_registrations.end_date.is_null()) )
-
spanning_with_systmone(start_date, end_date)
🔗
-
Filter registrations to just those spanning the entire period between
start_date
andend_date
and where the practice has been using the SystmOne EHR platform throughout that period (seesystmone_go_live_date
).View method definition
return practice_registrations.spanning(start_date, end_date).where( practice_registrations.practice_systmone_go_live_date <= start_date )
many rows per patient
sgss_covid_all_tests🔗
COVID-19 tests results from SGSS (the Second Generation Surveillance System).
For background on this data see the NHS DARS catalogue entry. And for more detail on SGSS in general see PHE_Laboratory_Reporting_Guidelines.pdf.
-
specimen_taken_date
🔗
date
-
Date on which specimen was collected.
- Never
NULL
- Never
-
is_positive
🔗
boolean
-
Whether the specimin tested positive for SARS-CoV-2.
- Never
NULL
- Never
-
lab_report_date
🔗
date
-
Date on which the labaratory reported the result.
- Never
NULL
- Never
-
was_symptomatic
🔗
boolean
-
Whether the patient reported symptoms of COVID-19 at the time the specimen was collected. May be NULL if unknown.
-
sgtf_status
🔗
integer
-
Provides information on whether a PCR test result exhibited "S-Gene Target Failure" which can be used as a proxy for the presence of certain Variants of Concern.
Results are provided as number between 0 and 9. We know the meaning of some of these numbers based on an email from PHE:
0: S gene detected
Detectable S gene (CH3>0)
Detectable y ORF1ab CT value (CH1) <=30 and >0
Detectable N gene CT value (CH2) <=30 and >01: Isolate with confirmed SGTF
Undetectable S gene; CT value (CH3) =0
Detectable ORF1ab gene; CT value (CH2) <=30 and >0
Detectable N gene; CT value (CH1) <=30 and >09: Cannot be classified
Null are where the target is not S Gene. I think LFTs are currently also coming across as 9 so will need to review those to null as well as clearly this is a PCR only variable.
However the values 2, 4 and 8 also occur in this column and we don't currently have documentation on their meaning.
- Always >= 0 and <= 9
-
variant
🔗
string
-
Where a specific SARS-CoV-2 variant was identified this column provides the details.
This appears to be effectively a free-text field with a large variety of possible values. Some have an obvious meaning e.g.
B.1.617.2
,VOC-21JAN-02
,VUI-21FEB-04
.Others less so e.g.
VOC-22JAN-O1_probable:V-21OCT-01_low-qc
. -
variant_detection_method
🔗
string
-
Where a specific SARS-CoV-2 variant was identified this provides the method used to do so.
- Possible values:
Private Lab Sequencing
,Reflex Assay
,Sanger Provisional Result
- Possible values:
many rows per patient
vaccinations🔗
This table contains information on administered vaccinations, identified using either the target disease (e.g., Influenza), or the vaccine product name (e.g., Optaflu). For more information about this table see the "Vaccinaton names in the OpenSAFELY-TPP database" report.
Vaccinations that were administered at work or in a pharmacy might not be included in this table.
many rows per patient
wl_clockstops🔗
National Waiting List Clock Stops
This dataset contains all completed referral-to-treatment (RTT) pathways with a "clock stop" date between May 2021 and May 2022. Patients referred for non-emergency consultant-led treatment are on RTT pathways. The "clock start" date is the date of the first referral that starts the pathway. The "clock stop" date is when the patient either: receives treatment; declines treatment; enters a period of active monitoring; no longer requires treatment; or dies. The time spent waiting is the difference in these two dates.
A patient may have multiple rows if they have multiple completed RTT pathways; however, there is only one row per unique pathway. Because referral identifiers aren't necessarily unique between hospitals, unique RTT pathways can be identified using a combination of:
pseudo_organisation_code_patient_pathway_identifier_issuer
pseudo_patient_pathway_identifier
pseudo_referral_identifier
referral_to_treatment_period_start_date
For more information, see "Consultant-led Referral to Treatment Waiting Times Rules and Guidance".
-
activity_treatment_function_code
🔗
string
-
The treatment function
- Matches regular expression:
[a-zA-Z0-9]{3}
- Matches regular expression:
-
priority_type_code
🔗
string
-
The priority type.
Note that a small number of rows contain values which are not in the list below. These are converted to NULL in this representation of the data. If you need to access the original values, please see the corresponding raw table.
- Possible values:
routine
,urgent
,two week wait
- Possible values:
-
pseudo_organisation_code_patient_pathway_identifier_issuer
🔗
string
-
pseudo_patient_pathway_identifier
🔗
string
-
pseudo_referral_identifier
🔗
string
-
referral_request_received_date
🔗
date
-
The date the referral was received, for the referral that started the original pathway
-
referral_to_treatment_period_end_date
🔗
date
-
Clock stop for the completed pathway
-
referral_to_treatment_period_start_date
🔗
date
-
Clock start for the completed pathway
-
source_of_referral_for_outpatients
🔗
string
-
waiting_list_type
🔗
string
-
The waiting list type on completion of the pathway.
Note that a small number of rows contain values which are not in the list below. These are converted to NULL in this representation of the data. If you need to access the original values, please see the corresponding raw table.
- Possible values:
ORTT
,IRTT
,PTLO
,PTLI
,RTTO
,RTTI
- Possible values:
-
week_ending_date
🔗
date
-
The Sunday of the week that the pathway relates to
many rows per patient
wl_openpathways🔗
National Waiting List Open Pathways
This dataset contains all people on open (incomplete) RTT or not current RTT (non-RTT) pathways as of May 2022. It is a snapshot of everyone still awaiting treatment as of May 2022 (i.e., the clock hasn't stopped). Patients referred for non-emergency consultant-led treatment are on RTT pathways, while patients referred for non-consultant-led treatment are on non-RTT pathways. For each pathway, there is one row for every week that the patient is still waiting. Because referral identifiers aren't necessarily unique between hospitals, unique RTT pathways can be identified using a combination of:
pseudo_organisation_code_patient_pathway_identifier_issuer
pseudo_patient_pathway_identifier
pseudo_referral_identifier
referral_to_treatment_period_start_date
For more information, see "Consultant-led Referral to Treatment Waiting Times Rules and Guidance".
-
activity_treatment_function_code
🔗
string
-
The treatment function
- Matches regular expression:
[a-zA-Z0-9]{3}
- Matches regular expression:
-
current_pathway_period_start_date
🔗
date
-
Latest clock start for this pathway period
-
priority_type_code
🔗
string
-
The priority type.
Note that a small number of rows contain values which are not in the list below. These are converted to NULL in this representation of the data. If you need to access the original values, please see the corresponding raw table.
- Possible values:
routine
,urgent
,two week wait
- Possible values:
-
pseudo_organisation_code_patient_pathway_identifier_issuer
🔗
string
-
pseudo_patient_pathway_identifier
🔗
string
-
pseudo_referral_identifier
🔗
string
-
referral_request_received_date
🔗
date
-
The date the referral was received, for the referral that started the original pathway
-
referral_to_treatment_period_end_date
🔗
date
-
If the pathway is open, then
NULL
-
referral_to_treatment_period_start_date
🔗
date
-
Latest clock start for this pathway. If the pathway is not a current pathway, then
NULL
. -
source_of_referral
🔗
string
-
National referral source code for the referral that created the original pathway
- Matches regular expression:
[a-zA-Z0-9]{2}
- Matches regular expression:
-
waiting_list_type
🔗
string
-
The waiting list type.
Note that a small number of rows contain values which are not in the list below. These are converted to NULL in this representation of the data. If you need to access the original values, please see the corresponding raw table.
- Possible values:
ORTT
,IRTT
,ONON
,INON
,PTLO
,PTLI
,RTTO
,RTTI
- Possible values:
-
week_ending_date
🔗
date
-
The Sunday of the week that the pathway relates to