Public willingness to share data – Data Detail Initial Insights (Part 2) - Mark Warner

The survey data that we have analysed so far indicates that there is a difference between people’s willingness to share Mobility data and Medical data when the data will be used to monitor the appropriateness of the UK COVID-19 Alert level. In Part 2 of our blog series, we will report some preliminary observations from examining the answers to the Data Detail questions in our survey – questions that asked about what kind of Medical data and Mobility data participants would be willing to share for this purpose.

Our initial analysis indicates that deciding how much data-detail to share is complex and involves considering what the data type is, who the data is being shared with, how it is to be stored, and what the Covid-19 alert level is. Further, there is notable variation across participant’s answers. Therefore, we need to delve deeper and consider how participant’s answers pattern at an individual level, for it may be that participants show a consistent bias towards/against certain data sharing practices, regardless of the factors that we have tried to investigate.

Introduction

In a previous blog, we reported that the data collected from another question set in the survey evidenced a small but statistically significant difference between willingness to share Mobility data and Medical data. Counter to our predictions, our participants were less willing to share their Mobility data than their Medical data. Of these two data types, to date research has generally prioritised considering Medical data and researchers have consistently found that participants are the most protective of and most concerned about sharing their Medical data compared to other data types. It could be that attitudes and concerns adjust within the context of a global pandemic, but it could be that this result is a product of Medical data and Mobility data being given ambiguous definitions in that section of the survey. We knew that we would want to explore participant’s perceptions of sharing Medical data and Mobility data in more detail and so included a set of questions specifically concerned with the different kinds of information that are classed as Medical and Mobility data. We will refer to these questions as the “Data Detail Sliders”.

Our Survey – Data Detail Sliders

From a scale of 6 options, we asked participants to select the amount of detail that they would be willing to include in the Medical and Mobility data that they were sharing. We asked participants to consider the Covid-19 alert level, who they were sharing the data with, and whether the data would be anonymous or identifiable (see figure 1 below for an example question).

Figure 1. An example of a ‘Data Detail Slider’ question

For Mobility data the 6 options were:

No data
Countries visited and when
Towns/cities visited and when
Streets visited and when
Buildings visited and when
All destinations and routes taken

And for Medical data the 6 options were:

No data
Covid-19 test result
Basic health information (e.g. BMI, smoker/non-smoker)
Current diagnoses and treatments
Medical records from the past 5 years
Medical records since birth

Overall Patterns

First, let’s compare the answers to the Mobility and Medical data detail sliders (figure 2 below). For both these data types, it is evident that more participants would be willing to share data if it was less detailed, and fewer participants are willing to share more detailed data. However, the peak (the option that was selected the most) is different for the two data types. For Mobility data, the “No data’ option was the most commonly selected option (described as “0” on the graph below), while for the Medical data the “Covid-19 test results” option was the most selected (described as “1” on the graph below). At this stage in the analysis, this aligns with our finding from earlier in the survey: that participants were less willing to share their Mobility data than their Medical data.

Figure 2. The percentage of responses for each data detail option by data type

Now let’s consider the Data Detail Slider answers to questions about Mobility data and Medical data separately, and factors that might affect participant responses. How data is stored (anonymously or identifiably) appears to influence how much detail participants are willing to share. For mobility data, if the data was identifiable, fewer participants would be willing to share more detailed data, and most would choose to share “No Data”. If the mobility data was anonymous, more participants would be willing to share more detailed data, with most selecting to share the most detailed data on our scale (see figure 3 below). In comparison, for Medical data most participants chose to share “Covid-19 Test result” when the data was anonymous and when the data was identifiable (see figure 4). One interpretation is that this emphasises our participants awareness of the value of Covid-19 specific data.

Figure 3. The percentage of responses for each data detail option by data storage (Mobility data)

Figure 4. The percentage of responses for each data detail option by data storage (Medical data)

Next, let’s consider the effect that the UK Covid-19 Alert Level may have. This factor appears to influence how much detail participants are willing to share in regard to Mobility Data but not Medical Data. In questions about Mobility Data, when the Alert Level was 1 most participants chose to share “No data”, when the Alert Level was 5 most participants chose to share the most detailed data option (“All destinations and routes”), and when the Alert Level was 3 participant responses were relatively evenly spread across the less detailed end of our scale (see figure 5). In comparison, most participants selected the “Covid-19 test results” option regardless of the Alert Level when considering sharing Medical Data (see figure 6).

Figure 5. The percentage of responses for each data detail option by Covid-19 Alert Level (Mobility data)

Figure 6. The percentage of responses for each data detail option by Covid-19 Alert Level (Medical data)

It may seem that an increase in the severity of the pandemic (as is indicated by the alert levels) motivating participants to share more detailed Mobility data, but not motivating the sharing of more detailed Medical data, is somewhat counter to our initial finding that participants were less willing to share Mobility data over Medical Data. But actually this highlights that the content of the data (both data type and data detail) is of paramount importance.

The final factor that we will consider here is Data holder (who the data will be shared with) and, at this initial stage of data examination, there are some clear patterns to the data but also some that are far less clear.

For the questions about Mobility Data, while the most selected option when the data holder is a Public Health Body is “All destinations and routes”, this is not notably more than the other options at the less detailed end of the scale. This lack of a strong differentiation in willingness to share data at different degrees of data detail is also evident when the data holder is a Local Authority; although the most selected option is “No Data” this is not noticeably more than “Countries visited and when” and “Towns/cities visited and when”. In contrast, the indication that fewer participants would be willing to share detailed data if the data holder was a commercial company (our made-up company is called “Info-Insights”) or their Regional Police Force is far clearer, with the most selected option being “No Data”. In summary, for each data detail option in the middle of the scale the difference across the 4 stakeholders is minimal, whereas who is permitted to have what kind of data is far clearer at the ends of the scale. See figure 7.

Some of these patterns recur in the answers to the Medical Data questions. Fewer participants would share detailed data if the data holder was a commercial company (“Info-Insights”), their Regional Police Force or a Local Authority, with the results of Covid-19 tests being the most detail they would share (compared to “No Data” when thinking of Mobility Data as described above). When the data holder is a Public Health Body, however, the participant answers are relatively more evenly distributed across the 6 options on the data detail scale with the number of participants willing to share their Covid-19 test results and the number of participants willing to share their medical records since birth being almost equitable. This indicates that our participants would be more conservative with sharing Medical data that contains greater detail than Covid-19 test results with non-health focused organisations. Equally, it shows great variation across our participants in regard to how much data detail can be shared with a Public Health Body. See figure 8.

Figure 7. The percentage of responses for each data detail option by data holder (Mobility data)

Figure 8. The percentage of responses for each data detail option by data holder (Medical data)

We can also look at how these factors overlap. Figure 9 and 10 display the answers to the mobility data questions and medical data questions, respectively, by alert level and data storage. Both these graphs indicate that more participants would share less detailed data when it is identifiable compared to anonymised at all alert levels (illustrated by the darker coloured bars being taller than the lighter coloured bars on the left-hand side of the graph). Equally, more participants would share more detailed data when it is anonymised compared to identifiable at all alert levels (illustrated by the lighter coloured bars being taller than the darker coloured bars on the right-hand side of the graph).

Figure 9. The percentage of responses for each data detail option by alert level and data storage (Mobility data)

Figure 10. The percentage of responses for each data detail option by alert level and data storage (Medical data)

Final Thoughts and Next steps

In this blogpost we have reported some initial observations from examining the data that was collected via the Data Detail Sliders questions in our survey where we asked our participants to choose the data detail that they would share from a scale of 6 options, with different scales for Mobility and Medical data, bearing in mind the Covid-19 alert level, who they were sharing it with, and whether the data would be anonymous of identifiable when making this choice. At a very high level (considering one potentially influential factor at a time), we can observe some relatively clear patterns in the data, and these patterns differ in the questions about Mobility data versus Medical data.

However, as we consider the data in more detail, what factors are likely to have an influence and how becomes less clear and there is notable variation across participants’ answers. Further, so far, we have not considered the role that individual experience, biases, assumptions and opinions has had in this dataset. For example, there may be participants that are highly motivated and would share more detailed data regardless of other factors. Equally, there may be participants who are consistently protective of their data, and answer “No Data” to most or even all of the questions. Therefore, in our next analysis we will delve deeper to consider how participant’s answers pattern at an individual level to explore the overall patterns and lack of patterns of behaviour within our sampled population.

Article by: Selina Sutton and the OMDDAC WP3 Team