Citizen Science (CS) attracts hundreds of thousands of participants to research projects that utilize human problem-solving for tasks of varying complexity (Heck et al. 2018; Huang et al. 2018). This collective intelligence of participants has contributed to many scientific discoveries, for example in protein folding (Cooper et al. 2010, Koepnick et al. 2019), galaxy morphology identification (Lintott et al. 2011, Masters and Galaxy Zoo Team 2019), mapping of neurons in the brain (Kim et al. 2014), heuristics for solving quantum physics challenges (Jensen et al., 2021), tracking changes in biodiversity and ecology (Lim et al. 2019, Wang et al. 2018), observing and monitoring air pollution (Snik et al. 2014), and transcribing manuscript collections (Causer and Terras 2014). Some of these achievements have been made in part by participants going beyond the narrow task given to them, digging further into the data to understand and interpret what they are seeing (Lintott 2019).
Scientific research often benefits from CS in ways that go beyond “free” labor. For example, researchers use CS when they are unable to collect the necessary data by themselves (Wyler et al. 2016), need specific expertise from the general public to help solve a problem (Danielsen et al. 2018), datasets are too large or complicated for the researchers to process with their given technology and resources (Das et al. 2019, Fortson et al. 2011, Nugent, 2019), or the degrees of freedom of a system results in nearly infinite possible candidate solutions to be explored (Jensen et al. 2021, Koepnick et al. 2019). Simply put, CS projects enhance scientific research by tapping into the collective cognitive and labor resources of the general public.
In a similar vein, Artificial Intelligence (AI) provides value in many scientific disciplines, making it possible for researchers to work with larger amounts of data, or detect patterns that would be hidden to the human eye or more reductionist statistical methods. Broadly, AI is being used to solve a wide variety of problems ranging from game challenges (Silver et al. 2016) to corporate applications (Eager et al. 2020, Li et al. 2017). Due to the quantitative character of natural sciences they are particularly well suited for Machine Learning applications (ML; a sub-field of AI) across disciplines like physics (Bohrdt et al 2019, Dalgaard et al. 2020) astronomy (Godines et al. 2019, Ormiston et al. 2020), biology (Senior et al. 2020), ecology (Leoni et al. 2020), and geoscience (Wang et al. 2019) to name a few. Cognitive science (Lake et al. 2015) and social sciences (Chen et al. 2018, Hindman 2015) also employ ML methods with increasing frequency. Furthering our understanding of AI will in turn enhance future scientific progress.
Despite the success of AI, a growing part of the AI community has realized that many tasks can still only achieve the required quality and reliability with some form of human-in-the-loop (Benedikt et al. 2020, Zanzotto 2019) input to the particular task. This could be either live or prerecorded human input as we show in this paper. Purely deep learning-based approaches often identify particular patterns in the training data that are not robust enough to solve many real-world problems in noisy, unpredictable and varying environments (Heaven 2019, Marcus, 2018). Also, despite the spectacular successes of AlphaZero to “teach itself” to play Chess and Go (Silver et al. 2018), the DeepMind team had to employ extensive learning from human gameplay in order for it to succeed in the complex and dynamic multi-actor environment of Starcraft (Vinyals et al 2019). The limitations of current “self-learning” AI is highlighted by the, perhaps surprising, fact that AlphaZero has so far hardly found application beyond the realm of games (Dalgaard et al. 2020, Tomašev et al. 2020). One goal of AI research is therefore to better understand what makes tasks unsolvable by current algorithms, and find alternative or complementary methods.
One emerging response to the failure of achieving autonomous operation is to develop hybrid solutions bringing the human more intimately into the loop, optimally combining the information processing capabilities of both humans and machines (Christiano et al. 2017, Dellermann et al. 2019, Michelucci and Dickinson 2016). Many such advanced approaches focus on capturing failures of the stand-alone AI system by querying humans for feedback about a certain selection of the AI predictions (Kamar and Manikonda 2017, Nushi et al. 2018). This becomes crucial in high stake applications, such as medical diagnostics (Holzinger 2016, Wilder et al. 2020). These approaches have also started finding their way into research contexts such as the optimization of complex dynamical systems (Baltz et al. 2017). In the latter study, subjective human expert opinion was used to choose one of the viable actions proposed by the AI, which led to superior results in fusion research compared to the case of pure machine control.
The field of CS is ideal for developing hybrid intelligence interactions for two distinct reasons. First, although all fields of science revolve around human problem-solving, the human computation being performed is often defined implicitly through the tacit domain-specific experience of the involved experts. In contrast, the field of CS specializes in explicitly transforming conventional research challenges into tasks tapping into the problem-solving abilities and collective intelligence of the general public. Second, the full long-term value of hybrid interactions may not always be immediately apparent because developing interfaces to optimally support human creativity is very challenging. Therefore, commercial applications may tend to focus more narrowly on short-term efficiency maximization using shallower, but predictable human involvement. In contrast, the field of CS is fueled both by a desire to solve concrete tasks but also to generate intrinsic value for the participants through as deep and meaningful involvement in the projects as possible and as participants wish. Additionally, although AI methodologies are starting to be applied in CS projects, little attention has been given to bi-directional human-computer interactions. We argue that the combination of these practical and value-based considerations make CS particularly well-suited to develop approaches combining human and artificial intelligence, i.e. hybrid intelligence, into concrete projects that will benefit the field of AI, the CS projects and participants, and science and society at large.
The concept of Hybrid Intelligence (HI) referred to above has been rather loosely defined in many variations (Akata et al. 2020, Lasecki 2019, Prakash and Mathewson 2020). Here we adhere to the operational HI definition in terms of three criteria put forward by Dellermann et al. 2019:
As we show in this paper, these three criteria of HI can serve as a lens for classifying the different interaction schemes between participants and AI in CS projects.
The aim of this paper is to identify which processes and outcomes within CS could benefit from combining AI and human intelligence, though many of the observations can be generalized to use-cases outside of CS. For the CS community, this paper provides a visualisation of how AI can support their projects. For AI researchers, this work highlights the opportunity CS presents to engage with real-world data sets and explore new AI methods and applications. In particular, CS projects are a fertile ground for quantitatively and qualitatively studying human 21st century skills such as creativity, hierarchical thinking and common sense (Jensen et al. 2020) which is a current roadblock in developing more robust AI (Marcus 2018). For both, there are opportunities for interdisciplinary and transdisciplinary collaborations.
In order to investigate which types of AI can and should be integrated into CS projects, we first relate CS to adjacent computational disciplines. A conceptual mapping of terms is necessary to effectively exploit the extensive insights from each research tradition in the interdisciplinary endeavor towards optimal human-machine problem-solving. Then, we examine several projects and their potential for AI-enhancement through two key dimensions: the degree of digitization and the accessibility to make a scientific contribution. We do this to concretely illustrate which types of CS tasks are ideally suited for which types of machine support. Finally, we present a framework for types of human-AI interaction in CS based on established criteria of HI. This framework identifies and categorises the ways AI can augment the process of solving CS tasks.
In order to find optimal ways for AI to enhance CS we start by exploring the axes of types of computation (biological, human, and machine) and number of agents. Each axis has an emergent intelligence (Bonabeau et al. 1999) associated with it: hybrid intelligence and collective intelligence respectively. We provide an overview table on the terms used (Table 1), followed by a diagram illustrating their relationships (Fig 1).
Table 1: Overview of types of computation, emergent intelligence, and artificial intelligence that are referred to throughout the paper.
Figure 1: The diagram illustrates the relationship of Hybrid Intelligence and Citizen Science in the reference frame of mixed-agent computation (y-axis), moving from machine to human and finally to general biological computation, and agent (biological individuals or machines) count (x-axis) moving from one agent to many.
HI is a subset of the overlap between Human Computation and AI. As HI can be achieved with only two agents, it lies partially outside of Collective Intelligence which requires at least three agents (Woolley et al. 2010). Apart from collectiveness and solution superiority, HI poses a rather strict requirement of mutual learning, which explains the substantial overlap between CS and AI as well as an overlap of AI and Human Computation beyond the area of HI. Since few projects today achieve all three HI requirements, the size of the HI field on fig 1 is overrepresented. However, HI’s importance will likely accelerate as algorithmic development increasingly focuses on human-centered AI (Auernhammer 2020). For the rest of this paper, we focus on the overlap between CS (fig. 1, yellow box) and Al (fig. 1, purple box), and discuss the characteristics which make projects lie inside or outside the realm of HI.
We now discuss the degree of digitization and the accessibility to scientific contributions of CS projects to determine which tasks are ideally suited for which types of machine support. To date, well-known CS typologies have focused on participant contributions to different parts of the research process such as collecting data (Bonney et al. 2009b, 2016, Paul et al. 2014), forming hypotheses (e.g. Bonney et al. 2009a, Haklay 2013), or generating knowledge in the project (Schäfer and Kieslinger 2016). Wiggins & Crowston (2011) were the first to take into account what they termed ‘virtuality’, i.e. digitization, in which projects are characterized as either online or offline. As nearly all CS projects trend towards including some digital components, in the following we show how adding granularity to the degree of digitization lends valuable insights into the potential forms of human-machine interactions in CS tasks. The degree of digitization is closely related to the choice of AI to optimally support CS participants in solving a task. To further understand the type of support needed and thereby the choice of AI, we also explore a second CS task characteristic: accessibility to contribution, which we elaborate on below. Note, even though a task may routinely be solved by participants, it should not be taken as a sign of computational simplicity, since tasks easily completed by humans (e.g., pattern recognition) can be quite challenging for AI. This CS task mapping may lead to increased appreciation of the multitude of human-processing going on in CS projects that are still far from being automatable in any foreseeable future.
We propose a granular description and classification of projects based on three different categories of task digitization: optimization tasks, annotation tasks, and physical tasks.
Physical Tasks require participants to perform non-digital actions, to acquire data such as birdwatching. In these tasks, the participant needs to continually (audio-)visually survey the environment and/or consider the suitability of deploying a sensor for recording data (Camprodon et al. 2019, Cochran et al. 2009, D’Hondt, Stevens and Jacobs 2013, Van Horn et al. 2018). The machine analogy of the data collection task would be robotics and smart sensors. In smart sensors, the raw measurement data is processed locally in the hardware before being passed to a central data storage for further processing (Posey n.d.). We are currently unaware of any CS projects that employ smart sensors.
Annotation tasks are solved via a digital platform, but require subject-specific or disciplinary knowledge, even if at a layperson level. Thus one cannot score the participants’ input objectively. Instead, the annotation is consensus-based (absence of a ground truth). The elements to be annotated are often images or audio recordings or transcriptions (Causer and Terras 2014, Lintott et al. 2008, Nugent 2019, Tinati et al. 2017). The annotated data can be used to train ML classification models which fall into the paradigm of supervised learning (SL).
Optimization tasks are completely digital and are related to systems which can be described with a self-contained mathematical model (Curtis 2015, Lee et al. 2014, Jensen 2021, Wootton 2017). By self-contained model, we mean a task that can be unambiguously and automatically evaluated (scored) in terms of how well a candidate solution solves the problem without any further human input. These are problems that can often - in theory - be solved purely by machine computation, but in practice may become intractable due to high complexity of the solution space. Naturally, these tasks lend themselves well to ML methods related to optimization in complex spaces, such as reinforcement learning (RL).
The degree of digitization allows for a rough categorization of the possible AI-contribution as robotics/smart sensors, SL, or RL respectively. One might naively expect that the pinnacle of human contribution is to solve complex mathematical problems (high degree of digitization tasks). However, in many ways the robotics/ smart sensors capable of assisting or replacing human volunteers in the real world for physical tasks is a much more difficult problem, considering current robotics are comparably less advanced than the state-of-the-art RL technologies (Dalgaard et al. 2020, Vinyals et al. 2019). This clearly demonstrates that the degree of digitization should not be mistaken for an axis of increasing cognitive complexity. We therefore posit that a systematic comparison with modern computational capabilities will lead to increased understanding and appreciation of the multitudes of tasks that human volunteers perform, and how that labor works alongside technology and ML.
The x-axis of Figure 2 plots projects according to the degree of digitization. As illustrated, there may be projects exhibiting a mixture of features from two categories. Another boundary case is CS remote optimization of concrete experiments (Heck et al. 2018), which would clearly be amenable to RL treatment but requires execution of a real-world experiment in order to evaluate the quality of any given user-specified candidate solution and therefore does not have a self-contained model. Finally, we note that the ML technique of Unsupervised Learning is absent here because it is a data analysis technique that can in principle be applied to any data set across the categories. In the following we illustrate the relevance of these task characteristics through several specific CS projects (see appendix A for all considered projects).
Whereas the degree of digitization allows for a rough categorization in terms of the potentially applicable forms of AI, it does not explain why some tasks within each category are easy for most participants and some can only be solved by a minority. To address this question, we propose the term, accessibility to contribution, which we define as: “the likelihood that an average layperson (assuming there are no impediments to participation, e.g., physical, socio-cultural, financial, or technological) would make a scientific contribution to the particular project.” To elaborate on this term, we present 9 examples, a subset of CS projects reviewed, and discuss what it takes for a participant to make a scientific contribution. We propose a spectrum along which projects can be ordered to signify broader or more limited accessibility.
Quantum Moves 2 is a real-time dynamics control game designed to tap into a player's intuition of water sloshing in a glass as they move an atom through a 2-dimensional space over the span of a few seconds. Apart from investigating the value of each individual human input, there is an emphasis on understanding the aggregated collective input to gain understanding of the generic intuition-driven strategies (Jensen et al. 2021). The data analysis in Quantum Moves 2 builds on a bulk analysis of all player data and thus these heuristics are gleaned from all player data. broad accessibility
Foldit is a puzzle-type game designed to visualize proteins in three dimensions, and lets participants spend as long as they need to slowly and (semi-)systematically search through a complex parameter landscape as they attempt to find the best folding pattern for a specific protein (Cooper et al. 2010). Reported results focus on the small subset of participants that arrive at uniquely useful solutions (Eiben et al. 2012, Khatib et al. 2011). medium-limited accessibility
Decodoku is designed for participants to solve sudoku-like quantum computing challenges without a time limit (Wootton 2017). Data are only collected in the form of written reports emailed to the scientists where participants not only have to come up with useful strategies but also be able reflect on and verbalize their strategies. limited accessibility
Stall Catchers is designed to facilitate analysis of data related to Alzheimer’s research. Participants are presented with few-second video clips of blood vessels from the brain of mice affected with Alzheimer’s. Through analyzing the movement of blood cells in a target area determined by the game, they classify images as either flowing or stalled, and mark the precise location of stalls on the images. The puzzles require non-domain specific skills and can be solved with minimal domain knowledge (Nugent 2019). broad accessibility
In Galaxy Zoo participants classify images of galaxies according to a series of questions (Lintott et al. 2008). Some questions are approachable with minimal domain knowledge (e.g., “Does this galaxy have spiral arms?”) while others benefit from experience (e.g., “Is there anything odd?”). Examples and illustrative icons help teach new participants how to participate medium-broad accessibility
Scribes of Cairo Geniza is a transcription project, where participants are presented with images of historic text fragments in Hebrew and Arabic and transcribe it one line at a time using an online program. Participation requires specialized training and/or prior knowledge when dealing with specialized objects due to language requirements (Scribes of the Cairo Geniza n.d.). limited accessibility
Quake-Catcher Network is a real-time motion sensing network of computers for earthquake monitoring. Participants download the software and purchase a USB sensor device, which records seismological waves while the software algorithmically determines waves outside the normal range, and sends them back to the project server. Participation, apart from the initial setup, does not require active action or skills of the participant (Cochran et al. 2009). broad accessibility
iNaturalist is an online social network, where participants can share biodiversity information by recording observations of organisms or their traces (nests, tracks etc.). Users can add identifications to these observations and an automated species identification algorithm is also used on the platform. Participation requires none to extensive domain-specific skills, depending on whether the user wants to also perform identification tasks. Observations can be used to monitor organisms at various locations. (iNaturalist 2021). broad - medium accessibility
UK Butterfly Monitoring Scheme (UKBMS) is a recording protocol used to record data on the butterfly population. Participants walk 1-2 km routes weekly at specific times of the day, in specific weather conditions from spring to fall multiple years in a row. The task is to record measurements on e.g., weather, habitat, and the number of different butterfly species on recording forms which are submitted weekly on the project website. Significant time investment, prior domain knowledge, and detailed environmental surveying skills are required; participation is limited to the United Kingdom (Dennis et al. 2017). limited accessibility
As we see, accessibility to contribution is determined by requirements such as: expertise through training (e.g., animal identification skills), experience (becoming familiar with the task environment and interface, e.g., Foldit), certain cognitive skills (currently understudied in CS). If these factors are properly understood, AI can be used to broaden accessibility by automatically adapting to the diverse needs and skills of participants, facilitating quality of contributions (Anderson-Lee et al. 2016, Walmsley et al., 2020.) and making the task simpler and more enjoyable for participants (Kawrykow et al. 2012). Attempts have been made to optimize the interactions between the volunteers and the scientific tasks of the CS project increasing engagement and optimizing quality of contributions (Sterling 2013). However, without an appropriate underlying framework these specific examples are difficult to generalize, as both CS tasks and the needs of volunteers are diverse.
The accessibility to make a contribution axis highlights a gap in research studying the diverse cognitive skills of participants with respect to the requirements of the scientific task in CS projects. We argue that our categorization allows for joint considerations about cognitive and learning processes of participants as well as possible computational models of AI applicable across a wide range of CS projects. In particular, we demonstrate that across CS projects there exist a class of problems that nearly all participants can contribute to using general human cognitive and motoric abilities (see Fig 1, y axis). The apparent simplicity of these tasks from the human perspective stands in dramatic contrast to the challenge of replicating them with AI technologies, which is one of the grand challenges of the AI field (Marcus 2018, 2020). At the other end of this spectrum lie projects where only a small fraction of participants are able to contribute. Here, understanding of how task learning can be combined with systematic exploration and intuitive leaps remains another grand challenge of AI. Interestingly, most limited accessibility tasks also draw heavily on many of the 21st century skills such as creativity and complex problem solving, which still elude a firm theoretical understanding in the field of psychology and education. Nevertheless, a further analysis of the particular cognitive processes going on in CS projects will be crucial for designing future automated support systems to enhance the contribution of the participants.
In general, the fields of AI and CS would benefit greatly from research unpacking the link between the accessibility of concrete CS tasks and the particular cognitive processes required to complete them. Such an analysis is well beyond the scope of this work, however, we do note that many high accessibility tasks are characterized by intuitive processing and the application of common sense (information processing or physical actions that most participants can do instinctively). Finally, the meta-cognitive aspects in the Decodoku example illustrates that it would be interesting to relate the accessibility level to the emerging concept of co-created CS (Bonney et al. 2009b), in which participants are involved not just in the data gathering phase of the scientific process but also e.g. hypothesis generation, design, and analysis. The algorithmic support of the scientific processes beyond data acquisition could tap into cutting edge AI trends such as unsupervised ML, hierarchical modelling (Menon et al 2017) and generative design (McKnight 2017, Oh et al. 2019).
Figure 2: Mapping of Citizen Science projects. The x-axis shows an increasing degree of digitization moving from physical tasks (potentially supported by robotics and smart sensors), through annotation tasks (potentially supported by supervised learning methods) to purely mathematical, optimization tasks (potentially supported by reinforcement learning methods). The y-axis represents accessibility to scientific contribution with highly accessible projects at the bottom and projects with extensive requirements on special cognitive traits, expert knowledge, or training at the top.
To summarize, for the CS community, this categorization provides an overview of how different CS projects do or could deploy AI technology as well as an opportunity to reflect on the tasks performed by their participants. For AI researchers, it may allow them to identify methodically suitable collaborations with CS.
Now that we have characterized the types of CS tasks, we turn to identifying and categorizing the ways AI can augment the process of solving these CS tasks. (Ceccaroni et al. 2019) identified three “broad and overlapping” categories for the use of AI in CS. These are: “... assisting or replacing humans in completing tasks… enabling tasks traditionally done by people to be partly or completely automated. … influencing human behaviour… e.g., through personalisation and behavioural segmentation, or providing people a means to be comfortable with citizen science and get involved. ... [and 3] having improved insights as a result of using AI to enhance data analysis.” This categorisation can be useful in distinguishing between the involved groups of humans: the CS participants and the scientists, referring to the first and third category respectively. If, however, we consider “generating improved insights” as part of the CS task (e.g., users perform their part of the task and their input is later aggregated by the AI to yield the desired solution (insight)), we can join the first and third category into one, simply called: Assistance in solving the CS task. Our criteria for this category is that the AI has to be directly involved in the solution of the CS task. Doing so, the AI may either provide a problem-related input to the CS participant during the task, or process the participant’s input to achieve or improve the solution.
In other words, one could say that assistance in solving the CS task occurs when the AI is applied inside of the CS task. Naturally, one would ask what is the contrasting case of AI being applied outside of the CS task? Can it be assimilated with the “influencing human behavior” AI application category mentioned above? We believe the answer is yes, but in order to make the category more inclusive and to take the CS task as a reference point, we propose to rename it to: CS content or task selection. This is to be understood in a broader sense as “intelligent content selection” for the CS participants with the purpose to either incentify a desired behavior (e.g., increase engagement and retention through motivational messages) or to harness the human expertise more efficiently (e.g., selecting tasks where the human input is most valued). In the discussion below, we take the participant-centered view, defining the task as a single instance of a problem.
Given the above definitions, we can already examine the relation of these categories to HI. Revisiting the HI criteria (collectiveness, solution superiority and mutual learning), we see that the category CS content or task selection does not meet the criteria of Collectiveness, since the AI is not working on the same task as the participants. In contrast, the Assistance in solving the CS task satisfies the HI Collectiveness criteria per definition. Reversing the argument, the HI criteria of Collectiveness can be used to divide the AI applications in CS into the two above-stated categories. Our HI framework is in no way intended to judge the value of AI applications or diminish the value of intelligent tasks or content selection in CS. Rather, it is intended as a design guide and a mental model in scenarios where the HI scheme can provide solution efficiency and boost participant satisfaction. Now we proceed to the explicit examples from the two AI application categories.
In the CS task assistance category, the use of Supervised ML with Deep Neural Networks (DNNs) for image classification is prominent. Here participants are tasked with labeling observations which can then be used for training better AI models, automating the task or parts of it. For example, Galaxy Zoo recently added a Bayesian DNN able to learn from volunteers to classify images of galaxies (Walmsley et. al 2020). Similarly, in (iNaturalist 2021) a DNN model trained on scientific grade data can provide good suggestions of species names or broader taxons such as genera or families for pictures submitted by participants. The participants are also tasked with classification of the images, therefore the human and AI agents are solving the CS task collectively.
To our knowledge RL has not been applied to CS task assistance. However, it holds great potential and initial steps have been taken by integrating simpler, non-learning algorithms directly into a number of CS optimization tasks (Jensen et al. 2020, Koepnick et al. 2019;). Even though these algorithms do not learn, they can greatly enhance performance of the human-machine system, achieving solutions superior to that of each agent alone. For example in Foldit, machine agents can assist the participant in the solution of the task in two different ways: first, systematically optimizing a participant’s solution with “small tweaks”, using an optimization algorithm initiated by the participant (Cooper et al. 2010, pp. 756). Secondly, the Foldit software assists the participant by continuously calculating a variety of assessment criteria for the protein that the participant is currently folding, and thus through continuous feedback providing the participant with additional information (Kleffner et al. 2017, pp. 2765–2766). Similarly, in Quantum Moves 2, players can engage a local optimizer starting from their “hand drawn” solution. It has been demonstrated that in some complex problems, the combination of such human seeding with machine optimisation can provide superior results to that of the individual machine agents (Jensen et al. 2021).
AI applications in the CS content and task selection category do not engage in solving the CS task directly, instead they aim at increasing the participants’ productivity and/or motivation, which could also be called participant management. This can be done with or without personalizing the selection for individual participants. The Galaxy Zoo Enhanced mode (Walmsley et al. 2020) (discussed in the following section) provides players only with tasks which are hard for the AI to solve alone. (See below for a precise definition of the CS task.) In contrast, in personalized task selection, an AI agent takes into account the behavior of the individual participants and directs them toward tasks better suited for their interests/abilities. CS task selection can also include displaying motivational messages, as these may indirectly change the perception of the current task or channel the users to an alternative task or content. For example, (Segal et al. 2018) present their approach, Trajectory Corrected Intervention, in which they used ML to select custom motivational messages for individual users. By doing so, they improve the retention of participants in Galaxy Zoo (Segal et al., in press). Spatharioti et al. (2019) also target participant engagement in the Cartoscope project by making the task and interface more interesting to the participant using an algorithm that adapts task difficulty to the player behavior. Crowston et al. (2020) implemented a Bayesian volunteer learning model to optimize training of newcomers by directing the flow of image classification tasks so that they learn more quickly while meaningfully contributing to the project at the same time. As a last example, Xue et al. (2016) applied an agent behavior model to reduce bias by optimizing incentives to motivate participants to collect data from understudied locations.
With these ever-improving AI technologies comes an abundance of opportunities for CS projects to distribute tasks more efficiently among humans and machines, to support the participants in solving the tasks, and to enhance the humans’ experience and consequently enhance their contribution. To optimally take advantage of these opportunities, we need to understand how information is processed differently in relation to solving CS tasks. The next section describes a way to visually represent this information processing in order to underpin the understanding needed.
The two previously described types of AI applications in CS, task assistance or selection, differ in the way information is processed. This can be graphically distinguished using information flow diagrams (Wintraecken 2012) (fig. 3).
Figure 3. Information flow diagrams for the two types of AI applications in Citizen Science (CS) projects: 1. Assistance in solving the CS task; 2. Content or task selection. Arrows denote flow of information (labeled with lowercase letters; referred to in the text); containers denote information sources/receivers, which can also be referred to as agents (machine or human). Agents containing natural or artificial intelligence are marked by grey color fill. Optional components are denoted by a dashed outline. See the following table for explanation of the terms and significance of letters.
The diagrams in Fig. 3 show how information is passed between the different agents when solving the CS task. The CS task is an abstract object representing both the scientific challenge and the scientists (e.g., the tasks might be dynamically updated). Note that the notion of a task may contain some ambiguity. For instance, in annotation tasks one can either consider "annotation of a single element" or of "the set of elements" as the fundamental "task". Of course, for the researchers the task is the latter, but for the participant it is arguably the former. Thus task selection in the former frame would be called task assistance in the latter frame. Here we define tasks from the perspective of the participants. The task can also represent the physical environment which the participants may be in direct contact with (arrow h), such as when the participants take pictures of nature for the iNaturalist project (part of the scientific challenge). The arrow h could also represent a previously acquired experience on the matter, for example a general knowledge of shapes required for contributing to the Galaxy Zoo project. The presence and character of the arrow h distinguishes between the different degrees of task digitization in the individual projects, as discussed previously (see Figure 2). In projects with a high degree of digitization, the information flow through this channel is reduced or even absent, as the CS task can be fully represented by the Online solution algorithm (defined below), simply put, the participants do not need “contact with the physical world” to solve the task.
In general, the participants interact with the CS platform individually (in a private session) via means of an Online solution algorithm (OnSA), which is the part of the online CS platform interface where they submit their inputs (arrow a) and receive a representation of the task and a feedback on their actions (arrow b). Here, algorithm refers to an automated processing of inputs and outputs that may or may not involve AI. This also includes all online (real time) algorithms which may optionally contain AI that help the participants to solve the CS task. For instance, above, we discussed the user-support algorithms in the FoldIt and Quantum Moves 2 games, which both do not contain AI at the current game implementation. The OnSA receives the task either directly (arrow c; left diagram: CS task assistance) or via a Task selection algorithm (arrow j; right diagram: CS task selection) from the CS task.
In order to evaluate the HI criteria: Mutual learning (see table 1), we need to identify precisely when and where the two stages of Machine Learning (ML) (training and execution; see table 2) take place relative to the participants’ input. This is crucial, because it imposes limits on how adaptive the AI can be (e.g., “can it learn during a single session with a CS participant?” or “how does the AI learn from the aggregated knowledge transferred between different participants’ sessions?”) and limits on how reliable the AI predictions/actions are (too much adaptation/learning may lead to overfitting and bias effects). For this purpose we use two terms: Online and Offline, defined with respect to the participant’s interaction with the algorithmic interface. Here the online training, for example, refers to the AI model being updated in real time with data from the individual participants while they engage with the interface (part of OnSA). An Offline AI-model, on the other hand, could be trained on some fixed dataset (acquired via the CS platform or otherwise) resulting in a model that can later interact with participants when executed as a tool in the OnSA or be itself the research outcome (a solution of the CS task). When an offline algorithm is Assisting in solving the CS task, we call it an Offline solution algorithm (OffSA).
An arrow pointing towards the CS task denotes that a solution was found. The solution might either be directly provided by the OnSA (arrow d) or obtained from an OffSA (arrow e), where data from multiple participants are loaded from the OnSA (arrow f) and aggregated by the OffSA (e.g., the AI model was trained). Alternatively, the AI in the OffSA might be trained on External data (arrow i; i.e. sources outside of the CS project) and the resulting model might be passed to the OnSA (arrow g), where it is executed to assist the participants in solving the task. This is done, for example, in the EyeWire project, where an affinity graph labelling ML algorithm is trained on a scientific grade labeled dataset.
When an AI is used for Task selection (Fig. 3; right panel), it is not part of the OnSA. Rather, it may observe the behavior of the individual participants while they are solving the task (arrow k). Such AIs use a pre-trained model executed on the real time behavioral data predicting the next suitable content/task (arrow j) in order to improve the participants’ experience, work efficiency or retention. Alternatively, task selection with AI can be performed offline, forming a common task queue for all participants (disregarding individuality; absent arrow k). This approach is adapted, for example, in Galaxy Zoo “Enhanced mode”, where several independently pre-trained ML models are executed on the images (tasks), passing the tasks for human classification only in the case of the models’ disagreement (low certainty of solution). The task selection algorithm was depicted at the same level as the CS task in Fig. 3 as it can be thought of as an extension of the CS task itself (on the level with the scientists), since it controls the content exposed to the participants and is therefore one of the dynamical components of the scientific challenge.
To proceed with our analysis towards HI schemes, we focus on the AI application category Assisting in solving the CS task which satisfies the Collectiveness criteria of HI (see table 1) and try to devise a sub-categorisation within it using the HI criterion: Mutual learning. Mutual learning requires information being passed bidirectionally between the AI and the human agents (they learn from each other). We therefore need to examine whether the learning and execution stages of the AI are both “in the solution loop” with the CS participants. The answer to that depends crucially on the presence and direction of the information flow between the mandatory OnSA and the optional OffSA, denoted with the two arrows, f and g, in Figure 3.
Four elementary cases arise from having one, both, or none of the information channels present, as shown in Figure 4. We propose to order these schemes according to how complex they are to implement which goes hand in hand with the degree of mutual learning leading up to HI in Tiers 3 and 4. In other words, these can be described as: 1. AI learning from humans, 2. Humans learning from AI, 3. Mutual learning on a long timescale, and 4. Mutual learning on a short timescale. Here, “humans” are the CS participants only. Our aim here is to discuss the elementary schemes that can exist, and in praxis are being combined to form more advanced systems.
Figure 4. Elementary schemes of task assistance by AI, i.e. concrete examples of information flow diagrams based on the generic diagram from Figure 3; left panel. The schemes are organized into tiers leading towards Hybrid Intelligence. They differ primarily by the amount and type of connections between the Online and Offline solution algorithms (arrows f and g). Arrow labels are consistent with Figure 3. For simplicity, the Citizen Science task object was omitted in these diagrams, though the c, d, e, and h arrows from Figure 3 left would still be relevant here.
Tier 1: Post-acquisition AI analysis, collective level
The most common AI task assistance scheme in CS is to train an AI model offline on participant input acquired via an OnSA without AI. This scheme contains only one connection (arrow f) and is usually done with data from multiple participants acquired over a period of time much longer than the single-task solution time, typically weeks or months, hence the name “collective level”, which we visualise with a dash-dotted outline of a crowd. The model itself or its execution on unseen data then forms part of the CS task solution. This scheme is very convenient from the ML perspective, because the quantity and quality of participant data is often not known in advance. AI model architecture can be chosen accordingly (post-factum) in order to prevent overfitting (attributing meaning to noise in the data). This scheme is applied for example in Stall Catchers (Stall Catchers 2020), Galaxy Zoo (Lintott et al. 2008), EyeWire (Kim et al. 2014), Quantum Moves 2 (Jensen et al. 2021), Eterna (Andreasson et al. 2019), Phylo (Kawrykow et al. 2012), and Fraxinus (Rallapalli et al. 2015).
Tier 2: Pre-trained AI tool, individual level
In this tier, a fixed, pre-trained AI model is executed as a component of the OnSA, assisting the participants with the task. In other words, participants can learn from the AI, but the AI cannot learn from the participants. The model was previously trained in an OffSA using obtained outside of the CS platform and passed into the OnSA (arrow g). This scheme can increase efficiency and quality of the task solutions, combining strengths of humans and machines to provide superior outcomes (HI criteria of Collectiveness and Superiority). Among the investigated projects, this scheme is applied in EyeWire (Kim et al. 2014) and iNaturalist (iNaturalist 2021). It is worth mentioning in Foldit (Cooper et al. 2010) and Quantum Moves (Jensen et al. 2021) participants can engage with algorithmic optimizers (not AI) leading to a similar increase in participants’ learning and productivity.
Tier 3: Hybrid Intelligence at collective level
In this tier, the CS system satisfies all three criteria of HI, including mutual learning of the heterogeneous agents as well as learning of the system as a whole. Similarly to Tier 2, the OnSA contains a fixed AI model which assists the participants in solving the task; however, the AI model is trained on the participant data acquired earlier on the very same platform (arrow f). The information flow through the OffSA therefore forms a closed loop as the re-trained/updated AI model is fed back to the OnSA (arrow g). While the AI does not learn in real time while the individual participants interact with the OnSA it can learn over weeks or months at the collective level, as the model is periodically updated with batches of new data from multiple participants. On an abstract level, the crowd plays the role of the “human agent”, since the AI is learning from the crowd and the crowd experiences that the AI improves over time. Individuals from the crowd might experience learning of the AI as well if they stay engaged on the CS platform for multiple training cycles. To our knowledge, the only project adopting HI of this type is iNaturalist (iNaturalist, 2021), where participants can use an AI model to classify their images and the model is periodically updated with human-labeled data from the platform. Such an implementation is also underway in the Stall Catchers project and is described in the outlook.
Tier 4: Hybrid Intelligence at individual level
Tier 4 also satisfies all three components of HI. However, the OnSA contains an AI model that is trained and executed while individual participants interact with it, allowing for mutual learning of the human and AI agents in real time. The basic form of this scheme does not contain an OffSA, implying the absence of arrows f and g. To our knowledge, no CS project uses this type of scheme. Although, judging by its information flow diagram, it appears particularly simple, the presence of both training and execution in real time interaction with participants raises many practical and technical issues. In order for the participants to benefit beyond what is gained in Tier 3, they must be able to experience the effect of their actions on the AI, otherwise it could be trained offline. This requires a very high learning rate for the AI, raising the risk of overfitting and ceasing to be useful on the task (i.e. the AI agreeing with the individual on all actions).
Due to large numbers of participants in CS, the effect of an individual on the final result, e.g. an AI model, tends to be rather small (often intentionally so). Nevertheless, a quickly learning AI might still be useful in some tasks, for example when it is crucial to extract the strategy of an individual. One could think of it as a supervised automation of the task solution. For example in Decodoku, the participants are asked to describe their strategies verbally. With the appropriate AI assistance, the meta cognitive reflection could potentially be enhanced. In general, projects with high digitization containing a closed model of the task are well suited for Tier 4. Hybrid Intelligence at individual level. As seen in Figure 2, high digitization tasks addressed here are in principle solvable with RL (an optimization problem). In one such scenario, an AI might propose the next actions to take, while humans either accept the proposal, modify it or choose a different action altogether. The AI would therefore be learning strategies from an individual with an immediate feedback loop, or in the ML terminology, the participants would provide an adaptable reward function for an RL algorithm. This could provide better outcomes than just training a model on many “games” from a certain player, the AI can also influence the player by for example increasing consistency of the players actions or on the contrary, sparking new ideas. In addition, having the AI train online with humans in the loop could reduce the issue of perverse instantiation which gives rise to bizarre, unwanted solutions in complex problems with an a priori defined reward function (Bostrom 2014) .
Although these Tiers of AI task assistance tend to be progressively more challenging to implement, it is not granted that implementing a higher tier is always worth the extra effort for research outcomes or is feasible and beneficial in every CS scenario. The chosen AI assistance scheme should be carefully considered, taking into account the character and amount of data as well as the AI methodology applicable to the task. For example in Supervised Learning tasks such as in iNaturalist (iNaturalist 2021), Galaxy Zoo (Lintott et al. 2008), Stall Catchers (Nugent 2019) and EyeWire (Kim et al. 2014), strong individual participant effects on the model are not desirable, as this can introduce large decision biases. Similarly, presence of certain types of AI agents in the OnSA may have a detrimental effect on the outcomes, e.g. biasing participants to solve the task in a specific way rather than contributing creatively to the project. Implementing higher tiers of AI task assistance therefore always bears extra overhead on design considerations, AI architecture choice, and ultimately maintenance of the system.
Herein we have described a value proposition for HI in CS. In acknowledging uncertainties about HI architectures and potential outcomes, we introduce a rich space of research opportunities that could help realize the tremendous potential of adaptive and synergistic human+AI relationships.
A key defining quality of HI is the mutual learning that exists among the AI and human components of the system. Such integration not only allows for, but necessitates the co-evolution of individual components (AI and human alike) with each other and the systems to which they contribute. Thus, we are entering into uncharted AI territory, where our best hope for advancing the field may require bootstrapping. In other words, the potential complexity of these systems suggests an opportunity to use HI itself to improve our understanding of HI. Asking why a given CS project is not solved entirely algorithmically can be a path to identifying new modes of human-machine problem solving by discovering suitable existing machine technologies and as well as to deeper appreciation of the distinctly human contribution in areas where current machine technology falls short.
Areas of great potential for human augmentation are applications of RL to optimization challenges, SL to classification tasks, and smart sensors to participatory sensing tasks. On the other hand, tasks in which full automation is beyond current technological reach typically tap into common sense, hierarchical thinking or meta-cognitive reflection and full human-level mobility combined with environmental sensing and domain knowledge.
How then, as a community of scientists, data science practitioners, and domain researchers are we to pursue these questions concertedly and efficiently? How, for example, can we study various information flow architectures such as those depicted in Figure 4 without having to manually design and code each one and then recruit human participants? A potential means of rapid experimentation could be through an online research platform to enable rapid development and testing of various candidate HI architectures for any CS project. Currently the developers of Stall Catchers (Bracko, et al. 2020, Falkenhain et al. 2020) are developing a platform to become the basis for a modular project builder that integrates with new human/AI research capabilities. This resulted in an experimentation dashboard that makes it possible to clone an existing CS project into a “sandbox” version, design a human/AI study, select data, invite participants, and run experiments from a single interface and without writing a single line of code. This research platform is part of a broader initiative called “Civium” (Borfitz 2019, Michelucci 2019, Vepřek et al. 2020), which seeks to make all advanced information processing systems, including hybrid intelligence research and applications, more transparent, trustworthy, and sustainable.
Original paper published at arxiv