Expertise Requirements for Hybrid Intellignce in Decision Augmentation

August 31, 2021
Hybrid Intelligence

Introduction

Firms increasingly engage in open innovation efforts to leverage the creative potential of a huge and diverse crowd of contributors (Leimeister et al. 2009). Therefore, one popular approach is to solve innovative problems by starting an open call to a crowd with heterogeneous knowledge and diverse experience via a web-based innovation platform (e.g. Bright Idea, Salesforce, and Ideascale). Individual members of the crowd then contribute creative opportunities to solve such problems and the firm rewards the best contribution in a contest approach (Lakhani and Jeppesen 2007). This novel way to solicit opportunities from online communities is a powerful mechanism to utilize open innovation.

However, the creative potential that arises from the innovative contributions of the crowd constitutes some critical challenges. The quantity of contributions and the demands on expertise to identify valuable opportunities is high and remains challenging for firms that apply crowdsourcing. Famous examples illustrate these novel phenomena. For instance, during the IBM “Innovation Jam” in 2006 more than 150,000 users from 104 countries generated 46,000 product opportunities for the company. Moreover, Google launched a crowd-innovation challenge in 2008 to ask the crowd opportunities that have the potential to change the world in their “Project 10^100”. After receiving over 150,000 submissions, thousands of Google employees reviewed the ideas to pick a winner, which took nearly two years and tens of thousands of dollars (Bayus 2013). As previous research suggests only about 10–30% of the entrepreneurial opportunities from crowdsourcing engagements are considered valuable. Furthermore, screening this vast amount of contributions to identify the most promising opportunities is one of the toughest challenges of crowdsourcing to date (Blohm et al. 2016). 

To solve these problems, different streams of research emerged that attempt to filter entrepreneurial opportunities (Klein and Garcia 2015). First, expert evaluations, which use executives within the firm to screen opportunities, were identified as costly and time consuming (Blohm et al. 2016). Second, research on algorithmic approaches proved to be a valuable way by identifying metrics to distinguish between high- and low-quality opportunities (Walter and Back 2013; Westerski et al. 2013; Rhyn and Blohm 2017). However, such filtering approaches always risk missing promising opportunities by identifying “false negatives” (classifying good opportunities as bad ones) and are rather capable to cull low quality opportunities than identifying valuable ones, which is a task that demands human decision makers. In response to this, the third approach to screen opportunities is crowd-based evaluation (Klein and Garcia 2015; Blohm et al. 2013; Riedl et al. 2013). Organizations have turned to the crowd to not just for generating opportunities but also to evaluate them to filter high quality contributions. This way has in fact shown to be of same accuracy such as expert ratings if the members of the crowd have suitable domain knowledge (Magnusson et al. 2016). However, this approach frequently fails in practice, when facing huge amounts of opportunities. Crowd-based filtering approaches tend to perform poorly as they make unrealistic demands on the crowd regarding their expertise, time, and cognitive effort (Klein and Garcia 2015). 

By combining algorithmic ML approaches with human evaluation to adaptively assign crowd members that have the required domain knowledge to entrepreneurial opportunities, I propose a semi-automatic approach that leverages the benefits of both approaches and overcomes limitations of previous research. I thus propose that a hybrid approach is superior to sole crowd-based and computational evaluation for two reasons: First, various research suggests that computational models (or machines) are better at tasks such as information processing and provide valid results (Nagar et al. 2016), while human decision makers are cognitively constrained or biased (Kahneman and Tversky 2013). Additionally, previous research shows the importance of human decision makers in the context of innovation (Kornish and Ulrich 2014). In this highly uncertain and creative context, decision makers can rely on their intuition or gut feeling (Huang 2016). 

Following a design science approach, I identified awareness of real-world problems in the context of filtering crowdsourcing contributions and derived DPs for such systems, which I evaluated with experts on crowdsourcing and requirement engineering. 

I, therefore, intend to extend previous research on idea filtering in crowdsourcing engagements through combining algorithmic and crowd-based evaluation. This research therefore will contribute to both descriptive and prescriptive knowledge, which may guide the development of similar solutions in the future. 

Entrepreneurial Contributions of the Crowd

In general, crowdsourcing denotes a mechanism that allows individuals or companies, who face a problem to openly call upon a mass of people over the web to provide potentially valuable solutions. One instantiation of crowdsourcing that is particularly interesting from both a practical and a research perspective are idea contests (Blohm et al. 2016). Idea contests are usually conducted via platforms that allow companies to collect opportunities from outside the organization. The output (i.e. the opportunities) of such contests are usually artefact opportunities that can take on different forms such as plain text, plans, designs and predictions from both experts and lay crowds (Riedl et al. 2013). The basic idea behind idea contests is thereby for companies to expand the solution space to a problem and thereby increasing the probability to obtain creative solutions to said problem (Klein and Garcia 2015). The effectiveness of idea contests is also underpinned by research showing that only under certain conditions users are willing, as well as capable to come up with innovative opportunities (Magnusson et al. 2016). Thus, by providing various incentives such as monetary rewards, firms increase the number of contributions and the probability to receive a creative submission. In simple terms attracting larger crowds leads to a more diverse set of solutions (Afuah and Tucci 2012).

Previous Approaches to Identify Valuable Opportunities

Such idea contests lead to a high number of opportunities that cannot be efficiently processed by current approaches. Thus, successful idea contests often lead to a flood of contributions that must be screened and evaluated before they can be moved to the next stage and further developed (Blohm et al. 2016). To identify valuable contributions that are worth implementing, one important task is filtering the textual contributions in such idea contests. Existing filtering approaches to separate valuable from bad contributions in crowdsourcing apply two content-based filtering approaches to evaluate the creative potential of opportunities: computational, algorithmic evaluation approaches and crowd-based evaluation approaches.

Computational Evaluation Approaches

One current approach to evaluate textual contributions in the context of crowdsourcing is computational evaluation, wherein algorithms are used to filter opportunities based on metrics for idea quality such as word frequency statistics (Nagar et al. 2016). Within the approaches for computational evaluation, two dominant approaches are emerging to support the decision-making of the jury, which reviews the opportunities to identify the most valuable ones. 

First, clustering procedures examine how the vast amount of textual data from crowdsourcing contributions can be organized based on topics (Walter and Back 2013) or domain-independent taxonomy for idea annotation (Westerski et al. 2013). Second, ML approaches can be used to filter opportunities based on rules that determine the value of the content (Rhyn and Blohm 2017). This approach is particularly useful if training data sets are available. Previous research in this context uses variables for contextual (e.g. length, specificity, completeness, writing style) or representational (e.g. readability, spelling mistakes) characteristics as well as crowd activity (e.g. likes, page views, comments), and behavior of the contributor of the idea (e.g. date of submission, number of updates) to determine the value of crowdsourcing contributions.

Crowd-based Evaluation Approaches

The second approach to evaluate crowdsourcing contributions is applying crowd-based evaluation approaches. In this context, members of the crowd evaluate contributions individually and the results are aggregated (Klein and Garcia 2015). Such users might include other users of the contest, or even paid crowds on crowd work platforms (John 2016) that are asked to evaluate opportunities from the crowdsourcing engagement.  

Previous research on crowd-based evaluation examined the applicability of one or multiple criteria in voting mechanism (where users vote for valuable opportunities), ranking approaches (where members of the crowd rank submissions), and rating mechanisms (where the crowd score opportunities) (Salganik and Levy 2015; Soukhoroukova et al. 2012; Bao et al. 2011). Moreover, prediction markets can be used where users trade opportunities by buying and selling stocks to identify the most valuable idea by aggregating these trades as a stock price (Blohm et al. 2016). Depending on the context of evaluation settings, these approaches proved to be equally accurate compared to the evaluation of experts (Magnusson et al. 2016). 

Methodology

For resolving the above-mentioned limitations, I conduct a DSR project (Gregor and Hevner 2013) to design a new and innovative artefact that helps to solve a real-world problem. To combine both relevance and rigor I use inputs from the practical problem domain and the existing body of knowledge (rigor) for my research project (Gregor and Jones 2007). Abstract theoretical knowledge thus has a dual role. First, it guides the suggestions for a potential solution. Second, the abstract learnings from my design serve as prescriptive knowledge to develop other artefacts that address similar problems in the future (Hevner 2007).

So far, I analysed the body of knowledge on collective intelligence, idea contests, and crowd-based evaluation as well as computational filtering approaches and identified five theory-driven problems of current idea filtering approaches that adversely affects evaluation accuracy. These problems represent the starting point for my solution design.  Based on deductive reasoning, I derived five DPs for a potential solution that I evaluated in an ex-ante criteria-based evaluation with experts in the field of community- and service -engineering. In the next steps, I will develop a prototype version of the novel filtering technique and implement it within the context of an idea contest. By conducting an A/B-test to compare the accuracy of my filtering approach against current filtering approaches, I intend to evaluate my proposed design. This also constitutes my summative design evaluation. I will, therefore, use a consensual assessment of experts as baseline (Blohm et al. 2016). Finally, the abstract learning from my design will provide prescriptive knowledge in the form of principles of form and function for building similar artefacts in the future.

Awareness of Limitations of Computational and Crowd Approaches

One solution that is currently employed in idea contests is shortlisting. Shortlisting can be considered as an algorithmic solution with the aim to shortlist the best opportunities. In doing so shortlisting algorithms often face a trade-off between specificity and sensitivity. Thus, if such algorithms are not balanced out (i.e. they are too specific, or they are too sensitive) this may lead to opportunities being shortlisted that are not innovative (i.e. the algorithm might include false positives) or to promising opportunities not being shortlisted (i.e. the algorithm might favour false negatives). In both cases this might lead to unfavourable results such as opportunities that are labelled as innovative when in fact they are not truly innovative opportunities (Problem 1). 

One limitation of previous crowd-based evaluation approaches is the cognitive load associated with the volume and variety of idea contributions in crowdsourcing (Blohm et al. 2016; Nagar et al. 2016). As cognitive load increases, users in the crowd may become frustrated make low quality decisions or simply deny evaluating opportunities (Nagar et al. 2016) . Such load may arise due to the complexity of the evaluation mechanism itself (e.g. prediction markets) and the increasing time and cognitive complexity demands for the raters. Moreover, the information overload in which cognitive processing capacity is exceeded by the volume and diversity of the crowdsourcing contributions makes it difficult for the crowd to evaluate each idea especially when the proposals are complex, such as in the context of innovation problems. Thus, users need to judge manifold, diverse, maybe even paradox opportunities with a high degree of novelty. This cognitive load renders previous approaches of crowd-based evaluation problematic for use in practice, where the number of contributions is large (Problem 2). 

Furthermore, contributions will vary in their textual representation such as writing style, schema, or language which accelerates the cognitive demands on the crowd. Consequently, in practice only a small number of contributions are evaluated. These contributions and their (positive) evaluations then create an anchoring effect (Kahneman 2011) and will socially influence other decision makers in the crowd (Deutsch and Gerard 1955). Generally, the ones that are presented on the top of the page and have been positively evaluated by peers a priori, which creates (potentially negative) information cascades (Klein and Garcia 2015) (Problem 3). 

Another major problem in crowd-based evaluation methods so far is that not all users in an idea contest are necessarily capable to evaluate opportunities. Therefore, the crowd-based evaluation results might not be a proxy for expert ratings, if users do not have the required expertise for being a “judge” (Magnusson et al. 2016; Ozer 2009). This is particularly problematic when crowdsourcing contributions are complex and diverse. Although previous research highlighted the requirements on the crowd for evaluating opportunities, the bottleneck of domain expertise is almost neglected in both theory and practice. To be appropriate for identifying valuable opportunities and improving decision quality and predictions in idea filtering, a user should also be an expert in the field (Keuschnig and Ganser 2016). Therefore, the crowd should combine both problem knowledge as well as solution knowledge (Hippel 1994) , which are crucial in the evaluation of innovation. While knowledge about the problem domain might be assumed for users that contribute an idea to a specific problem call, the variety of submitted solutions might be enormous as each diverse solver within the crowd deeply know different parts of the potential solution landscape (Faraj et al. 2016). Therefore, not every user in the crowd is equally appropriate to evaluate a certain idea due to limited domain knowledge of each part of the solution space submitted, which represents a major weakness of previous approaches in crowd-based evaluation (Problem 4).

Suggestion and Development of DPs for a Hybrid Filtering Approach

To overcome the limitations of previous approaches and to define objectives for a potential solution, I combine algorithmic approaches from ML with crowd-based evaluation approaches rather than treat them as substitutes. This approach enables my solution to support the human judge by using ML algorithms that identify the expertise of a crowd user, the expertise requirements for evaluating a specific crowdsourcing contribution, and match both to gather more reliable results in identifying valuable contributions. My proposed DPs mainly focus on improving the idea evaluation phase in innovation contests (see Figure 28).

First, the expertise requirements for each textual contribution needs to be identified to match it with suitable members of the crowd (Ozer 2009). Therefore, the hybrid filtering approach should extract topical features (i.e. latent semantics) to identify the knowledge requirements for potential judges. Thus, I propose:

DP1: Filtering crowdsourcing contributions should be supported by approaches that extract solution knowledge requirements from textual idea contributions within an idea contest by identifying relevant themes.

In the next step, the hybrid filtering approach needs to consider the expertise of a crowd participant (Kulkarni et al. 2014). One source of such expertise description is the user profile, which includes the self-selected proficiency of a participant. Thus, I propose:

DP2: Filtering crowdsourcing contributions should be supported by approaches that screen user profiles to extract expertise.

Apart from the expertise description in the users´ profile (i.e. static), crowd participants gain ability through their activity (i.e. dynamic) in idea contests over time. Users constantly learn through their own contributions (Yuan et al. 2016a). This needs to be additionally considered for the hybrid filtering approach. Moreover, this offers the possibility to ensure that users have really expertise in a topic as they proved it by making contributions. In contrast, expertise descriptions in user profiles might be biased due to overconfidence. Thus, I propose:

DP3: Filtering crowdsourcing contributions should be supported by approaches that extract solution expertise from users´ prior textual idea contributions across idea contests by identifying relevant themes.

Idea contest are highly dynamic (Blohm et al. 2013). To match crowd participants with suitable opportunities for evaluation, the expertise profiles of each user need to be dynamic (Yuan et al. 2016a). This means it should constantly update the expertise of a user through dynamically updating the abstract user profile based on the input and contributions of a crowd participant. Contributions include both past idea proposals, as well an idea quality indicator (i.e. the corresponding idea rating) Thus, I propose:

DP4: Filtering crowdsourcing contributions should be supported by approaches that create adaptive user profiles containing expertise extracted from the user profile and prior contributions.

As the evaluation quality of the crowd is highly dependent on the ability of each individual member of the crowd (Mannes et al. 2014), in the last step the hybrid filtering approach needs to match crowdsourcing contributions with suitable users. Previous work on such select crowd strategies in the field of psychology suggests that approximately five to ten humans are required to benefit from the aggregated results of evaluation (Keuschnigg and Ganser 2016). This sample size is most suitable for leveraging the error reduction of individual biases as well as the aggregation of diverse knowledge. Thus, I propose:

DP5: Filtering crowdsourcing contributions should be supported by approaches that match solutions with users that have the required expertise and assign textual contributions to this user for evaluation.

Figure 28 illustrates how the DPs relate to each other. Same topics are represented by the same colour codes. The solution is designed in a way that it allows to match topics (i.e. expertise) that are extracted from a static user profile and a dynamic user profile (i.e. past idea proposals). The adaptive profile thus includes both the self-reported topics of their expertise, as well as expertise that individuals acquired in past idea proposals. These extracted topics are then matched with topics of the current idea proposals. 

Conclusion

This research introduces a novel filtering approach that combines the strengths of both machines and humans in evaluating creative opportunities by using ML approaches to assign the right user with the required solution knowledge to a corresponding idea. To this end, I propose tentative DPs that I validated in the field with experts on crowdsourcing and system engineering. To the best of my knowledge, this is the first study that takes this topic into account. My research offers a novel and innovative solution for a real-world problem and contribute to the body of knowledge on idea filtering for open innovation systems by considering the required expertise of crowd evaluations (Bayus 2013; Klein and Garcia 2015; Keuschnigg and Ganser 2016). I, therefore, intend to extend previous research on idea filtering in crowdsourcing engagements through combining algorithmic and crowd-based evaluation.

See references

Original paper published at WI2018