LLM Mechanism Discovery Pipeline

Borchhardt & Carlson (2026)

A computational approach to systematically surface candidate mechanisms from unstructured business descriptions — formalizing the typically ad-hoc, abductive process of mechanism identification.

Data

~9,000 microenterprises across 26 countries

Stage 1

Generation

➔

Stage 2

Consolidation

➔

Stage 3

Cross-Validation

➔

Stage 4

Final Consolidation

➔

Stage 5

Researcher
Interpretation

Stage 1 · Split 1

Sample Businesses into Groups

We predict female ownership probability for each business, then draw samples from the top and bottom thirds of the distribution to form two blinded comparison groups.

High P(female ownership)

Group A

Low P(female ownership)

Group B

Stage 1 · Split 1

Generate Hypotheses

An LLM receives batches of Group A and Group B descriptions and identifies systematic differences.

Prompt

Prompt 1: DiscoveryI have two sets of business descriptions from microenterprises. Group A and Group B represent businesses that differ on some characteristics. GROUP A: {group_a_text} GROUP B: {group_b_text} Analyze these descriptions and identify systematic differences between the two groups. What patterns distinguish Group A from Group B? Return your response as a JSON array where each element represents a distinct hypothesis about how the groups differ: [ { "hypothesis": "Clear comparative statement describing how Group A differs from Group B on a specific dimension (e.g., 'Group A... while Group B...')", "dimension": "Name of the dimension being compared" }, ... ] Each hypothesis must explicitly compare both groups. Focus on concrete, observable characteristics that could be reliably coded by researchers. Return only valid JSON with no additional text.

GPT-4o-mini / Claude 3.5 Haiku

Hypotheses Generated

Repeated with fresh random samples → ~55 hypotheses per split

Stage 2 · Split 1

Consolidate Hypotheses

A separate LLM call groups the raw hypotheses from this split into coherent, non-redundant dimensions.

Prompt

Prompt 2: ConsolidationI have generated multiple hypotheses about what distinguishes Group A from Group B businesses. Please consolidate these into a clear, non-redundant list of measurable features. HYPOTHESES TO CONSOLIDATE: {hypotheses_text} Please: 1. Group similar hypotheses together 2. Create clear, operational definitions for each distinct feature 3. Eliminate redundancies and overlaps 4. Produce multiple distinct, measurable dimensions Return your response as a JSON array where each element represents a consolidated feature: [ { "feature_name": "Clear, concise name for the feature", "definition": "Operational definition of what this feature measures", "group_difference": "How Group A typically differs from Group B on this dimension", "source_hypotheses": ["List of hypothesis dimensions that contributed to this feature"] }, ... ] Return only valid JSON with no additional text.

Consolidated Features (Split 1)

Stages 1–2 · Split 2

Repeat on Independent Data Split

The same generation and consolidation process runs on a completely separate random half of the data.

↻ Different random 50% of businesses — entirely independent from Split 1

New Samples (Split 2)

Consolidated Features (Split 2)

Stage 3

Cross-Validation

An LLM compares features from the two independent splits. Only features discovered in both survive.

Split 1 Features

Prompt 3: Cross-ValidationI have two lists of features discovered from different random samples of the same underlying data. Please identify which features appear in BOTH lists, even if expressed differently. FEATURES FROM SPLIT 1: {features_1_text} FEATURES FROM SPLIT 2: {features_2_text} Please: 1. Identify features that appear in both lists (even with different wording) 2. Create a unified list of only the overlapping/similar features 3. Use the clearest wording for each overlapping feature 4. Preserve both the operational definitions and Group A vs Group B patterns 5. Exclude features that only appear in one list

↔

Split 2 Features

This entire process (sample → generate → consolidate → cross-validate) repeats across 5 independent iterations. Features that appear more frequently across iterations receive higher stability scores.

Stage 4

Final Consolidation

Cross-validated features from all 5 iterations are consolidated one final time. Semantically similar features are grouped and a minimum frequency threshold is applied.

Prompt

Prompt 4: Final ConsolidationI have collected features from multiple discovery iterations that may express similar concepts using different definitions. Please group semantically similar features together and create standardized definitions. ALL FEATURES ACROSS ITERATIONS: {features_text} Please: 1. Group features based on their definitions and group differences, not on any similarity in wording 2. Features should only be grouped if they measure the same underlying concept 3. Count how many original features belong to each group 4. Preserve the Group A vs Group B pattern information from the original features 5. Only group features that are genuinely similar - don't force unrelated features together 6. Ensure each feature represents one distinct dimension only - do not combine multiple conceptually separate dimensions into a single feature 7. If a feature seems to combine multiple dimensions, split it into separate features for each distinct concept 8. After grouping based on conceptual similarity, assign an appropriate standardized name to each group Each feature should measure only one clear, distinct concept. Avoid combining unrelated dimensions into compound features. Return your response as a JSON array where each element represents a grouped feature: [ { "feature_name": "Standardized name assigned after conceptual grouping", "definition": "Clear operational definition of this single dimension", "group_difference": "How Group A typically differs from Group B on this dimension", "count": number_of_original_features_grouped }, ... ] Return only valid JSON with no additional text.

13 Consolidated Hypotheses (Figure 5)

Stage 5

Researcher Interpretation

Researchers synthesize the pipeline output with domain expertise and prior literature to produce the final set of testable dimensions.

LLM-Generated Hypothesis	Researcher Interpretation	Short-hand	Comments	Literature