Borchhardt & Carlson (2026)
LLM Mechanism Discovery Pipeline
A computational approach to systematically surface candidate mechanisms
from unstructured business descriptions — formalizing the typically
ad-hoc, abductive process of mechanism identification.
Data
~9,000 microenterprises across 26 countries
➔
➔
➔
Stage 4
Final Consolidation
➔
Stage 5
Researcher
Interpretation
Stage 1 · Split 1
Sample Businesses into Groups
We predict female ownership probability for each business, then draw samples from
the top and bottom thirds of the distribution to form two blinded comparison groups.
Stage 1 · Split 1
Generate Hypotheses
An LLM receives batches of Group A and Group B descriptions and identifies systematic differences.
Prompt
Prompt 1: DiscoveryI have two sets of business descriptions from microenterprises. Group A and Group B represent businesses that differ on some characteristics.
GROUP A:
{group_a_text}
GROUP B:
{group_b_text}
Analyze these descriptions and identify systematic differences between the two groups. What patterns distinguish Group A from Group B?
Return your response as a JSON array where each element represents a distinct hypothesis about how the groups differ:
[
{
"hypothesis": "Clear comparative statement describing how Group A differs from Group B on a specific dimension (e.g., 'Group A... while Group B...')",
"dimension": "Name of the dimension being compared"
},
...
]
Each hypothesis must explicitly compare both groups. Focus on concrete, observable characteristics that could be reliably coded by researchers. Return only valid JSON with no additional text.
GPT-4o-mini / Claude 3.5 Haiku
Hypotheses Generated
Repeated with fresh random samples → ~55 hypotheses per split
Stage 2 · Split 1
Consolidate Hypotheses
A separate LLM call groups the raw hypotheses from this split into coherent, non-redundant dimensions.
Prompt
Prompt 2: ConsolidationI have generated multiple hypotheses about what distinguishes Group A from Group B businesses. Please consolidate these into a clear, non-redundant list of measurable features.
HYPOTHESES TO CONSOLIDATE:
{hypotheses_text}
Please:
1. Group similar hypotheses together
2. Create clear, operational definitions for each distinct feature
3. Eliminate redundancies and overlaps
4. Produce multiple distinct, measurable dimensions
Return your response as a JSON array where each element represents a consolidated feature:
[
{
"feature_name": "Clear, concise name for the feature",
"definition": "Operational definition of what this feature measures",
"group_difference": "How Group A typically differs from Group B on this dimension",
"source_hypotheses": ["List of hypothesis dimensions that contributed to this feature"]
},
...
]
Return only valid JSON with no additional text.
Consolidated Features (Split 1)
Stages 1–2 · Split 2
Repeat on Independent Data Split
The same generation and consolidation process runs on a completely separate random half of the data.
↻ Different random 50% of businesses — entirely independent from Split 1
Consolidated Features (Split 2)
Stage 3
Cross-Validation
An LLM compares features from the two independent splits. Only features discovered in both survive.
Prompt 3: Cross-ValidationI have two lists of features discovered from different random samples of the same underlying data. Please identify which features appear in BOTH lists, even if expressed differently.
FEATURES FROM SPLIT 1:
{features_1_text}
FEATURES FROM SPLIT 2:
{features_2_text}
Please:
1. Identify features that appear in both lists (even with different wording)
2. Create a unified list of only the overlapping/similar features
3. Use the clearest wording for each overlapping feature
4. Preserve both the operational definitions and Group A vs Group B patterns
5. Exclude features that only appear in one list
↔
This entire process (sample → generate → consolidate → cross-validate) repeats
across 5 independent iterations. Features that appear more frequently across iterations
receive higher stability scores.
Stage 4
Final Consolidation
Cross-validated features from all 5 iterations are consolidated one final time.
Semantically similar features are grouped and a minimum frequency threshold is applied.
Prompt
Prompt 4: Final ConsolidationI have collected features from multiple discovery iterations that may express similar concepts using different definitions. Please group semantically similar features together and create standardized definitions.
ALL FEATURES ACROSS ITERATIONS:
{features_text}
Please:
1. Group features based on their definitions and group differences, not on any similarity in wording
2. Features should only be grouped if they measure the same underlying concept
3. Count how many original features belong to each group
4. Preserve the Group A vs Group B pattern information from the original features
5. Only group features that are genuinely similar - don't force unrelated features together
6. Ensure each feature represents one distinct dimension only - do not combine multiple conceptually separate dimensions into a single feature
7. If a feature seems to combine multiple dimensions, split it into separate features for each distinct concept
8. After grouping based on conceptual similarity, assign an appropriate standardized name to each group
Each feature should measure only one clear, distinct concept. Avoid combining unrelated dimensions into compound features.
Return your response as a JSON array where each element represents a grouped feature:
[
{
"feature_name": "Standardized name assigned after conceptual grouping",
"definition": "Clear operational definition of this single dimension",
"group_difference": "How Group A typically differs from Group B on this dimension",
"count": number_of_original_features_grouped
},
...
]
Return only valid JSON with no additional text.
13 Consolidated Hypotheses (Figure 5)
Stage 5
Researcher Interpretation
Researchers synthesize the pipeline output with domain expertise and prior literature
to produce the final set of testable dimensions.
| LLM-Generated Hypothesis |
Researcher Interpretation |
Short-hand |
Comments |
Literature |
← → to navigate