A recent study from the University of North Carolina at Chapel Hill (UNC-Chapel Hill) demonstrates that advanced artificial intelligence tools, specifically large language models (LLMs), can significantly enhance the speed and efficiency of georeferencing plant specimens. This process, which involves pinpointing the original collection locations of specimens, has traditionally required extensive manual effort and resources.
The research, led by Yuyang Xie, a postdoctoral researcher in the biology department at UNC, reveals that LLMs can execute georeferencing tasks with near-human accuracy while reducing both time and cost. “Our study explores how large language models can take on one of the biggest bottlenecks in digitizing plant collections,” said Xie. This innovation promises to accelerate the digitization of plant specimens, providing new avenues for ecological research.
The study aimed to address a critical question: Can AI automate one of the most labor-intensive steps in digitizing natural history collections? The team confirmed that LLMs can indeed perform this task, achieving an error margin of less than 10 kilometers. This level of accuracy surpasses traditional methods, which often involve manual interpretation and extensive expert review.
Xiao Feng, the corresponding author and assistant professor in the biology department at UNC, stated, “Recent advances in LLMs can potentially transform the georeferencing process, making it faster and more accurate.” This technological leap offers researchers unprecedented opportunities to deepen their understanding of global biodiversity distributions.
Significance of the Research
The implications of this research are profound. It is estimated that there are between 2 and 3 billion herbarium specimens worldwide, yet only a small fraction has been digitized. The absence of digital records and spatial data limits researchers’ ability to track biodiversity loss, understand species movement in response to climate change, and analyze shifts in ecosystems.
By utilizing AI-driven georeferencing, scientists may soon have the means to rapidly digitize extensive natural history collections that have largely remained inaccessible. “This technology allows us to unlock millions of records that are currently sitting in cabinets,” Xie noted. “With the power of LLMs, we can rapidly digitize plant specimen data that will be critical for addressing global environmental challenges.”
Traditional georeferencing approaches rely on specialized software and multiple rounds of expert review, making them slow and costly. The UNC study is among the first to apply LLMs to this task, demonstrating their superiority in terms of accuracy, efficiency, and scalability. This new methodology could facilitate the digitization of natural history collections at an unprecedented pace.
The findings are documented in a research paper published in Nature Plants, which highlights the potential for LLMs to revolutionize the field of natural history archives. As researchers continue to explore the applications of artificial intelligence in scientific research, the prospect of digitizing vast collections of plant specimens becomes increasingly feasible.
This advancement not only serves academic and research communities but also has broader implications for global biodiversity efforts. By streamlining the digitization process, researchers can more effectively engage with pressing environmental issues, ultimately contributing to a better understanding of our planet’s ecosystems.


































