A recent study from the University of North Carolina at Chapel Hill (UNC) has demonstrated that advanced artificial intelligence tools, particularly large language models (LLMs), can significantly enhance the speed and accuracy of georeferencing plant specimens. This process, vital for digitizing natural history collections, has traditionally been labor-intensive and costly.
Researchers found that LLMs can perform georeferencing with an error margin of less than 10 kilometers, exceeding traditional methods in both efficiency and cost-effectiveness. “Our study explores how large language models can take on one of the biggest bottlenecks in digitizing plant collections,” stated Yuyang Xie, the first author and a postdoctoral researcher in the biology department at UNC.
The investigation aimed to determine whether AI could automate the laborious task of georeferencing, which involves pinpointing the original locations of collected specimens. The findings confirm that LLMs not only streamline this process but also provide near-human accuracy, expediting the digitization of valuable plant specimens.
Xiao Feng, the corresponding author and an assistant professor at UNC, emphasized the transformative potential of LLMs. “Recent advances in LLMs can potentially transform the georeferencing process, making it faster and more accurate,” he noted. “This gives researchers unprecedented opportunities to advance our understanding of global biodiversity distributions.”
The implications of this research are profound. With an estimated 2–3 billion herbarium specimens worldwide, only a small fraction have been digitized. The lack of digital records hampers researchers in tracking biodiversity loss, understanding species movement amid climate change, and analyzing ecosystem shifts.
By implementing AI-driven georeferencing, scientists could soon gain access to vast natural history collections that have remained largely undocumented. “This technology allows us to unlock millions of records that are currently sitting in cabinets,” Xie explained. “With the power of LLMs, we can rapidly digitize plant specimen data that will be critical for addressing global environmental challenges.”
Historically, georeferencing has depended on manual interpretation and specialized software, often requiring multiple rounds of expert review. The UNC study is among the first to apply LLMs to this task, showcasing their ability to outperform existing methods in accuracy, efficiency, and scalability.
This innovative approach promises a new era in the digitization of natural history collections, enabling researchers to work at a pace previously deemed impossible. The full research paper is available online in Nature Plants.


































