LLM test on superconductivity research questions

by ai-intensify
0 comments
Helping AI have long-term memory

conclusion

Several major conclusions emerge from this test case. Two models drawn from a curated database of experimental literature, NotebookLM and our custom-built tool, outperformed LLM trained on unfiltered Internet data. In particular, models relying on open web sources tend to mix established theories with highly speculative theories.

The evaluated LLM (accessed December 2024) also showed weaknesses in temporal and contextual understanding. For example, they often failed to recognize when a proposed hypothesis was later disproved. They also often omitted relevant papers when they did not explicitly include the exact language used in the initial query.

Our results highlight the need for LLMs to better understand tables and images, as scientific papers heavily use these formats. While the two models consistently referenced images, they often relied more on image captions rather than visual analysis. Enhancing visual reasoning capabilities, including the interpretation of images, plots, and scale bars, is a major direction for future improvement.

Related Articles

Leave a Comment