<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Wilson, Nikhil</style></author><author><style face="normal" font="default" size="100%">Verma, Ashwini</style></author><author><style face="normal" font="default" size="100%">Maharana, Piyush Ranjan</style></author><author><style face="normal" font="default" size="100%">Sahoo, Ameeya Bhusan</style></author><author><style face="normal" font="default" size="100%">Joshi, Kavita</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">HyStor: an experimental database of hydrogen storage properties for various metal alloy classes</style></title><secondary-title><style face="normal" font="default" size="100%">International Journal of Hydrogen Energy</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Databases</style></keyword><keyword><style  face="normal" font="default" size="100%">machine learning</style></keyword><keyword><style  face="normal" font="default" size="100%">Metal hydrides</style></keyword><keyword><style  face="normal" font="default" size="100%">Solid-state hydrogen storage</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2024</style></year><pub-dates><date><style  face="normal" font="default" size="100%">NOV </style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">90</style></volume><pages><style face="normal" font="default" size="100%">460-469</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;
	In this work, we introduce the HyStor database, consisting of 1282 metal alloys along with their maximum hydrogen storage capacity (H2wt%) at a given absorption temperature. The curated HydPark database consist of 831 entries. We sourced compositions from research articles and various patent documents, resulting in addition of 451 compositions to the HydPark database. The addition is reflected in the data across all existing classes of alloys. Further, low entropy alloys (LEA), medium entropy alloys (MEA) and high entropy alloys (HEA) have been newly included classes. This has broadened the scope of the database to encompass the latest materials of interest for hydrogen storage. HyStor contains representation of 54 elements, with a temperature range of 200-800 K, and H2wt% ranging from 0.1 to 7.19. We conducted thorough checks for duplicate entries, erroneous data, and conflicting compositions within the database to ensure data quality. Furthermore, we conducted multiple tests to identify potential outlier compositions. The data curation and updation reflects into slight improved error metrics of the HYST model, reducing the Mean Absolute Error (MAE) from 0.31 to 0.29 and increasing the R2 score from 0.77 to 0.79.&lt;/p&gt;
</style></abstract><work-type><style face="normal" font="default" size="100%">Article</style></work-type><custom3><style face="normal" font="default" size="100%">&lt;p&gt;
	Foreign&lt;/p&gt;
</style></custom3><custom4><style face="normal" font="default" size="100%">&lt;p&gt;
	7.2&lt;/p&gt;
</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Maharana, Piyush Ranjan</style></author><author><style face="normal" font="default" size="100%">Verma, Ashwini</style></author><author><style face="normal" font="default" size="100%">Joshi, Kavita</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Retrieval augmented generation for building datasets from scientific literature</style></title><secondary-title><style face="normal" font="default" size="100%">Journal of Physics-Materials</style></secondary-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">dataset building</style></keyword><keyword><style  face="normal" font="default" size="100%">Hydrogen storage</style></keyword><keyword><style  face="normal" font="default" size="100%">LLM</style></keyword><keyword><style  face="normal" font="default" size="100%">materials</style></keyword><keyword><style  face="normal" font="default" size="100%">RAG</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2025</style></year><pub-dates><date><style  face="normal" font="default" size="100%">JUL</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">8</style></volume><pages><style face="normal" font="default" size="100%">035006</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;
	In this work, we show that employing retrieval augmented generation (RAG) with a large language model (LLM) enables us to extract accurate data from scientific literature and construct datasets. The rapid growth in publications necessitates the automation of extraction of structured data as it is crucial for training machine learning(ML) models. The pipeline developed is simple and can be adjusted accordingly with natural language as input. Quantization enables us to run LLMs on consumer hardware and remove the reliance on closed-source models. Both Llama3-8B and Gemma2-9B with RAG give structured output consistently and with high accuracy as compared to direct prompting. Using the newly developed protocol, we created a data set of metal hydrides for solid-state hydrogen storage from paper abstracts. The accuracy of the generated dataset was &amp;gt;88% in the cases tested. Further, we demonstrate that the generated dataset is ready-to-use for ML models by testing it with HYST to predict the H(2)wt\textbackslash% at a given temperature. Thus, we demonstrate a pipeline to create datasets from scientific literature at minimal computational cost and high accuracy.&lt;/p&gt;
</style></abstract><issue><style face="normal" font="default" size="100%">3</style></issue><work-type><style face="normal" font="default" size="100%">Article</style></work-type><custom3><style face="normal" font="default" size="100%">&lt;p&gt;
	Foreign&lt;/p&gt;
</style></custom3><custom4><style face="normal" font="default" size="100%">&lt;p&gt;
	4.3&lt;/p&gt;
</style></custom4></record></records></xml>