Profile

PhD candidate specializing in Semantic Web technologies with a unique mathematical foundation and hands-on development experience. Research focus on completeness and responsiveness of online SPARQL endpoints, bridging theoretical research with practical implementation. Proven track record of 8 peer-reviewed publications and contributing to research projects developing tools like CRAWD and PASSAGE for public SPARQL endpoints.

Research & Development Experience

PhD Researcher - Semantic Web & Decentralized Linked Data
LS2N Lab, Nantes University Sep 2023 - Present
Supervisors: Prof Pascal Molli, Prof Hala Skaf-Molli
  • PASSAGE: Novel SPARQL continuation query concept ensuring completeness for public endpoints
  • CRAWD: Sampling-based estimator for count-distinct SPARQL queries
  • Published 4 papers at top-tier venues (ACM Web Conference, ISWC, Semantic Web Journal)
  • Collaborated on production-ready tools used by semantic web community worldwide
Research Intern - Information Retrieval Systems
LIG Lab, Grenoble Feb - Jul 2023
Supervisors: Prof Philippe Mulhem, Prof Lorraine Goeuriot, Dr. Petra Galuscakova
  • Applied NLP techniques to analyze IR system performance across different document collections
  • Developed predictive models using document features (length, complexity, query structure)
ML Research Intern - Energy-Efficient AI
LIG Lab, Grenoble Apr - Jul 2022
Supervisors: Prof Denis Trystram, Dr. Danilo Carastan-Santos
  • Built benchmark tracker for evaluating energy consumption of HPC-scale AI algorithms
  • Instrumented Python libraries to collect energy metrics through hardware counters
  • Published results at CARLA 2022 (Latin America High-Performance Computing Conference)

Key Publications

PASSAGE: Ensuring Completeness and Responsiveness of Public SPARQL Endpoints
THT Pham, G Montoya, B Nédelec, H Skaf-Molli, P Molli
ACM Web Conference 2025 Recent
Abstract: Being able to query online public knowledge graphs such as Wikidata or DBpedia is extremely valuable. However, these queries can be interrupted due to the fair use policies enforced by SPARQL endpoint providers, leading to incomplete results. We introduce the concept of SPARQL continuation queries to ensure completeness and responsiveness with performances similar to BlazeGraph.
CRAWD: Sampling-Based Estimation of Count-Distinct SPARQL Queries
THT Pham, P Molli, B Nédelec, H Skaf-Molli, J Aimonier-Davat
International Semantic Web Conference 2024 (Presented at ISWC2024, Baltimore)
Abstract: Count-distinct SPARQL queries compute the number of unique values in the results of a query executed on a Knowledge Graph. However, counting the exact number of distinct values is often computationally demanding and time-consuming. We propose CRAWD, a new sampling-based approach designed to approximate count-distinct SPARQL queries, significantly improving sampling efficiency for public SPARQL endpoints.
Continuation Queries: Embracing Timeouts on Public SPARQL Endpoints
THT Pham, G Montoya, B Nédelec, H Skaf-Molli, P Molli
ISWC 2025 Companion Volume, Nara, Japan
Abstract: Rather than treating timeouts as failures on public SPARQL endpoints, our approach leverages continuation queries to obtain partial results. When a SPARQL endpoint reaches its time quota, it returns partial results along with a new SPARQL query designed to retrieve the missing results, repeated iteratively to recover complete answers.
Fraw: Sampling-Based Approximate Query Processing for Federations of SPARQL Endpoints
E Boisteau-Desdevises, THT Pham, G Montoya, B Nédelec, H Skaf-Molli, P Molli
ISWC 2025 Companion Volume, Nara, Japan
Abstract: Fraw is a SPARQL federation engine enabling users to query multiple SPARQL endpoints as if all RDF data were available through a single virtual endpoint. The system employs sampling-based approximate query processing using random walks, demonstrated through an interactive SPARQL query autocompletion use case.
LLM4Schema.org: Generating Schema.org Markups with Large Language Models
MH Dang, THT Pham, P Molli, H Skaf-Molli, A Gaignard
Semantic Web Journal 2025 1 citation
Abstract: The integration of Schema.org markup into web pages has resulted in billions of RDF triples, yet around 75% of web pages still lack this critical markup. This paper introduces LLM4Schema.org, an innovative approach for assessing the performance of LLMs in generating Schema.org markup. Our findings reveal that 40–50% of the markup produced by GPT-3.5 and GPT-4 is invalid, but specialized LLM-powered agents can effectively identify and eliminate these errors.
Online Sampling of Summaries from Public SPARQL Endpoints
THT Pham, H Skaf-Molli, P Molli, B Nédelec
Companion Proceedings of the ACM Web Conference 2024, 617-620
Abstract: This paper investigates whether online sampling can generate summaries useful in cutting-edge SPARQL federation engines. Our experimental studies indicate that sampling allows the creation and maintenance of summaries by exploring less than 20% of datasets, while respecting fair usage policies for public SPARQL endpoints.
Impact des collections sur les performances des Systèmes de Recherche d'Information
THT Pham, P Galuščáková, P Mulhem, G González Sáez, L Goeuriot
CORIA 2024 (COnférence en Recherche d'Information et Applications)
Résumé: Cet article est une étude préliminaire sur les évolutions des corpus et leur impact sur les performances des systèmes de recherche d'information. Nous proposons une approche pour créer des corpus intermédiaires entre deux existants, puis étudions les corrélations entre les différences et les évaluations d'un certain nombre de systèmes de recherche d'information.
Understanding Energy Consumption of HPC Scale Artificial Intelligence
D Carastan-Santos, THT Pham
CARLA 2022 3 citations
Abstract: This paper contributes towards better understanding the energy consumption trade-offs of HPC scale Artificial Intelligence (AI), and more specifically Deep Learning (DL) algorithms. We developed benchmark-tracker, a benchmark tool to evaluate the speed and energy consumption of DL algorithms in HPC environments, contributing with a new tool to help HPC DL developers better balance infrastructure in terms of speed and energy consumption.

Certificates

IBM AI Developer Professional Certificate
IBM Skills Network - Coursera
September 2025
Certificate ID: VOJE5CBYF3SD | 10 Courses Completed
IBM RAG and Agentic AI Professional Certificate
IBM Skills Network - Coursera
September 2025
Certificate ID: ZBIJZ606QIJO | 8 Courses Completed

Education

PhD in Computer Science
LS2N, Nantes University
2023 - 2026 (Expected)
Master in Informatics (MoSIG)
Université Grenoble Alpes - ENSIMAG
2021 - 2023
Master of Science in Applied Mathematics
Université Orléans & VNUHCM (France-Vietnam Program)
2019 - 2020
Thesis: 15/20 (Top 5)
Bachelor of Science in Applied Mathematics
University of Saigon
2017 - 2019
GPA: 3.2/4 (Top 2 - Honorable Student)