HexGIN: Heterogeneous Graph Isomorphism Networks for Real-World Money Laundering Detection
Heterogeneous extension of Graph Isomorphism Networks applied to the FinCEN Files investigation - detection of money laundering patterns.
Transforming diverse data sources into graph representations suitable for deep learning remains a challenge in modern machine learning. HeXtractor provides a standardized, automated framework for converting structured and unstructured data into heterogeneous graphs compatible with Graph Neural Networks (GNNs).
HeXtractor in action
Publication:  Journal of Open Source Software, Volume 10, Issue 110, Article 8057, 2025
Authors: Filip Wójcik, Marcin Malczewski
Repository: Available on GitHub
Documentation: HeXtractor Documentation
Citation: Wójcik, F., & Malczewski, M. (2025). HeXtractor: Extracting Heterogeneous Graphs from Structured and Textual Data for Graph Neural Networks. Journal of Open Source Software, 10(110), 8057. https://doi.org/10.21105/joss.08057
HeXtractor is an open-source Python library that streamlines heterogeneous graph construction. Originally developed as part of the HexGIN project for financial transaction analysis, it has evolved into a domain-agnostic framework serving researchers and practitioners across multiple fields.
HeXtractor provides comprehensive functionality for graph construction:
The library incorporates several advanced technical features:
HeXtractor supports tabular data transformation through:
Single-Table Mode: Each row encodes relationships among entities defined by columns, automatically generating node and edge definitions
Multi-Table Mode: GraphSpecs framework for merging entity and relationship tables into unified heterogeneous graphs, maintaining referential integrity
Integration with Large Language Models enables semantic graph construction:
HeXtractor’s flexibility enables applications across multiple research areas:
The software has been successfully deployed in several high-impact applications:
HeXtractor addresses common problems in graph-based machine learning:
The open-source nature of HeXtractor has fostered adoption:
HeteroData(
  company={ x=[3, 2] },
  employee={ x=[7, 2], y=[7] },
  (company, has, employee)={ edge_index=[2, 6] }
)
Natural language descriptions automatically transformed into structured graph representations with entity recognition and relationship extraction powered by state-of-the-art language models.
Ongoing development focuses on:
HeXtractor represents a significant contribution to the graph machine learning ecosystem, democratizing access to heterogeneous graph construction capabilities. By bridging the gap between diverse data sources and graph neural networks, it enables researchers to focus on model development rather than data preprocessing complexities.