NSF Big Data Spoke on Advancing a Data-Driven Discovery and Rational Design Paradigm in Chemistry

PI: Dr. Johannes Hachmann (UB); co-PIs: Dr. Geoffrey Hutchison (University of Pittsburgh), Dr. Marcus Hanwell (Kitware, Inc.)
NSF grant no. IIS-1761990

The NSF Big Data Spoke on Advancing a Data-Driven Discovery and Rational Design Paradigm in Chemistry (funded under NSF grant no. IIS-1761990) sets out to promote the role of modern data science in the chemical domain and to foster and coalesce a community of stakeholders. It aims to create a community-driven roadmap as well as facilitate concrete solutions that go beyond the scope of the disjointed efforts of its individual stakeholders. It thus seeks to implement some of the key findings of the 2017 NSF Division of Chemistry workshop on Framing the Role of Big Data and Modern Data Science in Chemistry. The NSF Big Data Innovation Hubs and Spokes ecosystem is an ideal framework to realizing this vision and accelerating progress in this high-priority area of research, and this Spoke is part of the Northeast Hub.

The four signature initiatives of this Spoke are:

  1. Tools: Planning, coordination, integration, and consolidation of community-developed software tools for big data research in chemistry as well as establishing guidelines, best practices, and standards;
  2. Infrastructure: Providing access to a shared hardware infrastructure for community data sets, on-site data mining capacity, and the exploration of domain specific method and hardware issues;
  3. Meetings: Organizing regular workshops for community building, to connect solution seekers with solution providers, and to address questions ranging from strategic to technical;
  4. Education: Creating and disseminating community-developed teaching materials, including course, program, and curricular recommendations for education and workforce development that reflect a data-centric approach in chemical research.

Here is a list of the Spoke’s Core Stakeholders:

Name Affiliation Role
Clementi, Cecilia Humboldt University Berlin (Germany), Rice University
Crawford, Daniel Virginia Tech, MolSSI MolSSI Liaison
Cummings, Peter Vanderbilt University
Glotzer, Sharon University of Michigan
Grover, Martha Georgia Tech
Hachmann, Johannes University at Buffalo – SUNY PI
Hanwell, Marcus Brookhaven National Lab co-PI, Technical Lead
Harrison, Robert Stony Brook University IACS Liaison
Hernandez, Rigoberto Johns Hopkins University
Hutchison, Geoffrey University of Pittsburgh co-PI
Isayev, Olexandr Carnegie Mellon University
Kulik, Heather MIT
Marom, Noa Carnegie Mellon University
McEwen, Leah Ray Cornell University
Meredig, Bryce Citrine
Moore, Jonathan Dow Chemical CACHE Liaison
Persson, Kristin Lawrence Berkeley National Lab
Pfaendtner, Jim University of Washington CoMSEF Liaison
Roitberg, Adrian University of Florida
Saxe, Paul Virginia Tech, MolSSI MolSSI Liaison
Schrier, Joshua Fordham University
Sherrill, David Georgia Tech
Tuckerman, Mark New York University
von Lilienfeld, Anatole University of Vienna (Austria)
Warren, James NIST NIST Liaison
West, Richard Northeastern University
White, Andrew University of Rochester
Williams, Antony EPA
Wolverton, Chris Northwestern University
Yaron, David Carnegie Mellon University


This material is based upon work supported by the National Science Foundation under grant no. IIS-1761990. Any opinions, findings, conclusions, and/or recommendations expressed in this material are those of the Big Data Spoke participants and do not necessarily reflect the views of the National Science Foundation.

(Last update: 2021-07-26)