Alex May, Data Services Engineer

Alex B May

Data Services Engineer

Oak Ridge Leadership Computing Facility (OLCF) is dedicated to high impact, grand challenge scale problems. As we move towards integrated research ecosystems, managing data will become larger and more complex. I am interested in working with stakeholders in solving the curatorial problems posed by the volume, velocity, veracity of big data. 

Constellation, OLCF’s Open Data Portal, which continues to grow with a 40% increase in data for CY2022, now holding approximately 3.5 petabytes of data and 674,083 files.  The largest dataset is approximately 2 petabytes and the largest single file is 17 terabytes in size.  

I have been meeting the challenges of large, unstructured datasets whose management often requires creative and collaborative solutions.  

Currently I am the co-chair of the Department of Energy (DOE) Data Curation Working Group (DCWG).  This group was proposed at the 2022 DOE Data Days due to the volume, velocity, and variety of data being produced across the various US national laboratories.  A working-group of curators, librarians, information scientists, researchers, and developers representing the different laboratories formed to identify emerging issues and share expertise in data curation. The goal is to address the full lifecycle of data--from data management planning and ensuring early liaison engagement with scientists to establishing data curation standards and tools, data literacy materials, and repository best practices as well as citation-level metadata harvesting protocols as informed by the FAIR data principles.  As a strategic, cross-functional decision-making entity, the DCWG identified 7 objectives that members work asynchronously on throughout the fiscal year. 

I am also the chair of the Data Curation Network (DCN) Big Data Interest Group.  The DCN Big Data Interest Group is comprised of 17 universities, organizations such as the Michael J. Fox Foundation, and Oak Ridge National Laboratory to discuss the challenges of creating policies, curating, and storing big data.  The group meets monthly to discuss and review emerging trends and establish best practices for big data.

PUBLICATIONS 

Maheshwari, Ketan, Wilkinson, Sean, May, Alex, Skluzacek, Tyler, Kuchar, Olga, and Ferreira Da Silva, Rafael. Pseudonymization at Scale: OLCF’s Summit Usage Data Case     Study. United States: N. p., 2022. Web. doi:10.1109/BigData55660.2022.10020380.

SELECTED PRESENTATIONS

“Scientific Data and Cyber:  Addressing the Challenges of Publishing Big Research Data.”  Alex May, Olga Kuchar. 2023 DOE Cyber Security and Technology Innovation Conference.  Minneapolis, MN. May, 2023.

“Constellation—Leadership Public Data Archive.” Alex May, Ross Miller, Mitch Griffith, Brian Gajus, Kathryn Knight, Dale Stansberry, Olga A. Kuchar.  OSTI Monthly     Data Calls. August, 2022

“Who is Afraid of a Petabyte Dataset? Rethinking Repository Infrastructures and Curation Workflows for the Scale and Type of Next Generation Data.” Alex May, Olga Kuchar, Katie Knight, Rohit Srivastava. Open Repositories, Denver CO. June 2022.

“Towards a Big-Data Toolkit: Ensuring Data Governance & Ethical Considerations Are Applied to Large Datasets.”  Alex May, Olga Kuchar, Katie Knight, Rohit Srivastava. DOE Data Days, May 2022.

“Towards a DOE Data Catalog: Ensuring Access, Sharing, and Protection.” Alex May, Olga Kuchar, Katie Knight, Rohit Srivastava. DOE Data Days, May 2022.

INVITED TALKS
    
Keynote speaker, along with Olga Kuchar, for DOE Data Days Metadata and Curation Track, October 2023.