Sochat V, Culquicondor A, Ojea A and Milroy D. The Flux Operator version 1. F1000Research 2024, 13:203 [pdf] [journal-link]
D. Nichols et al., “Predicting Cross-Architecture Performance of Parallel Programs” 2024 IEEE International Parallel & Distributed Processing Symposium, San Francisco, California USA, 2024. [pdf]
T. Patki, et al., “Fluxion: A Scalable Graph-Based Resource Model for HPC Scheduling Challenges,” Proceedings of the SC ‘23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W ‘23). 2023, New York, NY, USA. doi: 10.1145/3624062.3624286.[pdf] [journal-link] [slides]
D. J. Milroy et al., “One Step Closer to Converged Computing: Achieving Scalability with Cloud-Native HPC,” 2022 IEEE/ACM 4th International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC), Dallas, TX, USA, 2022, pp. 57-70, doi: 10.1109/CANOPIE-HPC56864.2022.00011. [journal-link]
D. H. Ahn et al., “Scalable Composition and Analysis Techniques for Massive Scientific Workflows,” 2022 IEEE 18th International Conference on e-Science (e-Science), 2022, pp. 32-43, doi: 10.1109/eScience55777.2022.00018. [pdf] [journal-link]
H. Bhatia et al., “Generalizable Coordination of Large Multiscale Workflows: Challenges and Learnings at Scale,” SC21: International Conference for High Performance Computing, Networking, Storage and Analysis, 2021, pp. 1-16, doi: 10.1145/3458817.3476210. [pdf] [journal-link]
C. Misale, C et al., “Towards Standard Kubernetes Scheduling Interfaces for Converged Computing.” Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation. SMC 2021. Communications in Computer and Information Science, vol 1512. Springer, Cham. [journal-link]
Dong H. Ahn, Ned Bass, Albert Chu, Jim Garlick, Mark Grondona, Stephen Herbein, Helgi I. Ingólfsson, Joseph Koning, Tapasya Patki, Thomas R.W. Scogland, Becky Springmeyer, Michela Taufer, “Flux: Overcoming Scheduling Challenges for Exascale Workflows”, Future Generation Computer Systems, Volume 110, 2020, Pages 202-213. [pdf] [journal link]
Stephen Herbein, David Domyancic, Paul Minner, Ignacio Laguna, Rafael Ferreira da Silva, Dong H. Ahn, “MCEM: Multi-Level Cooperative Exception Model for HPC Workflows,” 9th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS) Phoenix, AZ, June 2019. [pdf] [slides]
Dong H. Ahn, Ned Bass, Albert Chu, Jim Garlick, Mark Grondona, Stephen Herbein, Joseph Koning, Tapasya Patki, Thomas R. W. Scogland, Becky Springmeyer, Michela Taufer, “Flux: Overcoming Scheduling Challenges for Exascale Workflows,” Workflows in Support of Large-Scale Science in conjunction with International Conference for High Performance Computing, Networking, Storage, and Analysis (SC|18), Dallas, TX, November 2018. [pdf]
Samuel D. Pollard, Nikhil Jain, Stephen Herbein, Abhinav Bhatele, “Evaluation of an Interference-free Node Allocation Policy on Fat-tree Clusters,” International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, TX, November 2018. [pdf]
Michael Wyatt, Stephen Herbein, Todd Gamblin, Adam Moody, Dong H. Ahn, and Michela Taufer, “PRIONN: Predicting Runtime and IO using Neural Networks,” 47th International Conference on Parallel Processing, Eugene, OR, August 2018. [pdf]
Stephen Herbein, Dong H. Ahn, Don Lipari, Thomas R.W. Scogland, Marc Stearman, Mark Grondona, Jim Garlick, Becky Springmeyer, Michela Taufer, “Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters”, 25th International Symposium on High-Performance Parallel and Distributed Computing, Kyoto, Japan, June 2016. [pdf]
Dong H. Ahn, Jim Garlick, Mark Grondona, Don Lipari, Becky Springmeyer, Martin Schulz, “Flux: A Next-Generation Resource Management Framework for Large HPC Centers”, 10th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems, Minneapolis, MN, September 2014. [pdf]
Dong H. Ahn, Jim Garlick, Mark Grondona, Don Lipari, “Vision and Plan for a Next Generation Resource Manager”, LLNL DRAFT technical report, May 2013. [pdf]
Sochat V, Milroy D, Fox D, (2024, February). “Kubernetes and HPC: Bare Metal Bros,” FOSDEM HPC, Big Data, and Data Science Devroom, Brussels, Belgium. [link]
Sochat V, Gharaibeh, A (2023, October). “On-Demand Systems and Scaled Training Using the JobSet API,” Kubecon America, Chicago, 2023. [link] [video]
Sochat V, Misale, C (2023, May). “Cloud and HPC Convergence: Flux for Job Management on Kubernetes”, HPC Knowledge Meeting 2023 [link]
Sochat V, Woźniak M (2023, April 21) “Enabling HPC and ML Workloads with the Latest Kubernetes Job Features.” Kubecon, Amsterdam. [link] [video]
Milroy D. Misale C. (2022, May). “KubeFlux: An HPC Scheduler Plugin for Kubernetes” . Kubecon Europe, Valencia Spain 2022. [link] [video]
Milroy D, Misale C. (2022 October). “Lightning Talk: Fluence: Approaching a Converged Computing Environment” Kubecon America Kubernetes Batch + HPC Day, Detroit Michigan. [video]
Dong H. Ahn, Ned Bass, Al Chu, Jim Garlick, Mark Grondona, Stephen Herbein, Tapasya Patki, Tom Scogland, Becky Springmeyer, “Flux: Practical Job Scheduling”, Lawrence Livermore National Laboratory’s Computation’s Developer Day, Livermore, CA, August 2018. [pptx] [pdf]
Stephen Herbein, Tapasya Patki, Dong H. Ahn, Don Lipari, Tamara Dahlgren, David Domyancic, Michela Taufer. Fully Hierarchical Scheduling: Paving the Way to Exascale Workloads, International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, November 2017. [poster] [abstract]
Stephen Herbein, Dong H. Ahn, Don Lipari, Thomas R.W. Scogland, Marc Stearman, Mark Grondona, Jim Garlick, Becky Springmeyer, Michela Taufer, “Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters”, Salishan Conference on High-Speed Computing, Gleneden Beach, OR, April 2016. [pdf]
Stephen Herbein, Dong H. Ahn, Don Lipari, Thomas R.W. Scogland, Kento Sato, Jim Garlick, Mark Grondona, Becky Springmeyer, Michela Taufer. Exploring the Trade-off Space of Hierarchical Scheduling for Very Large HPC Centers, International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, TX, November 2015. [pdf]
Stephen Herbein. Advanced Schedulers for Next-Generation HPC Systems, Dissertation, Department of Computer & Information Sciences, University of Delaware, Newark, DE, August 2018.