Balsam - HPC Workflow and Edge Service¶
Balsam is a Python-based service that handles the cumbersome process of running many jobs across one or more HPC resources. It runs on the login nodes, keeping track of all your jobs and submitting them to the local scheduler on your behalf.
Why do I want this?¶
Whereas a local batch scheduler like Cobalt runs on behalf of all users, with the goals of fair resource sharing and maximizing overall utilization, Balsam runs on your behalf, interacting with the scheduler to check for idle resources and sizing jobs to minimize time-to-solution.
You could use Balsam as a drop-in replacement for
qsub, simply using
balsam qsub to submit your jobs with absolutely no restrictions. Let Balsam
throttle submission to the local queues, package jobs into ensembles for you,
and dynamically size these packages to exploit local scheduling policies.
There is much more to Balsam, which is a complete service for managing complex workflows and optimized scheduling across multiple HPC resources.
- Workflow Tutorial
- API for BalsamJob database (DAG) Manipulations
- Hyperparameter Optimization
- Frequently Asked Questions
- Why isn’t the launcher running my jobs?
- Where does the output of my jobs go?
- How can I move the output of my jobs to an external location?
- How can I control the way an application runs in my workflow?
- I want my program to wait on the completion of a job it created.
- Querying the Job database
- Useful command lines
- Useful Python scripts