Skip to content

OSG Site Administrator Documentation

Welcome to the home of the Open Science Grid (OSG) Site Administrator documentation! This documentation aims to provide OSG site admins with the necessary information to install, configure, and operate site services.

If you are not a site adminstrator:

  • If you are a researcher interested in using OSG resources, you may want to view our user documentation.
  • If you'd like to learn more about the OSG and our mission, visit the OSG consortium's homepage.

This document outlines the overall installation process for an OSG site and provides many links into detailed installation, configuration, troubleshooting, and similar pages. If you do not see software-related technical documentation listed here, try the search bar at the top or contacting us at help@opensciencegrid.org.

Plan the Site

If you have not done so already, plan the overall architecture of your OSG site. It is recommended that your plan be sufficiently detailed to include the OSG hosts that are needed and the main software components for each host. Be sure to consider the operating systems that OSG supports. For example, a basic site might include:

Purpose Host Major Software
Compute Element (CE) osg-ce.example.edu OSG CE, HTCondor Central Manager, etc. (osg-ce-condor)
Worker Nodes wNNN.cluster.example.edu OSG worker node client (osg-wn-client)

Prepare the Batch System

The assumption is that you have an existing batch system at your site. Currently, we support HTCondor, LSF, PBS and TORQUE, SGE, and Slurm batch systems.

For smaller sites (less than 50 worker nodes), the most common way to add a site to OSG is to install the OSG Compute Element (CE) on the central host of your batch system. At such a site - especially if you have minimal time to maintain a CE - you may want to contact help@opensciencegrid.org to ask about using an OSG-hosted CE instead of running your own. Before proceeding with an install, be sure that you can submit and successfully run a job from your OSG CE host into your batch system.

Add OSG Software

If necessary, provision all OSG hosts that are in your site plan that do not exist yet. The general steps to installing an OSG site are:

  1. Install OSG Yum Repos and the Compute Element software on your CE host
  2. Install the Worker Node client on your worker nodes.
  3. Install optional software to increase the capabilities of your site.
  4. Test the OSG installation.
  5. Make your site available to researchers.
  6. Verify your site's accounting.

Note

For sites with more than a handful of worker nodes, it is recommended to use some sort of configuration management tool to install, configure, and maintain your site. While beyond the scope of OSG’s documentation to explain how to select and use such a system, some popular configuration management tools are Puppet, Chef, Ansible, and CFEngine.

General Installation Instructions

Installing and Managing Certificates for Site Security

Installing and Configuring the Compute Element

Adding OSG Software to Worker Nodes

Installing and Configuring Other Services

All of these node types and their services are optional, although OSG requires an HTTP caching service if you have installed CVMFS on your worker nodes.

Test OSG Software

It is useful to test manual submission of jobs from inside and outside of your site through your CE to your batch system. If this process does not work manually, it will probably not work for the glideinWMS pilot factory either.

Start GlideinWMS Pilot Submissions

To begin running GlideinWMS pilot jobs at your site, e-mail osg-gfactory-support@physics.ucsd.edu and tell them that you want to start accepting Glideins. Please provide them with the following information:

  • The fully qualified hostname of the CE
  • Resource/WLCG name
  • Supported OS version of your worker nodes (e.g., EL6, EL7, or both)
  • Support for multicore jobs
  • Maximum job walltime
  • Maximum job memory usage

Once the factory team has enough information, they will start submitting pilots from the test factory to your CE. Initially, this will be one pilot at a time but once the factory verifies that pilot jobs are running successfully, that number will be ramped up to 10, then 100.

Verify Reporting and Monitoring

To verify that your site is correctly reporting to the OSG, check OSG's Accounting Portal for records of your site reports (select your site from the drop-down box). If you have enabled the OSG VO, you can also check http://osg-flock.grid.iu.edu/monitoring/condor/sites/all_1day.html.

Scale Up Site to Full Production

After successfully running all the pilot jobs that are submitted by the test factory and verifying your site reports, your site will be deemed production ready. No action is required on your end, factory operations will start submitting pilot jobs from the production factory.