Grad Net Static Datashare

Revision as of 15:59, 4 May 2020 by Steve410 (talk | contribs)

INTRO

There might times that for your grad class or for your light research, you might need to download or import a large set of data (either very big files or a large number of files), where that data is static, i.e, will not be modified. The problem is that when you download very large static files or a data set with lots of static files to your Grad Net home directory, you are using up disk resources shared by the much of the CS Dept, especially, the backup resources our systems use. If you're storing static files, then, they should not be regularly backup and mirrored (as we do with our home directories.) Therefore, we have created an alternative... a Grad Net static datashare.

About the Grad Net Static Datashare

The grad net Linux clients have a secondary location for data storage to supplement the primary storage afforded by everyone's home directories. That secondary storage is available under the /static directory.

The /static directory is intended for data that:

  • Is larger than a few gigabytes in size.
    • Consider anything 10gb and over in file size to be large.
  • Is a data set comprising of a large number of files.
    • Consider a data set with more than 1000 files to be large (regardless of filesize)
  • Does not need to be backed up
    • It's static data that you acquired from somewhere, so you can obtain it again if necessary.
  • Needs to be available to all CS Grad Net Linux clients
    • Not available for the Ugrad Net
  • Should not be deleted by automatic system cleanup processes

For example, if there's a 30 gigabyte data set you want to download and look at, you could use /static to hold the data while you're working with it. You would put any programs you write to analyze the data in your home directory. If something happened to the department's servers, we would restore your programs from our backups and you could re-download the data from wherever you got it originally.

Putting your data into /static helps the department because it limits the expense of backing up excess data unnecessarily.

How to Use the Static Datashare

Simply create directory in /static to hold your data. We suggest (but don't require) that you use your account name. So someone with an account name of "jdoe15" could, for example, run:

mkdir /static/jdoe15
cd /static/jdoe15
wget https://example.com/some/dataset.zip

Note that /static is also available on the department webserver, so you can have PHP/CGI programs that make use of data in /static. (And if you want to serve files in /static directly, you can just put symbolic links to them from somewhere under your ~/public_html directory.)

Limits on Static Datashare Use

The usual restrictions on use of CS Department resources apply; any data stored in /static should be related to the department's educational or (light) research activities.

This service is only available to people with grad net accounts. There is no analogous service for people with only an undergraduate account.

Every account has a quota to limit overuse of the system. The default quota is 50 gigabytes. If you need to store more data, please email us at support@cs.jhu.edu with a description of what you're storing and how much total space you expect to need. (Quota increases are not guaranteed.)

Backups

Although the data in /static is not backed up via our usual mechanisms, there is a limited self-service backup system similar to our home directories' Self-Service File Recovery. See that page for instructions. Note that if self-service file restoration does not work, you will have to download or recreate the dataset yourself.

Alternatives to the Static Datashare

There are several other places where you can put files, depending on your needs:

  • Files in your home directory are available on all of our client systems that share the account. Such files are also backed up as described on our Backups page.
  • (For completeness.) Files in /static are available on all grad net Linux clients. They are not backed up to the same degree that home directories are.
  • Each Linux client has a /scratch directory. Files put there are only accessible on the Linux client where they're stored. Not only are the files there not backed up, they will be automatically deleted during our annual client OS upgrade process (and may be deleted at other times as circumstances warrant).
  • Each Linux client also has a /tmp directory. Like /scratch, files put in /tmp are only accessible on the Linux client where they're stored. Files in /tmp are automatically cleared out much more frequently than files in /scratch.