Storing Large Amounts Of Data On Our Systems

INTRO

Our lab systems utilize disk storage that is shared by all in the CS department. And we have many, many users on our systems, so we have to be wary of the amount of disk/storage resources used.
There might be times when you receive large amounts of data for a project you're working on in CS. Because our disk space is shared by others, we must all be conscious of the amount of data we store.
We consider large amounts of data to be those that are more than 10gb in size or a complete data set made up of many files that, in total, are over 10gb in size.
Additionally, for static data (data that does not change), because we have a backup system that backs up our data on a nightly, weekly and monthly basis, it's important that we minimize the number of times the backup system needs to backup the same large data over and over again. This will help reduce backup time and save backup storage resources.
NOTE: Both the Grad and UGrad nets use a different, special data storage location for static data specifically provided from CS professors to their classes for course assignments (and that's usually not big data.) And that should not be confused with the Grad Net static datashare described on this page. So, if you're teaching a course and need to provide static data to the students, that is a different situation, and you should please email support@cs.jhu.edu to arrange access.

If You Need To Store A Large Amount Of Data On Our Systems...

CS Grad students, Faculty, Staff, Researchers:

Static Data

If the large amount of data you receive or generate does not change or does not have to be backed up, then it is considered static. In this case, please see our page discussing an option that allows you to store large, static data in a folder called /static on our systems:
Grad Net Static Datashare
We only have a finite amount of static storage space to be shared amongst the department, so not all requests to use /static can be approved, or if approved, it might only be allowed for a limited time. We are not likely to be able to host datasets in excess of 250 GiB (and not all datasets less than that in size are necessarily feasible, either).
NOTE: there is no equivalent static datashare available for the CS Ugrad Net.

Dynamic Data

Large dynamic data storage is typically discouraged and would have to be approved on a case-by-case basis, based upon how it is being used for CS academic course work, etc. Large dynamic data also puts a strain on our backup and archival system resources and should only be in place for a limited time. (Size and number of dynamic files comes into play when considering their affect on backups.) And of course, you are sharing storage resources with those using the CS Dept's Grad Net resources.


Undergrads must first contact CS IT Support

If you receive or generate more that 10gb of data for a project, etc, on our systems, no matter if the data is static (not changing) or dynamic (potentially changing), there are things to consider and things you must do.
Most CS Undergrad users already have an 10gb disk quota. So, before you can store any large datasets on the Ugrad net, you need your quota updated for the time period you need to store that data.
Therefore, you will need to contact us with information we require...
Please email support@cs.jhu.edu before-hand to let us know:
  • How much data you will be storing.
  • What size disk quota you need yours to be raised to (in case you anticipate more space being needed.)
  • When you plan to put the data in place.
  • How long you expect to keep the data in place.
  • If your backup is static (will not change) or dynamic (will change.)
  • What course this is for.
  • Who is the instructor for the course.
Then, Decide If Your Large Data Needs To Be Backed Up On Our Systems
As mentioned in the INTRO above, we back up data to our backup system on a regular basis.
  • If your large data is dynamic (changing), you might want it to be backed up regularly. You need to let us know that.
  • If your large data is static (it won't be changing), let us know that, and then, to help us conserve backup system resources, we ask that you set up your data to not get backed up on our backup system. To do so, please visit: Excluding Select Folders From Being Backed Up

And Finally...

When you are done with your data and no longer need it stored, please remove it, and let us know that you've done so.

Other places to store (usually, smaller amounts of) data on our systems

  • Files in your home directory are available on all of our client systems that share the account. Such files are also backed up as described on our Backups page. But, remember, this home directory space is for files/file sets that are not large, as mentioned earlier in this page.
  • Each Linux client has a /scratch directory. Files put there are only accessible on the Linux client where they're stored. Not only are the files there not backed up, they will be automatically deleted during our annual client OS upgrade process (and may be deleted at other times as circumstances warrant).
  • Each Linux client also has a /tmp directory. Like /scratch, files put in /tmp are only accessible on the Linux client where they're stored. Files in /tmp are automatically cleared out much more frequently than files in /scratch.