Storing Large Amounts Of Data On Our Systems

Revision as of 13:49, 13 May 2020 by Steve410 (talk | contribs)

INTRO

Our lab systems utilize disk storage that is shared by all in the CS department.
There might be times when you receive large amounts of data for a project you're working on in CS. Because our disk space is shared by others, we must all be conscious of the amount of data we store.
We consider large amounts of data to be those that are more than 10gb in size or a complete data set made up of many files that, in total, are over 10gb in size.
Additionally, for static data (data that does not change), because we have a backup system that backs up our data on a nightly, weekly and monthly basis, it's important that we minimize the number of times the backup system needs to backup the same large data over and over again. This will help reduce backup time and save backup storage resources.
NOTE: Both the Grad and UGrad nets use a different, special data storage location for static data specifically provided from CS professors to their classes for course assignments (and that's usually not big data.) And that should not be confused with the Grad Net static datashare described on this page. So, if you're teaching a course and need to provide static data to the students, that is a different situation, and you should please email support@cs.jhu.edu to arrange access.

If You Need To Store A Large Amount Of Data On Our Systems...

CS Grad students, Faculty, Staff, Researchers:

Static Data

If the large amount of data you receive or generate does not change or does not have to be backed up, then it is considered static. In this case, please see our page discussing an option that allows you to store this static data in a folder called /static on our systems:
Grad Net Static Datashare
NOTE: there is no equivalent static datashare available for the CS Ugrad Net.

Dynamic Data

If your large data will change or be backed up regularly, it's most likely dynamic. If that's the case, and the dynamic data is more that 10gb of data for a project, etc, on our systems, please email support@cs.jhu.edu before-hand to let us know:
  • How much data you will be storing in your home directory.
  • When you plan to put the data in place.
  • How long you expect to keep the data in place.
To backup or not to backup your dynamic data with our backup systems.
For a large amount of dynamic data, you must also decide if that data needs to be backed up. As mentioned in the INTRO above, we back up data to our backup system on a regular basis.
  • If your large data is dynamic (changing), and you need it to be backed up regularly, please let us know you need it backed up when you email us.

Undergrads must first contact CS IT Support

If you receive or generate more that 10gb of data for a project, etc, on our systems, no matter if the data is static (not changing) or dynamic (potentially changing), there are things to consider and things you must do.
Most CS Undergrad users already have an 8gb disk quota. So, before you can store any large datasets on the Ugrad net, you need your quota updated for the time period you need to store that data.
Therefore, you will need to contact us with information we require...
Please email support@cs.jhu.edu before-hand to let us know:
  • How much data you will be storing.
  • What you need to have your disk quota raised to.
  • When you plan to put the data in place.
  • How long you expect to keep the data in place.
  • If your backup is static (will not change) or dynamic (will change.)
  • What course this is for.
  • Who is the instructor for the course.
Then, Decide If Your Large Data Needs To Be Backed Up On Our Systems
As mentioned in the INTRO above, we back up data to our backup system on a regular basis.
  • If your large data is dynamic (changing), you might want it to be backed up regularly. You need to let us know that.
  • If your large data is static (it won't be changing), let us know that, and then, to help us conserve backup system resources, we ask that you set up your data to not get backed up on our backup system. To do so, please visit: Excluding Select Folders From Being Backed Up

And Finally...

When you are done with your data and no longer need it stored, please remove it, and let us know that you've done so.


Other places to store (usually, smaller amounts of) data on our systems

  • Files in your home directory are available on all of our client systems that share the account. Such files are also backed up as described on our Backups page. But, remember, this home directory space is for files/file sets that are not large, as mentioned earlier in this page.
  • Each Linux client has a /scratch directory. Files put there are only accessible on the Linux client where they're stored. Not only are the files there not backed up, they will be automatically deleted during our annual client OS upgrade process (and may be deleted at other times as circumstances warrant).
  • Each Linux client also has a /tmp directory. Like /scratch, files put in /tmp are only accessible on the Linux client where they're stored. Files in /tmp are automatically cleared out much more frequently than files in /scratch.