Storing Large Amounts Of Data On Our Systems

Revision as of 20:00, 12 May 2020 by Steve410 (talk | contribs)

INTRO

Our lab systems utilize disk storage that is shared by all in the CS department.
There might be times when you receive large amounts of data for a project you're working on in CS. Because our disk space is shared by others, we must all be conscious of the amount of data we store.

We consider large amounts of data to be those that are more than 10gb in size or a complete data set made up of many files that, in total, are over 10gb in size.

Additionally, for static data (data that does not change), because we have a backup system that backs up our data on a nightly, weekly and monthly basis, it's important that we minimize the number of times the backup system needs to backup the same large data over and over again. This will help reduce backup time and save backup storage resources.

If You Need To Store A Large Amount Of Data On Our Systems...

CS Grad students, Faculty, staff, researchers:

Static Data

If the large amount of data you receive or generate does not have to be backed up, then it is considered static. In this case, please see our page discussing the:
Grad Net Static Datashare
NOTE: there is no equivalent datashare available for the CS Ugrad Net.

Dynamic Data

If your large data will change or be backed up regularly, it's most likely dynamic. If that's the case, and the dynamic data is more that 10gb of data for a project, etc, on our systems, please email support@cs.jhu.edu before-hand to let us know:
  • How much data you will be storing in your home directory.
  • When you plan to put the data in place.
  • How long you expect to keep the data in place.
To backup or not to backup your dynamic data with our backup systems.
For a large amount of dynamic data, you must also decide if that data needs to be backed up. As mentioned in the INTRO above, we back up data to our backup system on a regular basis.
  • If your large data is dynamic (changing), and you need it to be backed up regularly, please let us know you need it backed up when you email us.

Undergrads must first contact CS IT Support

If you receive or generate more that 10b of data for a project, etc, on our systems, no matter if the data is static (not changing) or dynamic (potentially changing), there are things to consider and things you must do.
Most CS Undergrad users already have an 8gb disk quota. So, before you can store any large datasets on the Ugrad net, you need your quota updated for the time period you need to store that data.
Therefore, you will need to contact us with information we require...

Please email support@cs.jhu.edu before-hand to let us know:

  • How much data you will be storing.
  • What you need to have your disk quota raised to.
  • When you plan to put the data in place.
  • How long you expect to keep the data in place.
  • If your backup is static (will not change) or dynamic (will change.)
  • What course this is for.
  • Who is the instructor for the course.
Then, Decide If Your Large Data Needs To Be Backed Up On Our Systems
As mentioned in the INTRO above, we back up data to our backup system on a regular basis.
  • If your large data is dynamic (changing), you might want it to be backed up regularly. You need to let us know that.
  • If your large data is static (it won't be changing), let us know that, and then, to help us conserve backup system resources, we ask that you set up your data to not get backed up on our backup system. To do so, please visit: Excluding Select Folders From Being Backed Up

And Finally...

When you are done with your data and no longer need it stored, please remove it, and let us know that you've done so.