Storing Large Amounts Of Data On Our Systems
Revision as of 01:14, 6 October 2023 by Steve410 (talk | contribs) (→CS Grad students, Faculty, Staff, Researchers:)
INTRO
- Our lab systems utilize disk storage that is shared by all in the CS department.
- There might be times when you receive large amounts of data for a project you're working on in CS. Because our disk space is shared by others, we must all be conscious of the amount of data we store.
- We consider large amounts of data to be those that are more than 10gb in size or a complete data set made up of many files that, in total, are over 10gb in size.
- Additionally, for static data (data that does not change), because we have a backup system that backs up our data on a nightly, weekly and monthly basis, it's important that we minimize the number of times the backup system needs to backup the same large data over and over again. This will help reduce backup time and save backup storage resources.
- NOTE: Both the Grad and UGrad nets use a different, special data storage location for static data specifically provided from CS professors to their classes for course assignments (and that's usually not big data.) And that should not be confused with the Grad Net static datashare described on this page. So, if you're teaching a course and need to provide static data to the students, that is a different situation, and you should please email support@cs.jhu.edu to arrange access.
If You Need To Store A Large Amount Of Data On Our Systems...
CS Grad students, Faculty, Staff, Researchers:
Static Data
- If the large amount of data you receive or generate does not change or does not have to be backed up, then it is considered static. In this case, please see our page discussing an option that allows you to store large, static data in a folder called /static on our systems:
- NOTE: there is no equivalent static datashare available for the CS Ugrad Net.
Undergrads must first contact CS IT Support
- If you receive or generate more that 10gb of data for a project, etc, on our systems, no matter if the data is static (not changing) or dynamic (potentially changing), there are things to consider and things you must do.
- Most CS Undergrad users already have an 10gb disk quota. So, before you can store any large datasets on the Ugrad net, you need your quota updated for the time period you need to store that data.
- Therefore, you will need to contact us with information we require...
- Please email support@cs.jhu.edu before-hand to let us know:
- How much data you will be storing.
- What size disk quota you need yours to be raised to (in case you anticipate more space being needed.)
- When you plan to put the data in place.
- How long you expect to keep the data in place.
- If your backup is static (will not change) or dynamic (will change.)
- What course this is for.
- Who is the instructor for the course.
Then, Decide If Your Large Data Needs To Be Backed Up On Our Systems
- As mentioned in the INTRO above, we back up data to our backup system on a regular basis.
- If your large data is dynamic (changing), you might want it to be backed up regularly. You need to let us know that.
- If your large data is static (it won't be changing), let us know that, and then, to help us conserve backup system resources, we ask that you set up your data to not get backed up on our backup system. To do so, please visit: Excluding Select Folders From Being Backed Up
And Finally...
- When you are done with your data and no longer need it stored, please remove it, and let us know that you've done so.
Other places to store (usually, smaller amounts of) data on our systems
- Files in your home directory are available on all of our client systems that share the account. Such files are also backed up as described on our Backups page. But, remember, this home directory space is for files/file sets that are not large, as mentioned earlier in this page.
- Each Linux client has a
/scratch
directory. Files put there are only accessible on the Linux client where they're stored. Not only are the files there not backed up, they will be automatically deleted during our annual client OS upgrade process (and may be deleted at other times as circumstances warrant). - Each Linux client also has a
/tmp
directory. Like/scratch
, files put in/tmp
are only accessible on the Linux client where they're stored. Files in/tmp
are automatically cleared out much more frequently than files in/scratch
.