Difference between revisions of "Grad Net Static Datashare"
(48 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
==INTRO== | ==INTRO== | ||
− | + | If you haven't already done so, please review our page on [[Storing Large Amounts Of Data On Our Systems]] | |
− | If you're storing static files, then, they should not have to be backed up or mirrored (as we do with our home directories), as you can re-obtain those files when necessary. Therefore, we have created an alternative... a '''Grad Net static datashare'''. | + | There might times where, for a class or for your light research, you might need to download or import a large set of data (either very big files or a large number of files), where that data is static, i.e, ''will not be modified''. The problem is that when you download ''very large'' static files or a data set with ''lots'' of static files to your Grad Net home directory, you are using a significant amount of disk resources shared by the much of the CS Dept, especially, the backup resources our systems use. (Backing up ''very large'' files or ''lots'' of smaller files puts a load on our storage/backup resources.) |
+ | |||
+ | If you're storing static files, then, they should not have to be backed up or mirrored (as we do with our home directories), as you can re-obtain those files when necessary from your original sources. Therefore, we have created an alternative... a '''Grad Net static datashare'''. | ||
+ | |||
+ | ::'''NOTE:''' Both the Grad and UGrad nets use a different, special data storage location for static data specifically provided from CS professors to their classes for course assignments (and that's usually not big data.) And that should not be confused with the Grad Net static datashare described on this page. So, if you're teaching a course and need to provide static data to the students, that is a different situation, and you should please email {{mailto|support}} to arrange access. | ||
==About the Grad Net Static Datashare== | ==About the Grad Net Static Datashare== | ||
Line 16: | Line 20: | ||
**Consider anything '''10gb and over''' in file size to be large | **Consider anything '''10gb and over''' in file size to be large | ||
* Could instead be a '''large''' ''number'' of files. | * Could instead be a '''large''' ''number'' of files. | ||
− | ** Consider a data set with '''more than | + | ** Consider a data set with '''more than 1,000 files''' to be large (regardless of filesize) |
* Could be a complete dataset/folder with a moderate number of data files, but whose combined file sizes make it a large folder. | * Could be a complete dataset/folder with a moderate number of data files, but whose combined file sizes make it a large folder. | ||
** Consider a dataset or folder consisting of a reasonable number of static files in that folder, but, where the ''total'' size of the data in the folder is 10gb or more, to be large. | ** Consider a dataset or folder consisting of a reasonable number of static files in that folder, but, where the ''total'' size of the data in the folder is 10gb or more, to be large. | ||
Line 26: | Line 30: | ||
* Should not be deleted by automatic system cleanup processes | * Should not be deleted by automatic system cleanup processes | ||
− | For example, if there's a 30 gigabyte data set you want to download and | + | For example, if there's a 30 gigabyte data set you want to download and utilize, you could use <code>/static</code> to hold the data while you're working with it. You would put any ''programs'' you write to analyze the data in your home directory. If something happened to the department's servers, we would restore your ''programs'' from our backups (since we back up the home directories), and you could re-download the data from wherever you got it originally (since we do not backup the static datastore.) |
Therefore, putting your static data into <code>/static</code> helps the department because it limits the expense of backing up excess data unnecessarily. | Therefore, putting your static data into <code>/static</code> helps the department because it limits the expense of backing up excess data unnecessarily. | ||
Line 32: | Line 36: | ||
== How to Use the Static Datashare == | == How to Use the Static Datashare == | ||
− | + | Every Grad Net account has a directory under <code>/static</code>, with the same name as the account. E.g. for an account named <code>jdoe15</code>, the static data directory is <code>/static/jdoe15</code>. You can just put your static data into that directory instead of into your home directory. | |
− | + | Your static data directory is available on any of our [[Linux Clients on the CS Grad/Research Net|''Grad Net'' Linux clients]]. | |
− | + | ===Our ''www.cs.jhu.edu'' website can ''also'' see your /static data share!=== | |
− | |||
− | |||
− | |||
In addition to being available on the Grad Net Linux clients, <code>/static</code> is also available on [[:Category:Webpages and Webservices|the department webserver]], so you can have PHP/CGI programs that make use of data in <code>/static</code>. (And if you want to serve files in <code>/static</code> directly, you can just put ''symbolic links'' to them from somewhere under your <code>~/public_html</code> directory.) | In addition to being available on the Grad Net Linux clients, <code>/static</code> is also available on [[:Category:Webpages and Webservices|the department webserver]], so you can have PHP/CGI programs that make use of data in <code>/static</code>. (And if you want to serve files in <code>/static</code> directly, you can just put ''symbolic links'' to them from somewhere under your <code>~/public_html</code> directory.) | ||
Line 47: | Line 48: | ||
The usual restrictions on use of CS Department resources apply; any data stored in <code>/static</code> should be related to the department's educational or (light) research activities. | The usual restrictions on use of CS Department resources apply; any data stored in <code>/static</code> should be related to the department's educational or (light) research activities. | ||
− | This service is only available to people with CS Grad Net accounts. '''There is no analogous service for people with only | + | This service is only available to people with CS ''Grad Net'' accounts. '''There is no analogous service for people with only a CS ''Ugrad Net'' account.''' |
− | ===How much can I store in /static?=== | + | ===How much data can I store in ''/static?''=== |
− | Every CS Grad account has a quota to limit overuse of the /static system. The default quota is 50 gigabytes. Currently, we do not have a way for you to examine your /static disk quota, so, you will need to keep track of the amount of data you have stored there. | + | Every CS Grad Net account has a ''quota'' to limit overuse of the ''/static'' system. '''The default user quota for <code>/static</code> is ''50 gigabytes''.''' Currently, we do not have a way for you to examine your <code>/static</code> disk quota, so, you will need to keep track of the amount of data you have stored there. Also, in <code>/static</code>, the ''number'' of files is not an issue, as long as the ''total size'' of all your files in <code>/static</code> is below your quota. |
− | If you need to store more data than the default quota allows, please email us at {{mailto|support}} with a description of | + | '''''If you need to store more data''''' in <code>/static</code> than the default quota allows, please email us at {{mailto|support}} with a description of: |
+ | * What kind of data (is it for course work? light research? other?) | ||
+ | * How much ''total space'' you expect to need, | ||
+ | * How long do you need to store that data. (Quota increases are not guaranteed.) | ||
− | '''NOTE:''' CS Grad Net ''home'' directories '''''do not''''' have disk quotas. Quotas on the CS Grad Net only apply to the <code>/static</code> directory. | + | We are unable to store datasets exceeding a total of 250 gigabytes at this time. |
+ | |||
+ | '''NOTE:''' CS Grad Net '''home''' directories '''''do not''''' have disk quotas. Quotas on the CS Grad Net only apply to the <code>/static</code> directory. | ||
== Backups == | == Backups == | ||
− | Although the data in <code>/static</code> is not backed up via our usual mechanisms, there is a limited self-service backup system similar to our home directories' [[Self-Service File Recovery]]. See that page for instructions. Note that if self-service file restoration does not work, you will have to download or recreate the dataset yourself. | + | We '''do not''' do backups of the <code>/static</code> directory. Therefore, in case your files ever need to be recovered, you should make sure you have access to the original source for your <code>/static</code> datasets or make a separate backup somewhere for yourself (just not in your CS '''home''' directory.) |
+ | <!--Although the data in <code>/static</code> is not backed up via our usual mechanisms, there is a limited self-service backup system similar to our home directories' [[Self-Service File Recovery]]. See that page for instructions. Note that if self-service file restoration does not work, you will have to download or recreate the dataset yourself.--> | ||
== Alternatives to the Static Datashare == | == Alternatives to the Static Datashare == | ||
Line 64: | Line 71: | ||
There are several other places where you can put files, depending on your needs: | There are several other places where you can put files, depending on your needs: | ||
− | * Files in your home directory are available on all of our [[:Category:Computers Available on the CS Network|client systems]] that share the account. Such files are also backed up as described on our [[:Category:Backups|Backups page]]. | + | * Files in your home directory are available on all of our [[:Category:Computers Available on the CS Network|client systems]] that share the account. Such files are also backed up as described on our [[:Category:Backups|Backups page]]. But, remember, this home directory space is for files/file sets that are '''not''' large, as mentioned earlier in this page. |
− | * (For completeness.) Files in <code>/static</code> are available on all grad net Linux clients. They are not backed up to the same degree that home directories are. | + | <!--* (For completeness.) Files in <code>/static</code> are available on all grad net Linux clients. They are not backed up to the same degree that home directories are.--> |
* Each [[:Category:Linux Clients|Linux client]] has a <code>/scratch</code> directory. Files put there are only accessible on the Linux client where they're stored. Not only are the files there not backed up, they will be automatically deleted during our annual client OS upgrade process (and may be deleted at other times as circumstances warrant). | * Each [[:Category:Linux Clients|Linux client]] has a <code>/scratch</code> directory. Files put there are only accessible on the Linux client where they're stored. Not only are the files there not backed up, they will be automatically deleted during our annual client OS upgrade process (and may be deleted at other times as circumstances warrant). | ||
* Each Linux client also has a <code>/tmp</code> directory. Like <code>/scratch</code>, files put in <code>/tmp</code> are only accessible on the Linux client where they're stored. Files in <code>/tmp</code> are automatically cleared out much more frequently than files in <code>/scratch</code>. | * Each Linux client also has a <code>/tmp</code> directory. Like <code>/scratch</code>, files put in <code>/tmp</code> are only accessible on the Linux client where they're stored. Files in <code>/tmp</code> are automatically cleared out much more frequently than files in <code>/scratch</code>. | ||
+ | |||
+ | [[Category:Linux Clients]] | ||
+ | [[Category:Backups]] |
Latest revision as of 21:53, 15 February 2024
INTRO
If you haven't already done so, please review our page on Storing Large Amounts Of Data On Our Systems
There might times where, for a class or for your light research, you might need to download or import a large set of data (either very big files or a large number of files), where that data is static, i.e, will not be modified. The problem is that when you download very large static files or a data set with lots of static files to your Grad Net home directory, you are using a significant amount of disk resources shared by the much of the CS Dept, especially, the backup resources our systems use. (Backing up very large files or lots of smaller files puts a load on our storage/backup resources.)
If you're storing static files, then, they should not have to be backed up or mirrored (as we do with our home directories), as you can re-obtain those files when necessary from your original sources. Therefore, we have created an alternative... a Grad Net static datashare.
- NOTE: Both the Grad and UGrad nets use a different, special data storage location for static data specifically provided from CS professors to their classes for course assignments (and that's usually not big data.) And that should not be confused with the Grad Net static datashare described on this page. So, if you're teaching a course and need to provide static data to the students, that is a different situation, and you should please email support@cs.jhu.edu to arrange access.
The Grad Net Linux clients have a secondary location for data storage to supplement the primary storage afforded by everyone's home directories. That secondary storage is available under the /static
directory.
That /static directory is the same across all of the Grad Net Linux clients. So, if you place files in /static on one Grad Net Linux client, it will be available in the /static folder of each of the Grad Net Linux clients.
The /static
directory is intended for data that:
- Is larger than a few gigabytes in size
- Consider anything 10gb and over in file size to be large
- Could instead be a large number of files.
- Consider a data set with more than 1,000 files to be large (regardless of filesize)
- Could be a complete dataset/folder with a moderate number of data files, but whose combined file sizes make it a large folder.
- Consider a dataset or folder consisting of a reasonable number of static files in that folder, but, where the total size of the data in the folder is 10gb or more, to be large.
- Does not need to be backed up
- It's static data that you acquired from somewhere, so you can obtain it again if necessary.
- Needs to be available to all CS Grad Net Linux clients
- /static is shared amongst the Grad Net Linux clients
- NOTE: Static datashare storage is not available for the Ugrad Net
- Should not be deleted by automatic system cleanup processes
For example, if there's a 30 gigabyte data set you want to download and utilize, you could use /static
to hold the data while you're working with it. You would put any programs you write to analyze the data in your home directory. If something happened to the department's servers, we would restore your programs from our backups (since we back up the home directories), and you could re-download the data from wherever you got it originally (since we do not backup the static datastore.)
Therefore, putting your static data into /static
helps the department because it limits the expense of backing up excess data unnecessarily.
Every Grad Net account has a directory under /static
, with the same name as the account. E.g. for an account named jdoe15
, the static data directory is /static/jdoe15
. You can just put your static data into that directory instead of into your home directory.
Your static data directory is available on any of our Grad Net Linux clients.
In addition to being available on the Grad Net Linux clients, /static
is also available on the department webserver, so you can have PHP/CGI programs that make use of data in /static
. (And if you want to serve files in /static
directly, you can just put symbolic links to them from somewhere under your ~/public_html
directory.)
The usual restrictions on use of CS Department resources apply; any data stored in /static
should be related to the department's educational or (light) research activities.
This service is only available to people with CS Grad Net accounts. There is no analogous service for people with only a CS Ugrad Net account.
How much data can I store in /static?
Every CS Grad Net account has a quota to limit overuse of the /static system. The default user quota for /static
is 50 gigabytes. Currently, we do not have a way for you to examine your /static
disk quota, so, you will need to keep track of the amount of data you have stored there. Also, in /static
, the number of files is not an issue, as long as the total size of all your files in /static
is below your quota.
If you need to store more data in /static
than the default quota allows, please email us at support@cs.jhu.edu with a description of:
- What kind of data (is it for course work? light research? other?)
- How much total space you expect to need,
- How long do you need to store that data. (Quota increases are not guaranteed.)
We are unable to store datasets exceeding a total of 250 gigabytes at this time.
NOTE: CS Grad Net home directories do not have disk quotas. Quotas on the CS Grad Net only apply to the /static
directory.
Backups
We do not do backups of the /static
directory. Therefore, in case your files ever need to be recovered, you should make sure you have access to the original source for your /static
datasets or make a separate backup somewhere for yourself (just not in your CS home directory.)
There are several other places where you can put files, depending on your needs:
- Files in your home directory are available on all of our client systems that share the account. Such files are also backed up as described on our Backups page. But, remember, this home directory space is for files/file sets that are not large, as mentioned earlier in this page.
- Each Linux client has a
/scratch
directory. Files put there are only accessible on the Linux client where they're stored. Not only are the files there not backed up, they will be automatically deleted during our annual client OS upgrade process (and may be deleted at other times as circumstances warrant). - Each Linux client also has a
/tmp
directory. Like/scratch
, files put in/tmp
are only accessible on the Linux client where they're stored. Files in/tmp
are automatically cleared out much more frequently than files in/scratch
.