Data recovery
The data in studies is stored in a versioned S3 bucket. This means that all versions are kept and files are not actually deleted, only hidden from view.
Data Recovery using Windows File Explorer
On the desktop is a script called Mount Data Drives for Recovery. Run that script and extra folders will appear in "D:" containing all versions of files, with the filenames changed to include the timestamp when the version was created/deleted.
Once you have found the version of the file you need, you can copy it back to its original location (without the timestamp in the name), or rename it to something completely different.
Data Recovery using the command line (advanced)
For most users, the above section using Windows File Explorer should meet their needs.
This example uses a Windows desktop within the environment to find and restore a version of a file; however, you can use the same process with any desktop type.
First list the files in the directory you are looking to recover, using rclone's
ls
command:
rclone ls --s3-versions --config C:\workdir\rclone.conf STUDY_NAME:AWS_ACCOUNT-treprod-REGION-PROJECT-studydata/studies/STUDY_TYPE/STUDY_NAME/
If you have lots of files to look through, then lsf
or lsjson
might be useful.
The actual value to use should be copied from the Mount Data Drive script on the desktop; however, if you are curious, they are:
Placeholder | Meaning | Example |
---|---|---|
AWS_ACCOUNT | The number of the aws account your studies live in | 1234567890 |
PROJECT | The short name of your TRE environment | tre1 |
REGION | A short code for the region where the data is stored | ldn |
STUDY_NAME | The name of the study | ExampleStudy |
STUDY_TYPE | Can be Organization or User | Organization |
Here is an example where a file has been overwritten: testfile.txt
is the latest
version and testfile-v2024-01-24-111334-000.txt
is the previous version.
If you know when the file version you want was written, the timestamp in the filename may be useful:
28 testfile.txt
11 testfile-v2024-01-24-111334-000.txt
0 testfile-v2024-01-24-111324-000.txt
Here is an example where the file has been deleted:
28 testfile-v2024-01-24-111731-000.txt
11 testfile-v2024-01-24-111334-000.txt
0 testfile-v2024-01-24-111324-000.txt
Next copy the version you want to somewhere local to examine:
rclone copy --s3-versions --config C:\workdir\rclone.conf STUDY_NAME:AWS_ACCOUNT-treprod-REGION-PROJECT-studydata/studies/STUDY_TYPE/STUDY_NAME/testfile-v2024-01-24-111334-000.txt c:\
The file will be in "C:" – if you have many files you might want to make a folder first and copy them there.
Once you have checked the file is the version you are after, you can either copy what you need from it or copy it back to a study.
If you experience issues recovering data or need a hand please contact us in the LSE-Trusted-Research-Environments Teams channel