Image by author
# How does collab work?
google collab Python is an incredibly powerful tool for data science, machine learning, and development. This is because it takes away the headache of local setup. However, one area that often confuses beginners and sometimes even intermediate users is file management.
Where do the files live? Why do they disappear? How do you upload, download or permanently store data? This article answers them all, step by step.
Let’s clear up the biggest misconception right away. Google Colab doesn’t work like your laptop. Every time you open a notebook, Colab gives you a temporary virtual machine (VM). Once you leave, everything inside is cleared. This means:
- Locally saved files are temporary
- When runtime resets, files are gone
Your default working directory is:
whatever you save inside /content Will disappear once the runtime is reset.
# View files in Colab
You have two easy ways to view your files.
// Method 1: Using the Visual Way
This is the recommended approach for beginners:
- View left sidebar
- Click on the folder icon
- browse inside
/content
This is great when you just want to see what’s going on.
// Method 2: Using the Python Way
This is useful when you are scripting or debugging paths.
import os
os.listdir('/content')
# Uploading and downloading files
Let’s say you have a dataset or comma separated values ​​(CSV) file on your laptop. The first method is to upload using the code.
from google.colab import files
files.upload()
A file picker opens, you select your file, and it appears /content. This file is temporary until it is moved somewhere else.
The second method is drag and drop. This method is simple, but the storage remains temporary.
- Open File Explorer (left panel)
- drag files directly in
/content
To download the file from Colab to your local machine:
from google.colab import files
files.download('model.pkl')
Your browser will download the file immediately. This works for csv, models, logs and images.
If you want your files to survive runtime reset, you should use Google Drive. To mount Google Drive:
from google.colab import drive
drive.mount('/content/drive')
Once you’ve authorized access, your drive appears here:
Anything saved here is permanent.
# Recommended project folder structure
A disorganized drive becomes painful very quickly. A neat structure you can reuse is:
MyDrive/
└── ColabProjects/
└── My_Project/
├── data/
├── notebooks/
├── models/
├── outputs/
└── README.md
To save time, you can use the following paths:
BASE_PATH = '/content/drive/MyDrive/ColabProjects/My_Project'
DATA_PATH = f'{BASE_PATH}/data/train.csv'
To save a file permanently Panda: :
import pandas as pd
df.to_csv('/content/drive/MyDrive/data.csv', index=False)
To load a file later:
df = pd.read_csv('/content/drive/MyDrive/data.csv')
# File management in Colab
// Working with Zip Files
To extract the zip file:
import zipfile
with zipfile.ZipFile('dataset.zip', 'r') as zip_ref:
zip_ref.extractall('/content/data')
// Using Shell Commands for File Management
Colab supports using Linux shell commands !.
!pwd
!ls
!mkdir data
!rm file.txt
!cp source.txt destination.txt
This is very useful for automation. Once you get used to it you will use it again and again.
// Downloading files directly from the internet
Instead of uploading manually, you can use wget: :
!wget https://example.com/data.csv
or are using Demand Library in Python:
import requests
r = requests.get(url)
open('data.csv', 'wb').write(r.content)
It is highly effective for datasets and pre-trained models.
# Additional Considerations
// storage limitations
You should be aware of the following limitations:
- Colab VM disk space is around 100GB (temporary)
- Google Drive storage is limited by your personal quota
- Browser-based upload limit is approximately 5 GB
For large datasets, always plan in advance.
// best practices
- Mount drive at the beginning of the notebook
- Use variables for paths
- Keep raw data read-only
- Separate data, models and output into separate folders
- Add a README file to your future
// When not to use Google Drive
Avoid using Google Drive when:
- Training on extremely large datasets
- High-speed I/O is critical for performance
- You need distributed storage
Options you can use in these cases include:
# final thoughts
Once you understand how Colab file management works, your workflow becomes more efficient. There’s no need to worry about lost files or rewriting code. With these tools, you can ensure clean experiments and smooth data transitions.
Kanwal Mehreen He is a machine learning engineer and a technical writer with a deep passion for the intersection of AI with data science and medicine. He co-authored the eBook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she is an advocate for diversity and academic excellence. She has also been recognized as a Teradata Diversity in Tech Scholar, a Mitex GlobalLink Research Scholar, and a Harvard VCode Scholar. Kanwal is a strong advocate for change, having founded FEMCodes to empower women in STEM fields.
