GCP Tutorial: Cloud Storage FUSE to mount GCP Cloud Storage buckets as file systems on Linux OS

First, I would like to say this method is not recommended for critical or highly performant environment (small random reads). It depends on your requirement, use it with cautious.

I use this method because I understand my workload is not too critical and I don’t need a highly performant environment. I just need the semantic of filesystem to expose my google cloud storage bucket.

While Cloud Storage FUSE has a file system interface, it is not like an NFS or CIFS file system on the backend. Cloud Storage FUSE retains the same fundamental characteristics of Cloud Storage, preserving the scalability of Cloud Storage in terms of size and aggregate performance while maintaining the same latency and single object performance. As with the other access methods, Cloud Storage does not support concurrency and locking. For example, if multiple Cloud Storage FUSE clients are writing to the same file, the last flush wins.

The main reason why I am using this architecture is to save cost while creating a shared file system storage for my application. Actually, the best way is to store external files that can be mounted to Google Compute Engine is to use Google Cloud Filestore. But, since this project is hosted on my GCP personal account – so I would like to maintain the cost as low as possible (read: near 0 every month) and not provisioning google file storage. Please don’t forget to read the PRO(+) and CONS(-) of using Cloud Storage FUSE on the last section of this post.

To summarize, these are the step by step that I use to configure my files to run on Google Cloud Storage from OS perspective, rather than saving it in Google Block Storage or Google File Storage.

  1. Download Cloud Storage FUSE from VM
  2. Create GCS Bucket
  3. Mount the bucket to OS
  4. Automate to start every reboot

Here are the detail:


Cloud Storage FUSE helps you make better and quicker use of Cloud Storage by allowing file-based applications to use Cloud Storage without rewriting their I/O code. It is ideal for use cases where Cloud Storage has the right performance and scalability characteristics for an application, and only the file system semantics are missing. When deciding if Cloud Storage FUSE is an appropriate solution, there are some additional differences compared to local file systems that you should take into account:

These are some considerations that you need to understand, if you are deciding to use Cloud Storage FUSE compare to regular POSIX system:

(+) Pricing: Cloud Storage FUSE access is ultimately Cloud Storage access. All data transfer and operations performed by Cloud Storage FUSE map to Cloud Storage transfers and operations, and are charged accordingly. See the pricing section below for details before using Cloud Storage FUSE.

(-) Performance: Cloud Storage FUSE has much higher latency than a local file system. As such, throughput may be reduced when reading or writing one small file at a time. Using larger files and/or transferring multiple files at a time will help to increase throughput.

  • Individual I/O streams run approximately as fast as gsutil.
  • The gsutil rsync command can be particularly affected by latency because it reads and writes one file at a time. Using the top-level -m flag with the command is often faster.
  • Small random reads are slow due to latency to first byte (don’t run a database over Cloud Storage FUSE!)
  • Random writes are done by reading in the whole blob, editing it locally, and writing the whole modified blob back to Cloud Storage. Small writes to large files work as expected, but are slow and expensive. Note: One not so obvious place to consider this is when benchmarking Cloud Storage FUSE. Many benchmarking tools use a mix of random and sequential writes as default settings. Make sure to tune any benchmarking tools to sequential I/O when running against a bucket mounted by Cloud Storage FUSE.

(-) Metadata: Cloud Storage FUSE does not transfer metadata along with the file when uploading to Cloud Storage. This means that if you wish to use Cloud Storage FUSE as an uploading tool, you will not be able to set metadata such as content type and ACLs as you would with other uploading methods. If metadata properties are critical, considering using gsutil, the JSON API or the Google Cloud Console. The exception to this is that Cloud Storage FUSE does store mtime and symlink targets.

(-) Concurrency: There is no concurrency control for multiple writers to a file. When multiple writers try to replace a file the last write wins and all previous writes are lost – there is no merging, version control, or user notification of the subsequent overwrite.

(-) Linking: Cloud Storage FUSE does not support hard links.

(-) Semantics: Some semantics are not exactly what they would be in a traditional file system. The list of exceptions is here. For example, metadata like last access time are not supported, and some metadata operations like directory rename are not atomic.

(-) Access: Authorization for files is governed by Cloud Storage permissions. POSIX-style access control does not work.

(-) Availability: Transient errors do at times occur in distributed systems like Cloud Storage, leading to less than 100% availability. It is recommended that retries be attempted using the guidelines of truncated exponential backoff.

(-) Local storage: Objects that are new or modified will be stored in their entirety in a local temporary file until they are closed or synced. When working with large files, be sure you have enough local storage capacity for temporary copies of the files, particularly if you are working with Google Compute Engine instances. For more information, see the readme documentation.

(-) Directories: By default, only directories that are explicitly defined (that is, they are their own object in Cloud Storage) will appear in the file system. Implicit directories (that is, ones that are only parts of the pathname of other files or directories) will not appear by default. If there are files whose pathname contain an implicit directory, they will not appear in the overall directory tree (since the implicit directory containing them does not appear). A flag is available to change this behavior. For more information, see the semantics documentation.

Notes:

  • Other cost effective alternative that you can do, is to create a virtual server and install NFS server on it. The guidance can be read in here: https://medium.com/@ngoodger_7766/nfs-filestore-on-gcp-for-free-859593e18bdf

Reference: https://cloud.google.com/storage/docs/gcs-fuse

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *