PowerShell, MongoDB and GridFS File Uploads

Posted on Posted in MongoDB, PowerShell

As part of one of the startups I’m working on our web front-end architecture is based on Microsoft’s ASP.NET MVC 3 framework, but our back-end storage is MongoDB. We’re obviously sitting outside of the traditional Open Source camp by using Microsoft technology as our web framework, even though ASP.NET is now open sourced. You don’t get all the goodness of the rich set of community snipits that other more traditional frameworks (Rails, Django) provide.

With that in part we’ve had to write a lot of helper scripts to make our lives just that little bit easier. One area that definately needs some help is MongoDB’s GridFS with PowerShell.

A little background to GridFS first.

GridFS
“GridFS is a storage specification for large objects in MongoDB. It works by splitting large object into small chunks, usually 256k in size. Each chunk is stored as a separate document in a chunks collection. Metadata about the file, including the filename, content type, and any optional information needed by the developer, is stored as a document in a files collection.”

Reference:
http://www.mongodb.org/display/DOCS/GridFS

With MongoDB you get a simple tool called mongofiles which allows you you to perform basic commands such as, GET, PUT and LIST. This is great if you want this simplicity else you need to roll your own solution. In our case we wanted to reuse some of our PowerShell scripts to create some richer tools.

With that in mind here is the first post of several with working with PowerShell, MongoDB and GridFS.

Load your MongoDB C# Driver
Grab it from the MongoDB CSharp Language Center


#
# Load MongoDB C# Driver
#
Add-Type -Path "D:\Kristof\mongodb\MongoDB.Bson.dll"
Add-Type -Path "D:\Kristof\mongodb\MongoDB.Driver.dll"

Find Your Files
As part of being a good citizen you need to ensure you have the correct Content Types when you upload to GridFS. I generally do a simple Get-Unique on a directory, use the output to generate a hash table of my Content Types.


#
# Find File Types
#
$files = Get-ChildItem -Path "D:\Kristof\Uploads"
Get-ChildItem "D:\Kristof\Uploads" | Select-Object Extension | Sort-Object Extension | Get-Unique -AsString

Create Content Type Hash Table
This is our hash table from the above output. Create an entry for each file extension it lists.


#
# Create Has Table of Content Types
#
$fileType = @{
"avi" = "video/x-msvideo";
"exe" = "application/octet-stream";
"html" = "text/html";
"jpg" = "image/jpeg";
"js" = "application/x-javascript";
"mp3" = "audio/mpeg";
"pdf" = "application/pdf";
"png" = "image/png";
"txt" = "text/plain";
"xml" = "application/xml";
"zip" = "application/zip"
}

Upload Files to GridFS
Now that we have our hash table sorted we’re ready for the upload process to start.


#
# Connect to MongoDB
#
$db = [MongoDB.Driver.MongoDatabase]::Create('mongodb://server01/mydatabase?safe=true')

#
# Upload Files to GridFS
#
$files = Get-ChildItem -Path "D:\Kristof\Uploads"
$number = $files.Count
$x = 0
foreach ($file in $files) {
$x++
$fileName = $file.FullName
$fs = new-object System.IO.FileStream($fileName,[System.IO.FileMode]'Open')
$fileOptions = new-object MongoDB.Driver.GridFS.MongoGridFSCreateOptions
$fileOptions.ContentType = $fileType.Item($($file.Extension).Substring(1,3))

Write-Progress -activity "Copying file $file ..." -status "Copying $x of $number"

$db.GridFS.Upload($fs, $file.Name, $fileOptions) | Out-Null
$fs.Close()
}

Simple as that. GridFS is a wonderfully simple technology to use and when used in the correct method it’s a real winner.

Enjoy,

Kristof

4 thoughts on “PowerShell, MongoDB and GridFS File Uploads

Leave a Reply

Your email address will not be published. Required fields are marked *