A step by step guide to Caffe
Caffe is a great and very widely used framework for deep learning, offers a vast collection of out-of-the-box layers and an amazing “model zoo”, however, it’s also famous for its lack of documentation.
After playing around with it for a few days, I felt that it would be great to share what I did to get everything up and running, especially some small hacks that I had to hunt around to get. I think it could well serve as a tutorial.
Before we get started, I strongly recommend going through This Course to get a theoretical primer about convolutional neural networks. The course has a great tutorial on Caffe as well, although it’s somewhat less detailed.
Now let’s get started.
- Get a desktop with a nice GPU!
Although most deep learning platforms can be run on CPU, it’s in general much slower. My personal experience is around 50-100 times slower, and I timed it once: 160 images per 35 seconds versus 5120 images in 26 seconds.
Up to this point I still think building a PC with a decent GPU is the best option (more flexible on IDE’s, flexible time of computation etc.), but if you don’t like to spend money on hardware, you can also use AWS or Terminal.com instead.
I found this great blog post on GPU selections. Take a look if you are wondering which GPU to buy.
- Install Caffe
The official documentation of Caffe has a pretty detailed instruction on installing Caffe here; they also provided a few platform specific step by step tutorials (e.g. here). Currently Ubuntu is best supported.
Thanks to the life saving apt-get install command, I was able to install most of the dependencies effortlessly. The Caffe package itself, however, needs to be compiled locally, some prior knowledge about make would help (not recommended to use the cmake script, it caused some issue for me).
- Data Preparation
Caffe is a high performance computing framework, to get more out of its amazing GPU accelerated training, you certainly don’t want to let file I/O slow you down, which is why a database system is normally used. The most common option is to use lmdb.
If you don’t have experience with database systems, basically lmdb will be a huge file on your computer once you finished data preparation. You can query any file from it using a scripting language, and reading big chunks of data from it is much faster can reading a file.
Caffe has a tool
convert_imageset to help you build lmdb from a set of images. Once you build your Caffe, the binary will be under
/build/tools. There’s also a bash script under
/caffe/examples/imagenet that shows how to use
You can also check out my recent post on how to write images into lmdb using Python.
Either way, once you are done, you’ll get two folders like below. The
data.mdb files will be very large, that’s where your images went.
- Setting up the model and the solver
Caffe has a very nice abstraction that separates neural network definitions (models) from the optimizers (solvers). A model defines the structure of a neural network, while a solver defines all information about how gradient descent will be conducted.
A typical model looks like this (note that the lmdb files we generated are specified here!):
This is actually a part of the AlexNet, you can find its full definition under
If you use Python, install
graphviz (install both the actuall
apt-get, and also the python package under the same name), you can use a script
/caffe/python/draw_net.py to visualize the structure of your network and check if you made any mistake in the specification.
The resulting image will look like this:
Once the neural net is set up and hooked up with the lmdb files, you can write a
solver.prototxt to specify gradient descent parameters.
- Start training
So we have our model and solver ready, we can start training by calling the
note that we only need to specify the solver, because the model is specified in the solver file, and the data is specified in the model file.
We can also resume from a snapshot, which is very common (imaging if you are playing Assasin’s Creed and you need to start from the beginning everytime you quit game…):
or to fine tune from a trained network:
- Logging your performance
Optional: Try out NVIDIA DIGITS, a web based GUI for deep learning.
Once the training starts, Caffe will print training loss and testing accuracies in a frequency specified by you, however, it would be very useful to save those screen outputs to a log file so we can better visualize our progress, and that’s why we have those funky things in the code block above:
This half line of code uses a command called
tee to “intercept” the data stream from stdout to the screen, and save it to a file.
Now the cool things: Caffe has a script (
/caffe/tools/extra/parse_log.py) to parse log files and return two much better formatted files.
And with a little bit trick, you can automate the parsing process and combine it with curve plotting using a script like this:
gnuplot_commands is a file that stores a set of gnuplot commands.
A sample result looks like this:
You can call the
visualize_log.sh command at any time during training to check the progress. Even better, with more tweaks, we can make this plot live:
There are a lot of things to talk about babysitting the training process, it’s out of the scope of this post though. The class notes from Stanford (here) has had it very well explained, take a look if you are interested.
The training process involves a search for multiple hyperparameters (as described in the solver), it’s actually quite complicated and requires certain level of experience to get the best training results.
- Deploy your model
Finally, after all the training process, we will like to use it in actual prediction. There are multiple ways of doing so, here I will describe the Pythonic way:
You’ll need a
deploy.prototxt file to perform testing, which is quite easy to create, simply remove the data layers and add an input layer like this:
you can find a few examples in
- That’s It!
This post describes how I conduct Caffe training, with some details explained here and there, hopefully it can give you a nice kickstart.
Caffe has a mixture of command line, Python and Matlab interfaces, you can definitely create a different pipeline that works best for you. To really learn about Caffe, it’s still much better to go through the examples under
/caffe/examples/, and to checkout the official documentation, although it’s still not very complete yet.
PhD student at USC working on computer vision.