Friday, October 31, 2014

Caffe on AWS with Fedora 20, CUDA 6.5 and CUDNN

Here are the steps I ran to test out Caffe on an AWS G2 instance.  The current rates for running a g2.2xlarge is 65 cents/hour and using an EBS general purpose SSD is 10 cents per GB-month.  So running these commands will cost you a couple of dollars.

1) Launch a GPU Instance with an HVM AMI.  Here are some parameters I chose:
  • Datacenter: US East (N. Virginia)
  • AMI: Community AMIs -> Fedora -> Fedora_20_HVM_AMI
  • Instance Type: g2.2xlarge
  • Storage: 30 GB General Purpose SSD EBS volume
I encourage you to create a security group that only allows in SSH from your specific subnet.

2) Connect to your instance once it's running and you have the IP address (nn.nn.nn.nn). I use ssh from my local linux machine with the command that looks something like:

  ssh -X -i key_filename.pem fedora@nn.nn.nn.nn

2) Update and install some basic packages:
  # initial security updates
  sudo yum update -y
  # gcc toolchain
  sudo yum groupinstall -y "C Development Tools and Libraries"
  # git and stuff
  sudo yum groupinstall -y "Development tools"
  # for the nvidia driver
  sudo yum install -y kernel-devel dkms
  # for lspci, locate and wget
  sudo yum install -y pciutils mlocate wget
  # basic X11
  sudo yum install -y xorg-x11-apps xorg-x11-xauth

If you want to make sure you can see the see the GPU device on your PCI bus, run this:
lspci | grep NVIDIA
My output was:
00:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1)

3) Install CUDA. Grab the CUDA repository RPM for Fedora from: https://developer.nvidia.com/cuda-downloads
I copied the URL and ran this command:
    sudo rpm -Uvh http://developer.download.nvidia.com/compute/cuda/repos/fedora20/x86_64/cuda-repo-fedora20-6.5-14.x86_64.rpm

Also install the RPMFUSION repositry for akmods and other good stuff
    sudo rpm -Uvh http://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-20.noarch.rpm
    sudo rpm -Uvh http://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-20.noarch.rpm

Install CUDA with this command:
    sudo yum install -y cuda

Make sure the nouveau driver is blacklisted on the latest kernel which you are about to reboot to. Edit grub.conf (sudo vi /etc/grub.conf) and make sure these parameters are added to the kernel option:
  nouveau.modeset=0 rd.driver.blacklist=nouveau

For example, mine is:
  kernel /boot/vmlinuz-3.16.6-203.fc20.x86_64 ro root=UUID=f1d4c251-e4c9-408b-a7b8-f5a9be8511fd console=hvc0 LANG=en_US.UTF-8 nouveau.modeset=0 rd.driver.blacklist=nouveau video=vesa:off vga=normal

Add the CUDA libraries to your standard shared library path:
echo /usr/local/cuda/lib64 | sudo tee /etc/ld.so.conf.d/cuda-x86_64.conf

4) Now is a good time to reboot:
sudo reboot

Once you reconnect, if you want to make sure X11 forwarding works (the -X in the ssh command) then run the xlogo command and you should see an X window pop up on your desktop.

If you want to make sure your nvidia kernel driver works run this command:
nvidia-smi -q | head

My output was:
==============NVSMI LOG==============

Timestamp                           : Fri Oct 31 20:09:04 2014
Driver Version                      : 340.29

Attached GPUs                       : 1
GPU 0000:00:03.0
    Product Name                    : GRID K520
    Product Brand                   : Grid

It's a good idea to build the CUDA samples just to make sure they work:

cd /usr/local/cuda/samples/
sudo make

To see what Device Query returns run:
./1_Utilities/deviceQuery/deviceQuery

Mine returned:

./1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GRID K520"
  CUDA Driver Version / Runtime Version          6.5 / 6.5
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 4096 MBytes (4294770688 bytes)
  ( 8) Multiprocessors, (192) CUDA Cores/MP:     1536 CUDA Cores
  GPU Clock rate:                                797 MHz (0.80 GHz)
  Memory Clock rate:                             2500 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           0 / 3
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GRID K520
Result = PASS

5) Install CUDNN. Somehow download CUDNN from nvidia and get the file on your AWS Instance. Since you need to register as an nvidia developer and accept license terms I downloaded it on my linux desktop then copied it to my aws instance with scp. The CUDNN URL is: https://developer.nvidia.com/cuDNN and once I had the file in my home directory these are the commands I used to 'install' it:

    tar -xvzf cudnn-6.5-linux-R1.tgz
    sudo cp -var cudnn-6.5-linux-R1/libcudnn* /usr/local/cuda/lib64/
    sudo cp -var cudnn-6.5-linux-R1/cudnn.h /usr/local/cuda/include/

6) Download Caffe. Follow the instructions on: http://caffe.berkeleyvision.org/installation.html

Clone the caffe source code on github:

git clone https://github.com/BVLC/caffe.git
cd caffe

Then install a bunch of caffe dependencies (some of which are optional for python):

sudo yum install -y atlas-devel bc
sudo yum install -y protobuf-devel leveldb-devel
sudo yum install -y snappy-devel opencv-devel
sudo yum install -y boost-devel hdf5-devel
sudo yum install -y gflags-devel glog-devel lmdb-devel

sudo yum install -y python-pip python-devel boost-python
sudo yum install -y gcc-gfortran
sudo yum install -y libpng-devel freetype-devel

for req in $(cat python/requirements.txt); do sudo pip install $req; done

7) Build and test Caffe. Create and edit your config:

cp Makefile.config.example Makefile.config

This is all I changed:

USE_CUDNN := 1
BLAS_LIB := /usr/lib64/atlas

Build the source, tests then run the tests:
make all
make test
make runtest

Test MNIST for good measure:

pushd data/mnist; ./get_mnist.sh; popd
./examples/mnist/create_mnist.sh
./examples/mnist/train_lenet.sh

It took about 47 seconds to train and achieved an accuracy of 0.9909. Don't forget to shutdown your instance when you are done:

sudo shutdown -h now

Thursday, August 4, 2011

Andrew Ng

The Future of Robotics and Artificial Intelligence:
http://www.youtube.com/watch?v=AY4ajbu_G3k

Bay Area Vision Meeting: Unsupervised Feature Learning and Deep Learning:
http://www.youtube.com/watch?v=ZmNOAtZIgIk

Deep Learning and Unsupervised Feature Learning
http://www.stanford.edu/class/cs294a/

Thursday, February 5, 2009

Saturday, September 13, 2008

Two more Google Tech Talks

AI & Digital Media by Steve DePaola
He talks about parametrized spaces and in particular has a nice implementation to facial systems.
http://www.youtube.com/watch?v=i78P-K1RhjY&feature=related

http://www.dipaola.org/facespace/


Cognitive and Computational Neuroscience of Categorization by Mark Gluck
Discussion of the Basal Ganglia and the Hippocampus in learning.
http://www.youtube.com/watch?v=2Ei6wFJ9kCc&feature=related

Saturday, May 24, 2008

Misc courses

IIT Courses on Bayesian structures:

http://www.youtube.com/view_play_list?p=6EE0CD02910E57B8&page=3


Stanford lecture on Self-Improving Artificial Intelligence:

http://www.youtube.com/watch?v=omsuTsOmvsc&feature=related

by Stephen M. Omobundro @ http://selfawaresystems.com/

Monday, May 12, 2008

Talk on sentence trees

Modeling Human Sentence Processing:

http://www.youtube.com/watch?v=_kAWu37EDd4&feature=user

by some grad student with ICCS at the University of Edinburgh.