Friday, 21 June 2013

How to run HPL benchmark with ATLAS generated custom BLAS on ARM


Just for a background, I got  4  64 ARM based ODROID-X2 development boards and configured a Beowulf like cluster.

Next step was to test this cluster for LINPACK benchmark. For that purpose I used HPL - a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers. 

To run HPL benchmark you must have these software on your system;

MPI - A message passing library for distribute memory computers.
BLAS - Basic Linear Algebra subroutines.
or
VSIPL - Vector Signal Image Processing Library

I used BLAS in this case.

ATLAS is a tool that automatically generates a complete and optimized BLAS library for a large variety of modern systems. (I would recommend ATLAS for generating BLAS library)


What is available for ARM?
Since ARM is based on RISC instruction set, thus the it does not enjoy the software community support as much as x86 systems. But the efforts are being put in this regards.

ATLAS natively supports the "softfp" ABI, where floating point arguments are passed through the integer registers. Ubuntu has recently switched from softfp ABI to hardfp, where floating point arguments are instead passed through floating point registers. So ATLAS now supports hardfp ABI.
(if things are goin pretty fast, please dont panic, I will be explaining in below sections with much simplicity. Please bear with me -_-)

Things to do first:
From now on I will be putting the necessary steps here. I will try to be as specific as I can be.

1. Installing MPI
currently there are two flavours of MPI are available i.e. openmpi and mpich. In this installation I would be sticking with openmpi (in case of mpich, some configurations in hpl make file would change).

NOTE: I am assuming that you already have gcc/g++ 4.6 or above compiler installed.

If you are not familiar with : how to install openmpi. Try to apt-get these packages:

libopenmpi-dev openmpi-bin openmpi-doc

I hope that you will have no problem in installing openmpi. (Try google else).


2. Downloading ATLAS and hardfp ABIs

Download ATLAS latest release from here
untar atlas-*.tar.bz

Download ARM specifi hardfp ABIs from here  (section: Installing ATLAS on a HARDFP ARM)
untar it too (it will create a directory named "ARMHARDFP", keep this)

3. Compiling ATLAS

In ATLAS top-level directory create a folder for your machine specific build e.g.

$ mkdir buildDir
$ cd  buildDir

NOTE: before compiling ATLAS, you have to turn of CPU throttling. To know what CPU throttling is and how to turn it off. Go here


then in ATLAS top level directory, you need to configure
$ configure [flags]
where flags can be different (see install.txt in ATLAS for details)

Right now, just build by giving this flag:
http://math-atlas.sourceforge.net/errata.html#armhardfp
[EDIT]
../configure --prefix=/mnt/nfs/jahanzeb/bench/atlas/original/ATLAS/buildDir -Si archdef 0 -Fa al -mtune=cortex-a9 -D c -DATL_ARM_HARDFP=1 -Ss ADdir /mnt/nfs/jahanzeb/bench/atlas/arm/ARMHARDFP --cc=/usr/bin/gcc --nof77 -m 1700


$ cd buildDir
$ ../configure -D c -DATL_ARM_HARDFP=1 -Ss ADdir <path-to-arm-hardfp-abis>/ARMHARDFP


Then to compile, just type make in same buildDir
$make

it will take several minutes to finish. After it finishes successfully, you can run some sanity check like
These steps are optional.
$make check
$make ptcheck
$make time
$make install

3. Download and install HPL (High Performance LINPACK)

Get the latest hpl package here and untar it to some directory.

Next, you have to create a file named Make.<arch>, since I am using armv71 based Cortex-A9 processor, so I will just create a file named "Make.armv71-a" (you can name the file Make.<anything> according to your architecture). 

Then next steps are very important, please be careful. 

you can find template Make files for several architectures in hpl-top-level-dir/setup but if you try to modify any of those, you may get into trouble. To reduce your hassle, I will be attaching a sample make file for ARM and explain which parts you need to edit according to your environment.

Steps to do:

In hpl top level directory.

(if you are already using the provided Make.armv71-a because u have the same architecture board just skip the following command)
$ touch Make.armv71-a


Here is my sample Make.armv71-a file:


#  
#  -- High Performance Computing Linpack Benchmark (HPL)                
#     HPL - 2.1 - October 26, 2012                          
#     Antoine P. Petitet                                                
#     University of Tennessee, Knoxville                                
#     Innovative Computing Laboratory                                 
#     (C) Copyright 2000-2008 All Rights Reserved                       
#                                                                       
#  -- Copyright notice and Licensing terms:                             
#                                                                       
#  Redistribution  and  use in  source and binary forms, with or without
#  modification, are  permitted provided  that the following  conditions
#  are met:                                                             
#                                                                       
#  1. Redistributions  of  source  code  must retain the above copyright
#  notice, this list of conditions and the following disclaimer.        
#                                                                       
#  2. Redistributions in binary form must reproduce  the above copyright
#  notice, this list of conditions,  and the following disclaimer in the
#  documentation and/or other materials provided with the distribution. 
#                                                                       
#  3. All  advertising  materials  mentioning  features  or  use of this
#  software must display the following acknowledgement:                 
#  This  product  includes  software  developed  at  the  University  of
#  Tennessee, Knoxville, Innovative Computing Laboratory.             
#                                                                       
#  4. The name of the  University,  the name of the  Laboratory,  or the
#  names  of  its  contributors  may  not  be used to endorse or promote
#  products  derived   from   this  software  without  specific  written
#  permission.                                                          
#                                                                       
#  -- Disclaimer:                                                       
#                                                                       
#  THIS  SOFTWARE  IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
#  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,  INCLUDING,  BUT NOT
#  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
#  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
#  OR  CONTRIBUTORS  BE  LIABLE FOR ANY  DIRECT,  INDIRECT,  INCIDENTAL,
#  SPECIAL,  EXEMPLARY,  OR  CONSEQUENTIAL DAMAGES  (INCLUDING,  BUT NOT
#  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
#  DATA OR PROFITS; OR BUSINESS INTERRUPTION)  HOWEVER CAUSED AND ON ANY
#  THEORY OF LIABILITY, WHETHER IN CONTRACT,  STRICT LIABILITY,  OR TORT
#  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
#  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
# ######################################################################
#  
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL        = /bin/sh
#
CD           = cd
CP           = cp
LN_S         = ln -s
MKDIR        = mkdir
RM           = /bin/rm -f
TOUCH        = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH         = armv7-a
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#

HOME = /mnt/nfs/jahanzeb/bench/hpl/
TOPdir       = $(HOME)/hpl-2.1
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
#
HPLlib       = $(LIBdir)/libhpl.a 
#
# ----------------------------------------------------------------------
# - MPI directories - library ------------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#

#for openmpi
#MPdir         = /usr/lib/openmpi
#MPinc         = -I$(MPdir)/include
##MPlib         = $(MPdir)/lib/libmpi.a
#MPlib         = -L$(MPdir)/lib
##MPlib         = $(MPdir)/lib/libmpi.so


#FOR CUSTOM OPENMPI
#MPdir         = /mnt/nfs/install/openmpi-install
#MPinc         = -I$(MPdir)/include
#MPlib         = -L$(MPdir)/lib


#For mpich
#MPdir         = /mnt/nfs/install/mpich-3.0.4
MPdir         = /mnt/nfs/install/mpich-install
MPinc         = -I$(MPdir)/include
#MPlib         = -L$(MPdir)/lib
MPlib         = $(MPdir)/lib/libmpich.a


#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#

# Default BLAS comes with ubuntu 12
#LAdir        = /usr/local/atlas/lib
#LAinc        =
#LAlib        = $(LAdir)/libcblas.a $(LAdir)/libatlas.a

# ATLAS Generated BLAS
LAdir        = /mnt/nfs/jahanzeb/bench/atlas/original/ATLAS2/buildDir/lib
LAinc        = 
LAlib        = $(LAdir)/libcblas.a $(LAdir)/libatlas.a


#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section  if and only if  you are not planning to use
# a  BLAS  library featuring a Fortran 77 interface.  Otherwise,  it  is
# necessary  to  fill out the  F2CDEFS  variable  with  the  appropriate
# options.  **One and only one**  option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_              : all lower case and a suffixed underscore  (Suns,
#                       Intel, ...),                           [default]
# -DNoChange          : all lower case (IBM RS6000),
# -DUpCase            : all upper case (Cray),
# -DAdd__             : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int,         [default]
# -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle    : The string address is passed at the string loca-
#                       tion on the stack, and the string length is then
#                       passed as  an  F77_INTEGER  after  all  explicit
#                       stack arguments,                       [default]
# -DStringStructPtr   : The address  of  a  structure  is  passed  by  a
#                       Fortran 77  string,  and the structure is of the
#                       form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal   : A structure is passed by value for each  Fortran
#                       77 string,  and  the  structure is  of the form:
#                       struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle   : Special option for  Cray  machines,  which  uses
#                       Cray  fcd  (fortran  character  descriptor)  for
#                       interoperation.
#
F2CDEFS      = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)

#for mpich
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib) 
#HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib) -lmpl

#for openmpi
#HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)

#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L           force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS       call the cblas interface;
# -DHPL_CALL_VSIPL       call the vsip  library;
# -DHPL_DETAILED_TIMING  enable detailed timers;
#
# By default HPL will:
#    *) not copy L before broadcast,
#    *) call the Fortran 77 BLAS interface
#    *) not display detailed timing information.
#
HPL_OPTS     = -DHPL_CALL_CBLAS

# ----------------------------------------------------------------------
#
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
#CC           = /usr/bin/gcc

#CC           = /usr/bin/mpicc
#CC           = /mnt/nfs/install/openmpi-install/bin/mpicc
CC           = /mnt/nfs/install/mpich-install/bin/mpicc
CCNOOPT      = $(HPL_DEFS)
#CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
CCFLAGS      = $(HPL_DEFS) $(CFLAGS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall

#
#LINKER       = /usr/bin/gcc
#LINKER       = /usr/bin/mpicc

LINKER       = /mnt/nfs/install/mpich-install/bin/mpicc
#LINKER        = /mnt/nfs/install/openmpi-install/bin/mpicc
LINKFLAGS    = $(CCFLAGS) 
#
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo
#
# ----------------------------------------------------------------------





































1 comment: