%\VignetteIndexEntry{blockcluster tutorial}
%\VignetteKeywords{Rcpp, C++, STK++, Co-Clustering, letent block model}
%\VignettePackage{blockcluster}

\documentclass[a4paper,11pt]{article}
\usepackage[utf8]{inputenc}

\usepackage{amsfonts,amstext,amsmath,amssymb}

\usepackage{graphicx}

\usepackage{rotating}
\usepackage{multirow}

%\usepackage[toc,page]{appendix}
\usepackage{tikz}
\usetikzlibrary{arrows}
\usetikzlibrary{decorations.markings}
%\usepackage{array}

\usepackage{color}
\definecolor{dkgreen}{rgb}{0,0.6,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\definecolor{mauve}{rgb}{0.58,0,0.82}
% une couleur par auteur lors de la redaction
\newcommand{\cb}[1]{{\color{blue}{{\bf CB:} #1}}}
\newcommand{\fl}[1]{{\color{red}{{\bf FL:} #1}}}

\usepackage{fullpage}
\usepackage{hyperref}

\usepackage{listings}
\lstset{
    language=R,
    basicstyle=\footnotesize\ttfamily,
    keywordstyle=\color{blue},
    commentstyle=\ttfamily\color{dkgreen},
    backgroundcolor=\color{white},
    stringstyle=\color{orange},
    showspaces=false,
    showstringspaces=false,
    showtabs=false,
    tabsize=2,
    captionpos=b,
    breaklines=true,
    breakatwhitespace=false,
    title=\lstname,
    escapeinside={},
    keywordstyle={},
    morekeywords={}
    }


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Diverses commandes

\newcommand{\R}{\mathbb{R}}
\newcommand{\Rd}{\mathbb{R}^d}
\newcommand{\Xd}{\mathbb{X}^d}

% Diverses commandes
\newcommand{\calA}{{\cal A}}
\newcommand{\calB}{{\cal B}}
\newcommand{\calE}{{\cal E}}
\newcommand{\calF}{{\cal F}}
\newcommand{\calM}{{\cal M}}
\newcommand{\calN}{{\cal N}}
\newcommand{\calP}{{\cal P}}
\newcommand{\calT}{{\cal T}}
\newcommand{\calU}{{\cal U}}
\newcommand{\calW}{{\cal W}}
\newcommand{\calX}{{\cal X}}
\newcommand{\calZ}{{\cal Z}}


% Lettre ou chiffe en gras
\newcommand{\un}{\mathbf{1}}
\newcommand{\ba}{\mathbf{a}}
\newcommand{\bb}{\mathbf{b}}
\newcommand{\bc}{\mathbf{c}}
\newcommand{\bd}{\mathbf{d}}
\newcommand{\bg}{\mathbf{g}}
\newcommand{\bh}{\mathbf{h}}
\newcommand{\be}{\mathbf{e}}
\newcommand{\bL}{\mathbf{L}}
\newcommand{\bn}{\mathbf{n}}
\newcommand{\bp}{\mathbf{p}}
\newcommand{\bq}{\mathbf{q}}
\newcommand{\br}{\mathbf{r}}
\newcommand{\bs}{\mathbf{s}}
\newcommand{\bt}{\mathbf{t}}
\newcommand{\bu}{\mathbf{u}}
\newcommand{\bv}{\mathbf{v}}
\newcommand{\bW}{\mathbf{W}}
\newcommand{\bw}{\mathbf{w}}
\newcommand{\bx}{\mathbf{x}}
\newcommand{\bsx}{\boldsymbol{x}}
\newcommand{\bX}{\mathbf{X}}
\newcommand{\bsX}{\boldsymbol{X}}
\newcommand{\by}{\mathbf{y}}
\newcommand{\bsy}{\boldsymbol{y}}
\newcommand{\bY}{\mathbf{Y}}
\newcommand{\bsY}{\boldsymbol{Y}}
\newcommand{\bZ}{\mathbf{Z}}
\newcommand{\bz}{\mathbf{z}}

% Lettre grecque en gras % requiert \usepackage{amsbsy}
\newcommand{\balpha}{\boldsymbol{\alpha}}
\newcommand{\bbeta}{\boldsymbol{\beta}}
\newcommand{\bchi}{\boldsymbol{\chi}}
\newcommand{\bdelta}{\boldsymbol{\delta}}
\newcommand{\bDelta}{\boldsymbol{\Delta}}
\newcommand{\bepsilon}{\boldsymbol{\epsilon}}
\newcommand{\bGamma}{\boldsymbol{\Gamma}}
\newcommand{\bgamma}{\boldsymbol{\gamma}}
\newcommand{\blambda}{\boldsymbol{\lambda}}
\newcommand{\bkappa}{\boldsymbol{\kappa}}
\newcommand{\bmu}{\boldsymbol{\mu}}
\newcommand{\bnu}{\boldsymbol{\nu}}
\newcommand{\bpi}{\boldsymbol{\pi}}
\newcommand{\bphi}{\boldsymbol{\phi}}
\newcommand{\brho}{\boldsymbol{\rho}}
\newcommand{\bsigma}{\boldsymbol{\sigma}}
\newcommand{\bSigma}{\boldsymbol{\Sigma}}
\newcommand{\btheta}{\boldsymbol{\theta}}
\newcommand{\bTheta}{\boldsymbol{\Theta}}
\newcommand{\bvarepsilon}{\boldsymbol{\varepsilon}}
\newcommand{\bxi}{\boldsymbol{\xi}}


% Lettre verticale en gras  % requiert \usepackage{amsbsy}
\newcommand{\ve}[1]{{\boldsymbol{#1}}}
\newcommand{\x}{\boldsymbol{x}}
\newcommand{\X}{\boldsymbol{X}}
\newcommand{\z}{\boldsymbol{z}}
\newcommand{\Z}{\boldsymbol{Z}}


\newcommand{\argmin}{\mathop{\mathrm{argmin}}}
\newcommand{\argmax}{\mathop{\mathrm{argmax}}}
\newcommand{\trace}{\mathop{\mathrm{Tr}}}
\newcommand{\diag}{\mathop{\mathrm{diag}}}
\newcommand{\mix}{\textsc{mixmod}}
\newcommand{\II}{1 \! \! 1}
\newcommand{\IR}{\mathbb{R}}
\newcommand{\IZ}{\mathbb{Z}}
\newcommand{\IN}{\mathbb{N}}
\newcommand{\IE}{\mathbb{E}}
\newcommand{\mat}[4]{\begin{array}{cc}#1 & #2 \\#3 & #4 \end{array}}
\newcommand{\matb}[4]{\begin{array}{cc}{\bf #1} & {\bf #2} \\{\bf #3} & {\bf #4} \end{array}}
\newcommand{\med}{\mathrm{med}}
\newcommand{\tr}{\mbox{trace}}
\newcommand{\tra}[1]{\mbox{tr}{\bf #1}}
\newcommand{\var}{\mbox{var}}

\newtheorem{exmp}{Example}

<<prelim,echo=FALSE,print=FALSE>>=
library(blockcluster)
bc.version <- packageDescription("blockcluster")$Version
bc.date <- packageDescription("blockcluster")$Date
@

%opening
\title{A tutorial for {\bf blockcluster} R package\\Version 4}
\author{Parmeet Singh Bhatia\footnote{Siemens, bhatia.parmeet@gmail.com},
Serge Iovleff \footnote{INRIA-Lille, serge.iovleff@inria.fr}}
\date{\today}


\begin{document}
\SweaveOpts{concordance=TRUE}
\maketitle
\tableofcontents
\begin{abstract}
{\bf blockcluster} is a newly developed {\bf R} package for co-clustering of
binary, contingency, continuous and categorical data. The core library is
written in {\bf C++} and {\bf blockcluster} API acts as a bridge between {\bf
C++} core library and {\bf R} statistical computing environment. The package is
based on recently proposed \cite{Govaert2003}, \cite{govaert2008binary},
\cite{govaert2010contingency} latent block models for simultaneous clustering
of rows and columns. This tutorial is based on the package version 4.
\end{abstract}

\section{Introduction}
Cluster analysis is an important tool in a variety of scientific areas such as
pattern recognition, information retrieval, micro-array, data mining, and so
forth. Although many clustering procedures such as hierarchical clustering,
$k$-means or self-organizing maps, aim to construct an optimal partition of
objects or, sometimes, of variables, there are other methods, called block
clustering methods, which consider simultaneously the two sets and organize    
the data into homogeneous blocks. Let $\bx$ denotes a $n\times d$ data matrix
defined by $\bx = \{(x_{ij} ); i \in I \mbox{ and } j \in J\}$, where $I$ is a
set of $n$ objects (rows, observations, cases etc) and $J$ is a set of $d$
variables (columns, attributes etc). The basic idea of these methods consists
in making permutations of objects and variables in order to draw a
correspondence structure on $I \times J$.
\begin{figure}[h!]
\centering
\includegraphics[width = 100mm]{figs/binarysamplecocluster.png}
\caption{Binary data set (a), data reorganized by a partition on $I$ (b), by
partitions on $I$ and $J$ simultaneously (c) and summary matrix (d).}
\label{fig:samplecoclust}
\end{figure}
For illustration, consider Figure~\ref{fig:samplecoclust} where a binary data
set defined on set of $n=10$ individuals $I = {A,B,C,D,E,F,G,H,I,J}$ and set of
$d=7$ binary variables $J = {1,2,3,4,5,6,7}$ is re-organized into a set of
$3\times3$ clusters by permuting the rows and columns.

Owing to ever increasing importance of Co-clustering in variety of scientific
areas, we have recently developed a R package for the same called {\bf blockcluster}.
The R package {\bf blockcluster} allows to estimate the parameters of the
co-clustering models [\cite{Govaert2003}] for binary, contingency, continuous
and categorical data. This package  is unique from the point of view of
generative models it implements (latent block models), the used algorithms
(BEM, BCEM) and, apart from that, special attention has been given to design
the library for handling very huge data sets in reasonable time. The R package
is already available on CRAN at \url{https://CRAN.R-project.org/package=blockcluster}.

This aim of this tutorial is to elaborate the usage of R package {\bf
blockcluster} and to familiarize its users with its various capabilities. The
rest of the article is organized as follows. Section \ref{sec:usingblockcluster}
gives various details of the package as well as demonstrate it's usage on
simulated binary data-set. Section \ref{sec:applications} provides two examples
with real data-sets.

\section{Package details}
\label{sec:usingblockcluster}
This package contains two main functions namely {\bf cocluster} and
{\bf coclusterStrategy} to perform co-clustering and to set various input
parameters respectively. The convenient functions {\bf coclusterBinary}, {\bf
coclusterCategorical}, {\bf coclusterContingency} and {\bf coclusterContinuous}
are specialized versions of the {\bf cocluster} function. The package also
contains two helper functions namely {\bf summary} and {\bf plot} to get the
summary of estimated model parameters and to plot the results respectively. We
will first go through the details of two main functions. The helper functions
are self-explanatory and I will use them in various examples for better
understanding.

\subsection{cocluster function}
Up to version 3, this is the main function of {\bf blockcluster} package
that performs Co-clustering for binary, categorical, contingency and continuous
data. The prototype of the function is as follows:

\begin{lstlisting}
cocluster( data, datatype, semisupervised = FALSE
         , rowlabels = numeric(0), collabels = numeric(0)
         , model = NULL, nbcocluster, strategy = coclusterStrategy())
\end{lstlisting}

The various inputs of {\bf cocluster} functions are as follows:
\begin{itemize}
\item {\bf data:} Input data as matrix (or list containing data matrix, numeric vector for row effects and numeric  
vector column effects in case of contingency data with known row and column effects.)
\item {\bf datatype:} This is the type of data which can be "binary",
"categorical", "continuous" or "contingency".
\item {\bf semisupervised:} Boolean value specifying whether to perform
semi-supervised co-clustering or not. Make sure to provide row and/or column
labels if specified value is true. The default value is false.
\item {\bf rowlabels:} Vector specifying the class of rows. The class number
starts from zero. Provide -1 for unknown row class.
\item {\bf collabels:} Vector specifying the class of columns. The class number
starts from zero. Provide -1 for unknown column class.
\item {\bf model:} This is the name of model. The various models that are
available in package are given in tables~\ref{tab:binarymodeltypes},
\ref{tab:categoricalmodeltypes}, \ref{tab:continuousmodeltypes} and
\ref{tab:contingencymodeltypes}.
\item {\bf nbcocluster:} Integer vector specifying the number of row and column
clusters respectively.
\item {\bf strategy:} This input can be used to control various input parameters. It
can be created using the function {\bf coclusterStrategy} as explained in
Section~\ref{sec:coclusterStrategy}.
\item {\bf nbCore:} This input can be used to control the number of threads to
use. Put 0 for all availables cores. Default is 1.

\end{itemize}
The only mandatory inputs to the function {\bf cocluster} are {\bf data},
{\bf datatype} and {\bf nbcocluster}. The default model for each data-type is
the most general model with free row and column proportions and unequal
dispersion/variance for each block. Furthermore we have default set of input
parameters which works well in most cases which are explained in further
details in Section~\ref{sec:coclusterStrategy}. The package also comes with OpenMP
support (If supported by your Operating system and R). You need to set the number of
threads in you environment ({\bf nbCore}).

\subsubsection{The coclusterBinary function}
The \verb+coclusterBinary+ function is a specialization of the \verb+cocluster+
function for binary data. The prototype of the function is as follows:
\begin{lstlisting}
# cocluster for binary data
coclusterBinary( data, semisupervised = FALSE
   , rowlabels = numeric(0), collabels = numeric(0)
   , model = NULL, nbcocluster, strategy = coclusterStrategy()
   , a=1, b=1, nbCore=1)
\end{lstlisting}
This function has two
additional parameters $a$ and $b$ corresponding to the bayesian form of the
likelihood function. The default value correspond to the case "no prior".
The available binary models are given in the table~\ref{tab:binarymodeltypes}.
\begin{table}[h!]
\centering
 \begin{tabular}{|c|c|c|c|c|}
\hline
     Model  & Datatype & Proportions & Dispersion/Variance\\
\hline
     {\bf pik\_rhol\_epsilonkl} & binary & unequal & unequal\\
     {\bf pik\_rhol\_epsilon} & binary & unequal & equal\\
     {\bf pi\_rho\_epsilonkl} & binary & equal & unequal\\
     {\bf pi\_rho\_epsilon} & binary & equal & equal\\
\hline
 \end{tabular}
\caption{Binary models available in package {\bf blockcluster}.}
\label{tab:binarymodeltypes}
\end{table}


\subsubsection{The coclusterCategorical function}
The \verb+coclusterCategorical+ function is a specialization of the \verb+cocluster+
function for categorical data. The prototype of the function is as follows:
\begin{lstlisting}
# cocluster for categorical data
coclusterCategorical( data, semisupervised = FALSE
   , rowlabels = numeric(0), collabels = numeric(0)
   , model = NULL, nbcocluster, strategy = coclusterStrategy()
   , a=1, b=1, nbCore=1)
\end{lstlisting}
This function has two additional parameters $a$ and $b$ corresponding to the
bayesian form of the likelihood function. The default value correspond to the
case "no prior". The availables categorical models are given in the
table~\ref{tab:categoricalmodeltypes}.
\begin{table}[h!]
\centering
 \begin{tabular}{|c|c|c|c|}
\hline
     Model & Datatype & Proportions & Dispersion/Variance\\
\hline
 {\bf pik\_rhol\_multi} & categorical & unequal & N.A \\
 {\bf pi\_rho\_multi}   & categorical & equal   & N.A \\
\hline
 \end{tabular}
\caption{Categorical models available in package {\bf blockcluster}.}
\label{tab:categoricalmodeltypes}
\end{table}

\subsubsection{The coclusterContinuous function}
The \verb+coclusterContinuous+ function is a specialization of the \verb+cocluster+
function for continuous data. The prototype of the function is as follows:
\begin{lstlisting}
# cocluster for continuous data (Gaussian models)
coclusterContinuous( data, semisupervised = FALSE
   , rowlabels = numeric(0), collabels = numeric(0)
   , model = NULL, nbcocluster, strategy = coclusterStrategy(), nbCore=1)
\end{lstlisting}
The availables continuous models are given in the table~\ref{tab:continuousmodeltypes}.
\begin{table}[h!]
\centering
 \begin{tabular}{|c|c|c|c|}
\hline
     Model      & Datatype & Proportions & Dispersion/Variance\\
\hline
 {\bf pik\_rhol\_sigma2kl} & continuous & unequal & unequal\\
 {\bf pik\_rhol\_sigma 2}  & continuous & unequal & equal \\
 {\bf pi\_rho\_sigma2kl}   & continuous & equal & unequal \\
 {\bf pi\_rho\_sigma2}     & continuous & equal & equal \\
\hline
 \end{tabular}
\caption{Continuous models available in package {\bf blockcluster}.}
\label{tab:continuousmodeltypes}
\end{table}

\subsubsection{The coclusterContingency function}
The \verb+coclusterContingency+ function is a specialization of the \verb+cocluster+
function for contingency data. The prototype of the function is as follows:
\begin{lstlisting}
# cocluster for contingency data (Poisson models)
coclusterContingency( data, semisupervised = FALSE
   , rowlabels = numeric(0), collabels = numeric(0)
   , model = NULL, nbcocluster, strategy = coclusterStrategy())
\end{lstlisting}
The availables contingency models are given in the table~\ref{tab:contingencymodeltypes}.
\begin{table}[h!]
\centering
 \begin{tabular}{|c|c|c|c|}
\hline
     Model  & Datatype & Proportions & Dispersion/Variance\\
\hline
     {\bf pik\_rhol\_unknown} & contingency & unequal & N.A\\
     {\bf pi\_rho\_unknown} & contingency & equal & N.A \\
     {\bf pik\_rhol\_known} & contingency & unequal & N.A \\
     {\bf pi\_rho\_known} & contingency & equal & N.A \\
\hline
 \end{tabular}
\caption{Contingency models available in package {\bf blockcluster}.}
\label{tab:contingencymodeltypes}
\end{table}

\subsection{coclusterStrategy function}
\label{sec:coclusterStrategy}
In the package {\bf blockcluster}, we have a function called {\bf
coclusterStrategy} which can be used to set the values of various input
parameters. The prototype of the function is as follows:

\begin{lstlisting}
coclusterStrategy( algo = "BEM", initmethod = "emInitStep"
                 , stopcriteria = "Parameter", semisupervised = FALSE
                 , nbinitmax = 100, nbiterationsxem = 50, nbiterationsXEM = 500
                 , nbinititerations = 10, initepsilon = 0.01
                 , nbiterations_int = 5, epsilon_int = 0.01
                 , epsilonxem = 1e-04, epsilonXEM = 1e-10, nbtry = 2
                 , nbxem = 5)
\end{lstlisting}

In the following example, we call the function {\bf coclusterStrategy} without
any arguments and then we called the overloaded function {\bf summary} to see
default values of various input parameters.
<< >>=
defaultstrategy <- coclusterStrategy()
summary(defaultstrategy)
@

To set these input parameters, we have to
pass appropriate arguments to function {\bf coclusterStrategy} as shown in
example below where we set {\bf nbtry}, {\bf nbxem} and {\bf algo} parameters.
<< >>=
newstrategy <- coclusterStrategy(nbtry=5, nbxem=10, algo='BCEM')
@
The {\bf newstrategy} object can then be passed to function {\bf cocluster} to
perform Co-clustering using the newly set input parameters. The various input
arguments for the function {\bf coclusterStrategy} are as follows:
\begin{itemize}
\item {\bf algo:} The valid values for this parameter are "BEM" (Default),
"BCEM", "BSEM" and "BGibbs" (only for Binary model) which are respectively
Block EM, Block Classification EM, Block Stochastic EM algorithms and Gibbs
sampling.
\item {\bf stopcriteria:} It specifies the stopping criteria. It can be based
on either relative change in parameters value (preferred) or relative change in
log-likelihood. Valid criterion values are "Parameter" and "Likelihood".
Default criteria is "Parameter".
\item{\bf initmethod:} Method to initialize model parameters. The valid values
are "cemInitStep", "emInitStep" and "randomInit".
\item{\bf nbinititerations:} Number of Global iterations used in initialization
step. Default value is 10.
\item{\bf initepsilon:} Tolerance value used inside initialization. Default
value is 1e-2.
\item{\bf nbiterations\_int:} Number of iterations for internal E step. Default
value is 5.
\item{\bf epsilon\_int:} Tolerance value for relative change in
Parameter/likelihood for internal E-step. Default value is 1e-2.
\item{\bf nbtry:} Number of tries (XEM steps). Default value is 2.
\item{\bf nbxem:} Number of xem steps. Default value is 5.
\item{\bf nbiterationsxem:} Number of EM iterations used during xem step. Default value is 50.
\item{\bf nbiterationsXEM:} Number of EM iterations used during XEM step. Default value is 500.
\item{\bf epsilonxem:} Tolerance value used during xem step. Default value is 1e-4.
\item{\bf epsilonXEM:} Tolerance value used during XEM step. Default value is 1e-10.
\end{itemize}
To understand many of the above input parameters, we need to have some basic idea about
the algorithms and the way they run inside the package {\bf blockcluster}, which is why there is a separate dedicated 
section ~\ref{sec:inputarguments} for the same.

\subsubsection{Understanding various input parameters}
\label{sec:inputarguments}
You might be wondering why there are so many types of iterations and tolerances inside the package. Well, to get some basic understanding about various input parameters, it is important to know a bit about the algorithms. We will not go through full fledged theory of these algorithms here but will provide enough details to make you understand the meaning of all the input parameters. From now on everything will be explained using BEM but it is applicable in same way to BCEM as well as to BSEM/BGibbs algorithm. The BEM algorithm can be defined as follows in laymen language.
\begin{enumerate}
 \item Run EM algorithm on rows. 
\item Run EM algorithm on columns.
\item Iterate between above two steps until convergence.
\end{enumerate}
The following strategy is employed to run various algorithms.
\begin{enumerate}
\item Run the BEM Algorithm for {\bf 'nbxem'} number of times (with high
tolerance and low number of iterations) and keep the best model parameters
(based on likelihood) among these runs. We call this step {\bf 'xem'} step.
\item Starting with the  best model parameters, run the algorithm again but
this time with a low value of epsilon (low tolerance) and a high number of
iterations. We call this step {\bf 'XEM'} step.
\item Repeat above two steps for {\bf 'nbtry'} number of times and keep the
best model estimation.
\end{enumerate}
With this background, the various input parameters are explained as follows. 
\begin{itemize}
 \item {\bf nbxem, nbtry:} As explained above these numbers represents the
 number of time we run {\bf 'xem'} step and {\bf 'xem'+'XEM'} step respectively.
The tuning of the values of {\bf 'nbxem'} and {\bf 'nbtry'} 
need to be done intuitively, and could have a substantial effect on final
results. A good way to set these values is to run co-clustering few number of
times and check if final log-likelihood is stable. If not, one may need to
increase these values. In practice, it is better to increment {\bf 'nbxem'} as
it could lead to better (stable) results without compromising too much the
running time.
\item {\bf nbiterationsxem, nbiterationsXEM:} These are number of iterations
for BEM algorithm i.e the number of times we run EM on rows and EM on columns.
As the name suggests, they are respectively for {\bf 'xem'} and {\bf 'XEM'} steps.
\item {\bf nbiterations\_int:} This is the number of iterations for EM
algorithm on rows/columns.
\item {\bf epsilonxem, epsilonXEM:} These are tolerance values for BEM
algorithm during {\bf 'xem'} and {\bf 'XEM'} step respectively.
\item {\bf epsilon\_int:} This is the tolerance value for EM algorithm on
rows/columns.
\item {\bf initepsilon, nbinititerations:} These are the tolerance value and
number of iterations respectively used during initialization of model parameters.
\end{itemize}

\subsection{Model Parameters}
When {\bf summary} function is called on the output {\bf cocluster} fuction, it
gives the estimated values of various model parameters. The parameters that are
common among all the models are row and column mixing proportions. The model
parameter for various data-types are as follows.

\subsubsection{Binary Models}
The parameters $\balpha$ of the underlying distribution of a binary data set is
given by the matrix $\bp=(p_{k\ell})$ where $p_{k\ell} \in ]0,1[$ $\forall$
$k=1,\ldots,g$ and $\ell=1,\ldots,m$ and the probability distribution
$f_{k\ell}(x_{ij};\bp)=f(x_{ij};p_{k\ell})$ is the Bernoulli distribution
$$f(x_{ij};p_{k\ell})=(p_{k\ell})^{x_{ij}}(1-p_{k\ell})^{1-x_{ij}}.$$
we re-parameterize the model density as follows:
$$f_{k\ell}(x_{ij};\balpha)=(\varepsilon_{kj})^{|x_{ij}-a_{k\ell}|}
(1-\varepsilon_{kj})^{1-|x_{ij}-a_{k\ell}|}$$
where
$$\left\{
  \begin{array}{ll}
    a_{k\ell}=0, \; \varepsilon_{k\ell}=p_{k\ell} &\mbox{ if } p_{k\ell}<0.5\\
    a_{k\ell}=1, \; \varepsilon_{k\ell}=1-p_{k\ell} &\mbox{ if } p_{k\ell}>0.5.
  \end{array}
\right.
$$

Hence the parameters $p_{k\ell}$ of the Bernoulli mixture model are replaced by
the following parameters:
\begin{itemize}
\item The binary value $a_{k\ell}$, which acts as the center of the block $k,\ell$ and which
gives, for each block, the most frequent binary value,
\item The value $\varepsilon_{k\ell}$ belonging to the set $]0,1/2[$ that characterizes the dispersion of the block $k,\ell$
and which is, for each block, represents the probability of having a different value than the center.
\end{itemize}

\subsubsection{Categorical Models}
The idea behind categorical models is simple extension of binary models for
more than 2 modalities. Hence instead of Bernoulli distribution, we used
Multinomial (categorical) distribution. Hence the model parameters for each
block $k,l$ are $\balpha_{k\ell}=(\balpha_{k\ell}^{h})_{h=1,..r}$ and
$\sum_h{\balpha_{k\ell}^{h}}=1$ where $r$ is the number of modalities.

\subsubsection{Continuous Models}
In this case, the continuous data is modeled using unidimensional normal
distribution. Hence the density for each block is given by:
$$f_{k\ell}(x_{ij};\balpha)= \frac{1}{\sqrt{2\pi\sigma_{k\ell}^2}}\exp-\{\frac{1}{2\sigma_{k\ell}^2}(x_{ij}-\mu_{k\ell})^2\}$$
The parameters of the model are $\balpha=(\balpha_{11},\ldots,\balpha_{gm})$
where $\balpha_{k\ell}=(\mu_{k\ell},\sigma^2_{k\ell})$ i.e the mean and
variance of block $k,l$.

\subsubsection{Contingency Models}
In this case, it is assumed that for each block
$k,\ell$, the values $x_{ij}$ are distributed according to Poisson distribution
$\mathcal{P}(\mu_i \nu_j \gamma_{k\ell})$ where the Poisson parameter is split into $\mu_i$ and
$\nu_j$ the effects of the row $i$ and the column $j$ respectively and $\gamma_{k\ell}$ the effect of the
block $k\ell$. Then, we have
$$f_{k\ell}(x_{ij};\balpha)=
\frac{e ^{-\mu_i \nu_j \gamma_{k\ell}} (\mu_i \nu_j \gamma_{k\ell})^{x_{ij}}}{x_{ij}!}$$
where $\balpha=(\bmu,\bnu,\bgamma)$ with $\bmu=(\mu_1,\ldots,\mu_n)$,
$\bnu=(\bnu_1,\ldots,\nu_d)$ and $\bgamma=(\gamma_{11},\ldots,\gamma_{gm})$.
The row and column effects are either provided by the user for models {\bf
pik\_rhol\_known}  and {\bf pi\_rho\_known}  or estimated by the package itself
for models {\bf pik\_rhol\_unknown} and {\bf pi\_rho\_unknown}.

\subsection{Example using simulated Binary dataset}
The various parameters used to simulate this binary data-set are given in
Table~\ref{tab:binarytableparam}. The class mean and dispersion are
respectively represented by $\ba$ and $\bepsilon$ whereas $\bpi$ and $\brho$
represents row and column proportions respectively. The data consist of $1000$
rows (samples) and $100$ columns (variables) with two clusters on rows and
three clusters on columns. The following R commands shows how to load the
library, process the data and visualize/summarize results using {\bf blockcluster}.
\begin{table}[h!]
\begin{minipage}{.59\linewidth}
 \flushright
\begin{tabular}{|c|c|c|c|}
\hline
\multirow{2}{1cm}{$\ba$, $\bepsilon$}&0, 0.1&0, 0.3&1, 0.1\\
\cline{2-4}
&1, 0.3&1, 0.2&0, 0.1\\
\hline
\end{tabular}
\end{minipage}
\begin{minipage}{.39\linewidth}
 \flushleft
\begin{tabular}{|c|c|c|}
 \hline
$\bpi$&.6&.4\\
\hline
\end{tabular} \\
\begin{tabular}{|c|c|c|c|}
\hline
$\brho$&.3&.3&.4\\
\hline
\end{tabular}
\end{minipage}
\caption{Parameters for simulation of binary data.}
\label{tab:binarytableparam}
\end{table}

<< >>=
library(blockcluster)
data("binarydata")
out<-coclusterBinary(binarydata, nbcocluster=c(2,3))
summary(out)
@

Note that you also get the explicit Integrated Complete Likelihood (ICL) value 
in case of
binary and categorical models, and asymptotic value otherwise. 
This value can be used for model selection. The
following {\bf R} command is used to plot the original and co-clustered data
(Figure~\ref{fig:binarydata}(a)) with default value of {\bf asp} which is $0$
(FALSE). When {\bf asp} is FALSE, R graphics will optimize the output figure
for the display, hence the original aspect ratio may not be conserved. To
conserve the original aspect ratio, set the value of {\bf asp} as $1$ or TRUE.
<<fig=FALSE>>=
plot(out, asp = 0)
@

To Plot various block distributions (Figure~\ref{fig:binarydata}(b)), the 
following {\bf R} command is used with
{\bf type} argument of overloaded {\bf plot} function set to
'distribution' ({\bf type} is 'cocluster' by default which plots the original
and Co-clustered
data as shown in (Figure~\ref{fig:binarydata}(a))).
<<fig=FALSE>>=
plot(out, type = 'distribution')
@

\begin{figure}[h!]
\begin{minipage}{.4\linewidth}
 \includegraphics[width = 55mm,height=80mm]{figs/coclustbinary1.jpg}\\
\centering(a)
\end{minipage}
\begin{minipage}{.59\linewidth}
 \includegraphics[width = 75mm,height=80mm]{figs/distributionbinary.jpeg}\\
\centering(b)
\end{minipage}
\caption{Original and co-clustered binary data (a), and distributions for each
 block along with various mixture densities (b).}
\label{fig:binarydata}
\end{figure}



\section{Examples with real datasets}
\label{sec:applications}
This section  demonstrates the applicability of package on real data.
Two examples are used: one for Image 
segmentation and other for document (co-)clustering. 

\subsection{Image segmentation}

Automatic image segmentation is an important technique and have numerous
application especially in fields of Medical imaging. Here I present an
interesting application of co-clustering (as pre-processing step) for segmenting object(s) in image. I assume that the object pixels follows Gaussian distribution.
Hence I run the {\bf blockcluster} package with Gaussian Family model {\bf pik\_rhol\_sigma2kl}  on image shown in Figure~\ref{fig:coclustersnake}.
It can be clearly seen that the image got nicely segmented into snake and insect in two different blocks. 

\begin{figure}[h!]
\centering
 \includegraphics[scale=.25]{figs/coclustersnake.jpg}
\caption{Original and co-clustered (segmented) image.}
\label{fig:coclustersnake}
\end{figure}
 % and in particular Gaussian latent block model used in our software.

\subsection{Document clustering}

Document clustering is yet another data mining technique where co-clustering seems to be very useful. Here we run our
package on one of the datasets being used in \cite{Dhillon01} which is publicly available at \url{ftp://ftp.cs.cornell.edu/pub/smart}. We mix Medline 
(1033 medical abstracts)
and Cranfield (1398 aeronautical abstracts) making a total of 2431 documents. Furthermore, we used all the words (excluding stop words) as features making
a total of $9275$ unique words. The data matrix consist of words on the rows and documents on the columns with each entry giving
the term frequency, that is the number of occurrences of corresponding word in corresponding document. I assume that the term frequency
follows Poisson distribution. Hence we can apply the model {\bf pik\_rhol\_unknown}  available in our package for contingency (Poisson Family) 
datasets with unknown row and column effects. Table~\ref{tab:confusionmatrix} shows the confusion
 matrix
and compare our results with classical bipartite spectral graph partitioning algorithm of [\cite{Dhillon01}] where we have obtained $100$ percent correct classification.
Figure~\ref{fig:coclusterdocument} depicts 
the $2\times2$ checkerboard pattern in the data matrix, hence confirming the more frequent occurrence of
particular set of words in one document and vice-versa. Please note that the data matrix images are extremely sparse (data points almost invisible) 
and have been processed using simple image processing tools for visualization purpose only.

\begin{table}[h!]
\begin{minipage}{.49\linewidth}
 \centering
\begin{tabular}{c|c|c|}
\cline{2-3}
&Medline&Cranfield\\
\hline
\multicolumn{1}{|c|}{Medline}&1026&0\\
\hline
\multicolumn{1}{|c|}{Cranfield}&7&1400\\
\hline
\end{tabular}\\
\vspace{.1cm}
(a)
\end{minipage}
\begin{minipage}{.49\linewidth}
 \centering
\begin{tabular}{c|c|c|}
\cline{2-3}
&Medline&Cranfield\\
\hline
\multicolumn{1}{|c|}{Medline}&1033&0\\
\hline
\multicolumn{1}{|c|}{Cranfield}&0&1398\\
\hline
\end{tabular}\\
\vspace{.1cm}
(b)
\end{minipage}
\caption{Confusion Matrix: Results reported in \cite{Dhillon01} (a), and Results using {\bf blockcluster} (b). The difference in number
of Cranfield documents is because we made use of the already available data extracted from the documents and there are two less documents data in the same.} 
\label{tab:confusionmatrix}
\end{table}


\begin{figure}[h!]
\begin{minipage}{.49\linewidth}
\centering
\begin{sideways}Unique Words \end{sideways}
\begin{tikzpicture}
\draw[black,
    decoration={markings,mark=at position 1 with {\arrow[scale=2,black]{>}}},
    postaction={decorate},
    shorten >=0.4pt
    ]
    (1.0,0) -- (1.0,4);
\end{tikzpicture}
\hspace{.2cm}
 \includegraphics[width = 25mm,height=60mm]{figs/orgdata.jpg}\\
\begin{tikzpicture}
\draw[black,
    decoration={markings,mark=at position 1 with {\arrow[scale=2,black]{>}}},
    postaction={decorate},
    shorten >=0.4pt
    ]
    (1.0,5) -- (3.0,5);
\end{tikzpicture}\\
Documents\\
\vspace{.1cm}
(a)
\end{minipage}
\begin{minipage}{.49\linewidth}
\centering
\begin{sideways}Unique Words \end{sideways}
\begin{tikzpicture}
\draw[black,
    decoration={markings,mark=at position 1 with {\arrow[scale=2,black]{>}}},
    postaction={decorate},
    shorten >=0.4pt
    ]
    (1.0,0) -- (1.0,4);
\end{tikzpicture}
\hspace{.2cm}
 \includegraphics[width = 25mm,height=60mm]{figs/coclustdata.jpg}\\
\begin{tikzpicture}
\draw[black,
    decoration={markings,mark=at position 1 with {\arrow[scale=2,black]{>}}},
    postaction={decorate},
    shorten >=0.4pt
    ]
    (0,5) -- (2,5);
\end{tikzpicture}\\
Documents\\
\vspace{.1cm}
(b)
\end{minipage}
\caption{Original data matrix with words on rows and documents on columns (a), and checkerboard pattern in words by documents matrix obtained after performing
co-clustering (b).}
\label{fig:coclusterdocument}
\end{figure}

\section{Remarks}
This tutorial gives a brief introduction about the {\bf blockcluster} R package. It demonstrates the use of package using Binary data-set but the package can be used in similar fashion for other types of data namely {\bf Contingency}, {\bf Continuous} and {\bf Categorical}.
Please note that this tutorial is based on version 4.% If you have any
%questions, suggestions or remarks, do not hesitate to put it on public forum at
%\url{https://gforge.inria.fr/forum/forum.php?forum_id=11190&group_id=3679}.


\bibliographystyle{plain}
\bibliography{biblio}

\appendix

\section{Examples}
In this appendix, we present the main functions allowing to launch the various
implemented model in {\bf blockcluster}.

\subsection{Example with simulated binary dataset}

<<fig=FALSE>>=
data(binarydata)
out<-coclusterBinary(binarydata,nbcocluster=c(3,2), model="pik_rhol_epsilon")
summary(out)
plot(out)
@

\subsection{Example with simulated categorical dataset}

<<fig=FALSE>>=
data(categoricaldata)
out<-coclusterCategorical(categoricaldata,nbcocluster=c(3,2))
summary(out)
@

\subsection{Examples with simulated Poisson dataset}

<<fig=FALSE>>=
data(contingencydataunknown)
out<-coclusterContingency( contingencydataunknown, nbcocluster=c(2,3))
summary(out)
@

Contingency models using known row/column effects
<<fig=FALSE>>=
data(contingencydataunknown)
mui= rep(1,nrow(contingencydataunknown)) 
nuj= rep(1,ncol(contingencydataunknown)) 
out<-coclusterContingency( list(contingencydataunknown, mui, nuj)
                         , nbcocluster=c(2,3), model="pik_rhol_known")
summary(out)
@

\subsection{Examples with simulated Gaussian dataset}

<<fig=FALSE>>=
data(gaussiandata)
out<-coclusterContinuous(gaussiandata,nbcocluster=c(2,3))
summary(out)
@

Gaussian model using common variance
<<fig=FALSE>>=
data(gaussiandata)
out<-coclusterContinuous(gaussiandata,nbcocluster=c(2,3), model="pik_rhol_sigma2")
summary(out)
@

\end{document}
