%\VignetteIndexEntry{Summary Tables}
%\VignetteDepends{GeneticsBase}
%\VignetteKeywords{Genetics}
%\VignettePackage{GeneticsBase}

\documentclass[10pt]{article}
\usepackage{url}
\usepackage{amsmath}
\usepackage{natbib}
\usepackage{isorot}
\usepackage{fullpage} % standard 1 inch margins 

\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\textit{#1}}}
\newcommand{\code}[1]{\texttt{#1}}
\newcommand{\email}[1]{\texttt{#1}}

%%% Hyperlinks for ``PDF Latex'' :
\ifx\pdfoutput\undefined%%--- usual ``latex'' :
  %% Stuff w/out hyperref
\else%%---------------------- `` pdflatex '' : -- still gives funny errors
  \RequirePackage{hyperref}
  %% The following is R's share/texmf/hyperref.cfg :
  %% Stuff __with__ hyperref :
  \hypersetup{%
    %default: hyperindex,%
    colorlinks,%
    %default: pagebackref,%
    linktocpage,%
    %%plainpages=false,%
    linkcolor=Green,%
    citecolor=Blue,%
    urlcolor=Red,%
    pdfstartview=Fit,%
    pdfview={XYZ null null null}%
    }
  \RequirePackage{color}
  \definecolor{Blue}{rgb}{0,0,0.8}
  \definecolor{Green}{rgb}{0.1,0.75,0.1}
  \definecolor{Red}{rgb}{0.7,0,0}
  %% ESS JCGS v2 :
  %%\hypersetup{backref,colorlinks=true,pagebackref=true,hyperindex=true}
  %%\hypersetup{backref,colorlinks=false,pagebackref=true,hyperindex=true}
\fi


\usepackage{Sweave}
\begin{document}

\title{Generating Marker Summary Reports Using the \Rpackage{GeneticsBase}}
\author{
        Gregory Warnes\\
        \email{gregory\_warnes\@urmc.rochester.edu},
        \\
        Nitin Jain\\
        \email{nitin.jain\@pfizer.com}
        }       

\maketitle

\section{Introduction}

This document demonstrates how to use the \Rpackage{GeneticsBase}
(version 0.6.0) package to generate marker summary tables \emph{for
  studies with a small number of markers}.  It is written as a
step-by-step tutorial. For additional details on each of the R
functions utilized, please see the individual help pages

\textbf{Note: The textual displays described here are not suitable
for large numbers of markers.  They are intended for reviewing
detailed information on a small number of markers, such as those in
candidate gene studies, or a small set of markers achieving a 'quality'
or 'significance' cutoff from a larger set.}



\section{Example}

\subsection{Prepare phenotype data \label{data}}

The first step is to prepare the phenotype data.  It may be in the
form of a SAS dataset, SAS export file, comma-delimited text file
(CSV), tab-delimited text file (TSV), or Microsoft Excel spreadsheet
file (XLS).  It should have one row per observation and one column
per variable, and must contain a subject identifier variable that
can be used to match observations with the corresponding genotype
data.

\subsection{Prepare genotype data \label{data}}

You also need to store the genetic call data in a file that can be
read into R.  GeneticsBase packge accepts genotype data in a variety
of formats:

\begin{itemize}

\item standard pedigree (ped) format. 
  
\begin{verbatim}
a2m apoe
    50103    5010004    5090005    5090004          2          2          1
      2          3          4
    50103    5010005    5090005    5090004          2          2          1
      1          3          4
    50105    5010049    5090021    5090022          2          2          1
      1          4          4
    50105    5010070    5090020    5090019          1          2          1
      1          3          4

\end{verbatim}


\item hapmap format : The hapmap .ped format is a variant of the
  standard pedigree format.  A portion of the first two lines of
  the hapmap file for chrmosome 1 are shown below:

\begin{verbatim}
rs2298011 rs1320571 rs11721 rs4018608 rs6685064 rs604618 ....
1347 14 0 0 1 1 1 1 3 3 3 3 1 1 2 2 3 3 3 3 2 2 3 3 2  .....
\end{verbatim}

\item Pfizer format: First few lines of an example file in Pfizer's
  data format are shown below:

\begin{verbatim}
Locus,Gene Marker,Locus Start,Project,Protocol,Sample ID,Donor ID,Genotype
A1A,C1556G,-1243,P234,103,1028022.1,1028022,G/G
A1B,T127A,20141,P234,103,1028022.1,1028022,A/T
A1B,T5094A,102358,P234,103,1028022.1,1028022,A/T
A1A,C1556G,-1243,P234,103,1035130.1,1035130,G/G
\end{verbatim}

\item Perlegen format: A portion of first two lines of data in the
  Perlegen format are shown below:

\begin{verbatim}
snp_id  genotype
753527  rrrrrrrrrrrrrrrrhrrrrrrrrrrrrrrrrrrrrrrrrrr......
752848  hhahaararhhhahrhhhahaharhaahhahrhhahhahnhha......
\end{verbatim}


\end{itemize}

\subsection{Load the genotype and phenotype data}

In GeneticsBase, both genotype and phentype data is loaded by a single
function \emph{readgenes}. This function \emph{readGenes} has four
primary arguments: \emph{gfile}, \emph{gfromat}, \emph{file}, and
\emph{pformat}. Arguments \emph{gfile} and \emph{pfile} are the names
of files containing the genotype and phenotype data (respectively), and
arguments \emph{gformat} and \emph{pformat} are the corresponding file
formats for the genotype and phenotype data.


Various types of data can be loaded by readGenes function.  Example
files are provided in \code{data} subdirectory of the
\code{GeneticsBase} package.  To access these execute

<<1,eval=FALSE,print=TRUE>>=
library(GeneticsBase)
setwd(file.path(.path.package("GeneticsBase"), "data"))
@

\noindent%
Supported file formats include:

\begin{itemize}

\item fbat .ped format 

  The Alzheimer's example dataset is stored in the Fbat variant of the
  .ped Pedigree Format. As it does not include phenotype data, we only
  use the genotyep filename and file type arguments:

<<2,echo=TRUE,eval=FALSE>>=
ALZH <- readGenes(gfile="ALZH.ped", gformat="fbat.ped")
@

The CAMP example dataset is from from the `Childhood Asthma Management
Program (CAMP)' and incudes both genotype and phenotype
information. It can be loaded by:

<<3,echo=TRUE,eval=FALSE>>=
CAMP <- readGenes(gfile="CAMP.ped", gformat="fbat.ped",
                  pfile="CAMPZ.phe", pformat="fbat.phe")
@

\item HAPMAP .ped format

  A subset of the data for the International HapMap project is
  available in the hapmap example data set.  This file can be loaded
  via:

<<4,echo=TRUE,eval=FALSE>>=
hapmapchr1 <- readGenes(gfile="hapmapchr1.ped", gformat="hapmap")
@

\item Pfizer genetics data format

<<5,echo=TRUE,eval=FALSE>>=
PfizerExample <- readGenes.pfizer("PfizerExample.txt", format="Listing")
@

\item Perlegen data format

<<6,echo=TRUE,eval=FALSE>>=
PerlegenExample <- readGenes("PerlegenExample.txt", gformat="perlegen")
@

\end{itemize}

\subsection{Reviewing the data}

For the purpose of this example, we are going to use CAMP data set,
which can be loaded manually as shown in the previous sub-section, or via

<<7,echo=TRUE,eval=TRUE>>=
library(GeneticsBase)
data(CAMP)
@

Now you can see a brief summary of the data that was loaded by
simply entering the name of the object on a line by itself:

%% Invisible in text
<<8,echo=FALSE, eval=TRUE>>=
upper <- function(x)
{
  tmp <- capture.output(print(x))
  cat(paste(head(tmp, n=15), collapse="\n"))
  cat("\n")
  cat("\n...\n")
}
@

<<9,echo=TRUE>>=
CAMP
@ 
\begin{verbatim}
Warning messages:
1: geneSet Object has 121 observations.  Only first and last 6 displayed 
in: .local(object, ...)
 \end{verbatim}

The phenotype data can be extracted from the CAMP data object using the
\code{sampleInfo} command:

<<10,echo=TRUE,eval=FALSE>>=
pdata <- sampleInfo(CAMP)
summary(pdata)
@ 

<<11,echo=FALSE,eval=TRUE>>=
pdata <- sampleInfo(CAMP)
upper(summary(pdata))
@ 


\subsection{Generate the tables}

We can generate a variety of summary tables on our genetics data.

\begin{itemize}
\item Allele information

<<12,echo=TRUE,eval=FALSE,fig=FALSE>>=
alleleSummary(CAMP)
@ 

<<13,echo=FALSE,eval=TRUE>>=
upper(alleleSummary(CAMP))
@ 

\item Genotype information

<<14,echo=TRUE,eval=FALSE>>=
genotypeSummary(CAMP)
@ 

<<15,echo=FALSE,eval=TRUE>>=
upper(genotypeSummary(CAMP))
@ 

\item Marker information

<<16,echo=FALSE,eval=TRUE>>=
# get descriptive statistics for markers
#desMarkers(CAMP, founderOnly=T)
@ 

\item Linkage disequilibrium, text view

<<17,echo=TRUE,eval=FALSE>>=
ld <- LD(CAMP)
ld
@ 

\begin{small}
<<18,echo=FALSE,eval=TRUE>>=
ld <- LD(CAMP)
upper(ld)
@ 
\end{small}

\item Linkage disequilibrium, matrix plot

<<19,echo=TRUE,fig=TRUE>>=
plot(ld)
@ 

\clearpage

\item Linkage disequilibrium, graphical view using LDView

<<20,echo=TRUE,fig=TRUE>>=
# visualize r^2 matrix
r2<-t(ld@"R^2")
diag(r2)<-1
LDView(r2, labRow=markerNames(CAMP))
@ 

\end{itemize}

\clearpage

\section{Generating tables for inclusion in reports}

To make it simple to include the summary tables in written reports,
they can be written to files in a variety of formats, including plain
text, html, and LaTeX.

\subsection{Plain text files}

<<21>>=
aS <- alleleSummary(CAMP)
txt(aS, file="CAMP_alleleSummary.txt")
@


\subsection{LaTex files}

<<22,echo=TRUE,eval=FALSE>>=
aS <- alleleSummary(CAMP)
latex(aS)
@ 

<<23,echo=FALSE,eval=TRUE,results=tex>>=
aS <- alleleSummary(CAMP)
latex(aS, floating=FALSE)

@ 

\clearpage

%\begin{sideways}
%\begin{minipage}{\textheight}
\begin{landscape}

<<24,echo=TRUE,eval=FALSE>>=
gs <- genotypeSummary(CAMP[-2,])
latex(gS) # drop ALR2 SSLP so table fits on page
@ 


\begin{small}
<<25,echo=FALSE,eval=TRUE,results=tex>>=
gS <- genotypeSummary(CAMP[-2,])
latex(gS, floating=FALSE) # drop ALR2 SSLP so table fits on page
@ 
\end{small}

\end{landscape}
%\end{minipage}
%\end{sideways}

\clearpage

<<26,echo=TRUE,eval=FALSE>>=
ld <- LD(CAMP)
latex(ld)
@ 

\begin{tiny}
<<27,echo=FALSE,eval=TRUE,results=tex>>=
ld <- LD(CAMP)
latex(ld, floating=FALSE)
@ 
\end{tiny}

\clearpage

<<28,echo=TRUE,eval=TRUE,fig=TRUE>>=
plot(ld)
@ 


\subsection{HTML files}

\subsection{Graphics files}

 As usual, plots can be generated in any
format R supports.


We can also output everything all at once to a set of files, encoded as plain text (format="print"), html (format="html"), or LaTeX (format="latex"):
<<29,echo=TRUE,eval=TRUE>>=
PGtables(CAMP, filename="CAMP", sep="_", format="html")
@ 
which creates a set of html and a PDF files in the current directory.

\begin{figure}[!ht]
\begin{center}
\caption{HTML allele summary table}
\vspace{1em}
\includegraphics[height=0.9\textheight]{CAMP_alleleSummary_html.pdf}
\end{center}
\end{figure}
 
\begin{figure}[!ht]
\begin{center}
\caption{HTML genotype summary table}
\vspace{1em}
\includegraphics[height=0.9\textheight]{CAMP_genotypeSummary_html.pdf}
\end{center}
\end{figure}


\begin{figure}[!ht]
\begin{center}
\caption{HTML linkage disequilibrium table}
\vspace{1em}
\includegraphics[height=0.9\textheight]{CAMP_LD_html.pdf}
\end{center}
\end{figure}

\begin{figure}[!ht]
\begin{center}
\caption{Linkage disequilibrium plot}
\vspace{1em}
\includegraphics[height=0.9\textheight]{CAMP_LD.pdf}
\end{center}
\end{figure}

\clearpage


\section{Subsetting by Group}

The \code{alleleSummary} and \code{genotypeSummary} functions also
allow you to create tables which show the summary information
separated out by a grouping variable, which must be discrete
``factor'' variables (such as Sex).

To accomplish this, add the argument \code{by=Sex} to the function
call. For example:

<<30,echo=TRUE,eval=TRUE>>=
alleleSummary(CAMP, by='sex') 
@ 

This will display a table within a separate block within each marker
for each level of the variable \code{Sex}.

To control whether the summary table for entire data in addition to
individual factor levels, add \code{includeOverall=TRUE} or
\code{includeOverall=FALSE} (the default) as appropriate.


\appendix

\section{Example R script\label{script}}

<<31,echo=TRUE,eval=FALSE>>=
library(GeneticsBase)

data(CAMP)

## HTML Output
PGtables(CAMP, filename="test", format="html")

## LaTex Output
PGtables(CAMP, filename="test", format="latex")
@ 

\clearpage

\bibliographystyle{biometrics}
\begin{thebibliography}{}


\item [~\hspace{0.125in}~Warnes~GR.]{ ``The Genetics
    Package: Utilities for handling genetic data'' \code{R News},
    Volume 3, Issue 1, June 2003.}

\item [~\hspace{0.125in}~Warnes~GR.] ``genetics'', a package for
  handling marker-based genetic data within the open-source
  statistical package ``R''.  The package includes function to compute
  allele frequencies, use genetic markers in statistical models,
  estimate disequilibrium, and test for departure from Hardy-Weinberg
  equilibrium. \\
  \verb+http://cran.us.r-project.org/src/contrib/PACKAGES.html#genetics+, 
  2002-

\end{thebibliography}
\end{document}
