To generate images using GenomePixelizer you need to collect
the data and set up three input files. The name of the first file
should be "GenPix_RunSetup.txt" (since this file is read by
default during the start up). In the "GenPix_RunSetup.txt" file
you specify the names of all the other files you are using as well
as the information about your data set: number of chromosomes,
size of chromosomes, the coordinates of genes (e.g. physical
location of genes on chromosomes), the features of genes (e.g.
different protein domains) coded by different color, and the
distance matrix file containing similarity/identity data
of your set of genes. GenomePixelizer will use your initial
data files to generate highly customizable images on the fly.
In GenPix_RunSetup.txt file you need to define:
1. name of the input file containing gene coordinates
2. name of the distance matrix file
3. number of chromosomes
4. size of chromosomes in megabases or in centiMorgans
5. identity upper level (higher degree of similarity)
6. identity lower level (lower degree of similarity)
7. window size (pixels) X (horizontal size of your image)
8. window size (pixels) Y (vertical size of your image)
and other optional parameters.
click on the corresponding link to see the
examples of GenomePixelizer input files.
How it works:
First, GenomePixelizer reads in the "GenPix_RunSetup.txt" file
the information about the size of window to be generated and the
sizes of chromosomes to be placed in a generated window. Then it
creates a window and draws the chromosomes according to their
specified sizes. Afterwards, GenomePixelizer reads
"GenPix_GeneCoords" input file where gene ID, gene coordinates,
and gene features are defined by user, and places genes either
below or above the chromosomes.
By default, positions of the genes above or below the chromosomes
correspond to Watson/Crick orientation; however, the user may assign
any other information state to this characteristic.
The color scheme is highly flexible
and customizable and may reflect any genes' features defined by
the user. In the current version, up to 15 colors are possible.
At the same time the program generates a separate file -
"html_image_map_coord.txt" with HTML Image Map tags, that can be
used to create Web pages with clickable images. Finally, the
program reads the "GenPix_DistMatrix" file, with specified gene IDs
and their percentage of similarity, and draws the lines between
homologous genes within the upper and lower levels of similarity
defined by user.
... COPY-PASTE "html_image_map_coord.txt" content here ...
Generating HTML ImageMap coordinates
GenomePixelizer generates a "html_image_map_coord.txt" file which can be used to
create clickable HTML images on Web pages. To get everything working
properly you need to set up a screenshot using any suitable for that software
(in Windows, just press ALT+PrintScreen; or save your image as PostScript file
and transform it into PNG or GIF format using Corel PhotoPaint program, for example).
The area for a proper screenshot is
indicated by gray lines on canvas window. HTML prefix is defined by you in
GenPix_RunSetup file and its format could be, for example,
html://whatever.you.want/. You need to copy-paste the contents of the generated
"html_image_map_coord.txt" file into source of your HTML page between tags:
<img src="your_screenshot.gif" width="X" height="Y" USEMAP="#genopix_map">
Note that you will find the proper values for "width" and "height" on
the last line of the "html_image_map_coord.txt" file.
If the size of the generated image is too large for a single screenshot, you
can take two or more of them and then combine everything together
using any drawing program.
to see an example of a generated html_image_map_coord.txt file.
For proper placement of genes above or below the chromosomes, a
"GenPix_Input" file should be preformatted using any kind of
Excel-like editor or a custom perl script.
Sort your tab-delimited table in the following order:
Sort by fourth column (W/C features)
by first column (chromosome number)
by third column (position on chromosome)
Click here to see how the sorted
columns should look.
When genes are placed too close to each other, GenomePixelizer
automatically puts them on the different levels above (Watson) or
below (Crick) the preceding genes. If gene density is too high,
the positions of your genes may exceed canvas parameters. There
are two ways to overcome this problem:
1. Try to increase canvas parameters (width, horizontal size) until
gene locations on the image satisfy you.
2. You may adjust the generated file "GenPix_Whatever_Input.wc",
and re-run GenomePixelizer in a manual correction mode instead of
automatic correction mode. This is advisable for experienced users only.
About manual mode and "WC correction"
When you run the program for first time in automatic mode, GenomePixelizer
generates a slightly modified file with the same name, only it adds ".wc"
extension at the end of it. As you explore both files: the one with the ".wc"
extension and the one without it, you may see that these two files are different
in a way that ".wc" file contains the sixth column that reflects the actual "WC
correction" (in other words, how far from chromosome the genes are placed).
For example, "0" means that the genes are drawn right next to the chromosome line.
"1" means one level above in case of "Watson" orientation, or one level below
for "Crick" orientation. "2" means two levels, "3" - three and so on. You can
make the program to use this ".wc" file as an input file, while it is being run
in MANUAL mode. You should not see any difference until you change the values of
"WC correction" in the sixth column. The program in this case does not use an
AUTOMATIC algorithm to arrange genes over chromosomes; instead, the program reads
"WC correction" values directly from ".wc" file. This GenomePixelizer feature is
extremely useful when you would like to arrange genes in the way you want, instead
of blindly relying on the algorithm. It is very easy to change "WC" values using
"StarOffice" or "MS Excel" spreadsheet software.
To run GenomePixelizer in MANUAL mode, change option "A" in "RunSetup" file to "M"
(line 12, for experienced users only).
NOTE! Every time when you run GenemePixelizer, the program deletes the old "WC"
file in your project and generates the new one, after the processing of the new
data. If you would like to save the old file, you need to copy it under the different
extension - ".manual", for example, and then edit it.
This is a little confusing; however, it does work.
Click here to see an example
of a generated GenPix_Tutorial_Input.wc file.
Example RunSetup Files:
This release of GenomePixelizer contains several real data sets besides the tutorial
input files. All input files for these project can be found under the directory "Examples".
You may generate images for these projects, if you run GenomePixelizer with
"RunSetup_Example_A1.txt" or any other "Example" file. You can find the detailed
description of every project in corresponding directories. Briefly:
Set A - Relationship of genes containing NBS domain in Arabidopsis genome
RunSetup_Example_A1.txt - blue lines show 60% sequence identity
Set B - Comparative clustering of three superfamilies in Arabidopsis (NBS, cytochrome
P450 and protein kinases containing LRR (PK-LRR).
RunSetup_Example_B1.txt - blue lines show 75% sequence identity between NBS motifs
RunSetup_Example_B2.txt - blue lines show 75% sequence identity between P450 sequences
RunSetup_Example_B3.txt - blue lines show 75% sequence identity between PK-LRR genes
RunSetup_Example_B4.txt - all together, lines for different families are drawn
in different color (blue - NBS, gray - P450, black - PK-LRR)
RunSetup_Example_B5.txt - the same as B4 in different window scale
Set C - Comparative genomics between A.thaliana and C.elegans, clustering of
cytochrome P450 superfamily. In set "C" first five chromosomes represent A.thaliana
genome, next six - C.elegans.
RunSetup_Example_C1.txt - blue lines show 70% sequence identity between P450 sequences
RunSetup_Example_C2.txt - blue lines show relationship between closest homologs of P450
sequences between A.thaliana and C.elegans (cutoff 25.5% identity)
To view the whole Arabidopsis genome with 26,000 genes you may run GenomePixelizer
with the "RunSetup_Arab_Genome_A.txt" or "RunSetup_Arab_Genome_M.txt" runsetup files.
Be careful, the size of image is huge at 16,000 x 1,500 pixels. You will need a fast computer with lots of memory.
It is very helpful to explore all of these files to understand the project's
setup. Try to modify run parameters and generate the images under different options.
Matrix File Setup
Usually, the matrix file for large scale projects
(>200 genes) is huge and contains about
(200*200 - 200)/2 = 19,900 lines of pairwise
data. Files of such size definitely slow down the
speed of GenomePixelizer. To handle this problem
you may extract from the original the full matrix file
and only the pairwise data that falls into
the similarity range of your interest. For example,
the matrix file for the "RunSetup_Example_A1.txt"
project contains pairwise data with the identity
range 100% - 40% only. This dramatically decreases
the size of file and speeds up the program.
There are a lot of ways to get pairwise data
for the matrix file: you may use results of Fasta
search, GeneDoc generates matrix file on the fly.
Matrix files for the examples described above
were generated by ClustalW bootstrapping. See
original files with extension .njb in directory:
WARNING: Matrix file MAY NOT contain genes' ID(s) that are not in Input file with
genes' coordinates. Otherwise you may get an incorrect line drawing. For example, if a gene
ID in Input file is At3g12345, then the same ID must be present in the Matrix file, At3g12345,
but not AT3g12345 or At3g12345a. GenomePixelizer is case sensitive.
If you doubt your matrix file is correct you can use the "Matrix Validator" procedure to
check the proper setup of matrix file.
Standard and Extended Modes
For small genomes or for short fragments of chromosomes you may want to
specify the size of the individual genes instead of the "generic" gene size.
For that reason the current version of GenomePixelizer comes with "standard"
and "extended" modes. Standard mode runs GenomePixelizer with a "generic
gene size"; there is no difference from previous versions. If you run
GenomePixelizer in "extended" mode (check the last line of the
option #18 [std/ext]), you can specify the size of the individual genes.
Extended mode reads the sixth column in the Input file where you have to
specify the size of individual genes. The sixth column contains the "gene
size coefficient" that shows how the "current gene size" is different from the
"generic size". You can run GenomePixelizer with the example RunSetup files:
GenPix_RunSetup_Extended_A.txt or GenPix_RunSetup_Extended_A_manual.txt
to see how it works in extended mode.
There is a "plug-in" for GenomePixelizer named
"Alignment Viewer". You may use it to view and
paint your favorite multiple alignments in different colors.
If you click on "Alignment Viewer" on the
left bottom corner of the main GenomePixelizer interface,
a new window will appear.
Place your alignment in ClustalW format under the
directory "Alignments", type in the name of your
alignment into the entry window and click "View Alignment".
You may choose different color schemes to display
different amino acids. Just type in the color of your
choice into the corresponding entries for the amino acids.
The Alignment Viewer works slower for
huge alignments (more than 200 genes and more than
1000 amino acids long).
It seems that's all what we can tell for you
in this tutorial. You are very welcome to send us
your suggestions and comments to improve this document.
Thanks for your patience.