GenomePixelizer Manual



Description
Screenshots
Examples
Manual
Download
Feedback
Authors




        Getting started

To generate images using GenomePixelizer you need to collect the data and set up three input files. The name of the first file should be "GenPix_RunSetup.txt" (since this file is read by default during the start up). In the "GenPix_RunSetup.txt" file you specify the names of all the other files you are using as well as the information about your data set: number of chromosomes, size of chromosomes, the coordinates of genes (e.g. physical location of genes on chromosomes), the features of genes (e.g. different protein domains) coded by different color, and the distance matrix file containing similarity/identity data of your set of genes. GenomePixelizer will use your initial data files to generate highly customizable images on the fly.

In GenPix_RunSetup.txt file you need to define:

1. name of the input file containing gene coordinates
2. name of the distance matrix file
3. number of chromosomes
4. size of chromosomes in megabases or in centiMorgans
5. identity upper level (higher degree of similarity)
6. identity lower level (lower degree of similarity)
7. window size (pixels) X (horizontal size of your image)
8. window size (pixels) Y (vertical size of your image)
and other optional parameters.

click on the corresponding link to see the examples of GenomePixelizer input files.

GenPix_RunSetup

GenPix_Tutorial_Input

GenPix_Tutorial_Matrix

        How it works:

First, GenomePixelizer reads in the "GenPix_RunSetup.txt" file the information about the size of window to be generated and the sizes of chromosomes to be placed in a generated window. Then it creates a window and draws the chromosomes according to their specified sizes. Afterwards, GenomePixelizer reads "GenPix_GeneCoords" input file where gene ID, gene coordinates, and gene features are defined by user, and places genes either below or above the chromosomes. By default, positions of the genes above or below the chromosomes correspond to Watson/Crick orientation; however, the user may assign any other information state to this characteristic. The color scheme is highly flexible and customizable and may reflect any genes' features defined by the user. In the current version, up to 15 colors are possible. At the same time the program generates a separate file - "html_image_map_coord.txt" with HTML Image Map tags, that can be used to create Web pages with clickable images. Finally, the program reads the "GenPix_DistMatrix" file, with specified gene IDs and their percentage of similarity, and draws the lines between homologous genes within the upper and lower levels of similarity defined by user.


        Generating HTML ImageMap coordinates

GenomePixelizer generates a "html_image_map_coord.txt" file which can be used to create clickable HTML images on Web pages. To get everything working properly you need to set up a screenshot using any suitable for that software (in Windows, just press ALT+PrintScreen; or save your image as PostScript file and transform it into PNG or GIF format using Corel PhotoPaint program, for example). The area for a proper screenshot is indicated by gray lines on canvas window. HTML prefix is defined by you in GenPix_RunSetup file and its format could be, for example, html://whatever.you.want/. You need to copy-paste the contents of the generated "html_image_map_coord.txt" file into source of your HTML page between tags:

-------------------------------------------------------------------

<img src="your_screenshot.gif" width="X" height="Y" USEMAP="#genopix_map">
<MAP NAME="genopix_map">

... COPY-PASTE "html_image_map_coord.txt" content here ...

</MAP>

-------------------------------------------------------------------

Note that you will find the proper values for "width" and "height" on the last line of the "html_image_map_coord.txt" file. If the size of the generated image is too large for a single screenshot, you can take two or more of them and then combine everything together using any drawing program.

Click here to see an example of a generated html_image_map_coord.txt file.


        W/C correction

For proper placement of genes above or below the chromosomes, a "GenPix_Input" file should be preformatted using any kind of Excel-like editor or a custom perl script.

Sort your tab-delimited table in the following order:

Sort by fourth column (W/C features)
then
by first column (chromosome number)
then
by third column (position on chromosome)

Click here to see how the sorted columns should look.

When genes are placed too close to each other, GenomePixelizer automatically puts them on the different levels above (Watson) or below (Crick) the preceding genes. If gene density is too high, the positions of your genes may exceed canvas parameters. There are two ways to overcome this problem:

1. Try to increase canvas parameters (width, horizontal size) until gene locations on the image satisfy you.

2. You may adjust the generated file "GenPix_Whatever_Input.wc", and re-run GenomePixelizer in a manual correction mode instead of automatic correction mode. This is advisable for experienced users only.


        About manual mode and "WC correction"

When you run the program for first time in automatic mode, GenomePixelizer generates a slightly modified file with the same name, only it adds ".wc" extension at the end of it. As you explore both files: the one with the ".wc" extension and the one without it, you may see that these two files are different in a way that ".wc" file contains the sixth column that reflects the actual "WC correction" (in other words, how far from chromosome the genes are placed). For example, "0" means that the genes are drawn right next to the chromosome line. "1" means one level above in case of "Watson" orientation, or one level below for "Crick" orientation. "2" means two levels, "3" - three and so on. You can make the program to use this ".wc" file as an input file, while it is being run in MANUAL mode. You should not see any difference until you change the values of "WC correction" in the sixth column. The program in this case does not use an AUTOMATIC algorithm to arrange genes over chromosomes; instead, the program reads "WC correction" values directly from ".wc" file. This GenomePixelizer feature is extremely useful when you would like to arrange genes in the way you want, instead of blindly relying on the algorithm. It is very easy to change "WC" values using "StarOffice" or "MS Excel" spreadsheet software.

To run GenomePixelizer in MANUAL mode, change option "A" in "RunSetup" file to "M" (line 12, for experienced users only).

NOTE! Every time when you run GenemePixelizer, the program deletes the old "WC" file in your project and generates the new one, after the processing of the new data. If you would like to save the old file, you need to copy it under the different extension - ".manual", for example, and then edit it. This is a little confusing; however, it does work.

Click here to see an example of a generated GenPix_Tutorial_Input.wc file.


        Example RunSetup Files:

This release of GenomePixelizer contains several real data sets besides the tutorial input files. All input files for these project can be found under the directory "Examples". You may generate images for these projects, if you run GenomePixelizer with "RunSetup_Example_A1.txt" or any other "Example" file. You can find the detailed description of every project in corresponding directories. Briefly:

Set A - Relationship of genes containing NBS domain in Arabidopsis genome RunSetup_Example_A1.txt - blue lines show 60% sequence identity

Set B - Comparative clustering of three superfamilies in Arabidopsis (NBS, cytochrome P450 and protein kinases containing LRR (PK-LRR).
RunSetup_Example_B1.txt - blue lines show 75% sequence identity between NBS motifs
RunSetup_Example_B2.txt - blue lines show 75% sequence identity between P450 sequences
RunSetup_Example_B3.txt - blue lines show 75% sequence identity between PK-LRR genes
RunSetup_Example_B4.txt - all together, lines for different families are drawn in different color (blue - NBS, gray - P450, black - PK-LRR)
RunSetup_Example_B5.txt - the same as B4 in different window scale

Set C - Comparative genomics between A.thaliana and C.elegans, clustering of cytochrome P450 superfamily. In set "C" first five chromosomes represent A.thaliana genome, next six - C.elegans.
RunSetup_Example_C1.txt - blue lines show 70% sequence identity between P450 sequences
RunSetup_Example_C2.txt - blue lines show relationship between closest homologs of P450 sequences between A.thaliana and C.elegans (cutoff 25.5% identity)

To view the whole Arabidopsis genome with 26,000 genes you may run GenomePixelizer with the "RunSetup_Arab_Genome_A.txt" or "RunSetup_Arab_Genome_M.txt" runsetup files. Be careful, the size of image is huge at 16,000 x 1,500 pixels. You will need a fast computer with lots of memory.

It is very helpful to explore all of these files to understand the project's setup. Try to modify run parameters and generate the images under different options. Good luck!


        Matrix File Setup

Usually, the matrix file for large scale projects (>200 genes) is huge and contains about (200*200 - 200)/2 = 19,900 lines of pairwise data. Files of such size definitely slow down the speed of GenomePixelizer. To handle this problem you may extract from the original the full matrix file and only the pairwise data that falls into the similarity range of your interest. For example, the matrix file for the "RunSetup_Example_A1.txt" project contains pairwise data with the identity range 100% - 40% only. This dramatically decreases the size of file and speeds up the program. There are a lot of ways to get pairwise data for the matrix file: you may use results of Fasta search, GeneDoc generates matrix file on the fly. Matrix files for the examples described above were generated by ClustalW bootstrapping. See original files with extension .njb in directory: ./Alignments/ClustalW_BootStrap/

WARNING: Matrix file MAY NOT contain genes' ID(s) that are not in Input file with genes' coordinates. Otherwise you may get an incorrect line drawing. For example, if a gene ID in Input file is At3g12345, then the same ID must be present in the Matrix file, At3g12345, but not AT3g12345 or At3g12345a. GenomePixelizer is case sensitive. If you doubt your matrix file is correct you can use the "Matrix Validator" procedure to check the proper setup of matrix file.


        Standard and Extended Modes

For small genomes or for short fragments of chromosomes you may want to specify the size of the individual genes instead of the "generic" gene size. For that reason the current version of GenomePixelizer comes with "standard" and "extended" modes. Standard mode runs GenomePixelizer with a "generic gene size"; there is no difference from previous versions. If you run GenomePixelizer in "extended" mode (check the last line of the RunSetup file, option #18 [std/ext]), you can specify the size of the individual genes. Extended mode reads the sixth column in the Input file where you have to specify the size of individual genes. The sixth column contains the "gene size coefficient" that shows how the "current gene size" is different from the "generic size". You can run GenomePixelizer with the example RunSetup files: GenPix_RunSetup_Extended_A.txt or GenPix_RunSetup_Extended_A_manual.txt to see how it works in extended mode.

GenPix_Tutorial_Input Standard    GenPix_Tutorial_Input Extended


        Alignment Viewer

There is a "plug-in" for GenomePixelizer named "Alignment Viewer". You may use it to view and paint your favorite multiple alignments in different colors. If you click on "Alignment Viewer" on the left bottom corner of the main GenomePixelizer interface, a new window will appear. Place your alignment in ClustalW format under the directory "Alignments", type in the name of your alignment into the entry window and click "View Alignment". You may choose different color schemes to display different amino acids. Just type in the color of your choice into the corresponding entries for the amino acids. The Alignment Viewer works slower for huge alignments (more than 200 genes and more than 1000 amino acids long).

It seems that's all what we can tell for you in this tutorial. You are very welcome to send us your suggestions and comments to improve this document.

Thanks for your patience.






Description
Screenshots
Examples
Manual
Download
Feedback
Authors


email: Alexander Kozik

Last modified, July 03 2002