PhyloGrapher - Graph Visualization Tool



Description
Screenshots
Examples
Download
Credits




PHYLOGRAPHER OVERVIEW

PhyloGrapher is a program designed to visualize and study evolutionary relationships within families of homologous genes or proteins (elements). PhyloGrapher is a drawing tool that generates custom graphs for a given set of elements. In general, it is possible to use PhyloGrapher to visualize any type of relations between elements.

Each gene or protein on PhyloGrapher's graph is represented as a colored node (vertex) and connected to other nodes (vertices) by lines (edges) of variable thickness and color based on the similarity of genes or proteins (distance matrix). The position of each node in the graph is flexible and adjusted by the user to optimize visualization of the inter-relationships between the nodes. Consequently, the physical distances on the graph between nodes have no information content unlike classical phylogenetic trees. The level of similarity between genes or proteins on PhyloGrapher's graphs is indicated by color and line thickness.

PhyloGrapher is written in Tcl/Tk and works on Unix/Linux or Windows that supports the Tcl/Tk toolkit which can be downloaded for free at tcl.activestate.com. Macintosh computers may have a problem running PhyloGrapher because of the one-button mouse.


BASIC ELEMENTS and CHARACTERISTICS of GRAPHS

The critical characteristics of graphs in general are which dots are connected by which lines.

The basic elements and characteristics of a graph are its: vertex (node), edge, degree of a vertex (the number of edges that touch it), size (number of vertices), path (a route from vertex to vertex), length of the path (number of edges in a path), planar and non-planar graphs (graph is planar if it can be drawn on a plane so that the edges intersect only at the vertices), distance (shortest path), diameter (longest distance between two vertices), isolated vertex (a vertex of degree zero), adjacent vertices (connected by an edge), neighborhood (adjacent vertices) and others (you can check our collection of web links). A tree is a connected graph containing no cycles.


GOALS TO ACHIEVE WHEN MANIPULATING THE GRAPH

You may use a variety of criteria to maximize the readability of the graph. In general, optimization goals can be:
Minimize: edge crossings and graph area;
Maximize: symmetries and smallest angle between edges.

You should pay attention to the basic elements of the graph, its position and relationship (linkage), and make any kind of scientific conclusion at your own risk. PhyloGrapher is liscenced under the GPL and the author is not responsible for anything that may happen.


INPUT FILES and PROGRAM OPERATION

To create a graph using PhyloGrapher you need to set up two input files: 1. a list of elements (genes or proteins) and 2. the distance matrix file. Examine the files in the directory "Matrix". The structure of these files is simple. "List" file contains just IDs of elements (genes) of your data set. "Matrix" file contains identity values for each pair of elements (genes). PhyloGrapher reads data only from the first three columns of "Matrix" file. In the "Matrix" file identity values are in the third column and they should be normalized between 0 and 1. All other columns in the "Matrix" file are ignored by PhyloGrapher. By clicking on "Load Data into Memory" the program reads the list of genes and the matrix file and creates data structure in the computer memory that will be used to construct the graph on canvas by clicking on "Run". Nodes can be assigned different colors representing different qualities (e.g. species or linkage groups). You can paint nodes individually or in "batch mode", click on "Node Painter" in "Extras".

PhyloGrapher initially generates non-organized graphs by placing all elements in a circle in the same order as in the list of elements in the input file and connecting related nodes by lines. PhyloGrapher then allows you to move each node around using mouse (click and drag) to make it easier to interpret the graph.

You can use the Canvas Editor from the "Extras" to finish editing your graph. You can save coordinates of the nodes for future projects. To do this, click on "Node Coords" under "Extras". And finally, you can save your graph as postscript file, (or take a screenshot), and if you want, generate HTML image map links for your graph.


PROGRAM FUNCTIONALITY and
KEY BINDINGS TO MANIPULATE THE IMAGE

Mouse:
left button - drag the node,
middle button - print node ID or edge weight on canvas (the same as double left click in the Windows version),
right button - shake the node and view the degree.
Keyboard:
w - change color of node to white, b - change color of node to blue, s - change color of node to dark blue, c - change color of node to cyan, g - change color of node to green, o - change color of node to orange, r - change color of node to red, v - change color of node to violet, p - change color of node to purple.
Control-d - delete object from canvas (You can not delete nodes and edges generated from data file).
To find the node with a given ID, type the ID in the Node Entry window and press Enter.

You can change the default font size of the printed ID and edge weight by opening the "Canvas Editor" from "Extras" and changing tge font size value in the corresponding window.

You can use PhyloGrapher to draw custom graphs by switching into "Manual Mode". Choose from "Project Configuration" window "Manual Drawing" option and click "Run". The empty canvas should appear and using key "n" you can create new node. To draw the edge from one node to another point mouse over the first node and press "1" key, then point mouse over the second node and press "2" key. New edge should appear.


GRAPHICAL FASTA/SSEARCH VIEWER

PhyloGrapher is highly integrated with Bill Pearson's FASTA/SSEARCH programs. You can run FASTA/SSEARCH in real time within PhyloGrapher for a given pair of sequences (nodes) (FASTA and SSEARCH must be installed on your computer, see ftp.virginia.edu/pub/fasta). This feature allows the user to check different alignment parameters, such as Smith-Waterman score, identity value, and overlap length. PhyloGrapher runs FASTA/SSEARCH and parses the results of the search and represents all data graphically which simplify the validation of a given alignment.

To run FASTA/SSEARCH within PhyloGrapher: From the "Extras" menu click on "Smith-Waterman". A new window will appear with following entry fields:
    1. Database (directory where you store sequences in fasta format)
    2. file extension (type of file extension of sequence files)
    3. program (type of search to perform: SSEARCH by default or FASTA)
    4. Node A ("query" sequence file)
    5. Node B ("library" sequence file)

Point your mouse cursor over the "query" node (gene) and press <Control-a>; the ID of node "A" in the corresponding entry window should appear. Then point the mouse cursor over the "library" node (gene) and press <Control-s>; the ID of node "B" in the corresponding entry window should appear. Then click on "Show Alignment" and PhyloGrapher will run FASTA or SSEARCH with a pair of sequences of your choice (node "A" vs node "B") and display the graphical representation of the alignment. You can save the results of the search as a text file as well as a postscript file in the directory "Saved_Work" by clicking on the "save as" button. You can change the font size of fasta output using the "Canvas Editor".

To install the FASTA and SSEARCH programs on your computer go to Bill Pearson's ftp site (ftp.virginia.edu/pub/fasta) and download the current version of the FASTA distribution corresponding to your computer platform.

On Windows uncompress the .zip FASTA file and copy the executables fasta34.exe and ssearch34.exe into the main WINDOWS directory. Read the FASTA documentation.

On Linux you need to compile the FASTA source code to get the executables fasta34 and ssearch34. For that, copy Makefile.linux to Makefile, and run "make". Copy the fasta34 and ssearch34 binaries into the /usr/local/bin/ directory. Read the FASTA documentation.


GRAPH TRAVERSAL (GRAPH SEARCH)

PhyloGrapher has a graph traversal (graph search) functionality. For example, you can highlight all nodes belonging to a single group (connected graph), or you can select adjacent nodes to any given node. From the "Extras" dialog, select "Adjacent nodes" and then generate the adjacency list file by clicking on "Build Adjacency List". The adjacency list is based on the data of the "list" and "matrix" files from main menu. Remember, with different identity cutoff values ("no lines below...") from main menu PhyloGrapher generates different adjacency lists. As soon as the adjacency list is available you can select the node you want to analyze by pointing the mouse cursor over the node. Then click on "Highlight Adjacent Nodes" to highlight adjacent nodes or "Highlight All Nodes in Group" to select all nodes belonging to a connected graph. In this case PhyloGrapher performs Depth-First Search (DFS) and you can observe the progress of search visually.


GRAPH SELF ORGANIZATION

We attempted to implement a modified version of the Fruchterman-Rheingold algorithm to organize the graph layout in automatic mode. From "Extras" choose "Fruchterman-Rheingold" and try to run it (Self Organize button). Try running it with the default data set (My_Matrix_File.txt) to get an impression of what the current version can achieve. It is far from perfect, but nodes move in the right direction and form proper groups. You can try different parameters to generate different graph layouts.


EXAMPLE DATA

PhyloGrapher contains example data for three large gene families of Arabidopsis thaliana: NB-ARC (nucleotide binding site containing proteins, putative resistance genes), cytochrome P450 and putative protein kinases (PK-LRR). Protein sequences for selected subsets you can find under the directory "Database/TIGR". All sequences derived from TIGR database (ftp.tigr.org/pub/data/a_thaliana/ath1/SEQUENCES) on September 2002. Because the annotation, prediction of exon-intron structure, is dynamic, protein sequences in this release of PhyloGrapher may not correspond to current state of these sequences at TIGR or other databases. This set is chosen as an example set and author is not responsible for its regular update. For your own project you may want to create new directory under directory "Database" and place new set of sequences there.

Default set of genes "My_ID_List.txt" is a set of Arabidopsis cytochrome P450 putative proteins. Corresponding matrix "My_Matrix_File.txt" was derived by parsing of the results of BLAST search. You can find the parser under directory "Scripts".


Feedback and comments may be sent to Alexander Kozik, email: akozik@atgc.org



PhyloGrapher is under the GNU General Public License

Copyright © 2001 University of California at Davis, Alexander Kozik




Description
Screenshots
Examples
Download
Credits


email: Alexander Kozik
Last modified, November 07 2002