![]() |
Subscribe to biofriend_india | |
| Browse Archives at groups.google.co.in | ||
Bioinformatics
tools are free or commercial software programs that are designed for
extracting the meaningful information from the mass of molecular biology /
biological databases & to carry out sequence or structural analysis.
Major
categories of Bioinformatics Tools :
There are both standard and customized products to meet the requirements of
particular projects. There are data-mining software that retrieve data from
genomic sequence databases and also visualization tools to analyze and
retrieve information from proteomic databases. These can be classified as
homology and similarity tools, protein functional analysis tools, sequence
analysis tools and miscellaneous tools.
Everyday bioinformatics is working with
sequence search programs like BLAST, sequence analysis programs, like the
EMBOSS and Staden packages, structure prediction programs like THREADER or PHD
or molecular imaging / modelling programs like RasMol and WHATIF.
Homology
and Similarity Tools:
This type of sequences are sequences that are related by divergence
from a common ancestor. Thus the degree of similarity between two sequences
can be measured while their homology is a case of being either true of false.
This set of tools can be used to identify similarities between novel query
sequences of unknown structure and function and database sequences whose
structure and function have been elucidated.
Protein
Function Analysis:
This
is very useful to compare your
protein sequence to the secondary (or derived) protein databases that contain
information on motifs, signatures and protein domains. Highly significant hits
against these different pattern databases allow you to approximate the
biochemical function of your query protein.
Structural
Analysis:
This set of tools
allow you to compare structures with the known structure databases. The
function of a protein is more directly a consequence of its structure rather
than its sequence with structural homologs tending to share functions. The
determination of a protein's 2D/3D structure is crucial in the study of its
function.
Sequence
Analysis:
This set of tools
allows you to carry out further, more detailed analysis on your query sequence
including evolutionary analysis, identification of mutations, hydropathy
regions, CpG islands and compositional biases. The identification of these and
other biological properties are all clues that aid the search to elucidate the
specific function of your sequence.
Some
examples of Bioinformatics Tools:
BLAST:
BLAST ( Basic Local Alignment
Search Tool) comes under the category of
homology and similarity tools. It is a set of search programs designed for the
Windows platform and is used to
perform fast similarity searches regardless of whether the query is for
protein or DNA. Comparison of nucleotide sequences in a database can be
performed. Also a protein database can be searched to find a match against the
queried protein sequence. NCBI has also introduced the new queuing system to
BLAST (Q BLAST) that allows users to retrieve results at their convenience and
format their results multiple times with different formatting options.
Depending on the
type of sequences to compare, there are different programs:
blastp compares
an amino acid query sequence against a protein sequence database
blastn compares a
nucleotide query sequence against a nucleotide sequence database
blastx compares a
nucleotide query sequence translated in all reading frames against a protein
sequence database
tblastn compares
a protein query sequence against a nucleotide sequence database dynamically
translated in all reading frames
tblastx compares
the six-frame translations of a nucleotide query sequence against the
six-frame translations of a nucleotide sequence database.
FASTA:
FAST homology search A ll sequences .An
alignment program for protein sequences created by Pearsin and Lipman in 1988.
The program is one of the many heuristic algorithms proposed to speed up
sequence comparison. The basic idea is to add a fast prescreen step to locate
the highly matching segments between two sequences, and then extend these
matching segments to local alignments using more rigorous algorithms such as
Smith-Waterman.
EMBOSS:
EMBOSS (European Molecular Biology
Open Software Suite) is a
software-analysis package. It can work with data in a range of formats and
also retrieve sequence data transparently from the Web. Extensive libraries
are also provided with this package, allowing other scientists to release
their software as open source. It provides a set of sequence-analysis
programs, and also supports all UNIX platforms.
Clustalw:
It is a fully automated sequence alignment tool for DNA and protein sequences.
It returns the best match over a total length of input sequences, be it a
protein or a nucleic acid.
RasMol:
It is a powerful research tool to display the structure of DNA, proteins, and
smaller molecules. Protein Explorer, a derivative of RasMol, is an easier to
use program.
PROSPECT:
PROSPECT (PROtein Structure Prediction and Evaluation Computer ToolKit) is a
protein-structure prediction system that employs a computational technique
called protein threading to construct a protein's 3-D model.
PatternHunter
:
PatternHunter, based on Java, can identify all approximate repeats in a
complete genome in a short time using little memory on a desktop computer. Its
features are its advanced patented algorithm and data structures, and the java
language used to create it. The Java language version of PatternHunter is just
40 KB, only 1% the size of Blast, while offering a large portion of its
functionality.
COPIA :
COPIA (COnsensus Pattern Identification and Analysis) is a protein structure
analysis tool for discovering motifs (conserved regions) in a family of
protein sequences. Such motifs can be then used to determine membership to the
family for new protein sequences, predict secondary and tertiary structure and
function of proteins and study evolution history of the sequences.
Application
of Programmes in Bioinformatics:
JAVA in
Bioinformatics:
Since research centers are scattered all around the globe ranging from private
to academic settings, and a range of hardware and OSs are being used, Java is
emerging as a key player in bioinformatics. Physiome Sciences' computer-based
biological simulation technologies and Bioinformatics Solutions' PatternHunter
are two examples of the growing adoption of Java in bioinformatics.
Perl in
Bioinformatics:
String manipulation, regular expression matching, file parsing, data format
interconversion etc are the common text-processing tasks performed in
bioinformatics. Perl excels in such tasks and is being used by many
developers. Yet, there are no standard modules designed in Perl specifically
for the field of bioinformatics. However, developers have designed several of
their own individual modules for the purpose, which have become quite popular
and are coordinated by the BioPerl project.
Bioinformatics
Projects:
BioJava:
The BioJava Project is dedicated to providing Java tools for processing
biological data which includes objects for manipulating sequences, dynamic
programming, file parsers, simple statistical routines, etc.
BioPerl:
The BioPerl project is an international association of developers of Perl
tools for bioinformatics and provides an online resource for modules, scripts
and web links for developers of Perl-based software.
BioXML:
A part of the BioPerl project, this is a resource to gather XML documentation,
DTDs and XML aware tools for biology in one location.
Biocorba:
Interface objects have facilitated interoperability between bioperl and other
perl packages such as Ensembl and the Annotation Workbench. However,
interoperability between bioperl and packages written in other languages
requires additional support software. CORBA is one such framework for
interlanguage support, and the biocorba project is currently implementing a
CORBA interface for bioperl. With biocorba, objects written within bioperl
will be able to communicate with objects written in biopython and biojava (see
the next subsection). For more information, see the biocorba project website
at http://biocorba.org/ .
The Bioperl BioCORBA server and client bindings are available in the
bioperl-corba-server and bioperl-corba-client bioperl CVS repositories
respecitively. (see http://cvs.bioperl.org/
for more information).
Ensembl :
Ensembl is an ambitious automated-genome-annotation project at EBI. Much of
Ensembl\'s code is based on bioperl, and Ensembl developers, in turn, have
contributed significant pieces of code to bioperl. In particular, the bioperl
code for automated sequence annotation has been largely contributed by Ensembl
developers. Describing Ensembl and its capabilities is far beyond the scope of
this tutorial The interested reader is referred to the Ensembl website at http://www.ensembl.org/.
bioperl-db:
Bioperl-db is a relatively new project intended to transfer some of Ensembl's
capability of integrating bioperl syntax with a standalone Mysql database ( http://www.mysql.com
) to the bioperl code-base. More details on bioperl-db can be found in the
bioperl-db CVS directory at http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-db/?cvsroot=bioperl
. It is worth mentioning that most of the bioperl objects mentioned above
map directly to tables in the bioperl-db schema. Therefore object data such as
sequences, their features, and annotations can be easily loaded into the
databases, as in $loader->store($newid,$seqobj) Similarly one can query the
database in a variety of ways and retrieve arrays of Seq objects. See
biodatabases.pod, Bio::DB::SQL::SeqAdaptor, Bio::DB::SQL::QueryConstraint, and
Bio::DB::SQL::BioQuery for examples.
Biopython
and biojava:
Biopython and biojava are open source projects with very similar goals to
bioperl. However their code is implemented in python and java, respectively.
With the development of interface objects and biocorba, it is possible to
write java or python objects which can be accessed by a bioperl script, or to
call bioperl objects from java or python code. Since biopython and biojava are
more recent projects than bioperl, most effort to date has been to port
bioperl functionality to biopython and biojava rather than the other way
around. However, in the future, some bioinformatics tasks may prove to be more
effectively implemented in java or python in which case being able to call
them from within bioperl will become more important.