Bioinformatics and Genetics
What is bioinformatics?
Briefly, bioinformatics is the study and process of biological information via computer. Bioinformatics, an interdisciplinary science, develop techniques to store the biological data and retrieve from the store, organize and analyze. Bioinformatics make use of computer science and statistics, in addition to biology. Thus, bioinformaticians are expected to efficiently use, at least one programming language.
How did we come up with bioinformatics?
The name “bioinformatics” was used for the first time by Paulien Hogeweg in 1970, implying the study of information processes in biotic systems (Hogeweg, 2011). Starting from 2001, the field accelerated with the announcement of Human Genome Project. Bioinformatics was hardly distinguishable from molecular biology in 1970’s; it was established as a separate science with the need of increasing amount of data, and with the opportunities increasing computer science offered.
What is genetic data?
The raw material of bioinformatics is the genetic data and the related gene expression. But, what is genetic data? It is the entire DNA properties of an organism, both heritable and inheritable. While from Mendel’s era, the approach to study these properties was mostly indirect (linkage analysis, karyotyping etc.), sequencing technology replaced these approaches in our century. Sequencing is pointing the nucleic acids (A, T, C, G, U) in correct order that forms a nucleic acid polymer (DNA/RNA).
What can we do with genetic data?
Each organism has genes and genetic elements specific to the species. The sequencing technology and genetic analysis, which get faster and cheaper every day, is used to increase life quality in various ways. For instance, genetic analysis can be used to prove the existence of an organism. It might sound strange at first, so check out this example. The most convenient way to understand if a person has Malaria (Plasmodium) in the blood is to make a PCR based genetic analysis, since some types of Plasmodium might not be noticed under microscope. (McCutchan, 2008).
You may picture that you want to classify a number of organisms. These kinds of organizations to determine the relation between species is required for taxonomy science and its applications (finding novel antibiotics for bacteria, plants with improved resistance etc.). Classification and phylogenetic trees were done using features like color and size in Aristotle’s era; nowadays the scientists use and compare much realistic features such as genomes (the entire DNA information of an organism). The bioinformaticians to carry out this task should know about genetics to recognize the genetic patterns, statistics to follow and organize these patterns, and computer science to manage and compute these patterns in computers. In another example, picture that you want to understand the evolutionary steps that mitochondria has been through. It’s bioinformaticians, who follows its evolutionary change in time by tracking tiny genetic traces (Gray et al., 1999). Comparing bacterial genomes to mitochondrial genomes by using genetic and statistical analyses, bioinformaticians presented proof for endosymbiosis. Similarly, tracking the speciation and migration back in time, which is traveling back in evolution requires strong bioinformatics.
How can we use genetic data in industry?
One of the world’s most promising fields is biotechnology, making use of many biological sciences. Imagine that you take sample from a hot and acidic crater, which contains many different organisms. Sequencing the total DNA of the sample (metagenomic analysis) will give very important clues about how different species in this niche manage the high temperature/low pH problem, using which genes; you may even discover novel genes responsible for these organisms to survive in extreme conditions. Finding novel genes is extremely important in biotechnology. The governments and biotechnology companies invest a lot for gene discovery and gene classification for purposes from Polymerase enzyme stable at high temperatures (for PCR technologies) to bacteria converting cellulose to alcohol (for bio-fuels) (Biello, 2010).
How we use genetic information in medicine?
Genetic information differs between individuals of a species. Although this variation helps the species adapt and survive, it also causes some individuals to be prone to specific diseases or to have genetic diseases. Advanced genetic studies are carried out to link the diseases to the responsible genes or mutations. Analyzing someone’s genome, you can plot which diseases s/he carries or have risk to develop. Geneticists might use these bioinformatics tools themselves, but most of the time a bioinformatician gets involved. Sequencing techniques getting widely applied, has become a very common approach for genetic diagnosis. (Illumina has presented 1000 Dollar whole genome sequencing recently). The genetic diagnosis field is highly developed in Turkey, and highly demanded from the neighboring countries. Yet, as we have implied before, holding a data of 3 billion base has no practical use. We need bioinformaticians to select out meaningful knowledge out of this huge data.
Genetic data differs between cells of a multicellular organism as well. The most prominent is the difference between gametes. If ¾ of a couple’s children die as fetus due to a hereditary disease, the fertilization should be carried out in vitro, screening and selecting the healthy embryos. This technique is applied properly in Turkey, as in may other countries. If the disease is novel or rare, the responsible should first define the disease. These kind of situations are the crossroads for research and translation, where bioinformatic applications are required.
Out of the cells showing genetic variation within a single organism, the most dangerous are cancer cells. With the extended life span and exposure to carcinogens, unfortunately 1 out of each 4 people will face cancer. For a successful cancer treatment, the origin of the tumor, stage and the genetic alterations have to be known. With the personalized medicine, the treatment totally changes according to the features of the tumor. Defining the tumor requires molecular biologists in the wet lab, and bioinformaticians in computer lab. Defining tumors, which is widely applied, is just the tip of the iceberg. To dig the unseen part of the iceberg, we are looking for parameters that will help to recognize the tumors – biomarkers. All the leading pharmaceutical companies, institutes and research teams use incredible budgets and human resources for biomarker research and development. In this process, it’s bioinformaticians to finalize the experiments, such as microarrays that molecular biologists starts.
In Turkey there are a number of companies applying bioinformatics, mostly working with research teams: PHI-tech, Genometri, HGM Biyoinformatik, AG Biyoinformatik, Done Genetik, Genomize. Here are some examples of the overseas bioinformatics companies: Celera Genomics (USA, incorporated in the Human Genome Project), Accelrys Inc. (USA), Invitrogen (USA), DNASTAR (USA), Ingenuity Systems (USA), Rosetta Biosoftware (USA) Genedata (Switzerland), CLC bio (Denmark), Biobase (Germany), Biomax Informatics AG (Germany), Inte:Ligand (Austria), Genostar (France), Applied Maths (Belgium), Integromics (Spain), Ocimum Bio Solutions (India), Simbiosys (Canada), AstridBio (Hungary), Cytogenomics (Japan), Health Gene Technologies (People’s Republic of China), Macrogen (Korea). Besides bioinformatic companies, bioinformaticians are employed in pharmaceutical companies, research labs, and universities. Singapore, North American and European countries especially value bioinformatics research highly.
Bioinformatics in basic research
So far we have discussed about the disease state. But life still stays as secret, despite so many details we know about living. We have so little knowledge about how a cell works and how the DNA molecule controls life (The video below prepared by Harvard University summarizes most of our knowledge). The approximately 25,000 genes on approximately 3.2 billion nucleotides are deeply investigated by molecular biologists and geneticists. Yet there are so few clues how this 25,000 genes are controlled, interact, or simply how they react. For example, from the moment pregnancy starts, different tissues and cell groups in human body react differently, and adjust gene expression accordingly. The different physiological outcomes of these cells, which are genetically identical, are not due to act of individual genes but due to genetic pathways, interaction webs and differential coordination of the genetic systems. From this perspective, we can say that molecular biologists and geneticists define single pieces of the puzzle, while bioinformaticians get the pieces of the puzzle together, to get the big picture. Many bioinformatician define the gene interaction webs, gene systems and simulate the system and predict the reaction to the changing environment.
The few examples we discussed here are the most common fields preferred by bioinformaticians, but never limited to that. The advancements within the last decade has showed that the most promising, the revolutionary projects are the multidisciplinary projects performed by large teams in institutes or private companies. Along with the multidisciplinary nature of the projects, such as synthetic biology, bioengineering, or cybernetics; geneticists, cell biologists, chemists, engineers, bioinformaticians, even physicians are involved. Since every assertive project deals with massive amount of data, bioinformaticians are indispensible members of multidisciplinary projects. Besides, in Europe, North America and Far East, all the reputable institutes hire at least one bioinformatician to consult the research teams, which is the routine application for bioinformatics.
Bioinformaticians process the genetic data, so they must have solid genetic background. But the working principle is quite similar with other data analysts. So, even some people, who are educated in bioinformatics, prefer to work in other fields such as social media etc. This way, bioinformatics education will open up the doors of a big world for you.
Biello, D. (2010, Mar 1, 2010). Biofuel from Bacteria. Scientific American , 1.
Gray, M. W. (1999). Mitochondrial Evolution. Science , Vol. 283 no. 5407 , 1476-1481.
Hogeweg, P. (2011, Mar 31). The Roots of Bioinformatics in Theoretical Biology. PLoS Comput Biol. , 7(3): e1002021.
McCutchan, T. F. (2008). Use of Malaria Rapid Diagnostic Test to Identify Plasmodium knowlesi Infection. Emerg Infect Dis. , Nov 2008; 14(11):, 1750–1752.