Imbalances typically exist in bioinformatics and are also common in other areas. A drawback of traditional machine learning methods is the relatively little attention given to small sample classification. Thus, we developed imDC, which uses an ensemble learning concept in combination with weights and sample misclassification information to effectively classify imbalanced data. Our method showed better results when compared to other algorithms with UCI machine learning datasets and microRNA data.
In this study, we comparatively assessed multiple sequences of the leptin protein from different animal species to establish new insights into conservation degree of biological sequences and evolutionary biology among mammals using computational biology tools.
Identification of cancer-associated and tissue-specific proteins is important for research on carcinogenesis mechanisms and biomarker discovery. Here we performed a new strategy to identify candidate cancer proteins by mining immunohistochemistry protein profiles. Proteins with quantitative values from 14 normal tissues and their corresponding cancer tissues were compared and analyzed using bioinformatics.
Anencephaly is one of the most serious forms of neural tube defects (NTDs), a group of congenital central nervous system (CNS) malformations. MicroRNAs (miRNAs) are involved in diverse biological processes via the post-transcriptional regulation of target mRNAs. Although miRNAs play important roles in the development of mammalian CNS, their function in human NTDs remains unknown.
Glycoprotein C is one of the duck plague virus (DPV) glycoproteins and is encoded by the DPV UL44 gene. DPV glycoprotein C (DPV-gC) comprises 431 amino acids with a putative molecular mass of 47.35 kDa. Sequence analysis indicated that the protein possesses typical characteristics of type-I membrane glycoproteins, containing an N-terminal signal sequence, an external domain, a C-terminal membrane anchor region, and a short cytoplasmic domain.
The goals were to analyze and characterize the secondary structure, regions of intrinsic disorder and physicochemical characteristics of three classes of mutations described in the enzyme N-acetylgalactosamine-6-sulfatase that cause mucopolysaccharidosis IVA: missense mutations, insertions and deletions. All mutations were compared to wild-type enzyme, and the results showed that with 25 of 129 missense mutations secondary structure was maintained and that 104 mutations showed minor changes, such as an increase or decrease in the size of the elements.
The purpose of this study was to identify differentially expressed genes and analyze biological processes related to leukemia. A meta-analysis was performed using the Rank Product package of Gene Expression Omnibus datasets for leukemia. Next, Gene Ontology-enrichment analysis and pathway analysis were performed using the Gene Ontology website and Kyoto Encyclopedia of Genes and Genomes. A protein-protein interaction network was constructed using the Cytoscape software.
Coffee is one of the most important commodities in the world, and its production relies mainly on two species, Coffea arabica and Coffea canephora. Although there are diverse transcriptome datasets available for coffee trees, few research groups have exploited the potential knowledge contained in these data, especially with respect to fruit and seed development.
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix.
Farming of Haliotis midae is the most lucrative aquaculture venture in South Africa. The genome of this species needs to be studied to assist in selective breeding programs aimed at increasing overall yield, and molecular markers will be required to attain this goal. We identified and characterized 82 polymorphic microsatellite loci by using repeat-enriched genomic libraries and high-throughput pyrosequencing technology.