At DNAlytics, we are confronted everyday to the management and analysis of large datasets. If you want to learn more about our technologies, read our data science technology page. To support our data analysis efforts, we develop and maintain a series of key software libraries. Some of them are open-source: feel free to use them, and drop us some comments and feedbacks to share your experience ! Others are proprietary technologies, that we will be happy to distribute or apply for you upon request.
LiblineaR – an R package for large-scale linear modeling supporting classification and regression of large datasets. The original software in C/C++ was developed by Prof. Chih-Jen Lin and his team at the Machine Learning Group of the Taiwan University. As most of our developments are done in the open source R language, we have developed the R library LiblineaR, making all the functionalities of the original library available in the R environment. This software library thus offers to bioinformaticians, machine learners, data miners, to process data sets with very large number of features and/or large number of observations right in their favorite language. Recent versions of our package offers not only support for classification, but also for regression. The use of sparse matrices is also supported. We conteinue maintaining it as an open source software : it is freely available and is downloaded on average 2000 times every month!
jForest – an ensemble tree-based model library is a general framework for Machine Learning, which implements tree ensemble-based classification methods. It is designed to be very modular and allows easy tuning and modification of the tree induction, classification criterion and feature importance index. It is developed in Java and bundled in the form of an R package. jForest implements the statistically interpretable feature importance index. You can download jForest code and learn more on the feature importance index proposed in this package by reading the paper Inferring statistically significant features from random forests published in Neurocomputing in 2015.
REED – Rapid and Easy Evaluation of Datasets is a web application which aims at automatically process a dataset in order to get a quick guess of the potential predictive modeling and markers identification. Indeed, we know how frustrating it is to obtain poor results after having invested time and money into a data mining project! It is also a pity for us when we have to announce disappointing project outcome. This is why we have built this tool for a fast preliminary evaluation… which we make available to you for free ! Just visit REED page and follow the instructions !
BLISS – Biomarker List Interpretation Simple Software
contributes to the early process of identifying (multivariate) gene signatures for any predictive purpose (diagnosis, prognosis or treatment guidance). Establishing a joint interpretation for a list of genes may be a very uncomfortable task ; gathering up-to-date information about a gene list may take weeks and is rarely complete ; identifying the interactions between markers are rarely taken into account… At DNAlytics, we have developed BLISS to facilitate those tasks : BLISS collects information from well known databases such as Entrez, OMIM, UCSC, GO, Kegg, DrugBank, … and, for a given list of genes, it highlights the pathways connecting them, drug interacting with them, etc. You may have a look at the BLISS example report established for a given list of genes.
Next to those packages supporting our service activities, we also develop and maintain software solutions for dedicated business applications, as for example HERCULE, our software suite supporting the analysis and enhancement of biomanufacturing activities. We may also integrate some of our developments in custom software that will be delivered to our partners upon completion of a dedicated project.