REED – Rapid and Easy Evaluation of Datasets

Predictive modeling is a complex science. But what is more frustrating that obtaining poor or no results at all after having invested time and money in a data mining project? At DNAlytics, we clearly understand this. On our side, it is also a pity to have to announce such poor project outcome to our customers. We definitely don’t like it. That is why we now propose a very fast evaluation of the potential value of your data, and this for free!

What does REED do?

REED (Rapid and Easy Evaluation of Datasets) is a web application which aims at automatically process a dataset in order to get a quick guess of the interest the data represents in terms of predictive modeling and markers identification. The idea is not at all to perform our best work, which cannot be automated, but to give to prospects some hints about their data potential and also some specific issues that the data would contain, and that should be looked at in details. In particular, REED provides:

  • Estimation of predictive performances either with all available features or a good scenario with a smaller number of features automatically selected by our algorithms.
  • The number of features that we judge necessary to transform or drop because of their particular behavior
  • An outlier analysis, suggesting that some samples might be considered for removal before processing the data further.
  • The number of features statistically significantly differing from one diagnosis to the other.

How to use REED?

We keep it simple:

  1. Format and upload your dataset as specified
  2. Check that the data is correctly understood by the software (i.e. it correctly detects classification versus regression problems, that the number of numerical, categorical, binary variables are ok, …)
  3. Confirm the correct upload to launch the analysis
  4. Wait from a fraction of second to a few minutes depending on the dataset, and have a look at your results, sent directly to your mailbox.

Do you REED us?

First, remember that normally, each step of a full analysis would lead to conclusions after manual observation, which impacts the rest of the analysis (a good example is the detection of outliers). Remember also that REED does not at all reproduce DNAlytics’ standard analysis pipeline in real cases. REED uses just one type of feature selection and classification algorithms, while there are theoretical works proving that such an option cannot be guaranteed to suit all practical cases (some algorithms work better in some situations and not in others). This application also does not allow for prior knowledge incorporation (when you have some preliminary results from one tech platform and want to use that knowledge in another setting / with another technology platform). But, it is fast, and it likely provides a good hint about the opportunity to enter or not such a detailed and more manual analysis. It’s a dirty but fast run, already providing useful insights on the work to be performed.

If you find the results interesting, contact us to ask for a quotation for our best work with a full analysis!

Below is a screenshot of the application, presenting the key metrics it provides as results, as well as the terms and conditions in which it can be used.


