Category Archives: format

MIAPPE 1.1

A new version of MIAPPE has been released!

Based on the developments proposed by EXCELERATE project and two rounds of Request for Comments from the community, the new version of MIAPPE:

  • extends the minimum information requirements to deal with observational studies (in particular woody plants),
  • improves recommendations of ontologies to annotate experimental metadata,
  • proposes a formal data model for MIAPPE-compliant datasets, that is compatible with BrAPI and ISA approaches.

The paper summarising the details of this work is in preparation. Read more on miappe.org.

A new paper published

Following the opinions expressed in Krajewski et al., we are presenting a new paper summarising the solutions proposed for the improvement of phenotypic data description:

Hanna Ćwiek‑Kupczyńska, Thomas Altmann, Daniel Arend, Elizabeth Arnaud, Dijun Chen, Guillaume Cornut, Fabio Fiorani, Wojciech Frohmberg, Astrid Junker, Christian Klukas, Matthias Lange, Cezary Mazurek, Anahita Nafissi, Pascal Neveu, Jan van Oeveren, Cyril Pommier, Hendrik Poorter, Philippe Rocca‑Serra, Susanna‑Assunta Sansone, Uwe Scholz, Marco van Schriek, Ümit Seren, Björn Usadel, Stephan Weise, Paul Kersey and Paweł Krajewski
Measures for interoperability of phenotypic data: minimum information requirements and formatting 
Plant Methods, 2016. DOI 10.1186/s13007-016-0144-4

Configuration change

The new version of the ISA-Tab plant phenotyping configuration reorganises location of particular elements of MIAPPE attributes at Study and Assay level. It is a result of the current policy of the ISA-Tab group, namely, to allow for application-specific Study files. No longer bound by the default shape of the Study file, we decided to split and separate the information about the plant experiments itself from the information about the phenotyping procedures.

All attributes describing plant experiment, i.e. biological objects and its handling, together with growth conditions, are now described in the Study file. Rows of the Study file correspond to experimental units, as planned in the experimental design. There are three versions of the Study file – dedicated to basic, field and greenhouse experiments.

The description of phenotyping procedures, i.e. trait measuring, is placed in the Phenotyping Assay file. The distinctive characteristics of this file is the link to Trait Definition File, where the observed variables (phenotypic traits and environmental variables) are described. Phenotyping Assay can be enriched by e.g. ‘Time’ attribute, or the settings of the phenotyping platform.

The latest version of the plant phenotyping configuration for the ISA-Tab format can be found here.

Configuration change

New version of the ISA-TAB phenotyping configuration is a consequence of our recent developments in defining minimum information set, according to the opinions expressed in our paper. We have added some environmental information and reordered most of the attributes by grouping them into protocols. Detailed description is given below. The latest implementation of MIAPPE recommendations in the ISA-TAB format can be found here.

We are proposing an ISA-TAB phenotyping configuration that includes the general information about any experiment in Investigation and Study files, and more specific phenotyping-related features in the Assay files:

  • Basic Phenotyping Assay – a set of obligatory basic characteristics, common for all phenotyping experiments. You should be able to provide all the information for this assay, irrespective of the type of analysis you are about to describe. Use this assay to start formatting your experiment data, and add more attributes characterising the biosources, experimental design, conditions and treatments according to the needs of each analysis. Consult MIAPPE to be sure that you don’t skip any important information. In the simplest case, you can use the phenotyping configuration with this assay to format and store datasets for which no additional parameters are known, apart from objects names and observed values. Such cases are common, although the scientific value of the observations of phenotypic traits without environmental condition records is low.
  • Field Phenotyping Assay – extension of the basic assay for field experiments. Apart from the general attributes, it includes the basic characteristics describing field phenotyping experiment, i.e. sowing, watering, fertilisation, aerial conditions, etc. Each such area is defined as a protocol with a number of parameters that should be provided, e.g. for Aerial conditions protocol specify air humidity, light intensity, day temperature, and night temperature.
    Use this assay to format data from field experiments – most of the recommended attributes are already there! Add more attributes to better describe the biosources, specify additional factors, treatments, etc.
  • Greenhouse Phenotyping Assay – an extension of the basic assay for greenhouse experiments. Apart from the general attributes, it includes the basic characteristics describing handling of the plants in greenhouse experiments. It differs from the field configuration mainly in the Rooting protocol – you should describe parameters of pots instead of plots.
    In this assay most of the recommended attributes for greenhouse experiments are already defined, but you can add more to better describe the biosources, specify additional factors, treatments, etc.

Whichever Assay you choose to start from, consult MIAPPE to find out what other attributes are recommended to record and include in the description of a phenotyping experiment. In the field and greenhouse assay we include most of the obligatory ones that are common to most experiments. In MIAPPE we name also more specific attributes which pertain to some trials only. In such a case, it is important for the correct biological interpretation of your experiment that you add them to the formatted Assay. To do this you can define a new Characteristics (for more detailed description), a new Factor (to underline a simple multi-valued attribute that differentiates between rows), a new Parameter of a Protocol (to add another characteristics of some process or treatment), or a new Protocol (to define another process or treatment), perhaps also described by some parameters.

Using phenotyping ISA-TAB configuration in practice

Below we present some comments and advice regarding using the ISA-TAB for Phenotyping configuration. You can download the current version of the configuration from Downloads.

How to use this configuration
The configuration is meant as a minimal set of fields that are necessary to describe the experiment and data according to MIAPPE. We propose it as a configuration that can be used to format data from phenotyping experiments, or as a starting point for building specific configurations that deal with different phenotyping situations. However, in the constructed data files and in the “derived” configurations the names of fields should not be changed and the fields should not be removed even if not used. Specifically:

  • At the “study” level there are two Characteristics[] fields: Organism and Infra-specific name that should be used to describe the biological source. We propose to use Organism for definition of the taxonomy unit. Infra-specific name is to be used for any further describing attribute like accession, variety, ecotype, line, etc. The field name Infra-specific name should not be changed, but its actual meaning in every situation should be resolvable from the corresponding Term Source REF pointing to a collection of names in a database.
  • According to suggestions from users, there are two places where the organism part can be specified: Organism part at the “study” level (if the same plant tissue is used in all assays) and Material type at the “assay” level (if assays use different tissues). At least one of them should be used.

Two levels: “study” and “assay”
There are different possible ways of using the configuration in practice. The biosource specified at “study” level can mean a plant variety (ecotype, etc.) or a “variety x treatment” combination, if all assays within the study use the same combinations. If not, the treatments can be specified at “assay” level. Also, the “biosource” can mean different pools of material:

  • plant population represented in different assays – possibly run in different experimental designs – by a subset of its samples,
  • field plot represented in different assays by samples of plants or different plant parts.
  • one plant represented in different assays by the same parts used for many measurements (non-destructive phenotyping) or different parts.

In every situation, all “assays” within one “study” can use only Sample names defined in this “study”. Of course, Sample names can be repeated in different “assays” and any “assay” can use just a subset of Sample names.

Derived Data File
The Derived Data File specified at “assay” level may be of any format that is fully described in the corresponding Protocol. If there is no description of the protocol, the Derived Data File should be a plain tab-separated text file in which the first column contains values from the field Assay Name in “assay” file and the rest of the columns give observations of all traits obtained for each Assay name. The columns for observations should be named according to the Measurement Type column in the Trait Definition File (see below). So, this is an “Assay name x Trait” matrix of observations (quantitative or qualitative). The key field linking “assay” file and Derived Data File is Assay name. The Derived Data File may by incomplete with respect to the number of Assay names (rows) in comparison to “assay” file.

Trait definitions
All observed traits must be described in the Trait Definition File. The description consists of:

  • Measurement Type: description of the trait, or its local name annotated to an external ontology,
  • Technology Type: description of the measurement method, or its local name annotated to an external ontology,
  • Unit Name:  description of units of measurement or the scale in which the observations are expressed; if possible, standard units and scales should be used, and a reference to existing ontologies should be used; in case of a non-standard scale full explanation should be given in Technology Type field.

Configuration change

In the process of implementing the ISA-TAB by collaborating institutions, the following changes in the Phenotyping configuration have been introduced:

  • Material Type column added in assay file.
    Another way to describe the samples.
  • Derived Data File changed.
    For easier inclusion of data coming from various sources, the requirement to format derived files according to ISA-TAB convention was relaxed. The Derived Data File should contain a column Assay Name, with assay ids linking it to the metadata in assay file, followed by columns of data. Data column headers should be equal to trait names, defined in Trait Definition File as Measurement Type. There is no need to use Trait Value[] wrapping.
    As a consequence, the annotation of trait values in Derived Data File is no longer possible. Description of scale should be wholly provided in Trait Definition File.

Visit downloads to get the current version of files.

Configuration change

After another round of consultations with ISA-Team the following changes were introduced in the phenotyping configuration and in the example files:

  • Declaration of Sample Name moved to study file.
    Assay file should start with Sample Name column which references declaration of samples from the study file. Samples can correspond directly to sources (if no factors and protocols applied), or can represent a combination of sources and factors. Further analysis-specific factors can be added in the assay file.
  • Assay Name column added to assay file.
    Derived Data File changed to ‘Assays by Traits‘ format.
    The column Assay Name identifies rows in the assay file, and links them to the observations in Derived Data File, which implies a change of Derived Data File column names – instead of ‘Samples by Traits‘ it becomes a ‘Assay by Traits‘ matrix.
    Introduction of Assay Name column allows for specification of more levels of the processing of the plant material, i.e., there can be also an Extract Name and Labeled Extract Name columns defined after the Sample Name (and before Assay Name).
  • Introduction of another Protocol REF, Normalization Name and Data Transformation Name.
    In order to comply with the general ISA-Tools configuration and allow usage of ISA-validator for phenotyping ISA-Tab files, we have adopted the recommendations of ISA-Team regarding these columns:
    Protocol REF should be used to describe in all transitions of data from one node to another (Source Name > Sample Name > Extract Name > Labeled Extract Name > Assay Name), and to specify data processing for Derived Data File.
    Normalization Name and Data Transformation Name should proceed all derived data files. Unlike Protocol REF, they are not linked to any broader description in the investigation file, nor can take parameters. They can be either just symbolic names or more elaborated descriptions in this very column, or stay empty.
  • File naming convention.
    ISA-Tab prefixes (i_, s_, a_) in names of file types were corrected; other-then-standard ISA files do not include them now.

Visit Downloads to get the current version of files.

Configuration change

After the first feedback from ISA-Team the following changes were introduced in the phenotyping configuration and in example files:

  • Sufficient Data File column removed from Phenotyping Assay.
    Instead of introducing another column type, it was decided that sufficient data (along with other statistics, contrasts, model parameters and estimates) should be added as another Derived Data File proceeded by a Protocol column explaining the processing. Only one Derived Data File column (the one with phenotype values as ‘Samples per Traits’ matrix) is necessary.
  • Changed column names in Trait Definition File file.
    Because the names of new columns should be reusing existing tags as much as possible, the proposed names Trait Name, Method Name in Trait Definition File were replaced with Measurement Type and Technology Type.

Visit downloads to get the current version of files.