Module: Metadata

The Metadata module connects information about the images (i.e., metadata) to your list of images for processing in CellProfiler.
The Metadata module allows you to extract and associate metadata with your images. The metadata can be extracted from the image file itself, from a part of the file name or location, and/or from a text file you provide.

What is "metadata"?

The term metadata refers to "data about data." For many assays, metadata is important in the context of tagging images with various attributes, which can include (but is not limited to) items such as the following: It can be helpful to inform CellProfiler about certain metadata in order to define a specific relationship between the images and the associated metadata. For instance:

The underlying assumption in matching metadata values to image sets is that there is an exact pairing (i.e., a one-to-one match) for a given combination of metadata tags. A common example is that for a two-channel microtiter plate assay, the values of the plate, well, and site tags from one channel get matched uniquely to the plate, well, and site tag values from the other channel.

What are the inputs?

If you do not have metadata that is relevant to your analysis, you can leave this module in the default setting, and continue on to the NamesAndTypesmodule If you do have relevant metadata, the Metadata module receives the file list produced by the Images module. It then associates information to each file in the File list, which can be obtained from several sources:

What do the settings mean?

See below for help on the individual settings. In general, the settings serve in various forms of metadata extraction. You can extract metadata from all images from Images modules or a subset of them by using rules to filter the list.

What do I get as output?

The final product of the Metadata module is a list of files from the Imagesmodule, accompanied by the associated metadata retrieved from the source(s) provided and matched to the desired images.

As you are extracting metadata from your various sources, you can click the "Update" button below the divider to display a table of results using the current settings. Each row corresponds to an image file from the Images module, and the columns display the metadata obtained for each tag specified. You can press this button as many times as needed to display the most current metadata obtained.

Some downstream use cases for metadata include the following:

If the metadata originates from an external source such as a CSV, there are some caveats in the cases when metadata is either missing or duplicated for the referenced images; see the NamesAndTypes module for more details.

Available measurements

Settings:

Extract metadata?

Select Yes if your file or path names or file headers contain information (i.e., metadata) you would like to extract and store along with your measurements. See the main module help for more details.

Metadata data type

Metadata can be stored as either a text or numeric value:

Metadata types

(Used only when Choose for each is selected for the metadata data type)
This setting determines the data type of each metadata field when stored as a measurement.

Metadata extraction method

Metadata can be stored in either or both of two ways:

The Metadata module can extract internal or external metadata from the images in any of three ways: Specifics on the metadata extraction options are described below. Any or all of these options may be used at time; press the "Add another extraction method" button to add more.

Metadata source

You can extract the metadata from the image's file name or from its folder name.

Regular expression

(Used only if you want to extract metadata from the file name)
The regular expression to extract the metadata from the file name is entered here. Note that this field is available whether you have selected Text-Regular expressions to load the files or not. Please see the general module help for more information on construction of a regular expression.

Clicking the magnifying glass icon to the right will bring up a tool for checking the accuracy of your regular expression. The regular expression syntax can be used to name different parts of your expression. The syntax (?P<fieldname>expr) will extract whatever matches expr and assign it to the measurement,fieldname for the image.

For instance, a researcher uses plate names composed of a string of letters and numbers, followed by an underscore, then the well, followed by another underscore, followed by an "s" and a digit representing the site taken within the well (e.g., TE12345_A05_s1.tif). The following regular expression will capture the plate, well, and site in the fields "Plate", "Well", and "Site":

^(?P<Plate>.*)_(?P<Well>[A-P][0-9]{1,2})_s(?P<Site>[0-9])
^Start only at beginning of the file name
(?P<Plate>Name the captured field Plate
.*Capture as many characters as follow
_Discard the underbar separating plate from well
(?P<Well>Name the captured field Well
[A-P]Capture exactly one letter between A and P
[0-9]{1,2}Capture one or two digits that follow
_sDiscard the underbar followed by s separating well from site
(?P<Site>Name the captured field Site
[0-9]Capture one digit following

The regular expression can be typed in the upper text box, with a sample file name given in the lower text box. Provided the syntax is correct, the corresponding fields will be highlighted in the same color in the two boxes. Press Submit to enter the typed regular expression.

You can create metadata tags for any portion of the filename or path, but if you are specifying metadata for multiple images, an image cycle can only have one set of values for each metadata tag. This means that you can only specify the metadata tags which have the same value across all images listed in the module. For example, in the example above, you might load two wavelengths of data, one named TE12345_A05_s1_w1.tif and the other TE12345_A05_s1_w2.tif, where the number following the w is the wavelength. In this case, a "Wavelength" tag should not be included in the regular expression because while the "Plate", "Well" and "Site" metadata is identical for both images, the wavelength metadata is not.

Note that if you use the special fieldnames <WellColumn> and <WellRow> together, LoadImages will automatically create a <Well> metadata field by joining the two fieldname values together. For example, if <WellRow> is "A" and <WellColumn> is "01", a field <Well> will be "A01". This is useful if your well row and column names are separated from each other in the filename, but you want to retain the standard well nomenclature.

Regular expression

(Used only if you want to extract metadata from the path)
Enter the regular expression for extracting the metadata from the path. Note that this field is available whether you have selected Text-Regular expressions to load the files or not.

Clicking the magnifying glass icon to the right will bring up a tool that will allow you to check the accuracy of your regular expression. The regular expression syntax can be used to name different parts of your expression. The syntax (?<fieldname>expr) will extract whatever matches expr and assign it to the image's fieldname measurement.

For instance, a researcher uses folder names with the date and subfolders containing the images with the run ID (e.g., ./2009_10_02/1234/) The following regular expression will capture the plate, well, and site in the fields Date and Run:
.*[\\/](?P<Date>.*)[\\/](?P<Run>.*)$
.*[\\/]Skip characters at the beginning of the pathname until either a slash (/) or backslash (\) is encountered (depending on the operating system)
(?P<Date>Name the captured field Date
.*Capture as many characters that follow
[\\/]Discard the slash/backslash character
(?P<Run>Name the captured field Run
.*Capture as many characters as follow
$The Run field must be at the end of the path string, i.e., the last folder on the path. This also means that the Date field contains the parent folder of the Date folder.

Extract metadata from

Select whether you want to extract metadata from all of the images chosen by the Images module or a subset of the images.

This setting controls how different image types (e.g., an image of the GFP stain and a brightfield image) have different metadata extracted. There are two choices:

Select the filtering criteria

Select Yes to display and use rules to select files for metadata extraction.

Clicking the rule menus shows you all the file attributes, operators and conditions you can specify to narrow down the image list.

  1. For each rule, first select the attribute that the rule is to be based on. For example, you can select "File" to define a rule that will filter files on the basis of their filename.
  2. The operator drop-down is then updated with operators applicable to the attribute you selected. For example, if you select "File" as the attribute, the operator menu includes text operators such as Contain or Starts with. On the other hand, if you select "Extension" as the attribute, you can choose the logical operators "Is" or "Is not" from the menu.
  3. In the operator drop-down menu, select the operator you want to use. For example, if you want to match data exactly, you may want the "Exactly match" or the "Is" operator. If you want the condition to be more loose, select an operator such as "Contains".
  4. Use the condition box to type the condition you want to match. The more you type, the more specific the condition is.
    • As an example, if you create a new filter and select File as the attribute, then select "Does" and "Contain" as the operators, and type "Channel" as the condition, the filter finds all files that include the text "Channel", such as "Channel1.tif" "Channel2.jpg", "1-Channel-A01.BMP" and so on.
    • If you select "Does" and "Start with" as the operators and "Channel1" in the Condition box, the rule will includes such files as "Channel1.tif" "Channel1-A01.png", and so on.
  5. You can also create regular expressions (an advanced syntax for pattern matching; see below) in order to select particular files.

To add another rule, click the plus buttons to the right of each rule. Subtract an existing rule by clicking the minus button.

You can also link a set of rules by choosing the logical expression All or Any. If you use All logical expression, all the rules be true for a file to be included in the File list. If you use the Any option, only one of the conditions has to be met for a file to be included.

If you want to create more complex rules (e.g, some criteria matching all rules and others matching any), you can create sets of rules, by clicking the ellipsis button (to the right of the plus button). Repeat the above steps to add more rules to the filter until you have all the conditions you want to include.

Details on regular expressions

A regular expression is a general term refering to a method of searching for pattern matches in text. There is a high learning curve to using them, but are quite powerful once you understand the basics.

Patterns are specified using combinations of metacharacters and literal characters. There are a few classes of metacharacters, partially listed below. Some helpful links follow:

The following metacharacters match exactly one character from its respective set of characters:

MetacharacterMeaning
.Any character
[]Any character contained within the brackets
[^]Any character not contained within the brackets
\wA word character [a-z_A-Z0-9]
\WNot a word character [^a-z_A-Z0-9]
\dA digit [0-9]
\DNot a digit [^0-9]
\sWhitespace [ \t\r\n\f\v]
\SNot whitespace [^ \t\r\n\f\v]

The following metacharacters are used to logically group subexpressions or to specify context for a position in the match. These metacharacters do not match any characters in the string:

MetacharacterMeaning
( )Group subexpression
|Match subexpression before or after the |
^Match expression at the start of string
$Match expression at the end of string
\<Match expression at the start of a word
\>Match expression at the end of a word

The following metacharacters specify the number of times the previous metacharacter or grouped subexpression may be matched:

MetacharacterMeaning
*Match zero or more occurrences
+Match one or more occurrences
?Match zero or one occurrence
{n,m}Match between n and m occurrences

Characters that are not special metacharacters are all treated literally in a match. To match a character that is a special metacharacter, escape that character with a '\'. For example '.' matches any character, so to match a '.' specifically, use '\.' in your pattern. Examples:

Metadata file location

The file containing the metadata must be a comma-delimited file (CSV). You can create or edit such a file using a spreadsheet program such as Microsoft Excel.

The CSV file needs to conform to the following format:

The file must be saved as plain text, i.e., without hidden file encoding information. If using Excel on a Mac to edit the file, choose to save the file as "Windows CSV" or "Windows Comma Separated".

Match file and image metadata

Match columns in your .csv file to image metadata items. If you are using a CSV in conjunction with the filename/path metadata matching, you might want to capture the metadata in common with both sources. For example, you might be extracting the well tag from the image filename while your CSV contains treatment dosage information paired with each well. Therefore, you would want to let CellProfiler know that the well tag extracted from the image filename and the well tag noted in the CSV are in fact the one and the same.

This setting controls how rows in your CSV file are matched to different images. Set the drop-downs to pair the metadata tags of the images and the CSV, such that each row contains the corresponding tags. This can be done for as many metadata correspondences as you may have for each source; press to add more rows.

Use case insensitive matching?

This setting controls whether row matching takes the metadata case into account when matching. If you note that your CSV metadata is not being applied, your choice on this setting may be the culprit.

Select No so that metadata entries that only differ by case (for instance, "A01" and "a01") will not match.

Select Yes to match metadata entries that only differ by case.