CellProfiler Developers' Wiki
From Cellprofiler Developer Wiki
|
News
version 10211 available (2010-7-16).
CP 2.0 Beta release version 9978 available (2010-5-20).
CP 2.0 Beta release version 9777 available (2010-4-12).
CP 2.0 Beta release version 9717 available (2010-4-02).
CP 2.0 Alpha release version 9336 available (2010-2-22).
Introduction
This document describes the code standards for CellProfiler, and gives tips for CellProfiler developers.
Our goals for CellProfiler are:
- Backwards compatibility with MATLAB: The Python version will have modules with the same functionality as the current CellProfiler. It will be able to load MATLAB pipeline files and will be able to output the same MATLAB measurement files, database files and Excel files as the current CellProfiler. We are discussing how to support users’ custom MATLAB modules and would appreciate your input.
- Improved user interface: We're taking advantage of Python to improve CellProfiler's user interface. We've changed the way module settings are displayed: only the relevant ones appear and, if the settings don't make sense, there are visual cues to tell you what needs to be corrected. There's an experimental mode that lets you test out different settings and more.
- Speed improvements: The mathematical libraries in Python (Numpy, Scipy) and compiler tools associated with Python result in faster modules. We've seen 5x speed improvements in some algorithms.
- New algorithms: We are developing new thresholding methods and improving on some of the algorithms in our modules. Python plays a big part in this because of its strengths in array processing and its support for high-performance extensions.
- Interfaces to other imaging packages: We will be creating bridges between CellProfiler and other open-source imaging tools; more on this later.
- Improved reliability: We're using Python to give CellProfiler a richer and more flexible internal structure. As part of this, we're developing a test suite (currently close to 700 functionality tests), so that we can continuously validate CellProfiler's functionality.
- Public development: CellProfiler is open-source and our SVN repository is available at https://svnrepos.broadinstitute.org/CellProfiler/trunk/CellProfiler. You can browse the code at http://crucible.broadinstitute.org/browse/CellProfiler/trunk.
Mailing Lists
You can join developer mailing lists for CellProfiler and CellProfiler Analyst by sending mail with "Subscribe" in the Subject field to:
- cellprofiler-dev-request@broadinstitute.org
- cpa-dev-request@broadinstitute.org
These are not high-traffic lists and are intended for discussions or questions ranging from code details to the next horizons for the projects.
Installation
First, you will need to install Python on your machine. The version of Python you need to install depends on your OS as follows:
- 32-bit Windows, Mac OS X: Python 2.5.4; this is the final bugfix version of Python 2.5 with Windows and OS X binaries provided. Specifics for each OS are below.
- x64 Windows: Python 2.6; download the latest 2.6 version from this page
Then download and install the following packages. Note: Install the package for your version of Python. Some of these have installers for both Windows and Macs
- MatplotLib: A Python plotting library for Numpy.
- WxPython: A wrapper for the cross-platform GUI API wxWidgets (which is written in C++) for Python
- Java Development Kit (JDK): Used for the Bioformats Java library, which enables reading/writing of various image formats. Download the Standard Edition (SE).
For the remaining packages, refer to the instructions for your OS.
For Windows
- It is helpful to add the folder in which you installed Python to your system variables as PYTHONPATH; also, adding this folder to your Path variable allows you to run Python from the command prompt. See here for additional details.
- To obtain the source code from our SVN repository, we recommend using Tortoise SVN
- Download and install the Tortoise software, then right-click on a drive or folder to bring up the context menu. From there, select "SVN checkout..." and specify the URL of our repository and the destination on your computer.
For 32-bit machines
Most of these extensions have .exe installers for Windows, but you may need to look around the website a bit for the appropriate one
- Install MinGW: A free port of the GNU Compiler Collection (GCC) for building native Windows applications.
- When installing, select the Base tools, g++ compiler, the g77 compiler and Make.
- After installing MinGW and Python, place them both on your system environment path (i.e., using the Path system variable). If you chose the default install directories, this should probably be c:\Python25 and c:\MinGW\bin.
- Obtain the following packages
- For those files located on SourceForge, the package linked by the download button is usually for Python 2.6. If the package has py26 in the filename, this is not the correct version. Look at the full file list for the Python 2.5 package; usually they will have py25 in the filename.
- Most of the packages for x32 and x64 machines can also be found here
- Cython: Generates C extensions to Python. Available here.
- After installing Cython, use any text editor to create a new file containing the following text:
- [build]
- compiler=mingw32
- Save this file as distutils.cfg in your Python installation directory under the /Lib/distutils/ folder.
- After installing Cython, use any text editor to create a new file containing the following text:
- MySQLdb: A Python interface to MySQL
- At SourceForge, get the package under the mysql-python files, not the mysql-python-test files
- NumPy: A Python extension which provides convenient and fast N-dimensional array manipulation.
- SciPy: A library of algorithms and mathematical tools for Python; depends on Numpy.
- Python Imaging Library (PIL): Adds support for opening, manipulating, and saving many different image file formats in Python.
For 64-bit machines
The same supporting packages as for the 32-bit machines are also available for x64 machines. However, these have been built for Python 2.6. Therefore, as mentioned above, you will need to install Python 2.6 on your machine.
- Obtain and install the Microsoft Windows SDK
- Once installed, set the system environment variables DISTUTILS_USE_SDK and MSSdk to 1 (see here for details). If they are not present, create them.
- Obtain the following packages from this site
- At this point, run python CellProfiler.py from the Microsoft Windows x64 DEBUG build environment command line
- You can do this by selecting Windows SDK menu item from your Start Menu and selecting CMD Shell. Afterwards, cd to the folder you checked out the the CellProfiler source code
- Create the text file wx.pth in <python_folder>\Lib\site-packages\ which has the single line
wx-2.8-msw-unicode
Post-Install
After installation of the Python and the associated packages, double-clicking CellProfiler.py will start the GUI.
Creating a standalone executable for the PC
You must have InnoSetup and py2exe installed on your computer.
To make an installer, cd to the CellProfiler directory and run
python windows_setup.py py2exe msi
This will generate a file named CellProfilerSetup.exe in the output subdirectory. Double-clicking this file will launch an installer which will have options for installation location, whether to create desktop icons, Start Menu folders, etc.
For Mac
See Building CellProfiler2.0.app on a Mac
Converting from MATLAB to Python
Running MATLAB pipelines on CellProfiler 2.0
CellProfiler 2.0 pipelines are backwards-compatible with CellProfiler 1.0 pipelines that have been saved with the latest released version of CellProfiler 1.0. Older pipelines should be loaded using the latest version of CellProfiler 1.0 and saved before loading in CellProfiler 2.0.
CellProfiler 2.0 pipelines cannot be opened using CellProfiler 1.0
Using custom MATLAB CellProfiler modules with CellProfiler 2.0
Modules written in MATLAB for CellProfiler 1.0 can be run in the Python-based CellProfiler 2.0. You may want to convert them however:
- CellProfiler 2.0 uses a bridge between Python and MATLAB .
- Data is converted to MATLAB format, transmitted across the bridge, then copied and converted back to Python after the MATLAB module has run. This is slow.
- In other words, all images, labels and measurements are copied twice per module per image set
- MATLAB modules will display MATLAB figures which are inaccessible to the Python GUI
- At some point, we will abandon MATLAB compatibility. However, the MATLAB version of CellProfiler will always be available, and we will most likely continue to fix bugs in it. We will probably stop compiling binaries for new versions of MATLAB or new operating systems. If you are deeply invested in the MATLAB version, you may want to continue to use CP 1.0.
Writing Python Code in CellProfiler 2.0
Style Guide
Module Help
The Module Help window (which also constitutes a section of the CellProfiler manual) is populated by the documentation string of the module's code, and contains the following:
- The module's name: This is printed as a title automatically and is not specified within the code for the module.
- <b> Module name </b> phrase description
- Example: Identify Prim Automatic identifies objects via thresholding and contouring
- The section begins with three single quotes. While the module's official name has no spaces, use spaces and sentence case in this case to make it comprehensible. The phrase description is something like: "identifies primary objects..." or "places outlines of objects..."
- <hr>: Creates a horizontal line
- Full description/explanation of the module's function. Use headings, usually level <h2>, when needed
- When providing an overview of major capabilities of the module, options for a given setting should be placed in an unordered list, although this should be rare because usually it would be redundant with the help for the individual setting. Each list item should start with the option name in italics, followed by a colon, then the option description. For example:
- To use this module, the following options are available:
- <ul>
- <li><i>Option 1:</i> Blah, blah...</li>
- <li><i>Option 2:</i> Blah, blah...</li>
- </ul>
- When providing an overview of major capabilities of the module, options for a given setting should be placed in an unordered list, although this should be rare because usually it would be redundant with the help for the individual setting. Each list item should start with the option name in italics, followed by a colon, then the option description. For example:
- Features that can be measured by this module: Provide as an unordered list, like this
- <ul>
- <li><i>First measurement:</i> Description of first measurement</li>
- <li><i>Second measurement:</i> Descrption of second measurement </li>
- </ul>
- This creates a bulleted list of measurements that the module can make. We say "can" because depending on the user's settings, not all measurements will be made by a module during a run. We will eventually make modules self-aware so that the list of possible measurements that can be made will be generated by the code of the module itself.
- See also: <b>OtherModuleName</b>: Other relevant/related modules are listed here
- (end quotes): The general module help section ends with three single quotes
- License information follows and is commented out using #
- Help for each available setting for the module: this text is automatically extracted from the documentation strings for each setting and hence does not need to be written out here.
Buttons
Buttons which add or remove items should follow the form "Add another <item>" if adding and Remove this <item> if removing, where <item> is Object, Image, Measurement, etc. There should be no text to the left of the button.
Buttons should have a minimum width of 30 pixels to look good on the Mac.
General stylistic notes
- Other module names that are mentioned in the help should be <b> bold </b>. We will add hyperlinking function to go the other module's help eventually.
- References/citations that are mentioned should be <i> italicized </i>
Settings
In general, settings descriptions (which are displayed to the user in the GUI) should be:
- Concise. For example, Enter the threshold scaling factor should be Threshold scaling factor
- Punctuation-free, except for question marks. E.g., no colon at the end of the setting descriptions
- No personal pronouns (e.g., "you") in the settings description, although they are permissible in the Help
- Rules of thumb:
- If the setting is a checkbox, the setting description is most likely a question.
- If the setting is an edit box, the setting description is most likely a phrase with no helper text (e.g., "Radius, in pixels"); using the phrase "Enter the..." only if needed for clarity.
- If the setting is a plain menu, the setting description most likely begins with Select (e.g., "Select the thresholding method") especially if it's unclear whether the question is asking about a property of your existing images/situation (in which case, you're less likely to use "Select") or whether it's giving you an option to choose among options that are equally valid (in which case, you're more likely to use "Select").
- If the setting is a menu + an edit box, the setting description most likely begins with Select.
Some common standardized settings descriptions:
- Select the input image, or for measure modules, Select an image to measure or Select objects to measure
- Name the output image (the Help for this setting would be more wordy, beginning like: "What do you want to call the image with text markings that is displayed by this module?")
- Name the identified objects
- Add another image
- File output location
- Retain the <DESCRIPTION> image for use later in the pipeline (for example, in SaveImages)?
- This is followed by a contingent setting: Name the <DESCRIPTION> image
For cases where objects are identified and outlines can be retained, the following text goes in the main help:
- Special note on saving images: Using the settings in this module, object outlines can be passed along to the module OverlayOutlines and then saved with the SaveImages module. The identified objects themselves can be passed along to the object processing module ConvertToImage and then saved with the SaveImages module.
In the main help, the last annotation prior to the setting help is if there are related modules that the user should be referred to. If so, list them as "See also <module_name1>,<module_name2>, ..." with the module names in bold.
If the appearance of a setting is contingent on another setting, include text in the help stating what the setting depends upon. For example:
- In LoadImages, if the "Extract metadata from the file name?" setting is checked, then an additional setting appears so the user can specify the metadata.
- The Help for this additional setting should start with: "(Used only if metadata is extracted from the file name)" followed by a line break, followed by the rest of the docstring
- In HTML, it would be <i>(Used only if metadata is extracted from the file name)</i><br>
The text for each setting's help is grabbed from the module's code, where it looks like this:
def create_settings(...):
self.foo = cps.Bar("Quox", True, doc =
Setting help is here bla bla,
bla bla bla
bla bla)
Note that the text for the setting is indented to line up at least after the period within self.foo.
The Help uses the settings definition (which returns the settings in the order that they are saved/loaded) to order the help items, and the visible_settings definition to control the display order in the main GUI. If the Help items needs to be ordered in a different way than the settings, use the help_settings definition to describe the order.
Module Structure and Data Storage/Retrieval
We have provided a module template which illustrates the basic structure that is described below: imagetemplate.py
CellProfiler splits its internal data into two broad categories:
- Pipeline data: Data that does not change from run to run is pipeline data. The modules in your pipeline and their settings are pipeline data.
- Workspace data: Workspace data is data that is saved during the course of your run, such as images, objects and measurements.
CellProfiler's modules store their configuration in the form of settings; these generally are attributes of the module. You can access them from your code using self.<my_setting>. CellProfiler's modules never store workspace data in module attributes - they use the workspace or image set list that is passed to their methods to store workspace data.
pipeline
CellProfiler uses the pipeline to save the list of modules to run. You might need the pipeline to look at modules that run before your module, for instance, to see what measurements those modules produce. Very few modules need to look into the pipeline; none modify the pipeline. The pipeline is also used to load and save pipeline files to and from disk and to run your pipeline file.
Pipeline utility functions
The following is a list of pipeline functions that can be called from test code or from your module:
- File handling
- load: Loads a pipeline file into the pipeline
- save: Saves the pipeline to a file
- save_measurements: Saves measurements to a .mat file (and embeds the pipeline in the measurement file)
- Data access
- modules: Returns a list of the modules in your pipeline
- get_measurement_columns: Returns a list of the measurements your pipeline makes
- test_valid: Throws an exception if any pipeline modules are in invalid states (for instance, if the user entered an invalid file name in the LoadData module).
Pipeline modification functions
The pipeline is the center of CellProfiler's UI. Some parts of the UI plug into the pipeline to find out about changes and events. Others make changes to the pipeline. The pipeline itself keeps track of changes in a way that lets you undo the changes.
- add_module: Adds a module to the pipeline. The module has a one-based index: module_num. The pipeline uses this index to determine where to insert a module, so the caller should set the module's module_num to the insert point before calling
- remove_module: Removes a module according to its module_num. For instance, pipeline.remove_module(1) removes the first module in your pipeline.
- clear: Removes all modules.
- edit_module: Registers a change in the module's settings (the actual edit is done by the caller and edit_module is called afterwards).
- move_module: Moves a module up or down one position.
You can undo edits to the pipeline. You can also group a number of edits together to form a single undo action that will undo the whole group of edits (for instance, a user might delete several modules with one keystroke - the user interface groups all of the remove_module edits into one undo action).
- undo: Undoes the last undo action
- start_undoable_action: Marks the start of a group of edits to be bundled into a single action
- stop_undoable_action: Marks the end of the action
- undo_action: Returns an informative string that describes the current action that can be undone
- has_undo: True if the pipeline has something to undo
You can receive notifications when something happens to your pipeline. To do this, write a callback function like this:
def callback(pipeline, event):
....
Register as an event listener with the pipeline:
pipeline.add_listener(callback)
At this point, the callback function will receive notifications when the pipeline is edited and when exceptions happen when the pipeline is run.
Running a pipeline
CellProfiler runs the pipeline by calling methods on the pipeline. A run is a nested hierarchy. On the outside is the run itself. The run is composed of groups, and each group is composed of image sets. Each image set is a collection of images that is passed from module to module as the pipeline is run. Groups are collections of the image sets whose metadata matches the grouping criteria (for instance, the image sets with the same plate metadata value).
The following are the methods that are called in the course of running a pipeline; their calling order corresponds to the hierarchy order:
- run: Runs the pipeline from start to finish or over a subset of image sets or groups
- run_with_yield: A version of run that yields after running each module (to be UI-friendly)
- prepare_run: Build the run's image set list.
- get_groupings: Find the groups and grouping criteria from within the image set list
- prepare_group: Build the image sets from the grouping criteria. Modules that aggregate over groups (for instance, CorrectIlluminationCalculate) can initialize a group during this stage. The pipeline calls each module's run and display methods, passing a workspace that holds the data for an image set.
- post_group: Perform any necessary post-group processing. Modules that aggregate can write their results during this stage. For instance, SaveImages writes out aggregate images during post_group.
- post_run: Tell each module to do its post-run processing (for instance, ExportToSpreadsheet writes its spreadsheets in post_run)
cpmodule
cellprofiler.cpmodule.CPModule is the Python class that represents a CellProfiler module. Modules have settings. These settings hold the parameters that control the module's operation. They are loaded and saved in the pipeline file and are displayed in the UI.
A module has methods that do things as the pipeline is run. The module also has methods that tell pipeline users about what the module does. If you write a module, you'll have to fill in some or all of these methods in order to get your module to communicate with other modules and the user interface.
Module settings
A module tells CellProfiler about its settings using the following methods:
- create_settings: CellProfiler calls create_settings when it makes a new instance of a module. This is the place to create the settings that you want a user to see in the new module. Look at cellprofiler.modules.colortogray for a simple example of how to create settings.
- settings: CellProfiler calls settings whenever it wants to find out which settings should be saved to the pipeline. You should return a list of the settings that you want saved.
- visible_settings: CellProfiler calls visible_settings whenever it wants to find out which settings to display in its user interface. You can omit visible_settings - if you do, CellProfiler will use settings instead. You can implement visible_settings if you want to show different settings in different circumstances. For instance, cellprofiler.modules.colortogray can either combine the color channels to form one grayscale image or it can split each channel into a separate image. ColorToGray uses visible_settings to either show the split options or the combine options; users only see the settings they have to set.
- validate_module: CellProfiler calls validate_module whenever it wants to determine whether the module's settings are valid. You might have some combinations of settings that won't work in your pipeline. You can implement validate_module and raise the ValidationError exception if the settings are wrong. See the LoadData module for an example.
Modules can handle some complex setting situations as a pipeline is loaded. A module can make itself compatible with older versions of itself by implementing upgrade_settings (see the EnhanceEdges module for an example). A module can adjust its settings in order to handle different numbers of inputs and outputs using prepare_settings (see the Morph module for an example). A module can call help_settings when the help text for the module needs to be ordered for display differently than that given by settings.
Modules and measurements
Modules can make measurements when they process image sets and groups. Other modules want to know about these measurements; for instance, ExportToDatabase wants to make tables based on these measurements before the run starts. Every module that makes measurements implements the following methods:
- get_measurement_columns: This returns a list that has the measurement's name, the object being measured and the data type of the measurement.
- get_category: Given an object, this returns the measurement categories that the module makes on that object. For instance, IdentifyPrimaryObjects makes Location_CenterX measurements and these are in the "Location" category.
- get_measurement: Given an object and a category, this method returns the feature name. For example, the feature name of the category Location_CenterX is 'CenterX'.
A module can break its measurement names into category and measurement. It can specialize them further by implementing get_measurement_objects, get_measurement_images and/or get_measurement_scales. For example, the MeasureTexture module allows you can measure an object's texture at different scales and can measure the texture of an object in different images. MeasureTexture's measurements show up in an easy-to-navigate hierarchy in the ClassifyObjects module because it implements get_measurement_images and get_measurement_scales.
Modules and pipeline execution
You write a module in order to have it do something when it is executed during the course of a pipeline run. Each module implements some of the pipeline execution methods; CellProfiler calls these as it executes. CellProfiler calls pipeline execution methods with an image set list (prepare_run) or with a workspace (all other methods). Your module can get the results of prior modules from the image set list or workspace and can add its results to the image set list or workspace.
The following methods are called during the course of a run:
- prepare_run: Called at the start of a run to find out the image sets in the run
- get_groupings: Called after prepare_run to find out how image sets are grouped
- prepare_group: Called before running each group
- run: Called once per image set. This is where most modules do their work.
- display: Called if your module produces a display
- post_group: Called just after running each group
- post_run: Called at the conclusion of a run
Display during execution
CellProfiler uses the matplotlib library for display. Matplotlib has plotting graphics patterned after those in the Matlab language. You have three display choices in your module:
- No display: Your module doesn't have a display window and produces no feedback.
- Interactive display: Your module has an interactive user interface which lets the user provide input during the course of a run. IdentifyObjectsManually is an example of this sort of module.
- Informational display: Your module displays information or images after analysis.
Informational displays have some advantages. The user interface is live during calculation because the "run" method is executed in a worker thread while the user interface runs in the main thread. Informational displays are slightly harder to code, but can be accomplished with a little planning. First, you have to implement the is_interactive method as follows:
class MyModule(cellprofiler.modules.CPModule):
...
def is_interactive(self):
return False
This tells CellProfiler that your module's run method can be executed in the worker thread. Next, you can optionally store intermediate results in the workspace during run. Many modules (for example, MeasureImageQuality) collect a table of results for display. These are stored in the workspace's display_data. Here's an example:
...
def run(self, workspace):
# Initialize the statistics with a header
workspace.display_data.statistics = [("Measurement", "Value")]
...
workspace.display_data.labels = labels
for measurement_name, value in ...:
workspace.display_data.statistics.append((measurement_name, value))
You implement the display separately. CellProfiler makes it easy to create a montage of images, tables and other displays in a window that's reserved for your module (see the documentation for CPFigure for a full list). CellProfiler lays out your display in a grid where each grid cell is a subplot. You choose which grid cell displays which plot using x and y coordinates. Here's an example with an image and a table laid out side-by-side:
...
def display(self, workspace):
f = workspace.create_or_find_figure(subplots = (1,2))
f.subplot_imshow_labels(0, 0, workspace.display_data.labels)
f.subplot_table(0, 1, workspace.display_data.statistics)
settings
Settings store your module's parameters in the pipeline file. They tell the user interface how the parameters should be displayed and edited. They give names to your pipeline's data and link one module's inputs to another module's outputs. They complain when a user enters invalid data.
You can see a full list of settings in settings.py. All settings have text that is displayed to the left of the setting and HTML documentation that appears in a window when the user presses the help button to the right of the setting. Almost all settings have a value which holds the parameter that's stored in the pipeline and made available inside your module.
Settings are initialized inside your module's create_settings method. Here's a simple example:
import cellprofiler.settings as cps
....
class MyModule(CPModule):
def create_settings(self):
self.my_parameter = cps.Text(
"Enter a value:", "Default",
doc = """This setting controls the ....""")
def run(self, workspace):
my_value = self.my_parameter.value
The first parameter in cps.Text is the prompt text. The second is the initial value for the setting. "doc" is the documentation for the setting.
Setting Types
CellProfiler has settings that handle a number of different kinds of inputs such as numeric values, ranges, choices and yes/no questions. It also has some settings that hook into the CellProfiler data: the measurements, images and objects. Finally, it has settings whose only purpose is display or UI interaction. Here's a list:
Simple Settings
Simple settings are designed to capture values similar to Python's built-in Boolean, string and numeric types. You can use these in comparisons in your code as if they were built-ins, but you have to use the .value attribute during numeric operations. Here's an example:
...
def create_settings(self):
self.wants_automatic_threshold = cps.Binary(
"Calculate threshold automatically?", True)
self.manual_threshold = cps.Float("Enter threshold:", .5)
...
def run(self, workspace):
if self.wants_automatic_threshold:
t = self.calculate_threshold(workspace)
adjustment = self.calculate_adjustment(workspace)
else:
t = self.manual_threshold
adjustment = self.manual_threshold.value * 1.5
mask = image_pixels > t
adjusted_image = image_pixels * adjustment
The different types of simple settings are:
- Text: Value is a string, displayed in an edit box.
- Integer: The Integer's value is an integer, displayed in an edit box. You can set the minimum and maximum acceptable values using the minval and maxval keywords.
- Float: Value is a decimal or floating point number. Otherwise, it's similar to the Integer setting.
- IntegerRange, FloatRange: These settings hold a pair of integers or floats that represent the lower and upper bounds of a range.
- Binary: Represents a yes/no or True / False value, displayed as a checkbox.
- Choice: Represents one of a set of possible choices, displayed as a drop-down choice box.
Buttons and Dividers
You might want to do something if the user presses a button, e.g., add another image or read a file.
- DoSomething: Displays a button that does something when the user presses it. Below is example code that displays a message box when the button is pressed:
...
def create_settings(self):
self.my_value = cps.Text("Enter something", "Default")
self.hello_world = cps.DoSomething(
"Press me, please", self.hello_world_pressed)
...
def settings(self):
# Leave out hello_world... there's no value to save or load
return [self.my_value]
...
def visible_settings(self):
# Include it here though because we want it to be displayed
return [self.my_value, self.hello_world]
...
def hello_world_pressed(self):
import wx
wx.MessageBox("You entered " + self.my_value.value, "Hello, world")
The second argument to DoSomething is the function to run when the button is pressed.
- Divider: Displays a vertical line between the previous and following setting.
Additional Settings
CellProfiler has settings that give users a sophisticated and useful interface intended for specific situations. These include file choosers, color map choosers and regular expression editors:
- FilenameText: Displays the file name dialog when the user presses the browse button next to the edit box. At its simplest, the button just stores the file name in the edit box, but you can get it to save the path and do other, more complicated things. Look at LoadData's create_settings method for a pretty complex example of how you can get the FilenameText setting to interact with other settings in your module.
- The FilenameText constructor has some optional parameters that can help you:
- get_directory_fn: Supply a function here that returns the initial directory for the file dialog
- set_directory_fn: FilenameText calls this function after running the file dialog. It passes the function the directory that contains the filename that the user picked.
- browse_msg: A message that's displayed in the caption of the file dialog.
- exts: A list of acceptable extensions, for example, [("Text files (*.txt)","*.txt"),("All files (*.*)","*.*")]
- RegexpText: Displays a regular expression editor when the user presses the browse button. The editor tells the user if the regular expression is valid and it displays the fields that would be captured by groupings if the regexp was applied to the example text.
- You can supply the example text if you pass RegexpText an optional get_example_fn parameter. This should be a function that returns the example text when called. LoadImages' create_settings method has an example of this.
- Colormap: Lets the user pick one of Matplotlib's color maps. The ConvertObjectsToImage module is an example of how the Colormap setting might be used.
Settings that interact with your pipeline
Modules get their data from images, objects and/or measurements. CellProfiler has settings that supply names for images and objects and that let users choose from the names supplied by prior modules.
- Provider settings:These supply a name for an image or object that the module creates. The most widely used providers are the ImageNameProvider and ObjectNameProvider which provide names for the images and objects created by a module. These names will appear in the choice boxes of the appropriate subscribers. Here's an example of a module that supplies an image of all zeros to subsequent modules:
import numpy as np
import cellprofiler.cpimage as cpi
import cellprofiler.settings as cps
class ZeroImage(CPModule):
...
def create_settings(self):
self.image_name = cps.ImageNameProvider("Image name:", "Zeros")
...
def run(self, workspace):
workspace.image_set.add(self.image_name.value, np.zeros((100,100)))
- Subscriber settings: These look for matching provider settings in prior modules. The most widely used subscribers are the ImageNameSubscriber and ObjectNameSubscriber which subscribe to names of images and objects created by a prior module. Here's an example of how to retrieve an image inside your module:
import cellprofiler.settings as cps
class FindImageMaximum(CPModule):
...
def create_settings(self):
self.image_name = cps.ImageNameSubscriber("Image name:")
...
def run(self, workspace):
my_image = workspace.image_set.get_image(self.image_name.value)
my_pixels = my_image.pixel_data
maximum = my_pixels.max()
...
- Measurement setting: Supplies a measurement name to your module. You might want to feed the measurement results of one module into another; CalculateMath is an obvious example of a module that does this, but you might have more subtle reasons for using measurements such as adjusting a calculation based on some arbitrary measurement made on an object.
- Measurements are made on images and objects. You have to supply an object name or the keyword, "Image", to the Measurement setting to tell it where to look for measurements. This is done using the object_fn argument to Measurements. Here's a comprehensive example:
import cellprofiler.settings as cps
import cellprofiler.measurements as cpmeas
...
class UseMeasurement(CPModule):
...
def create_settings(self):
self.wants_image_measurement = cps.Binary(
"Use an image measurement?", True)
# We need an object if the user doesn't want an image measurement
self.object_name = cps.ObjectNameSubscriber(
"Object name:")
#
# Python lets you define functions wherever you want
# and they get to use the stuff that's lying around
# inside your function. This lets you write your code
# right next to where it's used. That makes reading the
# code simpler.
#
def object_fn():
if self.wants_image_measurement:
return cpmeas.IMAGE
else:
return self.object_name.value
self.measurement = cps.Measurement(
"Measurement:", object_fn)
...
def run(self, workspace):
object_name = (cpmeas.IMAGE if self.wants_image_measurement
else self.object_name.value)
value = workspace.measurements.get_current_measurement(
object_name, self.measurement.value)
...
workspace
The Workspace is a central location for the data needed to process an image set.
- A module's run method gets called with the workspace for the current image set.
- A module's post_group method gets called with the workspace for the last image set in the group.
- A module's post_run method gets called with the workspace for the last image set in the run.
The workspace has properties that provide access to the image set and the image set's object set. It also has properties that provide access to the run's measurements, the image set list and the pipeline:
- workspace.image_set: Holds the images for the image set. Use workspace.image_set.get_image(...) to get an image by name. Use workspace.image_set.add(...) to add a new image to the image set.
- workspace.object_set: Holds the objects for the image set. Use workspace.object_set.get_objects(...) to get objects by name. Use workspace.object_set.add(...) to add new objects to the image set's object set.
- workspace.measurements: Holds the run's measurements
- workspace.image_set_list: Holds the list of image sets
- workspace.pipeline: Holds the pipeline being run
Finally, the workspace controls a module's figure window.
objects
Segmentation (and other operations) group collections of pixel positions into objects. CellProfiler represents this data as a two-dimensional array of integer values. The special value, "0", marks a position as being outside of any object; all pixels with the same value, other than zero, are part of the same object. This value is used in CellProfiler to represent the object; for instance, the value is used as the one-based index into the object's measurements and the value is used to identify the object as being the parent of some other object. We refer to the array as the "labels matrix" in the code.
Each image set's workspace has an object set. This object set is a dictionary that links the name for an object to its representation. You can find the classes for ObjectSet and Objects here. You can get objects by name like this:
my_objects = workspace.object_set.get_objects("my_objects")
or you can put new ones that you make into the workspace like this:
workspace.object_set.add_objects(my_objects, "my_objects")
You might want to look at how objects are added in a fairly simple module like FilterObjects.
Inside cellprofiler.objects
You get an instance of cellprofiler.objects.Objects when you call get_objects, not the labels matrix. Most often, you'll only want the labels matrix which is accessible through the "segmented" property. For instance, you might set all pixels in an image that are outside of a labeled object like this:
my_objects = workspace.object_set.get_objects("my_objects")
pixels[my_objects.segmented == 0] = 0
"segmented" represents the final segmentation of your image; typically, there are parts of your image that should be ignored based on the segmentation, for instance, objects partially outside of the field of view. Your segmented objects and the objects to be ignored are accessible through the "unedited_segmented" property which is a labels matrix with both unfiltered and filtered objects labeled. CellProfiler uses this to allow the unfiltered objects to compete for pixels in a secondary segmentation with the filtered objects; otherwise, the unfiltered objects would extend to cover the space taken up by the filtered objects.
There is a third labels matrix, "small_removed_segmented". This is a labels matrix that has both the unfiltered objects and all filtered objects except for those that were filtered because they were too small. This labels matrix is useful for analyzing secondary images if the small objects are unfortunate artifacts of segmentation.
You can access the image that was used during segmentation through the "parent_image" property. This image may have secondary properties that are useful during downstream analysis. The image mask and cropping can be used to exclude areas in other channels from consideration during measurement.
cpimage
cpimage holds the three classes that define how images are handled in CellProfiler: Image, ImageSet and ImageSetList. These correspond to three levels of image hierarchy: a single image, a set of images that are processed together by one iteration of the pipeline and the list of all image sets to be run by the pipeline. A module builds the image set list during prepare_run, inserts images into image sets during prepare_run and run and processes the images during run.
Image
The main purpose of Image is to hold onto one image array. This array can either be a two-dimensional matrix representing a single channel of detection data or a three-dimensional multichannel or color image where the third dimension indexes the channel or color. The array can be composed of boolean, integer or floating point values. Images loaded from disk are normalized so that the detector's minimum value is a floating point zero and the maximum value is a floating point one. You can retrieve the array through the image's pixel_data attribute. You should treat the pixel_data as read-only; copy the pixel data before modifying it.
Image holds additional attributes for the image:
- mask: the mask is a boolean two-dimensional array that defines a region of interest for the image. CellProfiler imaging algorithms use the mask to determine which values (mask == True) are to be process and which values (mask == False) are to be ignored. The has_mask property will be true if the image has a mask; if not, the region of interest is the whole image and the mask will be entirely True.
- crop_mask: an image might have been cropped by having its edges trimmed away. CellProfiler saves a crop_mask that describes how this was done: the mask is True in the areas that were not cropped and False in the areas that were cropped. An algorithm might take two images and one of these images might be cropped and another might be the full, uncropped size. The algorithm can crop the full size image using the cropped image's cropping mask; you do this by calling crop_image_similarly to trim the full size image.
- masking_objects: CellProfiler can define an image's region of interest as the part of the image within some set of objects. CellProfiler will save the objects with the image when it masks - masking_objects is an instance of cellprofiler.objects.Objects in this case and the image's has_masking_objects property is True. You can get the labels for the objects directly through the image's labels property; labels will be '1' within the image's region of interest and '0' outside of it if the image was not masked with objects.
- parent_image: derived images (for example, a smoothed, cropped or masked image) will have a parent_image. This is the original image that CellProfiler processed to come up with the derived image. Images loaded from disk will not have a parent image (parent_image is None, has_parent_image == False). The image's file_name and path_name are the file and path of the image's primordial ancestor: the original parent loaded from disk.
Here's a typical example of how an image might be used to generate a derived image:
import numpy as np
import cellprofiler.cpimage as cpi
pixel_data = img.pixel_data # The image intensity data (assumed to be 2d)
mask = img.mask # The image's ROI, if any
std_img = np.zeros(pixel_data.shape) # Set don't-care pixels to 0
if np.any(mask): # Make sure there is some ROI
mean = np.mean(pixel_data[mask]) # Figure out how many STD each pixel is
std = np.std(pixel_data[mask]) # from the mean. [mask] only considers
std_img[mask] = (pixel_data[mask] - mean) / std # pixels in the ROI
std_image = cpi.Image(pixel_data, parent_image = img) # The parent's mask becomes the new image's mask
ImageSet
CellProfiler keeps your images in an ImageSet as it executes your pipeline. You can fetch an image from the image set like this:
image_name = self.orig_image_name.value # my module's image name subscriber image = image_set.get_image(image_name) # I get the image from the image set
Some algorithms need grayscale, binary or color images; for instance, a morphological skeletonization algorithm operates on binary images. You can get a grayscale image by supplying the must_be_grayscale keyword - the image set will combine channels for a color image and it will convert true and false to 1 and 0 for a binary image:
grayscale_image = image_set.get_image(image_name, must_be_grayscale = True)
Similarly, you can supply the must_be_color or must_be_binary keywords. These guarantee that an image has 3 color channels or that it is a two-dimensional boolean array; ImageSet will raise an exception if the image you retrieve is not the correct type.
You can put an image into the image set like this:
image_name = self.output_image_name.value # my module's image name provider image_set.add(image_name, image) # I add an image to the image set
Each image set has a dictionary of metadata keys and values: ImageSet.keys. You can use this dictionary to find the image sets that match your metadata values. For instance, one module might load images that have "Plate" and "Well" metadata and another module might load per-plate illumination correction images that only have "Plate" metadata. You can match the illumination correction images with the corresponding regular images by matching the "Plate" metadata values (you may want to use the LoadData module instead, which lets you explicitly specify which images should be loaded for each image set).
ImageSetProvider
Behind the scenes, ImageSet uses ImageProviders. These are promises of images - for images that are loaded from disk or are calculated from aggregates, the ImageProvider doesn't actually have an image until the first time that someone asks for the image. You can use an ImageProvider to efficiently supply the same image to every image set in a group:
def prepare_group(self, pipeline, image_set_list, grouping, image_numbers):
...
for image_number in image_numbers:
image_set_index = image_number - 1 # a legacy of CellProfiler's Matlab roots is 1-based indexed image_numbers
image_set = image_set_list.get_image_set(image_set_index)
image_set.providers.append(my_provider)
Typically, you'll use the VanillaImageProvider if you use one at all. This is just a plain-vanilla image provider that holds an already-created image. You can look at loadimages.py and makeprojection.py for more complex examples.
ImageSetList
The ImageSetList holds the ImageSets for a run. The ImageSetList is saved when you create a batch file and is restored when the batch file is run. CellProfiler your module's dictionary in the image set list; this dictionary is saved and restored as well.
The image set list operates in one of two modes: by image set index or by key. The first module in your pipeline controls the image set list's mode: if it asks for an image set by image set index, then subsequent image sets are matched by index, if it asks for an image set by key, then subsequent image sets are matched by key. You can call ImageSetList.get_image_set to either create a new image set or retrieve an existing image set by either key or image set index. If you use a key, you pass in a key/value dictionary and ImageSetList will find the image set that has the same values for your keys.
Most modules make little or no use of image set lists; typically the image set list is used behind the scenes by CellProfiler and modules use the image set that's in the workspace.
measurements
CellProfiler saves measurements made on images and objects in the Measurements structure which is accessible through the workspace. Measurements organizes measurements by object name, feature and image set. Image measurements are stored using the special object name, "Image" (cellprofiler.measurements.IMAGE is the symbolic name) and there are methods for saving and retrieving image measurements and object measurements. Each measurement is named by a measurement name; the measurement name should describe the measurement, for instance, "Intensity_MeanIntensity_GFP" is the feature name for the mean intensity measurement taken on the GFP channel.
Most modules only save measurements. You can use the add_image_measurement method to record an image-wide measurement made on the current image set. For instance:
import numpy as np image_name = self.image_name.value image = workspace.image_set.get_image(image_name) standard_deviation = np.std(image.pixel_data) measurement_name = "Statistics_StandardDeviation_" + image_name workspace.measurements.add_image_measurement(measurement_name, standard_deviation)
You can use the add_measurement method to record object measurements for the current image set. You should store object measurements in a one-dimensional numpy array with one element per object. For instance:
from cellprofiler.cpmath.cpmorphology import centers_of_labels objects_name = self.objects_name.value objects = workspace.object_set.get_objects(objects_name) labels = objects.segmented i,j = np.mgrid[labels.shape[0], labels.shape[1]] i_center, j_center = centers_of_labels(labels) assert isinstance(i_center, np.array) # make sure that we're storing an array assert tuple(i_center.shape) == (np.max(labels),) # make sure it's 1-d and has measurement per label measurement_name = "Location_Center_Y" workspace.measurements.add_measurement(objects_name, measurement_name, i_center)
Some modules use measurements made by prior modules. You can get the image and object measurements for the current image set using get_current_image_measurement and get_current_measurement. For example:
objects_name = self.objects_name.value
speckles_name = self.speckles_name.value
# Get the total # of speckles in the image (a single number)
total_speckles = workspace.measurements.get_current_image_measurement("Count_"+speckles_name)
scount = "Children_" + speckles_name + "_Count"
# Get the numpy array of speckle counts for each object
speckles_per_object = workspace.measurements.get_current_measurement(objects_name, scount)
fraction_of_speckles = speckles_per_object.astype(float) / float(total_speckles)
Hierarchy of feature names
CellProfiler has a standardized nomenclature for features; this helps out in the GUI and helps organize measurements during analysis. A measurement name has several parts, some of which may not be present. By convention, the parts are separated by the underbar ("_") character.
- Category: This is the general category of the measurement; it might indicate the module or general use of the measurement. Examples are "Metadata", "Intensity" and "Location"
- Feature: This names the algorithm or method that was used to make the measurement. Examples are "MeanIntensity" for the "Intensity" category, "Well" for the "Metadata" category and "Center_X" for the "Location" category.
- Image: easurements are often made on the intensity information from a particular image. Examples are "GFP" for the "MeanIntensity" measurement (the mean intensity of the GFP channel) or "Actin" for the "Texture_Gabor_Actin_5" measurement.
- Object: Object measurements can be made relative to other objects. For instance, the IdentifySecondaryObjects module might save the object number of a secondary object's parent (say it's nucleus) in the "Parent_Nucleus" measurement. "Nucleus" is the object part of the measurement name.
- Scale: The "Scale" part of the measurement can be used to capture the scale in pixels that was used when making a measurement. It can also be used as a catch-all for parameters used when making the measurement in order to distinguish two measurements made using the same algorithm, image and/or object. For instance, texture measurements are made by correlating pixels that are a certain distance away from each other; "3" and "5" in "Texture_Gabor_Actin_3" and "Texture_Gabor_Actin_5" designate two measurements of the Gabor transform, made on the Actin image at scales of 3 and 5.
Modules define the categories, feature names, objects, images and scales using the methods, "get_categories", "get_measurements", "get_measurement_images", "get_measurement_objects" and "get_measurement_scales". Modules also define the measurements they output using "get_measurement_columns". A module must report each measurement it makes using "get_measurement_columns". It should report each measurement it makes using the other methods in order to let the user select those measurements in the user interface. The measuretexture.py module is a good example of a module that defines these methods.
Error Handling
Coming soon.
Displaying Results
Coming soon.
Debugging Hints
Tests
CellProfiler comes with an extensive suite of unit tests. These tests can be run using Nose. You can install Nose using Python's easy_install:
sudo python -m easy_install nose
CellProfiler starts a Java VM in a separate thread and this thread stays live after the VM is started. You need to shut it down after running unit tests. You need to install our killjavabridge Nose plugin in order to do this. Start a shell, change to the trunk/CellProfiler directory from your checked-out source and type:
python killjavabridge/setup.py install
to install the plugin. There are two ways to enable the plugin. You can define an environment variable, "NOSE_WITH_KILL_VM=1", or you can run your nosetests like this (from the CellProfiler directory):
nosetests --exe -m "(?:^)test_.*" --with-kill-vm
We use Wing as our Python IDE for developing CellProfiler. You can set Wing up to test CellProfiler as follows:
- Make trunk/CellProfiler your project's directory (Project/Add Directory from the menu).
- Open your project's properties (Project/Properties from the menu)
- Select the Environment tab
- Ask for a custom Python path and add trunk/CellProfiler to that path
- Change "Use inherited environment" to "Add to inherited environment"
- Add "NOSE_WITH_KILL_VM=1" in the environment edit box
- Select the Testing tab
- Add "tests/test_*" to the test file patterns
- Select "Nose" as your default test framework
- Select the Environment tab
- Open the "Testing" tool window (Window/New Tool Window/Testing from the menu if you can't see the Testing tab). You should see the full list of unit tests. You can run and debug tests from this window.
Running CellProfiler 2.0
What are the command-line options for running CellProfiler?
For a full list of options, run CellProfiler like this:
python CellProfiler.py --help
Can I run CellProfiler 2.0 without the graphical user interface?
Yes. A pipeline can be run without the GUI ("headless") like this:
python CellProfiler.py -c -r -i ~/my_image_directory -o ~/my_output_directory -p ~/my_pipe.mat
Producing the Manual
The following procedure will create a PDF manual composed of the module and non-module help with a table of contents for each
- Make sure you've svn updated to the head version for the most recent help text.
- From the CellProfiler directory, run CellProfiler.py --html. This command will create a folder called located at CellProfiler_Manual_<cp_svn_version> containing the HTML files of svn version <cp_svn_version>. If you want a specific output directory, use CellProfiler.py --html -o <output_dir_path>.
- In Adobe Acrobat, goto File > Create PDF... > From Web Page... This step requires the professional version of Acrobat.
- Point the URL to the index.html file in the CellProfiler_Manual folder. Also, make sure that Get entire site is selected
- In Settings.., uncheck Place headers and footers on new page
- Since the images are embedded within the document, the PDF can be rather large. To reduce the file size, re-save the document in Adobe using Document > Reduce File Size and chose the latest version of Acrobat you want the PDF to be compatible with. Choosing Acrobat 4.0 and later will reduce the file to ~1/5 the original size.
