Details

Participants of NODE21 can form Teams. Teams can also consist of a single participant.

On the grand-challenge platform, algorithms can be created by submitting a docker image, or a github repository from which the image can be built.  Submissions to NODE21 are only accepted in form of a grand-challenge algorithm.  Full instructions on how this can be done, as well as template repositories are provided.   Once you have successfully created an algorithm it can be submitted to the corresponding track on the Submit page.  The participant models will run on a machine with NVIDIA T4 GPU.   It is required that your algorithm is linked to a public repository with a version tag and an Apache 2.0 license  or MIT license file. (Should you prefer a different permissive open-source license, please leave a message in the Forum.)

We provide baseline repositories for both detection and generation tracks to serve as example submissions to NODE21.   These are located at https://github.com/node21challenge/node21_detection_baseline and https://github.com/node21challenge/node21_generation_baseline respectively.

Submission Phases

In early 2022 we completed a 2-phase public challenge with 2 different test sets, the results of which are currently being prepared for publication.  As of April 2022 the NODE21 challenge reopens for public submissions and we invite individuals or teams to participate in the detection and generation tracks, running their algorithms against our experimental test set.  In this way NODE21 continues to record the current state of the art in this field. 

Detection

The detection track aims to assess state-of-the-art detection systems to automatically detect nodules from chest X-rays. The algorithms should read a chest X-ray, and return a list of possible bounding boxes for nodules, with a likelihood score for each bounding box.

Details of how to structure the code around your train and test methods are explained in the template repository here.    For successful evaluation and leaderboard placement the container you provide must run in "test" mode as described in the template repository.

Evaluation of submissions to the detection track

The submitted algorithms will be evaluated on the experimental test set.

Various metrics will be calculated for the evaluation of detection algorithms.  We will calculate the AUC score and the sensitivity at various average false positive rates using FROC (1/4,  1/2, 1) as described below.   

To calculate AUC, the likelihoods (probabilities) of any detected nodules for the image will be examined and the maximum of these will be chosen as an image score.  If there is no nodule prediction for an image, the image score will be set at 0.

For FROC analysis we first deal with the scenario of multiple prediction bounding boxes overlapping a reference bounding box.  Where more than one predicted bounding box overlaps a reference with IOU>0.2 only the prediction bounding box with the maximum likelihood (probability) among them is retained.  Next, we count true and false positives.  A prediction bounding box will be considered as a true positive if it overlaps with the reference standard bounding box at IOU>0.2, otherwise, it will be considered a false positive. Next, we select the average sensitivity at 3 predefined false positive rates: 1/8, 1/4, 1/2 FPs per image. For the case that FROC analysis does not include the predefined average false positive rate hence the corresponding average sensitivity cannot be known, the highest sensitivity value from FROC analysis is used. The submitted algorithms should implement their own post-processing to remove overlapping bounding box predictions (e.g. via non-max suppression); otherwise, they all might be counted as FP. 

The final metric used to rank the leaderboard will be calculated as follows:

S = Sensitivity at 0.25 FP per image
AUC = Area under the ROC
rank_metric = (0.75*AUC) + (0.25*S)

Generation

The generation track aims to assess whether state-of-the-art generation algorithms can improve the performance of the detection systems. The algorithms should take a frontal chest radiograph and a location as input and produce an image with a generated nodule at the requested location. 

More details regarding the required structure of the generation algorithms and the submission process are explained in the template repository here.

Evaluation of submissions to the generation track

The submitted generation algorithm will be run on a set of 1000 chest X-ray images that are free of nodules. The generation algorithms will be evaluated as follows:

A baseline nodule detection system, Faster R-CNN, will be trained with the resulting 1000 chest X-ray images with simulated nodules. The resulting nodule detection system will be validated as a detection system.  The detection algorithm will be run on the secret test data, and the same evaluation metrics will be calculated as used in the detection track (please see above for the evaluation of the detection track methods).

Since the evaluation will be run by grand-challenge, there is a time limit to run the container. A generation algorithm should generate a nodule image in a maximum time of 7 seconds per image.  If the algorithm takes longer than this it may not be possible to complete the evaluation in the allotted time.