We are hosting a multi-object tracking challenge based on BDD100K, the largest open driving video dataset as part of the CVPR 2020 Workshop on Autonomous Driving. This is a large-scale tracking challenge under the most diverse driving conditions. Understanding the temporal association of objects within videos is one of the fundamental yet challenging tasks for autonomous driving. The BDD100K MOT dataset provides diverse driving scenarios with complicated occlusions and reappearing patterns, which serves as a great testbed for the reliability of the developed MOT algorithms in real scenes. We provide 2,000 fully annotated 40-second sequences under different weather conditions, time of the day, and scene types. We encourage participants from both academia and industry and the winning teams will be awarded certificates for the memorable achievement. The evaluation server is hosted on CodaLab.
|Submission Deadline||11:59 PM PST, June 12, 2020|
The tasks are based on BDD100K, the largest driving video dataset to date supporting heterogenous multi-task learning. It contains 100,000 videos representing more than 1000 hours of driving experience with more than 100 million frames. The videos comes with GPU/IMU data for trajectory information. The BDD100K dataset now provide annotations of the 10 tasks: image tagging, lane detection, drivable area segmentation, object detection, semantic segmentation, instance segmentation, multi-object detection tracking, multi-object segmentation tracking, domain adaptation and imitation learning. These diverse tasks make the study of heterogenous multi-task learning possible.
For the CVPR 2020 Workshop on Autonomous Driving, we host the multi-object detection tracking challenge on CodaLab detailed below. Challenges on the other tasks will be announced on our dataset website.
To advance the study on multiple object tracking, we introduce BDD100K MOT Dataset. We provide 1,400 video sequences for training, 200 video sequences for validation and 400 video sequences for testing. Each video sequence is about 40 seconds long with 5 FPS resulting in approximately 200 frames per video.
BDD100K MOT Dataset is not only diverse in visual scale among and within tracks, but in the temporal range of each track. Objects in the BDD100K MOT dataset also present complicated occlusion and reappearing patterns. An object may be fully occluded or move out of the frame, and then reappear later. BDD100K MOT Dataset shows real challenges of object re-identification for tracking in autonomous driving. Details about the MOT dataset can be found in the BDD100K paper. Access the BDD100K data website to download the data.
bdd100k/ ├── images/ | ├── track/ | | ├── train/ | | | ├── $VIDEO_NAME/ | | | | ├── $VIDEO_NAME-$FRAME_INDEX.jpg | | ├── val/ | | ├── test/ ├── labels-20/ | ├── box-track/ | | ├── train/ | | | ├── $VIDEO_NAME.json | | | | | | ├── val/
The frames for each video are stored in a folder in the images directory. The labels for each video are stored in a json file with the format detailed below.
Each json file contains a list of frame objects, and each frame object has the format below. The format follows the schema of BDD100K data format.
- name: string - videoName: string - index: int - labels: [ ] - id: string - category: string - attributes: - Crowd: boolean - Occluded: boolean - Truncated: boolean - box2d: - x1: float - y1: float - x2: float - y2: float
There are 11 object categories in this release:
pedestrian rider other person car bus truck train trailer other vehicle motorcycle bicycle
The submission file for each of the two phases is a json file compressed by zip. Each json file is a list of frame objects with the format detailed below. The format also follows the schema of BDD100K data format.
- name: string - labels [ ]: - id: string - category: string - box2d: - x1: float - y1: float - x2: float - y2: float
Note that objects with the same identity share id across frames in a given video, and should be unique across different videos. Our evaluation will match the category string in evaluation, so you can assign your own integer ID for the categories in your model. But we recommend to encode the 8 relevant categories in the following order so that it is easier for the research community to share the models.
pedestrian rider car truck bus train motorcycle bicycle
The evaluation server will perform evaluation for each category and aggregate the results to compute the overall metrics. Then the server will merge both the ground-truth and predicted labels into super-categories and evaluate for each super- category.
We employ Multiple Object Tracking Accuracy (MOTA) as our primary evaluation metric for ranking. All metrics are detailed below. See this paper for more details.