Methodology

The dataset consists of all cases arising out of the anti-extradition protests in Hong Kong. Specifically, it covers the period from 9 June 2019 through mid-2024, during which these anti-ELAB-related cases are heard and decided at court. Note that only cases that were concluded are included in this dataset. We will keep updating this dataset regularly as more cases are concluded.

The entries of this dataset are originally compiled from various credible sources which include journalists and law students. Such information is usually obtained through physical attendance at court hearings. Our in-house researchers then take a random sample of the datasets and triangulate the information using cross-verification methods, such as corroborating with news reports from existing news outlets on the ground, independent reports [1], and consulting subject-matter experts, to ensure accuracy, reliability, and validity. Specifically, our sampled data is verified by at least two independent sources. Wherever available, verdicts and judgments are directly extracted from the Hong Kong Judiciary website at https://www.judiciary.hk/en/judgments_legal_reference/Jud_Ruling.html.

Note that each entry (i.e. each row) in the dataset represents a count of charge, which is defined as a defendant being charged for a particular crime, in a given case. It is noteworthy that in any case, there can be multiple defendants, and each defendant can be charged with multiple offenses, and end up with different judicial outcomes.

Given the fact that this dataset relies on non-governmental sources rather than official data (which are largely publicly unavailable), it is inevitable that it is incomplete and will suffer from some occasional and limited problems of missing data. Specifically, some case numbers are unfortunately unavailable in this dataset. To address this issue, and to facilitate easier referencing for researchers, we have assigned a unique identifier to each entry in the dataset. These identifiers ensure that every entry remains distinguishable despite the absence of certain case numbers. On the other hand, for most variables which are deemed important, such as names of judges, crimes charged, judicial outcomes as well as date and time which the incident took place – all have only less than 1% of missing data. Together with the rigorous data verification process we follow, as mentioned above, we are reasonably satisfied with the quality of the data presented here.

Interested parties are encouraged to reach out to our team at info.dap@protonmail.com should they are keen to know more, for research or journalistic purposes. This dataset is part of an open-source effort, with most data compiled by civil society rather than relying on official statistics. We actively welcome contributions from the public to enhance its breadth and accuracy.

[1] For instance, see https://www.law.georgetown.edu/law-asia/wp-content/uploads/sites/31/2023/10/GCAL-HK-2019-ARREST-DATA-REPORT-FINAL-OCT-2023.pdf.