Openfda: Question: download zip files

0

For the zip files packed at 2016/06/01, I downloaded the file named "2004q1/drug-event-0001-of-0002.json.zip", and found that the first three records of the file have strange receive_date:
"receivedate": "20100729",
"receivedate": "20101129",
"receivedate": "20110614".

Are those zip files packed randomly or by some rules?

yunstat picture yunstat  ·  6 Jul 2016

Most helpful comment

2

@cerdman I just downloaded 2016q1 XML's zip file, and in both PDF and XML files it is stated that version 2.1 is used, not v3.

PDF:

is compliant with the DTD DCL files that are published as part of the ICH
E2b/M2 version 2.1 standard

XML:

2.1

pozz82 picture pozz82  ·  27 Jul 2016

All comments

2

The drug event downloads are partitioned on an internal key called @timestamp which is simply mapped to the drug event key receiptdate. The records should contain all data with a receiptdate between 20040101 to 20040401.

HansNelsen picture HansNelsen  ·  6 Jul 2016
0

Follow-up questions:

do you keep or remove earlier reports from the same safetyid?
or do you combine all reports from the same safetyid?

Thanks

yunstat picture yunstat  ·  13 Jul 2016
2

The reports are processed in order first to latest and only the latest one is kept.

HansNelsen picture HansNelsen  ·  13 Jul 2016
0

https://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20140131]+AND+safetyreportid:4261828
has 1 match with
"receivedate": "20040102",
"receiptdate": "20031222"

In the original ASCII 2004Q1, the id has only one record with
FDA_DT=20140103
MFR_DT=20031222

Does receipt date also come from MFR_DT? or the newer FDA_DT?

My original guess is caused by the change of data structure
In the 2004 q1 file, the 7th column is MFR_DT, and the 8th column is FDA_DT
In the 2014 q1 file, the 7th column is init_fda_dt, and the 8th column is fda_dt
however init_fda_dt is not the newest report date.

yunstat picture yunstat  ·  22 Jul 2016
2

I do not know the answer to this one. The answer might be in the pdf files that come with the downloads. There is a sort of data dictionary in there that may prove useful in answering this questions. You could also ask the openFDA team, since they are in regular contact with the FDA and they can forward your question on to the internal group responsible for the drug event data.

Good luck.

HansNelsen picture HansNelsen  ·  22 Jul 2016
2

@yunstat, the pipeline pulls drug adverse events from FAERS XML/SGML files, not the ASCII ones (the latter is only used for report id to case number conversion in some cases). So in the example you provided the dates come from AERS_SGML_2004q1.zip/sgml/ADR04M01.SGM:

   <receivedateformat>102</receivedateformat>
   <receivedate>20040102</receivedate>
   <receiptdateformat>102</receiptdateformat>
   <receiptdate>20031222</receiptdate>

So the drug event data should not be affected by the change in the ASCII structure you described.

dkrylovsb picture dkrylovsb  ·  25 Jul 2016
0

Thank you. After seeing the earlier response, I checked the NTS files to search for the reason of having smaller receipt_date.

The FAERS system is started at 2012 Q4. Before that, the system is called AERS, or LAERS.
From 2012Q4 to 2016Q1, the NTS file describes

  • A.1.6b receivedate = Date report was first received by FDA (Initial FDA Received Date)
  • A.1.7b receiptdate = Date of most recent report received by FDA

From 2004Q1 to 2012Q3, the NTS file describes

  • A.1.6b receivedate = FDA receive date
  • A.1.7b receiptdate = Manufacture's date of receipt of initial information. For non-mfr reports, receiptdate is repeated.

Therefore, receivedate >= receiptdate before 2012Q3, and receivedate <= receiptdate after 2012Q4. This solved my questions about having receiptdate smaller than receivedate at 2004.

I have been thinking about the effect on aggregated counts. This is like a shift of time frame. My observation on the effect is small.

Since openfda has been using ASCII files, there date fields in ASCII, like FDA_dt etc, that might be useful to anchor the reports.

yunstat picture yunstat  ·  25 Jul 2016
0

The format of FAERS xml files is recommended by the DTD in ICH E2b/M2 V 2.1 standard.
[http://estri.ich.org/e2br22/index.htm]
In the document "Electronic Transmission of Individual Case Safety Reports Message Specification
Document Version 2.3 November 9, 2000", definitions are,

  • A.1.6b receivedate = Date report was first received from source
  • A.1.7b receiptdate = Date of receipt of the most recent information for this report

By definition, I would think receiptdate >= receivedate.
At page 22, the same document provides an example:

<receivedateformat>102</receivedateformat>
<receivedate>19980102</receivedate>
<receiptdateformat>102</receiptdateformat>
<receiptdate>19970103</receiptdate>

This example has receiptdate smaller than receivedate.

The document, "MAINTENANCE OF THE ICH GUIDELINE ON CLINICAL SAFETY DATA MANAGEMENT : DATA ELEMENTS FOR TRANSMISSION OFINDIVIDUAL CASE SAFETY REPORTS E2B(R2)", provides some more details:

A.1.6 Date report was first received from source
User Guidance:
For senders dealing with initial information, this should always be the date the information was received from the primary source. When retransmitting information received from another regulatory agency or another company or any other secondary source, A.1.6 is the date the retransmitter first received the information. A full precision date should be used (i.e., day, month, year).

A.1.7 Date of receipt of the most recent information for this report
User Guidance:
Because reports are sent at different times to multiple receivers, the initial/follow up status is dependent upon the receiver. For this reason an item to capture follow-up status is not included. However, the date of receipt of the most recent information taken together with the “sender identifier” (A.3.1.2) and “sender’s (case) report unique identifier” (A.1.0.1) provide a mechanism for each receiver to identify whether the report being transmitted is an initial or follow-up report. For this reason these items are considered critical for each transmission. A full precision date should be used (i.e., day, month, year).

The AERS system started way earlier before 2001. The change of definition on XML keys may have its own historical reasons. Even now, I am still not sure the receivedate in FAERS can be defined as "the date the retransmitter first received the information". This retransmitter-first-received-date may not always available, for example, the retransmitter may not have the date in record, or have a lot of missing data. In many situations, the currently used receivedate is still a reasonably good and quick solution for monitoring the trend.

yunstat picture yunstat  ·  25 Jul 2016
2
cerdman picture cerdman  ·  25 Jul 2016
2

@cerdman I just downloaded 2016q1 XML's zip file, and in both PDF and XML files it is stated that version 2.1 is used, not v3.

PDF:

is compliant with the DTD DCL files that are published as part of the ICH
E2b/M2 version 2.1 standard

XML:

2.1

pozz82 picture pozz82  ·  27 Jul 2016