In case your missed the announcement yesterday, The Railroad Commission of Texas (RRC) has released data sets on their website, below is the bulk of the email:
RRC Data Sets Now Available for Free
The Railroad Commission of Texas (RRC) has posted certain data sets available for download at no cost on the agency’s website. This will ensure 24-hour, 7 day a week access to important RRC data.
Many of these data sets previously required a subscription. They will remain in the same .FTP file locations. The data is directly accessible using links found on the RRC website without need of a username or password.
The new webpage containing links to the data sets will include information on frequency of updates, links to the manuals and links to useful information regarding each data set available.
To access the data sets, visit the RRC website at https://rrc.texas.gov/about-us/resource-center/research/data-sets-available-for-download/
Finally! A Robust Public/Free Data Set to Play With!
When it comes to collaboration, experimentation, research and training, I always felt that there weren’t many public data sets for Data Engineers and Data Scientists in the Oil and Gas industry, when compared to other industries, Financial, Healthcare, etc. Could this be the Data Set that we’ve been waiting for? I would like to find out.
I started downloading files from the different data sets and came across several data formats:
- Fixed-Width text files
- Delimted text files, right curly bracket is the common RRC delimiter
- dBase .dbf files
- Shape files
- Mainframe EBCDIC files, which stands for Extended Binary Coded Decimal Interchange Code
I started loading the data that doesn’t require too much work, such as the } delimited text files into an Azure Data Lake and connected it to Power BI to examine the contents, here’s a few screenshots of the first look a the data loaded from the Operators, Leases, Operator Cyclic Data and Well Query Dump data sets: (I only loaded from 1993 to current date for now)
- Convert and upload at least one of each of the different data formats and document the process for sharing
- Publish Azure Data Factory pipeline templates to make the process reusable and encourage others to collaborate
- Create a Modern Data Warehouse to use as a demo space and for training AI/ML models
Keep an eye on this blog for more updates or subscribe below to get notified: