I was asked about my data here and here. This is the data I collect:-

My bot collects data from each opening trade placed, attempted opening trade (unmatched) and when the market meets approx 90% criteria. The trigger includes a stoploss and tick-offset which are fired in response to the market so I don’t have any data relating to the point that they are placed.
I also record market suspension to help with analysis. I don’t record any other data about the market. If no trades are made in a market, I have no data.
The columns are as follows-

A = Market
B = Market ID
C = Time at last refresh
D = Seconds to off
E = Selection name

F thru Q = Best 3 back/lay prices (F,H,J,L,N,P) and volumes (G,I,K,M,O,Q)

R = Last price matched
S = Total matched for this selection
T = Trigger
U = My requested price
V = My stake
W = Total matched for this market
X = Timer output (time it takes for my code to run using excel’s timer)
The problem with data is how much is available. I’ve tried recording whole markets but you soon end up with megabytes of data that means nothing without refining. You first need an idea of what you want to find and then go about collecting the relevant data. For instance, I only record the data for my selection, not the other runners. You may only need data for the last 3 minutes before off, or only in-play, depending on how you want to trade. You could record prices in the morning and then at the off to look for trends. Without knowing what you want to target, data on its own is not much use. Because my data is so specific to my bot, it wont be any use to others without knowing the algorithm that picked it out.
EDIT:- below is a file of data recorded from one race. As Gruss passes the market to excel, I recorded each refresh to a sheet for each race. I only did it for a few days due to the size of the files. It’s about 280KB and taken from a day file which was 150MB in size.




















