preaction

Validating Financial Data with PDL

Wed, 13 May 2015 00:00:00 +0000

My job involves financial market data. A lot of financial market data. I take the market data from various sources and store it in a database for later analysis.

Being a programmer/analyst and not a mathematician with a Ph.D. in finance, my use for time series analytics falls into the "ensure correct data is being collected" category. But even then, some basic statistical analysis helps me preserve quality historical data for later use.

PDL is perfect for doing these kinds of calculations very quickly. Combined with PDL::Finance::TA, all the hard work is already done, and all I need to do is wire it all up.

Let's take a large set of random numbers. If our random number generator were perfect, we would expect that the set would be evenly distributed because each possible value is exactly as possible as any other value. If we calculate the standard deviation (stddev, a measurement of how disperse the data set is), we would expect that 99.7% of the points would be within 3 standard deviations of the mean (average).

So, if we write a test that checks to see if a new (completely random) point is within 3 stddev, there is a 0.3% chance that new (completely random) point will fail our test. If we bump that to 4 stddev, we should expect 99.99% of the points to pass the test, and 0.01% of the points to fail (1 of every 15787). If I collect 500,000 (completely random) points in a day, then 50 of them will fail our test.

So I create a time series of random points. Then I create a new time series of the 30-day standard deviation of the original series. Then I compare the two and see which points are outliers.

use PDL;
use PDL::Finance::TA;

my $ts = random( 5000 ) * 50;
my $stddev = ta_stddev( $ts, 30, 1 );

Market data is not completely random, it's stochastic, which I interpret to mean as "given value A1, the next value A2 will be somewhere between A1 +/- B". It's predicting (guessing) "B" that earns quants the big bucks. But, over the entire set of data, I know each previous value of B, which is the difference between A1 and A2, or the rate of change between 2 points. What I really want to know is if the rate of change from A1 to A2 appears abnormal, say, if it's more than 4 stddev from the mean.

So I take my time series, create a new time series that is the rate of change for each point in the previous series, create another new time series that is the 30-day stddev of the previous time series, and then compare the rate of change with the stddev to see which ones are outliers.

Finally, I should also make sure that my source is still updating, as it is very rare that most series would be the same twice in a row, or for an entire week. So let's check for flatness by using stddev.

PDL and TAlib make this all incredibly simple, so I can get on with my real work (fragging lamers in Quake)

Tags: perl pdl

Consuming Chaos

Sun, 08 Mar 2015 00:00:00 +0000

For what seems hours, you scan the board. The colors are sharp against the simple background. Some movement catches your eye, but it doesn't feel right, so you ignore it. Time stretches on.

There! The perfect move. Leaving the perfect next move. A quick flick. A match. The pieces fall into place. Another match. Another. Another. A special piece. Another special piece. It fires, triggering more. Chaos consumes.

The board is in ruins. Your carefully planned next move is lost in the destruction. You're back to scanning the board to try to find where you belong in this new world.

Is this a game, or is it your development strategy?

Software development is chaos. Either you work to managing chaos, consuming it, or it works on consuming you. There are too many possibilities, too much input, to brute-force your way to completion (how much software do you know of that can be considered complete?).

In the face of these possibilities, a rigid development plan will fail. Vague goals are better. Goals written in terms of a problem are best. Problems don't change, once you find their roots.

I didn't know this post was going to be about Agile, but there it is.

Exact is for computers. We are not computers. We are human. We are chaos.

Tags: software

Announcing Statocles

Mon, 02 Mar 2015 00:00:00 +0000

Static site generators are popular these days. For small sites, the ability to quickly author content using simple tools is key. The ability to use lower-cost (even free) hosting, often without any dynamic capabilities, is good for trying to maintain a budget. For larger sites, the ability to serve content quickly and cheaply is beneficial, and since most pages are read far more often than they are written, generating a full web page to store on the filesystem can improve performance (and lower costs).

For me, I like the convenience of using Github Pages to host project-oriented websites. The project itself is already on Github, so why not keep the website closely tied to it so it doesn't get out-of-date? For an organization like the Chicago Perl Mongers, Github can even host custom domains, allowing easy collaboration on websites.

It's through the Chicago.PM website that I was introduced to Octopress, a blogging engine built on Jekyll. It's through using Octopress that I decided to write my own static site generator, Statocles.

Tags: perl statocles

Mojolicious Triumphs Over Legacy Code

Fri, 13 Feb 2015 00:00:00 +0000

I got a text at 8:00am:

"Hey, can you jump on a conference call?"

Groggy and disoriented, I blearily type the conference line and enter my passcode, followed by the pound or hash sign. At the tone, I would be the 6th person to enter the conference. Tone.

"The app is down, and trading has stopped."

Tags: mojolicious perl

Managing SQL Data with Yertl

Wed, 21 Jan 2015 00:00:00 +0000

Originally posted on blogs.perl.org -- Managing SQL Data with Yertl

Every week, I work with about a dozen SQL databases. Some are Sybase, some MySQL, some SQLite. Some have different versions in dev, staging, and production. All of them need data extracted, transformed, and loaded.

DBI is the clear choice for dealing with SQL databases in Perl, but there are a dozen lines of Perl code in between me and the operation that I want. Sure, I've got modules and web applications and ad-hoc commands and scripts that perform certain individual tasks on my databases, but sometimes those things don't quite do what I need right now, and I just want something that will let me execute whatever SQL I can come up with.

Yertl (ETL::Yertl) is a shell-based ETL framework. It's under development (as is all software), but included already is a small utility called ysql to make dealing with SQL databases easy.

Tags: perl sql etl yertl