Planet Haskell

Donnacha Oisín Kidney: POPL Paperâ€”Algebraic Effects Meet Hoare Logic in Cubical Agda

Thu, 07 Nov 2024 00:00:00 +0000

Posted on November 7, 2024

Tags: Agda

New paper: “Algebraic Effects Meet Hoare Logic in Cubical Agda”, by myself, Zhixuan Yang, and Nicolas Wu, will be published at POPL 2024.

Zhixuan has a nice summary of it here.

The preprint is available here.

Monday Morning Haskell: Solve.hs Module 3 + Summer Sale!

Mon, 01 Jul 2024 16:00:00 +0000

After 6 months of hard work, I am happy to announce that Solve.hs now has a new module - Essential Algorithms! You’ll learn the “Haskell Way” to write all of the most important algorithms for solving coding problems, such as Breadth First Search, Dijkstra’s Algorithm, and more!

You can get a 20% discount code for this and all of our other courses by subscribing to our mailing list! Starting next week, the price for Solve.hs will go up to reflect the increased content. So if you subscribe and purchase this week, you’ll end up saving 40% vs. buying later!

So don’t miss out, head to the course sales page to buy today!

GHC Developer Blog: GHC 9.6.6 is now available

Mon, 01 Jul 2024 00:00:00 +0000

GHC 9.6.6 is now available

Zubin Duggal - 2024-07-01

The GHC developers are happy to announce the availability of GHC 9.6.6. Binary distributions, source distributions, and documentation are available on the release page.

This release is primarily a bugfix release addressing some issues found in the 9.6 series. These include:

A fix for a bug in the NCG that could lead to incorrect runtime results due to erroneously removing a jump instruction (#24507).
A fix for a linker error that manifested on certain platform/toolchain combinations, particularly darwin with a brew provisioned toolchain, arising due to a confusion in linker options between GHC and cabal (#22210).
A fix for a compiler panic in the simplifier due to incorrect eta expansion (#24718).
A fix for possible segfaults when using the bytecode interpreter due to incorrect constructor tagging (#24870).
And a few more fixes

A full accounting of changes can be found in the release notes. As some of the fixed issues do affect correctness users are encouraged to upgrade promptly.

We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

Enjoy!

-Zubin

Chris Smith 2: The Semigroup of Exponentially Weighted Moving Averages

Sun, 30 Jun 2024 23:21:28 +0000

This is just a quick note about some more interesting algebraic structure, and how that structure can help with generalizing an idea. None of this is new, but it’s the perspective it sheds on the problem solving process that was interesting to me.

Defining the EWMA

And exponentially weighted moving average or (EWMA) is a technique often used for keeping a running estimate of some quantity as you make more observations. The idea is that each time you make a new observation, you take a weighted average of the new observation with the old average.

For a quick example, suppose you’re checking the price of gasoline every day at a nearby store.

On the first day it’s $3.10.
The second day it’s $3.20. You take the average of $3.10 and $3.20, and get $3.15 for your estimate.
The next day, it’s $3.30. You take the average of $3.15 and $3.30, and get $3.225 for your estimate.

This is not just the average of the three data points, which would be $3.20. Instead, the EWMA is biased in favor of more recent data. In fact, if you took these observations for a month, the first day of the month would be only about a billionth of the final result — basically ignored entirely — while the last day would amount for fully half. That’s the point: you’re looking for an estimate of the current value, so older historical data is less and less important. But you still want to smooth out the sudden jumps up and down.

In the example above, I was taking a straight average between the previous EWMA and the new data point. That leads for very short-lived data, though. More generally, there’s a parameter, normally called α, that tells you how to weight the average, giving this formula:

EWMAₙ₊₁ = xₙ₊₁ α + EWMAₙ (1-α)

In the extreme, if α = 1, you ignore the historical data and just look at the most recent value. For smaller values of α, the EWMA takes longer to adjust to the current value, but smooths out the noise in the data.

The EWMA Semigroup

Traditionally, you compute an EWMA one observation at a time, because you only make one observation at a time and keep a running average as you go. But when analyzing such a process, you may want to compute the EWMA of a large set of data in a parallel or distributed way. To do that, you’re going to look for a way to map the observations from the list into a semigroup. (See my earlier article on the relationship between monoids and list processing.)

We’ll start by looking at what the EWMA looks like on a sequence of data points x₁, x₂, …, xₙ.

EWMA₁ = x₁
EWMA₂ = x₁ (1-α) + x₂ α
EWMA₃ = x₁ (1-α)² + x₂ α (1-α) + x₃ α
EWMA₄ = x₁ (1-α)³ + x₂ α (1-α)² + x₃ α (1-α) + x₄ α

The first term of this sequence is funky and doesn’t follow the same pattern as the rest. But that’s because it was always different: since you don’t have a prior average at the first data point, you can only take the first data point itself as an expectation.

That’s a problem if we want to generate an entire semigroup from values that represent a single data point, since they would seem to all fall into this special case. We can instead treat the initial value as acting in both ways: as a prior expectation, and as an update to that expectation, giving something like this:

EWMA₁ = x₁ (1-α) + x₁ α
EWMA₂ = x₁ (1-α)² + x₁ α (1-α) + x₂ α
EWMA₃ = x₁ (1-α)³ + x₁ α (1-α)² + x₂ α (1-α) + x₃ α
EWMA₄ = x₁ (1-α)⁴ + x₁ α (1-α)³ + x₂ α (1-α)² + x₃ α (1-α) + x₄ α

The first term is the contribution of the prior expectation, but even the single observation case now has update terms, as well.

There are now two parts to the semigroup: a prior expectation, and an update rule. There’s an obvious and trivial semigroup structure to prior expectations: just take the first value and discard the later ones.

The semigroup structure on the update rules is a little more subtle. The key is to realize that the rule for updating an exponentially weighted moving average always looks a bit like the formula for EWMA₁ above: a weighted average between the prior expectation and the new target value. While single observations should use the same weight (α), composing multiple observations amounts to adjusting to a combined target value with a larger weight

It takes a little algebra to derive the new weight and target value for a composition, but it amounts to this. If:

f₁(x) = x (1-α₁) + x₁ α₁
f₂(x) = x (1-α₂) + x₂ α₂

Then:

(f₁ ∘ f₂)(x) = x (1-α₃) + x₃ α₃
α₃ = α₁ + α₂ - α₁ α₂
x₃ = (x₁ α₁ + x₂ α₂ - x₂ α₁ α₂) / α₃

This can be defunctionalized to only store the weight and target values. In Haskell, it looks like this:

import Data.Foldable1 (Foldable1, foldMap1)
import Data.Semigroup (First (..))

data EWMAUpdate = EWMAUpdate
  { ewmaAlpha :: Double,
    ewmaTarget :: Double
  }
  deriving (Eq, Show)

instance Semigroup EWMAUpdate where
  EWMAUpdate a1 v1 <> EWMAUpdate a2 v2 =
    EWMAUpdate newAlpha newTarget
    where
      newAlpha = a1 + a2 - a1 * a2
      newTarget
        | newAlpha == 0 = 0
        | otherwise = (a1 * v1 + a2 * v2 - a1 * a2 * v2) / newAlpha

instance Monoid EWMAUpdate where
  mempty = EWMAUpdate 0 0

data EWMA = EWMA
  { ewmaPrior :: First Double,
    ewmaUpdate :: EWMAUpdate
  }
  deriving (Eq, Show)

instance Semigroup EWMA where
  EWMA p1 u1 <> EWMA p2 u2 = EWMA (p1 <> p2) (u1 <> u2)

singleEWMA :: Double -> Double -> EWMA
singleEWMA alpha x = EWMA (First x) (EWMAUpdate alpha x)

ewmaValue :: EWMA -> Double
ewmaValue (EWMA (First x) (EWMAUpdate a v)) = (1 - a) * x + a * v

ewma :: (Foldable1 t) => Double -> t Double -> Double
ewma alpha = ewmaValue . foldMap1 (singleEWMA alpha)

An EWMAUpdate is effectively a function of the form above, and its semigroup instance is reversed function composition. However, it is stored in a defunctionalized form so that the composition can be computed in advance. Then we add a few helper functions: singleEWMA to compute the EWMA for a single data point (with a given α value), ewmaValue to apply the update function to the prior and produce an estimate, and ewma to use this semigroup to compute the EWMA of any non-empty sequence.

Why?

What have we gained by looking at the problem this way?

The standard answer is that we’ve gained flexibility in how we perform the computation. In addition to computing an EWMA sequentially from left to right, we can do things like:

Compute the EWMA in a parallel or distributed way, farming out parts of the work to different threads or machines.
Compute an EWMA on the fly in the intermediate nodes of a balanced tree, such as one can do with Haskell’s fingertree package.

But neither of these are the reason I worked this out today. Instead, I was interested in a continuous time analogue of the EWMA. The traditional form of the EWMA is recursive, which gives you a value after each whole step of recursion, but doesn’t help much with continuous time. By working out this semigroup structure, the durations of each observation, which previously were implicit in the depth of recursion, turn into explicit weights that accumulate as different time intervals are concatenated. It’s then easy to see how you might incorporate a continuous observation with a non-unit duration, or with greater or lesser weight for some other reason.

That’s not new, of course, but it’s nice to be able to work it out simply by searching for algebraic structure.

Joachim Breitner: Do surprises get larger?

mail@joachim-breitner.de (Joachim Breitner) — Sun, 30 Jun 2024 13:28:31 +0000

The setup

Imagine you are living on a riverbank. Every now and then, the river swells and you have high water. The first few times this may come as a surprise, but soon you learn that such floods are a recurring occurrence at that river, and you make suitable preparation. Let’s say you feel well-prepared against any flood that is no higher than the highest one observed so far. The more floods you have seen, the higher that mark is, and the better prepared you are. But of course, eventually a higher flood will occur that surprises you.

Of course such new record floods are happening rarer and rarer as you have seen more of them. I was wondering though: By how much do the new records exceed the previous high mark? Does this excess decrease or increase over time?

A priori both could be. When the high mark is already rather high, maybe new record floods will just barley pass that mark? Or maybe, simply because new records are so rare events, when they do occur, they can be surprisingly bad?

This post is a leisurely mathematical investigating of this question, which of course isn’t restricted to high waters; it could be anything that produces a measurement repeatedly and (mostly) independently – weather events, sport results, dice rolls.

The answer of course depends on the distribution of results: How likely is each possible results.

Dice are simple

With dice rolls the answer is rather simple. Let our measurement be how often you can roll a die until it shows a 6. This simple game we can repeat many times, and keep track of our record. Let’s say the record happens to be 7 rolls. If in the next run we roll the die 7 times, and it still does not show a 6, then we know that we have broken the record, and every further roll increases by how much we beat the old record.

But note that how often we will now roll the die is completely independent of what happened before!

So for this game the answer is: The excess with which the record is broken is always the same.

Mathematically speaking this is because the distribution of “rolls until the die shows a 6” is memoryless. Such distributions are rather special, its essentially just the example we gave (a geometric distribution), or its continuous analogue (the exponential distributions, for example the time until a radioactive particle decays).

Mathematical formulation

With this out of the way, let us look at some other distributions, and for that, introduce some mathematical notations. Let X be a random variable with probability density function φ(x) and cumulative distribution function Φ(x), and a be the previous record. We are interested in the behavior of

Y(a) = X − a ∣ X > x

i.e. by how much X exceeds a under the condition that it did exceed a. How does Y change as a increases? In particular, how does the expected value of the excess e(a) = E(Y(a)) change?

Uniform distribution

If X is uniformly distributed between, say, 0 and 1, then a new record will appear uniformly distributed between a and 1, and as that range gets smaller, the excess must get smaller as well. More precisely,

e(a) = E(X − a ∣ X > a) = E(X ∣ X > a) − a = (1 − a)/2

This not very interesting linear line is plotted in blue in this diagram:

The expected record surpass for the uniform distribution

The orange line with the logarithmic scale on the right tries to convey how unlikely it is to surpass the record value a: it shows how many attempts we expect before the record is broken. This can be calculated by n(a) = 1/(1 − Φ(a)).

Normal distribution

For the normal distribution (with median 0 and standard derivation 1, to keep things simple), we can look up the expected value of the one-sided truncated normal distribution and obtain

e(a) = E(X ∣ X > a) − a = φ(a)/(1 − Φ(a)) − a

Now is this growing or shrinking? We can plot this an have a quick look:

The expected record surpass for the normal distribution

Indeed it is, too, a decreasing function!

(As a sanity check we can see that e(0) = √(2/π), which is the expected value of the half-normal distribution, as it should.)

Could it be any different?

This settles my question: It seems that each new surprisingly high water will tend to be less surprising than the previously – assuming high waters were uniformly or normally distributed, which is unlikely to be helpful.

This does raise the question, though, if there are probability distributions for which e(a) is be increasing?

I can try to construct one, and because it’s a bit easier, I’ll consider a discrete distribution on the positive natural numbers, and consider at g(0) = E(X) and g(1) = E(X − 1 ∣ X > 1). What does it take for g(1) > g(0)? Using E(X) = p + (1 − p)E(X ∣ X > 1) for p = P(X = 1) we find that in order to have g(1) > g(0), we need E(X) > 1/p.

This is plausible because we get equality when E(X) = 1/p, as it precisely the case for the geometric distribution. And it is also plausible that it helps if p is large (so that the next first record is likely just 1) and if, nevertheless, E(X) is large (so that if we do get an outcome other than 1, it’s much larger).

Starting with the geometric distribution, where P(X > n ∣ X ≥ n) = p_n = p (the probability of again not rolling a six) is constant, it seems that these p_n is increasing, we get the desired behavior. So let p₁ < p₂ < p_n < … be an increasing sequence of probabilities, and define X so that P(X = n) = p₁ ⋅ ⋯ ⋅ p_n − 1 ⋅ (1 − p_n) (imagine the die wears off and the more often you roll it, the less likely it shows a 6). Then for this variation of the game, every new record tends to exceed the previous more than previous records. As the p increase, we get a flatter long end in the probability distribution.

Gamma distribution

To get a nice plot, I’ll take the intuition from this and turn to continuous distributions. The Wikipedia page for the exponential distribution says it is a special case of the gamma distribution, which has an additional shape parameter α, and it seems that it could influence the shape of the distribution to be and make the probability distribution have a longer end. Let’s play around with β = 2 and α = 0.5, 1 and 1.5:

The expected record surpass for the gamma distribution

For α = 1 (dotted) this should just be the exponential distribution, and we see that e(a) is flat, as predicted earlier.
For larger α (dashed) the graph does not look much different from the one for the normal distribution – not a surprise, as for α → ∞, the gamma distribution turns into the normal distribution.
For smaller α (solid) we get the desired effect: e(a) is increasing. This means that new records tend to break records more impressively.

The orange line shows that this comes at a cost: for a given old record a, new records are harder to come by with smaller α.

Conclusion

As usual, it all depends on the distribution. Otherwise, not much, it’s late.