Time Series Forecasting vs Regression: An informal guide

vavooom · on Feb 12, 2024

The author makes a call out to the online book Forecasting: Principles and Practice which is a great reference when conducting time series analyses. https://otexts.com/fpp3/

wardedVibe · on Feb 13, 2024

I found it to be way too underspecific; more of cursory overview for a undergraduate seeing it for the first time in a business school than someone interested in digging into time series forecasting in depth. Don't have a better recommendation though, unfortunately.

foundart · on Feb 13, 2024

> At the end of each chapter we provide a list of “further reading”. In general, these lists comprise suggested textbooks that provide a more advanced or detailed treatment of the subject. Where there is no suitable textbook, we suggest journal articles that provide more information.

disgruntledphd2 · on Feb 13, 2024

Box and Jenkins is probably the best book, and the older editions are much cheaper (and about the same quality) as the newer ones.

t_mann · on Feb 13, 2024

ARIMA models, seasonal adjustments,... this is still largely based on the Box-Jenkins method (developed in the 70's!). I feel like this stuff has been taught the same way for decades now (maybe similar to undergraduate classical mechanics or other topics that are considered 'solved'). Is this really still the state of the art? Time series analysis seems oddly close to machine learning, which seems to move at break-neck speed all the time, yet it feels completely stuck in time. Can someone unravel that paradox for me?

disgruntledphd2 · on Feb 13, 2024

TS models are constrained by the realities of the world they exist in. While you can chase benchmarks in lots of ML problems, forecasting is something that's used by basically every large business with huge consequences to getting it wrong or right.

Therefore, people stick with relatively performant & interpretable methods such as ARIMA and friends.

Additionally, most TS problems are relatively data constrained (your company/product has only existed for so long) so methods that are sample efficient (which most "modern" ML methods are not) are much more useful.

Also, time series/forecasting is a ghetto ;)

boppo1 · on Feb 14, 2024

>time series/forecasting is a ghetto ;)

What do you mean?

disgruntledphd2 · on Feb 16, 2024

It's very very disconnected from overall modelling, with its own approaches and culture. There's little to no cross pollination from other areas of modelling.

lukego · on Feb 13, 2024

Maybe of interest: https://github.com/probsys/AutoGP.jl

Terretta · on Feb 13, 2024

“Hari Seldon is a fictional character in Isaac Asimov's Foundation series. In his capacity as mathematics professor at Streeling University on the planet Trantor, Seldon develops psychohistory, an algorithmic science that allows him to predict the future in probabilistic terms.”

The problem isn't the math, the problem is people are not rational.

bradstewart · on Feb 13, 2024

New things are coming out, Meta's "prophet" for one. I've been pretty impressed with it in a "just throw data and don't even think about parameters" sense.

But the fact is, ARIMA models work. So people keep using them. And you can see what they're doing, and understand why, and how to tune them.

nerdponx · on Feb 13, 2024

Prophet isn't necessarily new either. It's "just" a linear model, with some specially-crafted features that work well for the specific problems that Facebook/Meta had at the time when they built it.

It's also not considered a good general default choice for time series forecasting, but that's another story.

taeric · on Feb 12, 2024

I always feel this is too close to stochastic versus random. There is a lot of text that pushes an idea that regression is used to understand how well a model fits relationships between variables. But, I start to have major doubts when people push the idea that regression models are not also predictive models.

rdli · on Feb 13, 2024

(Author here)

One of the things that confused me is that regression models can be predictive, just like time series forecasting — they just do so in a different way. I tried to make this clear in the article (or maybe I’m not understanding what you’re saying).

In a regression model, you’re predicting target variables from feature variables. In a time series, you’re predicting the same variable from its past behavior. This is a subtle but crucial difference.

(And then you can do time series with covariates, which combines the two.)

civilized · on Feb 13, 2024

Many of the most important time series prediction models are called "autoregressive", meaning they are regression models predicting the target from (prior values of) itself. This suggests that statisticians don't really share the view that these domains are distinct, or that regression models should only predict with different variables from the target.

nerdponx · on Feb 13, 2024

Correct. In terms of what "kind" of model it is, it's all just a variation of the same linear model, y = bx.

That said, there are a lot of special considerations involved with timeseries data. There is a large number of specialized tools, techniques, and model families dedicated to time series modeling, which don't make sense to use for other kinds of problems. And all of those special time series tools exist to solve problems that do not arise in other modeling situations. So in practice, times series modeling is a distinct specialization from other kinds of modeling.

hackerlight · on Feb 13, 2024

Right, AR(n) is a regression model, as are models which take only exogenous variables.

My question is this. According to definitions, can the latter (f(X_t) = y_t) be a time series model if each row of data is a time step? It doesn't have any autoregressive terms in X, so I don't know if it categorically is a time-series model.

Not that this question even matters, it's purely a taxonomy/terminology question.

nerdponx · on Feb 13, 2024

Yes, it is. A time series model is any model where the data varies over time; that is, a time series model is any model of time series data. And timeseries data is broadly anything where the data for a single thing/entity varies over time. There are no strict definitions here, just common conventions.

hackerlight · on Feb 13, 2024

Okay. And we can also say that there's some time series models that aren't regression models, right? For example, Kalman Filter is a "model" of a time series but isn't a regression.

nerdponx · on Feb 13, 2024

Correct.

Although the term "regression" is a misnomer anyway, and often when people say "regression" they mean "linear model". And by "linear model", we mean specifically a model in which outputs/predictions are some fixed linear combination of the input.

It is however possible to interpret the Kalman filter as a kind of dynamic regression model. Check out here if you want a good math workout on that topic: https://stats.stackexchange.com/q/330696

(Another somewhat distinct meaning of the term "regression" is any model with a "continuous" outcome variable. This is usually in contrast to "classification", which is any model that has a "categorical" or discrete outcome variable.)

hackerlight · on Feb 13, 2024

I have a time series forecasting methodology question that I'll drop here.

Suppose I have exogenous variables that vary over time, X(t). X is about 100 features. What are some methods I can apply onto X(t) to automatically engineer features that may be useful at predicting some noisy y(t)?

I want to simultaneously capture interactions/interdependence between the columns of X, as well as the autocorrelation structure of X.

If I treat X as merely tabular data, throwing it into a traditional regression model (e.g. XGBoost), it can capture the interdependence structure in X, but it will neglect the autocorrelation structure... Unless I manually engineer features that capture the autocorrelation structure in X (e.g. rolling/shifted/differenced features), but I want to explore methods that do that automatically.

nerdponx · on Feb 13, 2024

It might not be that important to fully capture the autocorrelation structure within X.

Usually our models are doing something like "Y = f(X) + E" where E is some unknown random noise and f() is the relationship that we are trying to infer from the data. We usually take X as "given" or "known", so in that case we are looking at Y conditional on some specific value of X.

If we are just trying to make good predictions, then we don't necessarily care about the structure among the components of X unless that structure tells us something about how Y is affected by X.

Imagine the following "true" relationships in the data, where E and H are unmeasurable random noise:

  Y(t) = b0 + b1 * X(t) + b2 * X(t-1) + E(t)
  X(t) = c * X(t-1) + H(t)

Knowing b0, b1, and b2 is sufficient to predict "Y minus random noise". Knowing c doesn't help us at all.

If you're interested in obtaining good-quality estimates of b1 and b2, then you'll have a problem. That's because the direct effect of X(t-1) on Y is conflated with the indirect effect of X(t-1) on Y via X(t). But if you're just trying to make good predictions for Y, then you don't care as much about confidently distinguishing between b1 and b2.

disgruntledphd2 · on Feb 13, 2024

if the variables in X(t) have the same time steps, I'd probably look at the cross correlation function of the X vs y, and then built another model on the X to predict X(t+n) and use that as an input for Y(t).

hackerlight · on Feb 13, 2024

> built another model on the X to predict X(t+n)

I like this idea.

Practically, how would this look? Say X has 100 columns. Do we estimate 100 separate models f_{i}(X_{t}) = X_{i, t+1}, then generate 100 predictions for each time step, and then feed those 100 predictions into a regression to predict Y_{t}?

> cross correlation function of the X vs y

Is this supposed to be combined somehow with the f_{i} outputs?

disgruntledphd2 · on Feb 13, 2024

> > cross correlation function of the X vs y

Is this supposed to be combined somehow with the f_{i} outputs?

I'd rank the variables by their CCF, and use the top(n) to try to predict the series of interest.

Like, split Y in half, then use the X(1:(t/2)+n) to predict Y(t+n) to see if it works, and then if it works OK, actually model the top n X series and use them to really predict the Y.

It's a pretty manual approach, but you could automate it once you have a better idea what you're aiming for.

taeric · on Feb 13, 2024

Apologies for dropping offline shortly after posting this. :(

I should say that I enjoyed this post. And I think leaning into that confusion is my aim. In particular, my point of stochastic versus random is that they are more synonym than they are anything else. Just words that different groups came to use covering similar things.

Which is not to say that their aren't differences in the crowd that uses each term. I posit that most of the differences is in the aims of the crowd, and at the end of the day, you can get a lot of mileage by embracing the similarities. As opposed to the default of contrasting on the differences.

As a fun example, to me, if you view time not as just a number that always goes up, but as a number that cycles through the seasonal values, then it is easy to view as most any other feature. Similarly, the past is easy to envision as a feature of the present.

I do think the way you described a lot of time series analysis fits the fun read I had where Mandlebrot proposed a fractal view of time series predictions. Where you are looking for self similar behavior in the series data and reflecting/overlaying it on itself. But... as is probably guessable from the rest of my post, a lot of this is far outside of my comfort area. Love reading about it from a distance.

Fomite · on Feb 13, 2024

One of the major time series prediction models my field uses is indeed a regression model.

VHRanger · on Feb 12, 2024

I mean, regression models (GLMs in general) are interpretable.

If you are using them to extrapolate (eg. Prediction) that should help you gauge how resilient you expect the model to be in prediction.

Obviously, for ARIMA the AR and MA parameters aren't very informative.

I use SARIMAX a decent amount, nonetheless.

mr_toad · on Feb 13, 2024

Taken individually the AR and MA terms are an average change and an average difference respectively. When you combine different AR and MA terms it does become less explicable.

riedel · on Feb 13, 2024

To me time is just one dimension. What is described is just the difference between interpolation and extrapolation.

In terms of forecasting state of the art are weather models like graphcast or panguweather. I guess arima won't be much of help in those high dimensional cases.

If you consider the univariate case the trick to outperform arima I guess is to detect the context from the time window before to make better contextual predictions: this is much like a regression on a hidden variable.

mr_toad · on Feb 13, 2024

> What is described is just the difference between interpolation and extrapolation.

Extrapolation (predicting an unknown future), and interpolation (estimating unknown present/past) are not really that different.

riedel · on Feb 14, 2024

I would mostly agree, this why time series imputation or cleaning is often not that different from time series forecasting (or rather often 'nowcasting'). You would only want to be careful what kind of validation you chose to test the generalisability of the approach.

If you take however the example of the weather as an extreme example of time series forecasts, downsizing eddies or forecasting them in a navierstokes surrogate, this can require some different approaches.

arisAlexis · on Feb 13, 2024

With risk of sounding bad: I can ask chatgpt to summarize this for me without any human writing an article since there is ample knowledge already available. What is the future of these kind of articles ?

nerdponx · on Feb 13, 2024

Because ChatGPT is somewhere between "subtly" and "totally" wrong on most topics related to statistics and machine learning that I've tested it with. Maybe GPT 4 is better than 3.5, but I don't really trust it on technical subjects.

The quality is significantly lower than a good article written by a competent human, but maybe on par with or slightly better than a trashy article written by a content farm.

The advantage of the chat interface is that you can ask it clarifying questions. The real benefit of generative AI would be something like Copilot that you can interrogate for clarification as you are working through an article written by another human.

That, and the other problem of AI being trained on AI until nobody knows anything anymore.

enoch2090 · on Feb 13, 2024

Totally makes sense - I'm already getting used to "ChatGPTing" instead of Googling these questions. I guess the future of these articles is that there are always fields that LLMs hallucinate at, fields that are not very common (explain the products of a small brand, pros &cons of their different models).

iamgopal · on Feb 13, 2024

Soon AI will be feeding itself, and gradually will degrade in performance, that time, human generated better content to feed to AI will be in demand.

blitzar · on Feb 13, 2024

Perhaps, one day in the not too distant future there will be a revered old master in a village somewhere, who people travel from miles away to watch as they slowly and carefully write listicles the old way.

aeonik · on Feb 13, 2024

This will be true until the models can perform science, and update their weights according to their tests.

vmfunction · on Feb 12, 2024

> This process is typically called “feature engineering”, and is part art and part science. Choices on including or excluding certain variables, and how they are translated into numerical parameters, can significantly impact the model’s performance.

According to this article, to make good predictive/regression model, we need a good artist and a good engineer!

rdli · on Feb 12, 2024

(Author here). Or a lot of trial & error :).

Vaslo · on Feb 13, 2024

My job is now primarily Time Series Forecasting, and we’ve spent so much time improving our feature selection and engineering. When I started I thought “run correlations against target variables, find the best bunch and as long as we can explain them and their relation to the target we are good”

I was wrong.

bigger_cheese · on Feb 13, 2024

I work mostly with regressions and often it is almost more informative when something you expected to be a significant term isn't. Can help track down interesting behavior.

More recently Machine learning has really enhanced what you can do with regression. For example multivariate regressions when there are non-linear (or partially linear) relationships between feature and target variables.

For example recent regression problem involved a chemical reaction. It was suspected that a particular feature above a threshold began to display non linear behavior but it was difficult to pinpoint exactly where it began departing from linearity. ML was very helpful analyzing this.

Other than regressions and timeseries forecasting I think it's worth knowing about K-means clustering and PCA (Principal Component Analysis)/ PLS (Projection to latent structures) as well.

I've found PCA to be pretty unknown but very useful I've had success using it in the past and found it useful to explain the relationship not just between the data features and the target variable but also how the features relate to each other.

applied_heat · on Feb 13, 2024

I’m just about to start digging in to 8 years of data from a few power plants with 16 turbines in total to see if I can identify some problems we might have before the sensor measurements exceed the alarm threshold.

Taking bearing temperature as an example, I think I will identify periods of time where the machine has already been generating for an hour so temperature have stabilized and then I will have bearing oil inlet temperature and machine load as independent variables, and bearing oil outlet and bearing metal temperatures as dependent values. Seems like it should be straightforward to find any anomalies but I’ve just started googling how to do this yesterday. There are lots of vendors hawking predictive maintenance software but I can’t imagine that I couldn’t get similar results with a few weeks effort and armed with Python and all of the associated libraries

bigger_cheese · on Feb 13, 2024

Maybe try slopes and second derivatives (change in temperature over time and so forth) could also try introducing various lag windows into timeseries data.

edit: I've Also seen a lot of pitches about predictive maintenance / automated anomaly detection. I think the appeal lies in having a one size fits all solution you can apply to multiple pieces of equipment (fans, conveyor belt drives, pumps etc) and not needing to develop/deploy/maintain bespoke models.

A lot of manufacturing sites won't have a data person on tap (or even people who can write python). Also there are challenges with deployment etc especially in remote sites where access is difficult, data connectivity is bad etc (think like oil/gas pipelines). Most of the pitches seem to combine running ML models and using some kind of iot device with something like lorawan for connectivity..

applied_heat · on Feb 13, 2024

Is the product being sold the setup of the bespoke models from their bag of what they’ve done before?

Regressions seem like the obvious way to detect anomalies to me since it should be 100% repeatable and make sense according to amount of heat being generated and removed , how to apply ML/AI to it I am not so sure

mr_toad · on Feb 13, 2024

A good first step would be a scatterplots, time series plots and mark 1 eyeballs. It helps to understand the shape of the data before you start trying to fit models.

applied_heat · on Feb 13, 2024

Right, make some scatter plots of bearing temp vs oil inlet temp and machine load, establish a fitted line, then can detect anomaly when new measurements vary from expected by more than some threshold.

Doesn’t seem that fancy, but better than waiting for a small problem to turn in to a larger problem.

I think it will also be useful in highlighting differences between identical machines, why does the one right beside the other run 5 degrees hotter on thrust bearing? etc

nerdponx · on Feb 13, 2024

If you have any kind of functional physical model of part where and eventual failure, you have a huge head start.

That said, there can be a pretty big gap between detecting individual sensor anomalies (undergrad homework) and predicting component failure (build an entire business around it). I have never regretted starting a data project with a small, easy task, and ramping up from there. Whereas I have definitely regretted starting a data project with big goals and/or fancy techniques at the beginning. Set clear incremental goals, and use the early prototyping phases to explore the data and develop a good understanding for what might or might not be possible to accomplish with it.

afrnz · on Feb 13, 2024

Having worked in the same problem space, I can heavily recommend to get expert input when evaluating which features to use. Ideally, this is a person who knows the internals of the machinery and/or operations that can help you remove spurious features. As a Data Scientist, one sometimes tends to think that the data explains everything and no expert domain knowledge is needed ("Modern machine translation does work without any knowledge of grammar or language!"). Good luck!

levocardia · on Feb 13, 2024

You should look into using generalized additive models (GAMs). They are regression models that allow you to model nonlinear relationships, and even smooth nonlinear interactions between variables, while retaining the benefits of classical regression models like statistically valid confidence intervals and the ability to control for repeated measures. You can also explicitly model periodic behavior, like 24-hour or annual cycles in a predictor variable, and even account for auto-correlation explicitly.

In your example, you could not only pinpoint departure from linearity, but you could get a 95% confidence interval for it.

The best implementation is mgcv in R; pyGAM in python is ok but lacks many of the more advanced features in mgcv. There's even a more ML-flavored implementation in mboost

pbowyer · on Feb 13, 2024

This is a case where I have a hard time getting my head round how and why machine learning helps. What models are there available, and what training data do you use? Any background would be appreciated, I use ML for feature detection and image classification, but not yet for regressions.

bigger_cheese · on Feb 13, 2024

I'm not a data scientist (my background is Engineering) I use Azure ML Studio. the regression feature uses an ensemble of different algorithms there is an explanation here.

https://learn.microsoft.com/en-us/azure/machine-learning/com...

Once the model has run it uses something called a mimic to generate model explainability, which lets you explore things like feature importance etc in the final model. As far as the user interface goes I mostly used SAS in the past and it feels quite similar.

eyegor · on Feb 13, 2024

Wait I still do this, what are your secrets?