The author makes a call out to the online book Forecasting: Principles and Practice which is a great reference when conducting time series analyses. https://otexts.com/fpp3/
I found it to be way too underspecific; more of cursory overview for a undergraduate seeing it for the first time in a business school than someone interested in digging into time series forecasting in depth. Don't have a better recommendation though, unfortunately.
> At the end of each chapter we provide a list of “further reading”. In general, these lists comprise suggested textbooks that provide a more advanced or detailed treatment of the subject. Where there is no suitable textbook, we suggest journal articles that provide more information.
ARIMA models, seasonal adjustments,... this is still largely based on the Box-Jenkins method (developed in the 70's!). I feel like this stuff has been taught the same way for decades now (maybe similar to undergraduate classical mechanics or other topics that are considered 'solved'). Is this really still the state of the art? Time series analysis seems oddly close to machine learning, which seems to move at break-neck speed all the time, yet it feels completely stuck in time. Can someone unravel that paradox for me?
TS models are constrained by the realities of the world they exist in. While you can chase benchmarks in lots of ML problems, forecasting is something that's used by basically every large business with huge consequences to getting it wrong or right.
Therefore, people stick with relatively performant & interpretable methods such as ARIMA and friends.
Additionally, most TS problems are relatively data constrained (your company/product has only existed for so long) so methods that are sample efficient (which most "modern" ML methods are not) are much more useful.
It's very very disconnected from overall modelling, with its own approaches and culture. There's little to no cross pollination from other areas of modelling.
“Hari Seldon is a fictional character in Isaac Asimov's Foundation series. In his capacity as mathematics professor at Streeling University on the planet Trantor, Seldon develops psychohistory, an algorithmic science that allows him to predict the future in probabilistic terms.”
The problem isn't the math, the problem is people are not rational.
New things are coming out, Meta's "prophet" for one. I've been pretty impressed with it in a "just throw data and don't even think about parameters" sense.
But the fact is, ARIMA models work. So people keep using them. And you can see what they're doing, and understand why, and how to tune them.
Prophet isn't necessarily new either. It's "just" a linear model, with some specially-crafted features that work well for the specific problems that Facebook/Meta had at the time when they built it.
It's also not considered a good general default choice for time series forecasting, but that's another story.
I always feel this is too close to stochastic versus random. There is a lot of text that pushes an idea that regression is used to understand how well a model fits relationships between variables. But, I start to have major doubts when people push the idea that regression models are not also predictive models.
One of the things that confused me is that regression models can be predictive, just like time series forecasting — they just do so in a different way. I tried to make this clear in the article (or maybe I’m not understanding what you’re saying).
In a regression model, you’re predicting target variables from feature variables. In a time series, you’re predicting the same variable from its past behavior. This is a subtle but crucial difference.
(And then you can do time series with covariates, which combines the two.)
Many of the most important time series prediction models are called "autoregressive", meaning they are regression models predicting the target from (prior values of) itself. This suggests that statisticians don't really share the view that these domains are distinct, or that regression models should only predict with different variables from the target.
Correct. In terms of what "kind" of model it is, it's all just a variation of the same linear model, y = bx.
That said, there are a lot of special considerations involved with timeseries data. There is a large number of specialized tools, techniques, and model families dedicated to time series modeling, which don't make sense to use for other kinds of problems. And all of those special time series tools exist to solve problems that do not arise in other modeling situations. So in practice, times series modeling is a distinct specialization from other kinds of modeling.
Right, AR(n) is a regression model, as are models which take only exogenous variables.
My question is this. According to definitions, can the latter (f(X_t) = y_t) be a time series model if each row of data is a time step? It doesn't have any autoregressive terms in X, so I don't know if it categorically is a time-series model.
Not that this question even matters, it's purely a taxonomy/terminology question.
Yes, it is. A time series model is any model where the data varies over time; that is, a time series model is any model of time series data. And timeseries data is broadly anything where the data for a single thing/entity varies over time. There are no strict definitions here, just common conventions.
Okay. And we can also say that there's some time series models that aren't regression models, right? For example, Kalman Filter is a "model" of a time series but isn't a regression.
Although the term "regression" is a misnomer anyway, and often when people say "regression" they mean "linear model". And by "linear model", we mean specifically a model in which outputs/predictions are some fixed linear combination of the input.
It is however possible to interpret the Kalman filter as a kind of dynamic regression model. Check out here if you want a good math workout on that topic: https://stats.stackexchange.com/q/330696
(Another somewhat distinct meaning of the term "regression" is any model with a "continuous" outcome variable. This is usually in contrast to "classification", which is any model that has a "categorical" or discrete outcome variable.)
I have a time series forecasting methodology question that I'll drop here.
Suppose I have exogenous variables that vary over time, X(t). X is about 100 features. What are some methods I can apply onto X(t) to automatically engineer features that may be useful at predicting some noisy y(t)?
I want to simultaneously capture interactions/interdependence between the columns of X, as well as the autocorrelation structure of X.
If I treat X as merely tabular data, throwing it into a traditional regression model (e.g. XGBoost), it can capture the interdependence structure in X, but it will neglect the autocorrelation structure... Unless I manually engineer features that capture the autocorrelation structure in X (e.g. rolling/shifted/differenced features), but I want to explore methods that do that automatically.
It might not be that important to fully capture the autocorrelation structure within X.
Usually our models are doing something like "Y = f(X) + E" where E is some unknown random noise and f() is the relationship that we are trying to infer from the data. We usually take X as "given" or "known", so in that case we are looking at Y conditional on some specific value of X.
If we are just trying to make good predictions, then we don't necessarily care about the structure among the components of X unless that structure tells us something about how Y is affected by X.
Imagine the following "true" relationships in the data, where E and H are unmeasurable random noise:
Knowing b0, b1, and b2 is sufficient to predict "Y minus random noise". Knowing c doesn't help us at all.
If you're interested in obtaining good-quality estimates of b1 and b2, then you'll have a problem. That's because the direct effect of X(t-1) on Y is conflated with the indirect effect of X(t-1) on Y via X(t). But if you're just trying to make good predictions for Y, then you don't care as much about confidently distinguishing between b1 and b2.
if the variables in X(t) have the same time steps, I'd probably look at the cross correlation function of the X vs y, and then built another model on the X to predict X(t+n) and use that as an input for Y(t).
Practically, how would this look? Say X has 100 columns. Do we estimate 100 separate models f_{i}(X_{t}) = X_{i, t+1}, then generate 100 predictions for each time step, and then feed those 100 predictions into a regression to predict Y_{t}?
> cross correlation function of the X vs y
Is this supposed to be combined somehow with the f_{i} outputs?
Is this supposed to be combined somehow with the f_{i} outputs?
I'd rank the variables by their CCF, and use the top(n) to try to predict the series of interest.
Like, split Y in half, then use the X(1:(t/2)+n) to predict Y(t+n) to see if it works, and then if it works OK, actually model the top n X series and use them to really predict the Y.
It's a pretty manual approach, but you could automate it once you have a better idea what you're aiming for.
Apologies for dropping offline shortly after posting this. :(
I should say that I enjoyed this post. And I think leaning into that confusion is my aim. In particular, my point of stochastic versus random is that they are more synonym than they are anything else. Just words that different groups came to use covering similar things.
Which is not to say that their aren't differences in the crowd that uses each term. I posit that most of the differences is in the aims of the crowd, and at the end of the day, you can get a lot of mileage by embracing the similarities. As opposed to the default of contrasting on the differences.
As a fun example, to me, if you view time not as just a number that always goes up, but as a number that cycles through the seasonal values, then it is easy to view as most any other feature. Similarly, the past is easy to envision as a feature of the present.
I do think the way you described a lot of time series analysis fits the fun read I had where Mandlebrot proposed a fractal view of time series predictions. Where you are looking for self similar behavior in the series data and reflecting/overlaying it on itself. But... as is probably guessable from the rest of my post, a lot of this is far outside of my comfort area. Love reading about it from a distance.
Taken individually the AR and MA terms are an average change and an average difference respectively. When you combine different AR and MA terms it does become less explicable.
To me time is just one dimension. What is described is just the difference between interpolation and extrapolation.
In terms of forecasting state of the art are weather models like graphcast or panguweather. I guess arima won't be much of help in those high dimensional cases.
If you consider the univariate case the trick to outperform arima I guess is to detect the context from the time window before to make better contextual predictions: this is much like a regression on a hidden variable.
I would mostly agree, this why time series imputation or cleaning is often not that different from time series forecasting (or rather often 'nowcasting'). You would only want to be careful what kind of validation you chose to test the generalisability of the approach.
If you take however the example of the weather as an extreme example of time series forecasts, downsizing eddies or forecasting them in a navierstokes surrogate, this can require some different approaches.
With risk of sounding bad: I can ask chatgpt to summarize this for me without any human writing an article since there is ample knowledge already available. What is the future of these kind of articles ?
Because ChatGPT is somewhere between "subtly" and "totally" wrong on most topics related to statistics and machine learning that I've tested it with. Maybe GPT 4 is better than 3.5, but I don't really trust it on technical subjects.
The quality is significantly lower than a good article written by a competent human, but maybe on par with or slightly better than a trashy article written by a content farm.
The advantage of the chat interface is that you can ask it clarifying questions. The real benefit of generative AI would be something like Copilot that you can interrogate for clarification as you are working through an article written by another human.
That, and the other problem of AI being trained on AI until nobody knows anything anymore.
Totally makes sense - I'm already getting used to "ChatGPTing" instead of Googling these questions. I guess the future of these articles is that there are always fields that LLMs hallucinate at, fields that are not very common (explain the products of a small brand, pros &cons of their different models).
Perhaps, one day in the not too distant future there will be a revered old master in a village somewhere, who people travel from miles away to watch as they slowly and carefully write listicles the old way.
> This process is typically called “feature engineering”, and is part art and part science. Choices on including or excluding certain variables, and how they are translated into numerical parameters, can significantly impact the model’s performance.
According to this article, to make good predictive/regression model, we need a good artist and a good engineer!
My job is now primarily Time Series Forecasting, and we’ve spent so much time improving our feature selection and engineering. When I started I thought “run correlations against target variables, find the best bunch and as long as we can explain them and their relation to the target we are good”
I work mostly with regressions and often it is almost more informative when something you expected to be a significant term isn't. Can help track down interesting behavior.
More recently Machine learning has really enhanced what you can do with regression. For example multivariate regressions when there are non-linear (or partially linear) relationships between feature and target variables.
For example recent regression problem involved a chemical reaction. It was suspected that a particular feature above a threshold began to display non linear behavior but it was difficult to pinpoint exactly where it began departing from linearity. ML was very helpful analyzing this.
Other than regressions and timeseries forecasting I think it's worth knowing about K-means clustering and PCA (Principal Component Analysis)/ PLS (Projection to latent structures) as well.
I've found PCA to be pretty unknown but very useful I've had success using it in the past and found it useful to explain the relationship not just between the data features and the target variable but also how the features relate to each other.
I’m just about to start digging in to 8 years of data from a few power plants with 16 turbines in total to see if I can identify some problems we might have before the sensor measurements exceed the alarm threshold.
Taking bearing temperature as an example, I think I will identify periods of time where the machine has already been generating for an hour so temperature have stabilized and then I will have bearing oil inlet temperature and machine load as independent variables, and bearing oil outlet and bearing metal temperatures as dependent values. Seems like it should be straightforward to find any anomalies but I’ve just started googling how to do this yesterday. There are lots of vendors hawking predictive maintenance software but I can’t imagine that I couldn’t get similar results with a few weeks effort and armed with Python and all of the associated libraries
Maybe try slopes and second derivatives (change in temperature over time and so forth) could also try introducing various lag windows into timeseries data.
edit: I've Also seen a lot of pitches about predictive maintenance / automated anomaly detection. I think the appeal lies in having a one size fits all solution you can apply to multiple pieces of equipment (fans, conveyor belt drives, pumps etc) and not needing to develop/deploy/maintain bespoke models.
A lot of manufacturing sites won't have a data person on tap (or even people who can write python). Also there are challenges with deployment etc especially in remote sites where access is difficult, data connectivity is bad etc (think like oil/gas pipelines). Most of the pitches seem to combine running ML models and using some kind of iot device with something like lorawan for connectivity..
Is the product being sold the setup of the bespoke models from their bag of what they’ve done before?
Regressions seem like the obvious way to detect anomalies to me since it should be 100% repeatable and make sense according to amount of heat being generated and removed , how to apply ML/AI to it I am not so sure
A good first step would be a scatterplots, time series plots and mark 1 eyeballs. It helps to understand the shape of the data before you start trying to fit models.
Right, make some scatter plots of bearing temp vs oil inlet temp and machine load, establish a fitted line, then can detect anomaly when new measurements vary from expected by more than some threshold.
Doesn’t seem that fancy, but better than waiting for a small problem to turn in to a larger problem.
I think it will also be useful in highlighting differences between identical machines, why does the one right beside the other run 5 degrees hotter on thrust bearing? etc
If you have any kind of functional physical model of part where and eventual failure, you have a huge head start.
That said, there can be a pretty big gap between detecting individual sensor anomalies (undergrad homework) and predicting component failure (build an entire business around it). I have never regretted starting a data project with a small, easy task, and ramping up from there. Whereas I have definitely regretted starting a data project with big goals and/or fancy techniques at the beginning. Set clear incremental goals, and use the early prototyping phases to explore the data and develop a good understanding for what might or might not be possible to accomplish with it.
Having worked in the same problem space, I can heavily recommend to get expert input when evaluating which features to use. Ideally, this is a person who knows the internals of the machinery and/or operations that can help you remove spurious features. As a Data Scientist, one sometimes tends to think that the data explains everything and no expert domain knowledge is needed ("Modern machine translation does work without any knowledge of grammar or language!"). Good luck!
You should look into using generalized additive models (GAMs). They are regression models that allow you to model nonlinear relationships, and even smooth nonlinear interactions between variables, while retaining the benefits of classical regression models like statistically valid confidence intervals and the ability to control for repeated measures. You can also explicitly model periodic behavior, like 24-hour or annual cycles in a predictor variable, and even account for auto-correlation explicitly.
In your example, you could not only pinpoint departure from linearity, but you could get a 95% confidence interval for it.
The best implementation is mgcv in R; pyGAM in python is ok but lacks many of the more advanced features in mgcv. There's even a more ML-flavored implementation in mboost
This is a case where I have a hard time getting my head round how and why machine learning helps. What models are there available, and what training data do you use? Any background would be appreciated, I use ML for feature detection and image classification, but not yet for regressions.
I'm not a data scientist (my background is Engineering) I use Azure ML Studio. the regression feature uses an ensemble of different algorithms there is an explanation here.
Once the model has run it uses something called a mimic to generate model explainability, which lets you explore things like feature importance etc in the final model. As far as the user interface goes I mostly used SAS in the past and it feels quite similar.