Different financial problems call for different architectures.
For tabular data — feature vectors of fixed dimension, like cross-sectional features for return forecasts — MLPs are the obvious choice, but tree ensembles (XGBoost, LightGBM) usually match or beat them with less tuning.
For sequence data — order book updates, time series of prices — recurrent networks (LSTM, GRU) and Transformers shine. Transformers in particular have largely replaced RNNs in NLP and are increasingly common in time-series forecasting due to their parallelism and ability to learn long-range dependencies.
For image-like inputs — heat maps of order books, candlestick patterns — convolutional networks (CNNs) capture local structure efficiently.
A pragmatic warning: deep learning shines when you have lots of data and complex non-linear structure. For most quant problems, simpler models (linear, tree-based) are competitive and far more interpretable.