Questions and Answers ​in MRI
  • Home
  • Complete List of Questions
  • …Magnets & Scanners
    • Basic Electromagnetism >
      • What causes magnetism?
      • What is a Tesla?
      • Who was Tesla?
      • What is a Gauss?
      • How strong is 3.0T?
      • What is a gradient?
      • Aren't gradients coils?
      • What is susceptibility?
      • How to levitate a frog?
      • What is ferromagnetism?
      • Superparamagnetism?
    • Magnets - Part I >
      • Types of magnets?
      • Brands of scanners?
      • Which way does field point?
      • Which is the north pole?
      • Low v mid v high field?
      • Advantages to low-field?
      • Disadvantages?
      • What is homogeneity?
      • Why homogeneity?
      • Why shimming?
      • Passive shimming?
      • Active shimming?
    • Magnets - Part II >
      • Superconductivity?
      • Perpetual motion?
      • How to ramp?
      • Superconductive design?
      • Room Temp supercon?
      • Liquid helium use?
      • What is a quench?
      • Is field ever turned off?
      • Emergency stop button?
    • Gradients >
      • Gradient coils?
      • How do z-gradients work?
      • X- and Y- gradients?
      • Open scanner gradients?
      • Eddy current problems?
      • Active shielded gradients?
      • Active shield confusion?
      • What is pre-emphasis?
      • Gradient heating?
      • Gradient specifications?
      • Gradient linearity?
    • RF & Coils >
      • Many kinds of coils?
      • Radiofrequency waves?
      • Phase v frequency?
      • RF Coil function(s)?
      • RF-transmit coils?
      • LP vs CP (Quadrature)?
      • Multi-transmit RF?
      • Receive-only coils?
      • Array coils?
      • AIR Coils?
    • Site Planning >
      • MR system layout?
      • What are fringe fields?
      • How to reduce fringe?
      • Magnetic shielding?
      • Need for vibration testing?
      • What's that noise?
      • Why RF Shielding?
      • Wires/tubes thru wall?
  • ...Safety and Screening
    • Overview >
      • ACR Safety Zones?
      • MR safety screening?
      • Incomplete screening?
      • Passive v active implants?
      • Conditional implants?
      • Common safety issues?
      • Projectiles?
      • Metal detectors?
      • Pregnant patients?
      • Postop, ER & ICU patients?
      • Temperature monitoring?
      • Orbital foreign bodies?
      • Bullets and shrapnel?
    • Static Fields >
      • "Dangerous" metals?
      • "Safe" metals?
      • Magnetizing metal?
      • Object shape?
      • Forces on metal?
      • Most dangerous place?
      • Force/torque testing?
      • Static field bioeffects?
      • Dizziness/Vertigo?
      • Flickering lights?
      • Metallic taste?
    • RF Fields >
      • RF safety overview?
      • RF biological effects?
      • What is SAR?
      • SAR limits?
      • Operating modes?
      • How to reduce SAR?
      • RF burns?
      • Estimate implant heating?
      • SED vs SAR?
      • B1+rms vs SAR?
      • Personnel exposure?
      • Cell phones?
    • Gradient Fields >
      • Gradient safety overview
      • Acoustic noise?
      • Nerve stimulation?
      • Gradient vs RF heating?
    • Safety: Neurological >
      • Aneurysm coils/clips?
      • Shunts/drains?
      • Pressure monitors/bolts?
      • Deep brain stimulators?
      • Spinal cord stimulators?
      • Vagal nerve stimulators?
      • Cranial electrodes?
      • Carotid clamps?
      • Peripheral stimulators?
      • Epidural catheters?
    • Safety: Head & Neck >
      • Additional orbit safety?
      • Cochlear Implants?
      • Bone conduction implants?
      • Other ear implants?
      • Dental/facial implants?
      • ET tubes & airways?
    • Safety: Chest & Vascular >
      • Breast tissue expanders?
      • Breast biopsy markers?
      • Airway stents/valves/coils?
      • Respiratory stimulators?
      • Ports/vascular access?
      • Swan-Ganz catheters?
      • IVC filters?
      • Implanted infusion pumps?
      • Insulin pumps & CGMs?
      • Vascular stents/grafts?
      • Sternal wires/implants?
    • Safety: Cardiac >
      • Pacemaker dangers?
      • Pacemaker terminology?
      • New/'Safe" Pacemakers?
      • Old/Legacy Pacemakers?
      • Violating the conditions?
      • Epicardial pacers/leads?
      • Cardiac monitors?
      • Heart valves?
      • Miscellaneous CV devices?
    • Safety: Abdominal >
      • PIllCam and capsules?
      • Gastric pacemakers?
      • Other GI devices?
      • Contraceptive devices?
      • Foley catheters?
      • Incontinence devices?
      • Penile Implants?
      • Sacral nerve stimulators?
      • GU stents and other?
    • Safety: Orthopedic >
      • Orthopedic hardware?
      • External fixators?
      • Traction and halos?
      • Bone stimulators?
      • Magnetic rods?
  • …The NMR Phenomenon
    • Spin >
      • What is spin?
      • Why I = ½, 1, etc?
      • Proton = nucleus = spin?
      • Predict nuclear spin (I)?
      • Magnetic dipole moment?
      • Gyromagnetic ratio (γ)?
      • "Spin" vs "Spin state"?
      • Energy splitting?
      • Fall to lowest state?
      • Quantum "reality"?
    • Precession >
      • Why precession?
      • Who was Larmor?
      • Energy for precession?
      • Chemical shift?
      • Net magnetization (M)?
      • Does M instantly appear?
      • Does M also precess?
      • Does precession = NMR?
    • Resonance >
      • MR vs MRI vs NMR?
      • Who discovered NMR?
      • How does B1 tip M?
      • Why at Larmor frequency?
      • What is flip angle?
      • Spins precess after 180°?
      • Phase coherence?
      • Release of RF energy?
      • Rotating frame?
      • Off-resonance?
      • Adiabatic excitation?
      • Adiabatic pulses?
    • Relaxation - Physics >
      • Bloch equations?
      • What is T1?
      • What is T2?
      • Relaxation rate vs time?
      • Why is T1 > T2?
      • T2 vs T2*?
      • Causes of Relaxation?
      • Dipole-dipole interactions?
      • Chemical Exchange?
      • Spin-Spin interactions?
      • Macromolecule effects?
      • Which H's produce signal?
      • "Invisible" protons?
      • Magnetization Transfer?
      • Bo effect on T1 & T2?
      • How to predict T1 & T2?
    • Relaxation - Clincial >
      • T1 bright? - fat
      • T1 bright? - other oils
      • T1 bright? - cholesterol
      • T1 bright? - calcifications
      • T1 bright? - meconium
      • T1 bright? - melanin
      • T1 bright? - protein/mucin
      • T1 bright? - myelin
      • Magic angle?
      • MT Imaging/Contrast?
  • …Pulse Sequences
    • MR Signals >
      • Origin of MR signal?
      • Free Induction Decay?
      • Gradient echo?
      • TR and TE?
      • Spin echo?
      • 90°-90° Hahn Echo?
      • Stimulated echoes?
      • STEs for imaging?
      • 4 or more RF-pulses?
      • Partial flip angles?
      • How is signal higher?
      • Optimal flip angle?
    • Spin Echo >
      • SE vs Multi-SE vs FSE?
      • Image contrast: TR/TE?
      • Opposite effects ↑T1 ↑T2?
      • Meaning of weighting?
      • Does SE correct for T2?
      • Effect of 180° on Mz?
      • Direction of 180° pulse?
    • Inversion Recovery >
      • What is IR?
      • Why use IR?
      • Phase-sensitive IR?
      • Why not PSIR always?
      • Choice of IR parameters?
      • TI to null a tissue?
      • STIR?
      • T1-FLAIR
      • T2-FLAIR?
      • IR-prepped sequences?
      • Double IR?
    • Gradient Echo >
      • GRE vs SE?
      • Multi-echo GRE?
      • Types of GRE sequences?
      • Commercial Acronyms?
      • Spoiling - what and how?
      • Spoiled-GRE parameters?
      • Spoiled for T1W only?
      • What is SSFP?
      • GRASS/FISP: how?
      • GRASS/FISP: parameters?
      • GRASS vs MPGR?
      • PSIF vs FISP?
      • True FISP/FIESTA?
      • FIESTA v FIESTA-C?
      • DESS?
      • MERGE/MEDIC?
      • GRASE?
      • MP-RAGE v MR2RAGE?
    • Susceptibility Imaging >
      • What is susceptibility (χ)?
      • What's wrong with GRE?
      • Making an SW image?
      • Phase of blood v Ca++?
      • Quantitative susceptibility?
    • Diffusion: Basic >
      • What is diffusion?
      • Iso-/Anisotropic diffusion?
      • "Apparent" diffusion?
      • Making a DW image?
      • What is the b-value?
      • b0 vs b50?
      • Trace vs ADC map?
      • Light/dark reversal?
      • T2 "shine through"?
      • Exponential ADC?
      • T2 "black-out"?
      • DWI bright causes?
    • Diffusion: Advanced >
      • Diffusion Tensor?
      • DTI (tensor imaging)?
      • Whole body DWI?
      • Readout-segmented DWI?
      • Small FOV DWI?
      • IVIM?
      • Diffusion Kurtosis?
    • Fat-Water Imaging >
      • Fat & Water properties?
      • F-W chemical shift?
      • In-phase/out-of-phase?
      • Best method?
      • Dixon method?
      • "Fat-sat" pulses?
      • Water excitation?
      • STIR?
      • SPIR?
      • SPAIR v SPIR?
      • SPIR/SPAIR v STIR?
  • …Making an Image
    • From Signals to Images >
      • Phase v frequency?
      • Angular frequency (ω)?
      • Signal squiggles?
      • Real v Imaginary?
      • Fourier Transform (FT)?
      • What are 2D- & 3D-FTs?
      • Who invented MRI?
      • How to locate signals?
    • Frequency Encoding >
      • Frequency encoding?
      • Receiver bandwidth?
      • Narrow bandwidth?
      • Slice-selective excitation?
      • SS gradient lobes?
      • Cross-talk?
      • Frequency encode all?
      • Mixing of slices?
      • Two slices at once?
      • Simultaneous Multi-Slice?
    • Phase Encoding >
      • Phase-encoding gradient?
      • Single PE step?
      • What is phase-encoding?
      • PE and FE together?
      • 2DFT reconstruction?
      • Choosing PE/FE direction?
    • Performing an MR Scan >
      • What are the steps?
      • Automatic prescan?
      • Routine shimming?
      • Coil tuning/matching?
      • Center frequency?
      • Transmitter gain?
      • Receiver gain?
      • Dummy cycles?
      • Where's my data?
      • MR Tech qualifications?
    • Image Quality Control >
      • Who regulates MRI?
      • Who accredits?
      • Mandatory accreditation?
      • Routine quality control?
      • MR phantoms?
      • Geometric accuracy?
      • Image uniformity?
      • Slice parameters?
      • Image resolution?
      • Signal-to-noise?
      • Ghosting?
  • …K-space & Rapid Imaging
    • K-space (Basic) >
      • What is k-space?
      • Parts of k-space?
      • What does "k" stand for?
      • Spatial frequencies?
      • Locations in k-space?
      • Data for k-space?
      • Why signal ↔ k-space?
      • Spin-warp imaging?
      • Big spot in middle?
      • K-space trajectories?
      • Radial sampling?
    • K-space (Advanced) >
      • K-space grid?
      • Negative frequencies?
      • Field-of-view (FOV)
      • Rectangular FOV?
      • Partial Fourier?
      • Phase symmetry?
      • Read symmetry?
      • Why not use both?
      • ZIP?
    • Rapid Imaging (FSE &EPI) >
      • What is FSE/TSE?
      • FSE parameters?
      • Bright Fat?
      • Other FSE differences?
      • Dual-echo FSE?
      • Driven equilibrium?
      • Reduced flip angle FSE?
      • Hyperechoes?
      • SPACE/CUBE/VISTA?
      • Echo-planar imaging?
      • HASTE/SS-FSE?
    • Parallel Imaging (PI) >
      • What is PI?
      • How is PI different?
      • PI coils and sequences?
      • Why and when to use?
      • Two types of PI?
      • SENSE/ASSET?
      • GRAPPA/ARC?
      • CAIPIRINHA?
      • Compressed sensing?
      • Noise in PI?
      • Artifacts in PI?
  • …Contrast Agents
    • Contrast Agents: Physics >
      • Why Gadolinium?
      • Paramagnetic relaxation?
      • What is relaxivity?
      • Why does Gd shorten T1?
      • Does Gd affect T2?
      • Gd & field strength?
      • Best T1-pulse sequence?
      • Triple dose and MT?
      • Dynamic CE imaging?
      • Gadolinium on CT?
    • Contrast Agents: Clinical >
      • So many Gd agents!
      • Important properties?
      • Ionic v non-ionic?
      • Intra-articular/thecal Gd?
      • Gd liver agents (Eovist)?
      • Mn agents (Teslascan)?
      • Feridex & Liver Agents?
      • Lymph node agents?
      • Ferumoxytol?
      • Blood pool (Ablavar)?
      • Bowel contrast agents?
    • Contrast Agents: Safety >
      • Gadolinium safety?
      • Allergic reactions?
      • Renal toxicity?
      • What is NSF?
      • NSF by agent?
      • Informed consent for Gd?
      • Gd protocol?
      • Is Gd safe in infants?
      • Reduced dose in infants?
      • Gd in breast milk?
      • Gd in pregnancy?
      • Gd accumulation?
      • Gd deposition disease?
  • …Cardiovascular and MRA
    • Flow effects in MRI >
      • Defining flow?
      • Expected velocities?
      • Laminar v turbulent?
      • Predicting MR of flow?
      • Time-of-flight effects?
      • Spin phase effects?
      • Flow void?
      • Why GRE ↑ flow signal?
      • Slow flow v thrombus?
      • Even-echo rephasing?
      • Flow-compensation?
      • Flow misregistration?
    • MR Angiography - I >
      • MRA methods?
      • Dark vs bright blood?
      • Time-of-Flight (TOF) MRA?
      • 2D vs 3D MRA?
      • MRA parameters?
      • Magnetization Transfer?
      • Ramped flip angle?
      • MOTSA?
      • Fat-suppressed MRA?
      • TOF MRA Artifacts?
      • Phase-contrast MRA?
      • What is VENC?
      • Measuring flow?
      • 4D Flow Imaging?
      • How accurate?
    • MR Angiography - II >
      • Gated 3D FSE MRA?
      • 3D FSE MRA parameters?
      • SSFP MRA?
      • Inflow-enhanced SSFP?
      • MRA with ASL?
      • Other MRA methods?
      • Contrast-enhanced MRA?
      • Timing the bolus?
      • View ordering in MRA?
      • Bolus chasing?
      • TRICKS or TWIST?
      • CE-MRA artifacts?
    • Cardiac I - Intro/Anatomy >
      • Cardiac protocols?
      • Patient prep?
      • EKG problems?
      • Magnet changes EKG?
      • Gating v triggering?
      • Gating parameters?
      • Heart navigators?
      • Dark blood/Double IR?
      • Why not single IR?
      • Triple IR?
      • Polar plots?
      • Coronary artery MRA?
    • Cardiac II - Function >
      • Beating heart movies?
      • Cine parameters?
      • Real-time cine?
      • Ventricular function?
      • Tagging/SPAMM?
      • Perfusion: why and how?
      • 1st pass perfusion?
      • Quantifying perfusion?
      • Dark rim artifact
    • Cardiac III - Viability >
      • Gd enhancement?
      • TI to null myocardium?
      • PS (phase-sensitive) IR?
      • Wideband LGE?
      • T1 mapping?
      • Iron/T2*-mapping?
      • Edema/T2-mapping?
      • Why/how stress test?
      • Stess drugs/agents?
      • Stress consent form?
  • …MR Artifacts
    • Tissue-related artifacts >
      • Chemical shift artifact?
      • Chemical shift in phase?
      • Reducing chemical shift?
      • Chemical Shift 2nd Kind?
      • In-phase/out-of phase?
      • IR bounce point?
      • Susceptibility artifact?
      • Metal suppression?
      • Dielectric effect?
      • Dielectric Pads?
    • Motion-related artifacts >
      • Why discrete ghosts?
      • Motion artifact direction?
      • Reducing motion artifacts?
      • Saturation pulses?
      • Gating methods?
      • Respiratory comp?
      • Navigator echoes?
      • PROPELLER/BLADE?
    • Technique-related artifacts >
      • Partial volume effects?
      • Slice overlap?
      • Aliasing?
      • Wrap-around artifact?
      • Eliminate wrap-around?
      • Phase oversampling?
      • Frequency wrap-around?
      • Spiral/radial artifacts?
      • Gibbs artifact?
      • Nyquist (N/2) ghosts?
      • Zipper artifact?
      • Data artifacts?
      • Surface coil flare?
      • MRA Artifacts (TOF)?
      • MRA artifacts (CE)?
  • …Functional Imaging
    • Perfusion I: Intro & DSC >
      • Measuring perfusion?
      • Meaning of CBF, MTT etc?
      • DSC v DCE v ASL?
      • How to perform DSC?
      • Bolus Gd effect?
      • T1 effects on DSC?
      • DSC recirculation?
      • DSC curve analysis?
      • DSC signal v [Gd]
      • Arterial input (AIF)?
      • Quantitative DSC?
    • Perfusion II: DCE >
      • What is DCE?
      • How is DCE performed?
      • How is DCE analyzed?
      • Breast DCE?
      • DCE signal v [Gd]
      • DCE tissue parmeters?
      • Parameters to images?
      • K-trans = permeability?
      • Utility of DCE?
    • Perfusion III: ASL >
      • What is ASL?
      • ASL methods overview?
      • CASL?
      • PASL?
      • pCASL?
      • ASL parameters?
      • ASL artifacts?
      • Gadolinium and ASL?
      • Vascular color maps?
      • Quantifying flow?
    • Functional MRI/BOLD - I >
      • Who invented fMRI?
      • How does fMRI work?
      • BOLD contrast?
      • Why does BOLD ↑ signal?
      • Does BOLD=brain activity?
      • BOLD pulse sequences?
      • fMRI Paradigm design?
      • Why "on-off" comparison?
      • Motor paradigms?
      • Visual?
      • Language?
    • Functional MRI/BOLD - II >
      • Process/analyze fMRI?
      • Best fMRI software?
      • Data pre-processing?
      • Registration/normalization?
      • fMRI statistical analysis?
      • General Linear Model?
      • Activation "blobs"?
      • False activation?
      • Resting state fMRI?
      • Analyze RS-fMRI?
      • Network/Graphs?
      • fMRI at 7T?
      • Mind reading/Lie detector?
      • fMRI critique?
  • …MR Spectroscopy
    • MRS I - Basics >
      • MRI vs MRS?
      • Spectra vs images?
      • Chemical shift (δ)?
      • Measuring δ?
      • Backward δ scale?
      • Predicting δ?
      • Size/shapes of peaks?
      • Splitting of peaks?
      • Localization methods?
      • Single v multi-voxel?
      • PRESS?
      • STEAM?
      • ISIS?
      • CSI?
    • MRS II - Clinical ¹H MRS >
      • How-to: brain MRS?
      • Water suppression?
      • Fat suppression?
      • Normal brain spectra?
      • Choice of TR/TE/etc?
      • Hunter's angle?
      • Lactate inversion?
      • Metabolite mapping?
      • Metabolite quantitation?
      • Breast MRS?
      • Gd effect on MRS?
      • How-to: prostate MRS?
      • Prostate spectra?
      • Muscle ¹H-MRS?
      • Liver ¹H-MRS?
      • MRS artifacts?
    • MRS III - Multi-nuclear >
      • Other nuclei?
      • Why phosphorus?
      • How-to: ³¹P MRS
      • Normal ³¹P spectra?
      • Organ differences?
      • ³¹P measurements?
      • Decoupling?
      • NOE?
      • Carbon MRS?
      • Sodium imaging?
      • Xenon imaging?
  • ...Artificial Intelligence
    • AI Part I: Basics >
      • Artificial Intelligence (AI)?
      • What is a neural network?
      • Machine Learning (ML)?
      • Shallow v Deep ML?
      • Shallow networks?
      • Deep network types?
      • Data prep and fitting?
      • Back-Propagation?
      • DL 'Playground'?
    • AI Part 2: Advanced >
      • What is convolution?
      • Convolutional Network?
      • Softmax?
      • Upsampling?
      • Limitations/Problems of AI?
      • Is the Singularity near?
    • AI Part 3: Image processing >
      • AI in clinical MRI?
      • Super-resolution?
  • ...Tissue Properties Imaging
    • MRI of Hemorrhage >
      • Hematoma overview?
      • Types of Hemoglobin?
      • Hyperacute/Oxy-Hb?
      • Acute/Deoxy-Hb?
      • Subacute/Met-Hb?
      • Deoxy-Hb v Met-Hb?
      • Extracellular met-Hb?
      • Chronic hematomas?
      • Hemichromes?
      • Ferritin/Hemosiderin?
      • Subarachnoid blood?
      • Blood at lower fields?
    • T2 cartilage mapping
    • MR Elastography?
    • Synthetic MRI?
    • Amide Proton Transfer?
    • MR thermography?
    • Electric Properties Imaging?
  • Copyright/Legal
    • Copyright Issues
    • Legal Disclaimers
  • Forums/Blogs/Links
  • What's New
  • Self-test Quizzes - NEW!
    • Magnets & Scanners Quiz
    • Safety & Screening Quiz
    • NMR Phenomenon Quiz
    • Pulse Sequences Quiz
    • Making an Image Quiz
    • K-space & Rapid Quiz
    • Contrast & Blood Quiz
    • Cardiovascular & MRA Quiz

Data Prep and Fitting

In machine learning how do you  prepare data and select model parameters for the best fit? 
Picture
In the prior Q&A we briefly described several machine learning techniques (including logistic regression, cluster analysis, and support vector machines) used to classify and subdivide large data sets. Here we explain several important methods to prepare raw data for analysis and to optimally fit it to a given model.
Selecting a Model
In many quantitative data processing scenarios a straightforward model may be relatively easy to select.  As a simple example, one may wish to find a relationship between liver and spleen volumes on MRI.  Or another — to classify whether a patient has Alzheimer disease, Parkinson disease, or some other dementia based upon the thickness of cortical gray matter.  These types of problems could be amenable to modeling by regression or one of the other shallow learning techniques (support vector machine, cluster analysis, random forest) as described in a prior Q&A.

In other situations, such as reducing motion artifacts in an MR image, the best model may not at all be obvious and a generalized deep learning approach (such as a convolutional neural network) is needed.  Here the network may be  allowed to find its own optimal solution by analyzing in training data that has been partially or fully labeled. ​
Data Preparation
The first recommended step is to normalize the data.  This is typically done by zero-centering it in each direction, then normalizing it to have a mean of 0 and standard deviation of 1 along each direction.
Data preprocessing

Initializing Model Parameters
Prior to training the network, initialization of network parameters such as weights and biases for each neuron must be selected. One popular choice (a version of the Xavier initialization) is to set starting parameters with values drawn randomly from a Gaussian distribution of mean = 0 and standard deviation = √(2/n), where n = the number of inputs + outputs for each neuron. A even better method may be the He initialization (optimized for ReLU activations), where n = the number of inputs only.
In truth, AI application developers seldom start with such highly randomized initialization parameters.  Instead, they often employ a transfer learning approach, copying weights and biases from other tested networks as a starting point.  This approach can significantly reduce training time for the new network being designed.
Training
After selecting a model and preparing the raw data, training of the network must take place.  For image-based applications, a supervised learning technique is most commonly used. The data scientists will likely have at their disposal a reasonably large number of cases (usually in the thousands or higher) that contain images each with an independently verified label serving as "ground truth".  For example, the cases might all be MR images of brain tumors together with a binary label (e.g., benign or malignant).  
Armed with this labeled data set, the investigators will save out a relatively small proportion (10-15%) of the whole to be used as a validation set after the network has been fully trained. The other labeled samples (85-90%) will form the training set. For computational efficiency, this training data is usually divided into batches (typically of size 32, 64, 128, or another factor of 2).  After each training batch is passed through the network, the predicted output from the network is compared to the table of ground truths and a loss function (error score) is calculated. Weighting factors and parameters of the network are then modified in an iterative fashion to minimize the loss function. Usually iterations of several batches are needed so that all training examples have had a chance to have been fed through the network. At this point an epoch is said to have been completed.  To fully train the network many epochs are needed, often in the hundreds or thousands, which can take hours or days even on fast computers. 
The Loss Function
The Loss Function, also sometimes known as the Cost or Error Function, is a measure of the level of error in a network's output when using a certain set of internal parameters (such as weights and biases).  Two loss functions are commonly used:
Mean Squared Error (MSE) Loss. This is average of the squared differences between the expected and predicted values, given by the formula
Picture
​MSE Loss functions are typically used when the outcome is numeric, such as a distance measurement or volume. This is the metric by which linear regression is optimized.
​
Cross-Entropy (CE) Loss.  Also known as Logistic Loss or Log Loss, this method penalizes errors using a non-linear function (logarithm).  When the difference between expected and predicted values is small, the penalty is small; but when the difference is large, the penalty can be enormous. The CE Loss formula for a binary classification system is written as
Picture
Overfitting and Underfitting
What is the optimal number of parameters to optimally fit a set of data?  As all data sets vary, there is no general answer to this question, but the problems with overfitting and underfitting can be appreciated in the diagrams below:
Picture
In underfitting, the model has an insufficient number of parameters to fully learn the data.  In overfitting the model is too complex and contains too many parameters.  The computational burden is excessive and even though it fits the sample data with high precision, the model is highly optimized for the particular training data and will not generalize well to other data sets. 
Regularization
One possible solution to overfitting would be to collect more training data points. However, this is time-consuming and may not be practical if training data is hard to generate. Another option would be to reduce the number of model parameters, but this is a blunt instrument and it is often not clear at the onset how many variables will actually need to be used.
Rather than directly manipulating the number of parameters, a more robust method is to restrict the values that the parameters can take, a process known as regularization.  The most commonly used technique is known as L2 regularization, which adds an additional term to the loss function of the form λ(Σw²​) where λ is a user-defined constant known as the regularization strength and Σw² is the sum of squared values of the calculated weights at each step of training.
A second and very useful regularization method applicable to most multilayer networks is known as dropout. As the name implies, various nodes and their outputs are randomly ignored or temporarily dropped out of the model during training.  Typical dropout rates are about 20% of input nodes and 20-50% of intermediate nodes. (Output nodes do not participate).
Dropout has several direct and indirect effects on the network: 1) it means that the structure of the network will randomly change from time to time, forcing nodes to be more independent; 2) it introduces noise, making the weight calculations less dependent on a particular configuration of training data, thus reducing overfitting; and 3) it may break apart co-dependent nodes that adapt to fix mistakes committed by other nodes in prior layers, thus making the model more robust. 
Batch Normalization
Batch normalization is a procedure applied to rescale outputs of intermediate layer neurons during the course of training so that their values are Gaussian with a mean of 0 and standard deviation of 1.  Recall that one of the first steps in data preparation was to normalize the input data to the network so that each first layer neuron could expect to receive a consistent range of information.  But deeper layers in the network have no such guarantee what their inputs may be (sometimes extremely large or extremely small). They become highly dependent on earlier nodes.  By renormalizing inputs to nodes at intermediate layers, training speeds can be markedly improved.  Because it restricts large swings in intermediate data, batch normalization is also provides a modest degree of regularization.

Advanced Discussion (show/hide)»

No supplementary material yet. Check back soon!

 References   
     He K, Zhang X, Res S, Sun J. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. arXiv:1502.01852v1 (6 Feb 2015) - describes an initialization technique optimized for networks with ReLU activations. 
     Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167v3 (2 Mar 2015)
     Karpathy A. CS231n Convolutional Neural Networks for Visual Recognition. Linear Classifiers: Support Vector Machine, Softmax. Stanford University, 2016.  [LINK]
     Karpathy A. CS231n Convolutional Neural Networks for Visual Recognition. Neural Networks Part 1: Setting up the Architecture. Stanford University, 2016. [LINK]
     Karpathy A. CS231n Convolutional Neural Networks for Visual Recognition. Neural Networks Part 2: Setting up the Data and the Loss. Stanford University, 2016. [LINK]
     Srivastava N, Hinton GE, Krizhevsky A, et al.  Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 2014; 15:1929–1958.

Related Questions
     What are the various types of deep networks and how are they used?  

←  Previous Question
Next Question  →
↑ Complete List of Questions ↑
© 2024 AD Elster, ELSTER LLC
All rights reserved.   
MRIquestions.com - Home
Donate
Please help keep this site free for everyone in the world!