I study inverse probability weighted M-estimation under a general missing data scheme. The cases covered that do not previously appear in the literature include M-estimation with missing data due to a censored survival time, propensity score estimation of the average treatment effect for linear exponential family quasi-log-likelihood functions, and variable probability sampling with observed retainment frequencies. I extend an important result known to hold in special cases: estimating the selection probabilities is generally more efficient than if the known selection probabilities could be used in estimation. For the treatment effect case, the setup allows for a simple characterization of a double robustness result due to Scharfstein, Rotnitzky, and Robins (1999): given appropriate choices for the conditional mean function and quasi-log-likelihood function, only one of the conditional mean or selection probability needs to be correctly specified in order to consistently estimate the average treatment effect.