Data and Method

Data

We utilized the Wharton Research Data Services for our data sets. All data regarding director and CEO compensation and demographics came from Execucomp. Firm accounting variables and other firm-specific information came from Compustat. We cross referenced multiple studies to find determinants of CEO and director compensation as well as firm performance. Our sample was comprised of firms in the S&P 500 from years 2010-2019. We chose 2010-2019 to avoid the corporate scandals of the early 2000's, the financial crisis in 2008/2009, and the impact COVID-19 had on the market in 2020 and beyond.

BOD Compensation and Determinants

Director compensation was defined as their total compensation. This includes cash, value of stock options and option awards, non-equity incentive plan compensation, change in pension value and "nonqualified" deferred compensation earnings, and all other compensation.

We used accounting-based firm valuations and market performance as determinants for predicting Board of Director compensation. The accounting variables and compensation variables (not including total compensation) are lagged by one year because we are assuming that current total compensation is impacted by the previous years' performance. The accounting metrics of a firm are resilient to equity market changes because these metrics focus on internal firm performance, which will reduce the impact a strong market position will have on predicted compensation. A strong market artificially inflates firm values, even if the firm is not doing well internally. We included market performance determinants in our analysis because BOD packages usually contain some equity incentives as well (Dah and Frye).

CEO Compensation and Determinants

The CEO variables where also lagged by one year not including total compensation, we determined past compensation can be used as an indicator for future comp. CEO determinants are a combination of created variables and stock awards. We created an Ownership Ratio, Ownership Power, Year Served, and Prestige determinants. The Ownership Ratio is the CEO Stock Awards divided by BOD Stock Awards. This variables represent the equity power a CEO holds over a Board of Directors. If a CEO holds more equity than the board, then they hold more power over the company and have a greater influence on the board. We then created a determinant called Ownership Power. This is a binary categorical variable. This variable indicated whether the CEO Ownership Ratio was above the CEO Ownership Ratio median. We predict that CEO's with a higher ownership ratio will have more control over the firm and thus more influence on their pay. For Years Served, we took the difference between the year the CEO was on-boarded and the recorded year. CEO's that serve longer terms are more experienced and are more likely to receive higher pay. We based Prestige Power off of Years Served. Prestige power is a binary categorical variable that indicates whether time served is above the median. This implies a CEO can gain prestige power during their term (Bouteska and Mefteh-Wali)

Determinants of Firm Performance

The review written by Sigo explores a wide variety of contributing factors to firm performance: profitability performance, growth performance, market value performance of the firm, customer satisfaction, employee satisfaction, environmental performance, environmental audit performance, corporate governance performance, and social performance. Although this may seem exhaustive, the paper did not calculate any kind of firm performance measure. For our analysis, we focused on the profitability performance, growth performance, and market value performance due to their availability in the Compustat dataset and their quantifiable nature that was necessary for the measure calculation. All of these determinants are later used to predict firm performance.

Exploratory Data Analysis

After querying the data from WRDS, we did an Exploratory Data Analysis or EDA on our data frames. Our EDA leads us to dropping variables deemed unnecessary in our analysis, imputing data into missing or NaN data fields, and creating new variable identifiers.

The data sets that were queried from WRDS were very extensive. We did not have use for all of the variables in the data sets, and selected the ones we determined to be pertinent to our project based on the literature we read. We renamed variables to have more comprehensive labels. For missing data, we imputed values depending on the variable case. For the variables we did not create, we imputed missing values with that variable's mean (done according to market value bins). For example, the Small firms would not be imputed with the Large firms means. We had to normalize the variables we created as well. Some of the divisions resulted in NaN and 'inf' values. We understood that the NaN results came from dividing zero by a value and 'inf' was the result of dividing by zero. We imputed the NaN results to zero and capped the 'inf' results to the maximum value of that variable set.

Out of the two CEO total compensation values, we kept the variable that used the Black Scholes Model to value the options held by the CEO. (TDC1)

Variables

Director Compensation Variables:

Dependent Variables (Test Set)

Total Director Compensation

Independent Variables (Train Set) Numeric Data - Part of the BOD compensation package and Firm performance

Other Compensation: All other non traditional compensations
Non Equity Incentives: Incentives plan that does not include equity
Cash Fees: Fees earned or paid to the BOD
Stock Awards: Company Stock awards to the BOD
Option Awards: Value of the option awards given to the BOD
Total Other Compensation: As reported in SEC filings
Total Non Equity Incentives: As reported in SEC filings
Total Cash Fees: As reported in SEC Filings
Total Stock Awards: As reported in SEC Filings
Total Stock Options: As reported in SEC Filings

Market Value of the Firm

Total Fiscal Market Value: Market book value of the firm

Liquidity (Quick Ratio): Current Assets / Current liabilities

Net Income

Number of Employees

Debt to Equity Ratio: Total firm debt / total Shareholder Equity

Assets in Place

Capex by Assets

Return on Equity: Net Income / Shareholder equity

Categorical Variables

Firm Size: Determined by market value of the firm

Ceo Compensation Variables:

Dependent Variables (Test Set)

Total CEO Compensation

Independent Variables (Train Set) Created Variables

Ownership Ratio - CEO Stock Awards / BOD Stock Awards - If the ratio is 'inf' (BOD Stock Awards = 0) we replaced it with the highest ratio (406.17953) - If the ratio is 'NaN' (CEO Stock Awards = 0) we replaced it with zero
Ownership Power - 1 if the Ownership Ratio is above the median of the data set 0 otherwise - We used the median to avoid outliers in the ratio overly impacting our analysis
Years Served - This is the amount of time the CEO served at the firm
Prestige Power - 1 if the Year Served is above the median of the data set 0 otherwise - We used the median to avoid outliers in the ratio overly impacting our analysis

Numeric Data

CEO Age
Stock Awards: Value of the stock awards given to the CEO
Market Value of the Firm

Categorical data

Firm Size: Determined by market value of the firm

Firm Performance Variables:

Return on Assets
EBITDA Margin
Net Income / Revenue
Return on Equity
Earnings per share
Change in stock price
Dividend yield

Volatility
Market value added
Asset growth
Total revenue growth
Net income growth
Employee growth

Methodology

The Process

A majority of the prior research on executive compensation and firm performance has considered either CEO or Director compensation separately, they are not usually studied in tandem. This made our research unique. Our process uses a Ridge model to calculate an expected value for director and CEO compensation through their determinants while controlling for firm size and year. The expected value is then compared to the actual compensation to determine under or over compensation in that period. For both types of executives, we correlate this over/undercompensation variable to firm performance after accounting for the control variables we created. We then look at average performance present in firms within each of the four cases mentioned below.

CEO and Directors are both overcompensated.
CEO is overcompensated and Directors are undercompensated.
CEO is undercompensated and Directors are overcompensated.
CEO and Directors are both undercompensated.

These four cases are then correlated to firm performance after accounting for controls.

Method

Regression Analysis for Director and CEO Compensation

The regressions for director and CEO compensation were ran seperatley, but the overarching process is the same.

Through our research, we learned that firms of different sizes have varying payout structures for their executives. We split the firms into 3 different size categories to allow for individual treatment during the regressions. Small firms were categorized with a maximum market cap at $10 billion, followed by medium at $200 billion, and large anything over $200 billion. After splitting the firms into their respective bins, we ran a Ridge Regression on our determinants for compensation (Independent Variables) against director/CEO total compensation (Dependent variable). We used GridSearchCV to optimize each of our models with the optimal alpha and K values. After many iterations, the best_pipe function was used to pick the set of parameters that fit the data the best. These parameters were then loaded into a file we had created for each firm size.

After splitting the firms in their respective bins, we ran a Ridge Regression on our determinates for compensation (Independent Variables) against director/CEO total compensation (Dependent variable).

We used GridSearchCV to optimized each our models for the alpha and K value. After many iterations, the best_alpha function was used to pick the set of parameters that fit the data the best.

	
														
numer_pipe = make_pipeline(SimpleImputer(strategy="mean"), StandardScaler())
cat_pipe = make_pipeline(OneHotEncoder())

preproc_pipe = make_column_transformer(
    (numer_pipe, make_column_selector(dtype_include=np.number)),
    (cat_pipe, ['gender']),
    remainder="drop",
)

ridge_pipe = Pipeline([
    ('preprocessor', preproc_pipe),
    ('ridge', Ridge())
])
alphas = list(np.linspace(0, 300, 25))
parameters = {'ridge__alpha': alphas}

grid_search = GridSearchCV(estimator=ridge_pipe, 
                        param_grid=parameters,
                        cv=cv,
                        scoring='r2',
                        error_score='raise')

results = grid_search.fit(X_train, y_train)

This code was used to start finding the applicable alpha for each group.

Despite optimizing our regression, the director model was unable to fit the firms in the "huge" bin within any degree of accuracy. This suggests that different determinants govern director compensation at giant firms. If we were to re-run the analysis, we would do more research to find different determinants. For the time being, we decided to omit the data because no meaningful predictions could be made.

Director Results

Firm Size	R2	K	Alpha
Small	0.469471	79	202
Medium	0.618721	95	100
Large	0.227747	79	0.0001

CEO Results

Firm Size	R2	K	Alpha
Small	0.731653	96	0.01
Medium	0.68396	96	19
Large	0.887815	85	559

To prevent overfitting these variables were omitted for large bin size: ['total_curr','salary', 'bonus', 'stock_awards', 'option_awards', 'othcomp']

Overpayment Predictions

After fitting our models to the training set we predicted compensation on the firm years in our holdout data. After getting our predicted compensation values, we created the overcompensation variable.

Overpayment = Actual Pay / Predicted Pay

The following code for the CEO payment in small firm sizes was repeated for medium and large firms and all three sizes for director payment

												
small_ceo_df = pd.read_csv('../Saved/small_ceo_df.csv')
pred_small_ceo_df = pd.read_csv('../Saved/pred_small_ceo_df.csv')
small_ceo_df['prediction'] = pred_small_ceo_df['prediction']
small_ceo_df['over_under_comp'] = small_ceo_df['tdc1']/pred_small_ceo_df['prediction']

Firm Performance Score

We needed a measure of firm performance to determine how over- and undercompensation affects it. Based on our initial research, we were originally going to use Tobin's Q as our measure, but reviewing *Determinants of Firm Performance: A Subjective Model (Sigo, 2020)* prompted us to take the analysis one step further. Because the review outlines many different factors that impact firm performance, we decided to create our own performance score for each firm in each year and compare that score to the compensation variables we had calculated previously. The process for doing that involved the following:

Determining relevant measures to firm performance (see firm performance variables above)

Sigo segmented firm performance measures into 9 categories ranging from Profitability Performance to Social Performance, these categories were then broken down into the various measures that corresponded to them, so we chose those that were contained within the data available to us

Pulled neccessary accounting variables from Compustat Annual Fundamentals dataset
Created methods to calculate the measures from the accounting variables
Created a dataframe that contained each (Firm, Year) and its firm performance metrics (calculated by applying the previously-created functions to each firm's accounting variables in that year)

For measures such as stock price performance and volatility, data was pulled from Yahoo Finance

Assign weights to each of the measures for the overall performance score calculation

We ran a linear regression to fit the determinants of firm performance to Tobin's Q, often considered an indicator of firm performance
Once the regression was fit, we utilized a normalized version of the independent variable coefficients of the regression as our weights

Calculate performance score for each (Firm, Year)

Using StandardScaler(), data is standardized by removing the mean and scaling to unit variance
New values are multiplied by their corresponding column weights, then rows are summed to determine overall score

	
														
performance_score = firm_performance
firm_performance = firm_performance.dropna(subset=['tobinsQ'])
y = firm_performance.tobinsQ
firm_performance = firm_performance.drop('tobinsQ',axis=1)

scores_df = firm_performance[['tic', 'fyear']].copy()

firm_performance = firm_performance.drop('fyear',axis=1)

rng = np.random.RandomState(0)
X_train, X_test, y_train, y_test = train_test_split(firm_performance, y, random_state=rng)

numer_pipe = make_pipeline(SimpleImputer(), 
                           StandardScaler())

preproc_pipe = ColumnTransformer(
    [("num_impute", numer_pipe, make_column_selector(dtype_include=np.number)),]
    , remainder = 'drop'
)

linear_pipe = make_pipeline(preproc_pipe,
                           LinearRegression())

results = linear_pipe.fit(X_train, y_train)

coefficients = linear_pipe.named_steps['linearregression'].coef_

coef_df = pd.DataFrame({'metric':X_train.columns[1:],
                        'weight':coefficients})
                        
weights = np.abs(coefficients) / np.sum(np.abs(coefficients))

weight_df = pd.DataFrame({'metric':X_train.columns[1:],
                        'weight':weights})

weight_dict = weight_df.set_index('metric').T.to_dict('list')

scaler = StandardScaler()
performance_score = pd.DataFrame(scaler.fit_transform(performance_score.iloc[:, 2:]), columns=performance_score.iloc[:, 2:].columns)

for metric in weight_dict:
    performance_score[metric] = performance_score[metric] * weight_dict[metric]
    
performance_score['Performance Score'] = performance_score.sum(axis=1)

The weights used are reflected in the following graph:

As mentioned above, these were determined by running a linear regression and then scaling the resulting independent variable coefficients by absolute value so that they summed to 1.

Once the performance scores had been calculated, we correlated them with the over/undercompensation variables for each firm year, grouping by size category. The relationship was then graphed on a scatterplot for each of the size bins for both CEOs and Directors.

Our final step was determining the average performance score for each firm year used in our analysis, firms listed in the S&P 500 from 2017 to 2019. Creating a table showing the average performance score within each case within each size category, we were able to gain insight on how firms of different sizes performed with over and under compensated executive management.

The following code demonstrates how that was done:

											
def assignCase(df):
    df = df.dropna()
    cases = [
        (df['over_under_ceo'] >= 1) & (df['over_under_bod'] >= 1), # Case 1
        (df['over_under_ceo'] >= 1) & (df['over_under_bod'] <= 1), # Case 2
        (df['over_under_ceo'] <= 1) & (df['over_under_bod'] >= 1), # Case 3
        (df['over_under_ceo'] <= 1) & (df['over_under_bod'] <= 1) # Case 4
    ]
    names = ['Case 1', 'Case 2', 'Case 3', 'Case 4']
    df['Case'] = np.select(cases, names, default = np.nan)
    return df
    
combined_bod_ceo = pd.concat([small_bod_ceo, med_bod_ceo, large_bod_ceo], axis=0)
combined_bod_ceo = assignCase(combined_bod_ceo)

combined_avg_perf = combined_bod_ceo.groupby(['Firm Size', 'Case']).agg({'Performance_Score': 'mean', 'Case': 'size'})
combined_avg_perf = combined_avg_perf.rename(columns={'Performance_Score':'avg_perf_score','Case':'count'})