import React from 'react';
import PropTypes from 'prop-types';
import {withStyles} from '@material-ui/core/styles';
import Paper from '@material-ui/core/Paper';
import Typography from '@material-ui/core/Typography';

const styles = theme => ({

    root: {
        ...theme.mixins.gutters(),
        paddingTop: theme.spacing.unit * 2,
        paddingBottom: theme.spacing.unit * 2,
        width: '70%',
        marginBottom: 18,
        margin: '3px'
    },
    li: {
        margin: '3px'
    },
});

function Description(props) {
    const {classes} = props;

    return (

        <div>
            <Paper className={classes.root} elevation={8}>
                <Typography variant="h5">
                    Overview
                </Typography>
                <Typography variant="body1">
                    The law school calculator estimates admissions chances for law schools. Enter your GPA and LSAT score as well as soft factors for analysis. Over 44,000 application decisions were included in development of the model, all collected from <a href="http://www.lawschoolnumbers.com/">Law School Numbers</a>.
                </Typography>


            </Paper>
            <Paper className={classes.root} elevation={8}>
                <Typography variant="h5">
                    Methodology
                </Typography>
                <Typography variant="subtitle2">
                    Data Prep
                </Typography>
                <Typography variant="body1">

                    For this tool, models tailored to each of the top 14 schools were created. Dataset was first cleaned
                    and prepared for model selection by filtering extremely incomplete or records with impossible values
                    (e.g., LSAT scores of 100 or GPAs of less than 1.0). Missing information was imputed with the mean.
                    All categorical features were grouped prior to one-hot-encoding for simplification, meaning that the
                    "STEM" major choice is a collection of majors in that category.
                </Typography>


                <Typography variant="subtitle2">
                    Model Selection
                </Typography>
                <Typography variant="body1">
                    The performance of several machine learning models was evaluated: logistic
                    regression, support vector machines, gradient boosting classifiers (GBC's), decision trees, and
                    k-nearest neighbors.
                </Typography>

                <Typography variant="body2">
                    Cross-validation and Testing
                </Typography>
                <Typography variant="body1">

                    A grid search was done to evaluate a reasonable hyper-parameter space for each model.
                    For testing purposes, 20% of the law school admission data was set aside due to the large size of
                    the
                    data. The StratifiedKFold function was used with 4 folds, and the original holdout set was
                    stratified in
                    order to maintain class proportions.
                </Typography>
                    <Typography variant="subtitle2">

                    Evaluation
                    </Typography>

                <Typography variant="body1">

                    Ultimately, the Gradient Boosted Classifier outperformed the other models in terms of
                    precision/recall.
                    Applicants could be admitted to multiple schools, so one model per school was trained as a
                    simplification.
                </Typography>

            </Paper>
            <Paper className={classes.root} elevation={8}>

                <Typography variant="h5">
                    Discussion
                </Typography>
                <Typography variant="subtitle2">
                Limitations
            </Typography>
                <Typography variant="body1" align="left">


                    Data was taken from LawSchoolNumbers.com and was self-reported, so there are errors and
                    inaccuracies in the data that persist even after cleaning and preparation. The data is not representative of the entire
                    applicant pool.

                    Additionally, the data reported on LawSchoolNumbers is not up-to-date and includes information from the years
                    2012-2018.




                </Typography>

                <Typography variant="subtitle2">
                    Score Interpretation
                </Typography>
                <Typography variant="body1" align = "left">
                    The model does not predict true probability of admission and should only be used for
                        entertainment purposes. The false positive rates from the GBC decision functions are not evaluated here. The features of most successful applications to a specific school may be virtually indistinguishable when looking solely
                    at GPA, LSAT scores, and demographic data.  Other factors not included in the model may be more important (e.g., letters of recommendation, interviews, years of related experience) than included factors.
                </Typography>
                <Paper className={classes.root} elevation={8}>
                <Typography variant="body2" align = "left">
                    The model does not represent the admissions process for each school. The output is a scaled probability of <em>eventually</em> being accepted based solely on aggregate data from LawSchoolNumbers. This includes the case where a candidate was wait-listed prior to being offered admission.
                </Typography>
                </Paper>
                <Typography variant="body1" align = "left">
                    Originally, the model consisted
                    of a
                    single aggregate binary classifier that examined the average candidate profile for the
                    entire
                    T14 collectively. In the model not used here, the precision/recall for the admitted class
                    was
                    .69/.68 with a threshold of ~.50.
                </Typography>


                <Typography variant="subtitle2">Model Oddities</Typography>
                <Typography variant="body1" align = "left">
                It is possible for two profiles to be identical in every way, except that one profile ("Profile A") has a higher LSAT score or GPA with lower estimated chances than the other ("Profile B").
                How can this be? One reason is that there may be a common set of unknown factors that lead to higher admissions chances, and the candidates with those common, unknown factors have similar LSAT scores or GPA.
                For example, Profile A may be in a group likely to have strong letters of recommendation or alumni connections with a lower LSAT/GPA. Profile B is identical with other factors (Race: White, K-JD, etc.),
                but has a slightly higher GPA/LSAT, and the model believes that Profile B is not in the group with alumni connections.
                </Typography>
            </Paper>

        </div>
    );
}

Description.propTypes = {
    classes: PropTypes.object.isRequired,
};

export default withStyles(styles)(Description);