Business Intellegince Analytics CAE 3 Question Bank
- Business Intellegince Analytics CAE 3 Question Bank
- Answers
- 1. Advantages and Disadvantages of Data Analytics over Data Analysis:
- 2. Difference between Business Intelligence and Machine Learning:
- 3. Business Intelligence and Business Analytics:
- 4. Data Analytics Life Cycle:
- 5. Explanation of Infographic with Example:
- 6. Process of Data Analytics:
- 7. Types of Data in Statistics:
- 8. Measurement Levels of Data:
- 9. Representation of Categorical Variables in Analytics:
- 10. Differences between Label and One-Hot Encoding:
- 11. Mean, Median, and Mode:
- 12. Measures of Central Tendency:
- 13. Covariance vs. Correlation:
- 14. Standard Deviation and Variance:
- 15. Standard Error, Estimates, and Estimators:
- 16. Skewness and Coefficient of Variation:
- 17. SQL Clauses: Group By, Where, Order By, Having, etc.:
- 18. Table Operations in SQL: Data Insertion, Data Retrieval, Drop, Truncate, Schema Creation:
- 19. Aggregate Functions in SQL with Examples:
- 20. Union and Intersection Operations on Databases:
- 21. Keys in Databases:
Answers
1. Advantages and Disadvantages of Data Analytics over Data Analysis:
Data Analysis Advantages:
- Simplicity: Data analysis involves the inspection, cleaning, and transformation of data to discover trends, patterns, and insights in a relatively straightforward manner.
- Speed: Data analysis can be quicker to implement since it often deals with simpler data processing techniques.
- Less Resource-Intensive: Data analysis typically requires fewer computational resources and can be performed using basic tools like spreadsheets.
Data Analysis Disadvantages:
- Limited Insight: Data analysis may not provide deep insights or predictive capabilities as it primarily focuses on understanding historical data.
- Inflexibility: It may not adapt well to rapidly changing data or complex data structures.
- Lack of Predictive Power: Data analysis does not typically provide predictive modeling capabilities.
Data Analytics Advantages:
- Predictive and Prescriptive Insights: Data analytics goes beyond data analysis by using advanced statistical and machine learning techniques to make predictions and recommendations.
- Adaptability: Data analytics can handle complex and dynamic datasets, making it suitable for various business scenarios.
- Data-Driven Decision Making: It empowers organizations to make informed decisions based on data-driven insights.
Data Analytics Disadvantages:
- Complexity: Data analytics can be more complex to implement, requiring expertise in machine learning and data science.
- Resource-Intensive: It may demand significant computational resources and data storage capabilities.
- Data Quality Requirements: Data analytics heavily relies on high-quality data, which may be challenging to obtain.
2. Difference between Business Intelligence and Machine Learning:
Business Intelligence (BI):
- BI focuses on reporting, querying, and data visualization to provide historical insights into business performance.
- It uses descriptive analytics to summarize and present data in the form of reports, dashboards, and scorecards.
- BI tools are often used for monitoring key performance indicators (KPIs) and generating regular reports.
- It primarily deals with structured data and follows a static and rule-based approach.
Machine Learning (ML):
- ML is a subset of artificial intelligence that focuses on developing algorithms and models to learn from data and make predictions or decisions.
- ML employs predictive and prescriptive analytics to make recommendations and automate decision-making processes.
- ML models can handle structured and unstructured data and adapt to changing data patterns.
- It is suitable for tasks like image recognition, natural language processing, and anomaly detection.
3. Business Intelligence and Business Analytics:
Business Intelligence (BI) and Business Analytics (BA) are related concepts but serve different purposes:
Business Intelligence (BI):
- BI primarily deals with historical data and focuses on reporting and visualization.
- It provides insights into past and current business performance, often through dashboards, scorecards, and reports.
- BI helps in monitoring KPIs and making data-driven decisions based on historical data.
Business Analytics (BA):
- BA goes beyond historical data and uses statistical, predictive, and prescriptive analytics to gain insights and make future-oriented decisions.
- It involves exploring data, identifying trends, and making forecasts or recommendations for the future.
- BA is more focused on advanced analytics techniques and can handle complex and dynamic data.
4. Data Analytics Life Cycle:
The Data Analytics life cycle consists of several stages:
-
Data Collection: Gathering relevant data from various sources, which may include databases, APIs, or external data sources.
-
Data Preparation: Cleaning, transforming, and structuring the data to ensure its quality and suitability for analysis.
-
Data Analysis: Exploring the data to identify patterns, correlations, and insights. This stage often involves descriptive statistics and data visualization.
-
Modeling: Building predictive or prescriptive models using machine learning or statistical techniques to make forecasts or recommendations.
-
Evaluation: Assessing the performance of models and analyses to ensure accuracy and effectiveness.
-
Deployment: Implementing the insights or models in a business context for decision-making or automation.
-
Monitoring and Maintenance: Continuously monitoring the performance of deployed models and data analytics processes, and making necessary adjustments.
5. Explanation of Infographic with Example:
An infographic is a visual representation of information, data, or knowledge designed to convey complex concepts or data in a concise and engaging manner. It combines text, graphics, and images to make information more accessible and memorable. Here's an example:
Example: "The Benefits of Regular Exercise"
An infographic on the benefits of exercise might include:
- Title: "The Benefits of Regular Exercise"
- Section 1: Health
- Graphic: An image of a healthy heart
- Text: "Improves cardiovascular health"
- Section 2: Weight Management
- Graphic: A scale with a downward arrow
- Text: "Aids in weight loss and maintenance"
- Section 3: Mental Health
- Graphic: A happy brain with endorphins
- Text: "Reduces stress and boosts mood"
- Section 4: Longevity
- Graphic: An age progression chart
- Text: "Promotes a longer, healthier life"
- Section 5: Social
- Graphic: Silhouettes of people in a park
- Text: "Opportunity for social interaction"
- Section 6: Conclusion
- Graphic: A happy, active person
- Text: "Exercise is the key to a healthier, happier you!"
6. Process of Data Analytics:
The process of data analytics involves several key steps:
-
Data Collection: Gather data from various sources, such as databases, files, or external APIs. Ensure that the data is comprehensive, accurate, and relevant to the analysis.
-
Data Cleaning: Clean and preprocess the data to handle missing values, outliers, and inconsistencies. This step is crucial to ensure data quality.
-
Data Exploration: Explore the data to understand its characteristics, including distribution, patterns, and correlations. Visualization techniques and summary statistics are often used at this stage.
-
Data Transformation: Transform and reshape the data as needed. This may include feature engineering, scaling, or encoding categorical variables.
-
Modeling: Build statistical or machine learning models to analyze the data and make predictions, classifications, or recommendations.
-
Evaluation: Assess the performance of the models using appropriate metrics. This step helps in understanding how well the models are performing.
-
Interpretation: Interpret the results and derive actionable insights from the analysis. These insights are used to make informed decisions or recommendations.
-
Visualization and Reporting: Create visualizations and reports to communicate the findings effectively to stakeholders.
-
Deployment: Implement the results of the analysis in a business context, such as automating decision-making processes or using the insights to drive actions.
-
Monitoring and Iteration: Continuously monitor the models and data analytics processes and iterate as needed to adapt to changing data or business requirements.
7. Types of Data in Statistics:
In statistics, data can be categorized into four main types:
-
Nominal Data:
- Nominal data represents categories or labels with no inherent order or ranking.
- Examples: Colors (e.g., red, blue, green), types of fruits (e.g., apple, banana, orange).
-
Ordinal Data:
-
Ordinal data represents categories with a specific order or ranking but doesn't provide information about the magnitude of differences between categories.
- Examples: Education levels (e.g., high school, bachelor's, master's), customer satisfaction ratings (e.g., very dissatisfied, dissatisfied, neutral, satisfied, very satisfied).
-
Interval Data:
-
Interval data represents ordered categories with known and consistent intervals between them, but it lacks a true zero point.
- Examples: Temperature in Celsius or Fahrenheit (e.g., 20°C, 30°F), IQ scores (e.g., 100, 120, 140).
-
Ratio Data:
-
Ratio data includes ordered categories with equal intervals and a true zero point, meaning that ratios between values are meaningful.
- Examples: Age, income, height, weight, and counts (e.g., the number of products sold).
8. Measurement Levels of Data:
In statistics, data can be classified into four measurement levels, each with distinct characteristics:
-
Nominal Level: Nominal data represents categories or labels with no inherent order or ranking. It only allows for grouping and counting. Examples include gender, colors, and types of vehicles.
-
Ordinal Level: Ordinal data represents categories with a specific order or ranking, but the intervals between categories are not uniform or known. It allows for relative comparisons. Examples include education levels and customer satisfaction ratings.
-
Interval Level: Interval data represents ordered categories with known and uniform intervals between them. It allows for relative comparisons and arithmetic operations such as addition and subtraction. Temperature in Celsius or Fahrenheit is an example of interval data.
-
Ratio Level: Ratio data includes ordered categories with equal intervals and a true zero point. It allows for all arithmetic operations (addition, subtraction, multiplication, division) and meaningful ratios. Examples include age, income, height, weight, and counts.
9. Representation of Categorical Variables in Analytics:
Categorical variables are represented using various techniques in analytics. Two common techniques are:
a. Label Encoding:
- Label encoding assigns a unique numerical label to each category in a categorical variable.
- It is suitable for ordinal data where the order of categories matters.
- Example: In a "size" variable with categories ["Small", "Medium", "Large"], label encoding might map them to [0, 1, 2].
b. One-Hot Encoding:
- One-hot encoding creates binary columns (0 or 1) for each category in a categorical variable.
- It is suitable for nominal data, where there is no inherent order.
- Example: In a "color" variable with categories ["Red", "Blue", "Green"], one-hot encoding would create three binary columns: "Red," "Blue," and "Green."
One-hot encoding is preferred for nominal data because it preserves the independence of categories, while label encoding should be used for ordinal data when the order is significant.
10. Differences between Label and One-Hot Encoding:
Label Encoding:
- Assigns a unique numerical label to each category in a categorical variable.
- Suitable for ordinal data where the order of categories matters.
- May introduce ordinal relationships that don't exist, as the labels are numerical.
- Reduces dimensionality to a single column.
- Example: ["Small", "Medium", "Large"] might be encoded as [0, 1, 2].
One-Hot Encoding:
- Creates binary columns (0 or 1) for each category in a categorical variable.
- Suitable for nominal data with no inherent order.
- Preserves the independence of categories.
- Increases dimensionality, potentially leading to the "curse of dimensionality."
- Example: ["Red", "Blue", "Green"] would be represented as three binary columns: "Red," "Blue," and "Green."
11. Mean, Median, and Mode:
-
Mean: The mean is the average of a set of values and is calculated by adding all values and dividing by the number of values. It is a measure of central tendency.
-
Example: For the dataset [3, 5, 7, 7, 9], the mean is (3 + 5 + 7 + 7 + 9) / 5 = 31 / 5 = 6.2.
-
Median: The median is the middle value in a dataset when it is ordered from smallest to largest. If there is an even number of values, the median is the average of the two middle values.
-
Example: In the dataset [3, 5, 7, 7, 9], the median is 7 because it is the middle value.
-
Mode: The mode is the value that appears most frequently in a dataset. A dataset can have one mode, more than one mode, or no mode at all.
-
Example 1: In the dataset [3, 5, 7, 7, 9], the mode is 7 because it appears twice.
- Example 2: In the dataset [2, 3, 3, 4, 4, 5], it has two modes, 3 and 4.
12. Measures of Central Tendency:
Measures of central tendency are statistical values used to describe the center or average of a dataset. The main measures of central tendency are:
-
Mean: The arithmetic mean is the sum of all values divided by the number of values. It is sensitive to extreme values (outliers).
-
Median: The median is the middle value when the data is sorted, and it is not affected by outliers. It is useful when the data has extreme values.
-
Mode: The mode is the most frequently occurring value in the dataset. A dataset can have one mode (unimodal), multiple modes (multimodal), or no mode at all.
These measures provide insights into the typical or central value in a dataset, allowing for a better understanding of its distribution.
13. Covariance vs. Correlation:
-
Covariance: Covariance measures the degree to which two variables change together. A positive covariance indicates a positive relationship, while a negative covariance indicates a negative relationship. However, it doesn't provide the strength or direction of the relationship. It is calculated as the average of the product of the deviations of each variable from its mean.
-
Correlation: Correlation is a standardized measure that quantifies the strength and direction of the linear relationship between two variables. It takes values between -1 and 1. A positive correlation indicates a positive linear relationship, a negative correlation indicates a negative linear relationship, and a correlation of 0 suggests no linear relationship.
While covariance and correlation both indicate the relationship between two variables, correlation is more useful because it provides a standardized measure and is not affected by the scale of the variables.
14. Standard Deviation and Variance:
-
Standard Deviation: The standard deviation is a measure of the spread or dispersion of a dataset. It quantifies how individual data points deviate from the mean. A smaller standard deviation indicates that the data points are close to the mean, while a larger standard deviation indicates greater variability.
-
Variance: Variance is the square of the standard deviation. It measures the average of the squared differences between each data point and the mean. Like the standard deviation, it provides a measure of data dispersion.
Both standard deviation and variance are essential in understanding the variability and consistency of data in a dataset.
15. Standard Error, Estimates, and Estimators:
-
Standard Error: The standard error is a measure of the variability or precision of an estimate. It quantifies how much the estimate is expected to vary from the true population parameter in repeated sampling. A smaller standard error indicates a more precise estimate.
-
Estimates: Estimates are values calculated from sample data to approximate population parameters. For example, the sample mean is an estimate of the population mean, and the sample proportion is an estimate of the population proportion.
-
Estimators: Estimators are the statistical methods or formulas used to calculate estimates from sample data. For instance, the sample mean (x̄) and the sample proportion (p̂) are common estimators for the population mean and population proportion, respectively.
16. Skewness and Coefficient of Variation:
-
Skewness: Skewness is a measure of the asymmetry of the probability distribution of a dataset. It indicates whether the data is skewed to the left (negatively skewed), centered (symmetrical), or skewed to the right (positively skewed). A positive skew means the tail is on the right, and a negative skew means the tail is on the left.
-
Coefficient of Variation (CV): The coefficient of variation is a relative measure of the standard deviation (a measure of dispersion) to the mean (average) in a dataset. It is expressed as a percentage and is used to compare the variability of two or more datasets with different means and units.
17. SQL Clauses: Group By, Where, Order By, Having, etc.:
-
Group By Clause: The GROUP BY clause in SQL is used to group rows that have the same values in specified columns into summary rows. It is often used with aggregate functions like SUM, COUNT, AVG, etc., to generate summary information.
-
Where Clause: The WHERE clause is used to filter rows from a table based on a specified condition. It allows you to extract only the rows that meet the specified criteria.
-
Order By Clause: The ORDER BY clause is used to sort the result set of a SQL query in ascending or descending order based on one or more columns. It is often used in combination with SELECT statements.
-
Having Clause: The HAVING clause is used in combination with the GROUP BY clause to filter the results of grouped rows. It allows you to filter groups of rows based on conditions applied to aggregated data.
18. Table Operations in SQL: Data Insertion, Data Retrieval, Drop, Truncate, Schema Creation:
-
Data Insertion: To insert data into a table, you can use the INSERT INTO statement. For example:
sql
-
INSERT INTO employees (employee_id, first_name, last_name) VALUES (1, 'John', 'Doe'); -
Data Retrieval: To retrieve data from a table, you use the SELECT statement. For example:
sql
-
SELECT first_name, last_name FROM employees WHERE department = 'HR'; -
Drop: The DROP TABLE statement is used to delete an entire table and its data. For example:
sql
-
DROP TABLE employees; -
Truncate: The TRUNCATE TABLE statement is used to delete all rows from a table but retains the table structure. It is faster than DELETE and doesn't log individual row deletions. For example:
sql
-
TRUNCATE TABLE employees; -
Schema Creation: To create a new schema in a database, you can use the CREATE SCHEMA statement. For example:
sql
-
CREATE SCHEMA my_schema;
19. Aggregate Functions in SQL with Examples:
SQL aggregate functions perform a calculation on a set of values and return a single value. Common aggregate functions include:
-
SUM: Calculates the sum of values in a column. Example:
SELECT SUM(sales) FROM orders; -
COUNT: Counts the number of rows in a result set. Example:
SELECT COUNT(*) FROM products; -
AVG: Calculates the average of values in a column. Example:
SELECT AVG(price) FROM products; -
MIN: Returns the minimum value in a column. Example:
SELECT MIN(age) FROM employees; -
MAX: Returns the maximum value in a column. Example:
SELECT MAX(score) FROM exam_results;
20. Union and Intersection Operations on Databases:
-
Union: The UNION operation combines the result sets of two or more SELECT queries into a single result set. It eliminates duplicate rows by default. Example:
sql
-
SELECT first_name, last_name FROM employees UNION SELECT first_name, last_name FROM contractors; -
Intersection: The INTERSECT operation returns the common rows between two or more SELECT queries. It returns only the rows that exist in all the specified result sets. Example:
sql
-
SELECT student_name FROM math_scores INTERSECT SELECT student_name FROM science_scores;
21. Keys in Databases:
In a database, a key is a field or combination of fields that uniquely identify a record in a table. There are different types of keys:
-
Primary Key (PK): A primary key is a unique identifier for each record in a table. It enforces data integrity and ensures that each record is distinct. Example: Social Security Number in an employee table.
-
Foreign Key (FK): A foreign key is a field that links to the primary key of another table. It establishes relationships between tables in a relational database. Example: Employee ID in an orders table linked to the employee table's primary key.
-
Unique Key: A unique key ensures that all values in a column are distinct, but it allows for null values. It can be used to enforce data integrity when you want uniqueness without the requirement of data presence.
-
Candidate Key: A candidate key is a key that could be chosen as a primary key. It is a set of fields with unique values, and you select one of them as the primary key.
-
Composite Key: A composite key is a key that consists of multiple columns used together to uniquely identify a record. It's often used when no single column can serve as a primary key.