>> Coordinator: Welcome and thank you for standing
by. Currently all participants are in listen only mode. Today's webinar is being recorded and
the recording will be posted publicly. If you have any objections, you may disconnect
at this time. Now I'd like to turn the call over to your host for today, Earlene Dowell.
>> Earlene Dowell: Thank you, Lisa. Before we begin, a few housekeeping items. We will save
all our questions to the end of the presentation. Please add your questions into the
Q and A box in
the lower right-hand corner and we will read them out loud after the presentation. The chat has
been disabled to our attendees at this time. Good afternoon, everyone, and thank you for
joining us for the first local employment dynamics or LED webinar of 2024. On behalf of
the U.S Census Bureau and the LED partnership in collaboration with the Council for Community and
Economic Research and the Labor Market Information Institute, it is my pleasure to welcome Sadra
Sharifi, E
mmanuel Lucban, and Liang Tian as they present studying workers' travel patterns
to employment centers using census data. This study explores workers' historical travels
patterns utilizing the longitudinal employer, household dynamics, origin, destination,
employment statistics or LODES data to gain insight into the relationship between
the demographic characteristics and primary employment centers in the San Diego region.
Results of this study will help local policy makers and key stakehol
ders to institute
data driven programs and policies to improve regional transportation systems leading
to reduce congestion and emissions. Sadra Sharifi is an associate data scientist at
San Diego Association of Governments or SANDAG data science team. He has over five years of
experience working on various data science and modeling projects. Sadra holds a PhD degree
in transportation engineering from Utah State University with concentrations on transportation
data analysis, transportation
model developments, and statistical modeling. Emmanuel Lucban
is an associate data scientist at SANDAG. Emmanuel has years of experience in developing
data science related production applications, data engineering process, and IT management. Emmanuel
holds a master's of science degree in applied data science from the University of San Diego and a
bachelor's degree in data science and computer science from the University of California,
Berkeley with concentrations in data analysis, data eng
ineering, and machine learning.
Liang Tian is a principal overseeing data science management and analytical work program
areas at SANDAG. Liang has led cross functional teams to deliver high performing predictive
science and analytic solutions in various application domains across global markets. Liang
has published 20 plus journal articles and book chapters and is the first named inventor on
the U.S patent application in financial risk management domain. Liang holds a PhD in computer
scie
nce with a specialization in computational intelligence methods and its applications.
Now I'd like to turn the call over to Liang. >> Liang Tian: Thank you. Thank you so much,
Earlene, for the warm introductions. Really appreciate that. So first of all we are so glad
to be here and truly appreciate the organizing committee for giving us this opportunity to share
our original research from a local planning agency perspective. I'm also excited to have my two
colleagues, Sadra Sharifi and Emma
nuel Lucban, to join me today to cover the presentation.
So our topic today is the research study on workers' travel and commute patterns to major
employment centers in the San Diego region. Next slide please. So we will start with a brief
introduction to SANDAG and to San Diego region and also to introduce what we do in the data
science team. And we will cover two parts in this presentation. So part one will be focusing
more on the historical trend analysis and part two will cover the more
extended analysis and also
the behavior analysis using most recent data. All right. So the San Diego Association of
Governments, short for SANDAG, is a federal designated metropolitan planning organization.
And it's also a state designated transportation planning agency. So as a metropolitan planning
organization and a council of governments, SANDAG is bringing together local decision makers,
jurisdictions, and also the public to develop solutions to regional issues including improving
tr
ansportation, air quality, sustainable energy, economic development, and also to enhance
transportation across our binational region, across U.S and Mexico. Public health, public
safety, and housing and so much more. All right. So the San Diego region is a very
diverse and increasingly growing place to live and work. According to the U.S. Census Bureau
and California Employment Development Department, the population and employment in the region has
increased significantly in the last 10 yea
rs. So at SANDAG we use data as the foundation of
pretty much everything we do. So for planning, for building, and preserving resources,
Sadra, Emmanuel, and I are part of data science department. So our entire data
science team is truly passionate about data, passionate about analytics, and we are developing
products and performing research by leveraging our skill set to really help make better informed
decisions around regional issues. Understanding where people live and how people commut
e to
work are to major employment centers has been and always will be a foundation of our regional
planning effort. Further we should also focus on understanding how people are making different
travel decisions. Right? Is it by walking or biking or driving by yourself or carpool with
someone else? Right? So or taking a bus, freeways, or side street. So those are the questions we all
really need to understand better. Okay. So with that I'm going to turn you over to Emmanuel to
cover the par
t one of the presentation. Emmanuel. >> Emmanuel Lucban: Can you hear me?
All right. Thank you, Liang. So hello everyone. My name's Emmanuel Lucban. I am an
associate data scientist for SANDAG. And I'm going to go over the part one of the analysis
which is going to be historical trends. Okay. So for this study one can kind of gain
insight into the following. So, you know, first of all what are employment centers in the
region? So employment centers are essentially high areas of high density
employers. And then
so what are the trends or changes over time for job growth? Gender. Worker age distributions.
And income. And finally have there been any changes to workplace proximities over time?
Okay. So let me kind of give you like a quick introduction to what SANDAG refers to as
employment centers. So on the map on the right you see these colored portions. So the
colored portions on the map denote areas of significant density of employers. So how did we
identify these areas? So u
sing SANDAG's employment estimates and other data and along with a density
based spatial clustering method SANDAG was able to identify 79 clusters that we referred to as
employment centers. So these employment centers were then organized into four tiers based on the
based on the number of employees within those employment centers. So if you take a look at the
table, you see tier one centers consist mostly are centers that have over 75,000 employees. So these
tend to be kind of large compani
es and corporate headquarters. And there are three employment
centers that fall into this tier. So for tier 2, tier 2 employment centers are employment
centers with 25 to 75,000 employees, and there are 10 employment centers that fall
into this tier. For tier 3 employment centers we have 15,000 to 25,000 employees and there
are 15 employment centers that fall into this tier. And finally for tier 4 which are employment
centers with 2 and a half to 15,000 employees and there are 51 employment
centers in this tier.
So let's kind of examine the underlying sectors that make up the four tiers of employment centers.
So for tier one employment centers we can see that the predominant sector by far at 26.9% fall
under the professional scientific professional scientific and technical sector followed by
a combination of food and then the healthcare sector. So for tier two it's predominantly
healthcare closely followed by retail and accommodation and food. And then here we see
for tier t
hree it's predominantly manufacturing, retail, and then accommodation and food. And
then for tier four we see it's predominantly healthcare followed by accommodation and
food and then retail. Okay then, so using the longitudinal employer household dynamics
or LEHD data let's kind of examine some of the characteristics of the workers that work in these
employment centers and their travel patterns. So for this study we're using the LEHD
origin-destination employment statistics or LODES versio
n 7.5 data, and we're going from
the time period of 2012 to 2019. So we're going to look at employment and residential areas.
So the origin-destination we're using the origin destination tables and also the work
area characteristics tables, and we're going to be looking at the census block level.
So we use decided to use job type both 03 which is all primary private jobs to kind of
better align with the objective of the study. Okay. So exploring historical patterns.
We want to obtain insig
hts into the relationships between socio-demographic
characteristics and the employment center tiers. Some of the characteristics we'll be
looking at over time using the LEHD LODES data is going to be job growth, gender, age,
income distribution, educational attainment, and proximity. And I just want to kind of make a
note. We mentioned proximity and in this case we kind of want to make this distinction between
travel business and proximity. So for part one we define proximity here as a dir
ect distance
from origin to destination. So kind of think of a straight line rather than actual travel distance
going through a transportation network. So the first characteristic that we'll be examining is
job growth. So on the top bar chart we have the year over year growth of the number of jobs from
2012 to 2019 for all employment center tiers. And on the bottom we have a line chart and it's a
breakdown of job growth by each individual tier. So, as you can see from the top chart, job gro
wth
for all employment centers experienced kind of a steady increase with the largest year over year
job growth occurring in 2016 at 3.1%. The bottom chart we see that tier one employment centers
see significant steady job growth across all years while tier three and tier four job growth
kind of flattens out starting around 2016. And examining gender distributions for each tier,
so the left graph shows the proportion of female workers by tier over time and the right graph
shows the proport
ion of male workers by tier over time. We can see that the proportion of male
to female workers in each tier remained stable, relatively unchanged, over time. We do observe in
tier that tier four employment centers have the smallest difference with an almost balanced
proportion of male to female workers. Tier three employment centers are observed to have the
largest difference in proportion of male to female workers. And tier one and tier two differences in
proportions are between tier four
and tier three, but still have a higher male to female worker
ratio. So one of the potential contributors to the observed proportional differences in gender
could be the predominant sectors that comprise the employment center tiers. So if you recall, a
tier four's top three sectors include healthcare, accommodation, food, and then followed by
retail while top tier three's top three sectors are largely manufacturing, retail and
then followed by accommodation and food. Okay. So looking at th
e worker age
distributions we have four condensed stacked bar charts for each tier. The charts
were condensed for readability. So over time we observed kind of this increase of proportion
of workers aged 55 and older in all tiers while observing a decrease in proportion of workers
under the age of 30 excluding tier 1 employment centers which shows very marginal increase in
that age group. We could also see that tier 4 employment centers employ the largest proportion
of workers under the ag
e of 30 compared to other tiers while tier 1 employment centers have the
largest proportion of workers aged 30 to 54. Okay. So comparing income distributions from 2012
versus 2019. So the top donut charts are income distributions for tier one and the bottom one's
for tier two. The left side is going to be 2012 and the right side 2019. So the green part of the
chart shows the proportion of workers making over $3,333 per month. And we'll just call this the
highest income group. So we can see
that the green part shows that tier 1 has the largest
proportion of this group compared to other tiers and has increased 8% from 2012 to 2019.
So for tier 2 and tier 3 employment centers, next slide please, so they both showed a 9% increase in
the highest paid group from 2012 versus 2019. For tier 4 employment centers which has the smallest
proportion of this highest paid group, we still observed a 7% increase from 36% to 43%.
Okay. So let's look at the educational attainment. So for educat
ional attainment again
we have condensed bar charts for each employment center tier. So over time we kind of observed this
interesting trend. For all employment center tiers we have observed a steady proportional decrease
of workers with a college or advanced degree. So from SANDAG's economic analysis we have been
observing an increase in jobs that don't require college degrees which may be a contributing factor
to this trend. So even with an overall decrease, all employment center tiers st
ill have
a larger proportion of workers with a college or advanced degree than those without.
And with tier one employment centers having the largest proportion of college educated
workers among all tiers. This is most likely attributed to the sectors that comprise tier
one employment centers which is predominantly professional scientific and technical.
Okay. So next we'll examine worker proximities from employment centers. As we described
before, we define proximity as an origin block to
destination block Euclidean distance
rather than travel distance or transportation network. So you can see in this animated map we
kind of want to give you an idea of the changes in weighted average proximity of workers to
tier one employment centers over time. So if you see the black outline, that denotes the
location of the tier one employment centers. As we can observe from 2012 to 2019, there are these
pretty visible origin block changes particularly for those with average proximities o
f 30 plus
miles which are the blocks in red. We can see that these origin blocks are concentrated in
the northern part of San Diego over time. So a deeper dive into proximity, the bar
charts here display the proximity trends of each employment center for the last five years
available in the LEHD data. We can see that the proportion of workers with a proximity of 10
plus miles are decreasing overall. Especially in the tier 1 employment centers, we see a
steady increase over time and proport
ionally have the largest percentage of workers with
a proximity of 10 plus miles and the smallest percentage of workers with a proximity of 0 to
5 miles compared to other tiers. So workers in tier 1 and certainly in tier 3 employment centers
tend to reside farthest from their workplaces. And workers in tier 2 and tier 4 employment centers
have very similar proximity characteristics. Okay. So that concludes, you know, part
one. This is the historical analysis of employment center tiers using
LEHD LODES 7.5.
So I'm going to pass it off to Sadra who's going to give a more in depth analysis of the
travel patterns of select employment centers. >> Sadra Sharifi: Thank you, Emmanuel. Hello,
everyone. In this part I'm going to talk about our extended analysis on the employment centers. In
this analysis we explored the primary origins of employees traveling to their employment centers.
This will help us to understand where employees are distributed over the region. We examined to
see
whether there are significant differences in commute distances among different age groups. We
analyzed travel pattern variations among employees with different levels of incomes. This will help
us to identify the potential like disparities in our transportation systems. We identified the
preferred mode of transportation selected by employees to commute to their workplaces. And
finally we investigated how travel distances vary across different employment centers.
For this part we used 2019
LEHD version 8 which represents the normal travel behavior
before the COVID time. And we also used two additional data sources, SANDAG activity based
model and Replica. These two sources provide some travel related data like travel distances
and the transportation network and mode choice selected by employees. And this will help us
to couple these sources with LEHD data and do more deeper analysis. Similar to the first
part, we focused on private primary jobs. For our analysis we selected f
our diverse
employment centers representing each tier. Sorrento Valley located in the northern part of
San Diego is a technology and innovation hub spot. It's home to many tech companies, startups,
and research institutions such as UC San Diego. Mission Valley located in the center part of
the San Diego region is an urban hub with a mix of commercial, retail, and residential spaces.
And also there's a transit center within this employment center. Chula Vista in the southern
part of the reg
ion is a diverse community with a growing manufacturing sector. And Santee
in the eastern side is a suburb and community with a mix of businesses and industries.
These two maps show the residence location of employees working in Sorrento Valley
and Mission Valley obtained from LEHD data source. Each point on this map represents
one employee, and you can clearly observe that employees working in Sorrento Valley on the
left map they're living all over the region and they're distributed like a
ll in northern
center part and also southern part. But for Mission Valley majority of employees are living
in the center and southern part of the region. These are the two similar maps for Chula Vista
and Santee. In Chula Vista a major part of the employees are living in the southern part of the
region and the concentration of employees living within the employment center is very high. And
for Santee most of employees are working very close to the employment center in the eastern
part of t
he region. Using LEHD data coupled with Replica we can examine the travel distances
to different employment centers. This plot shows the cumulative distribution of travel distance to
the employment centers. The horizontal axis shows the travel distance and the vertical axis shows
the percentage of employees. From this graph we can extract the 50 percentile or median of travel
distances to these employment centers, and we can clearly see that employees are experiencing
longer travel distance
s to Sorrento Valley and Mission Valley compared to Chula Vista and
Santee. Specifically the median travel distance to Sorrento Valley is more than two times to the
Chula Vista. And also looking at these curves we can find that likely behavior for Chula Vista and
Santee are very similar up to the 50 percentile, but after that level like the curve for Santee
is moving towards the Mission Valley and Sorrento Valley. So knowing this we can understand
the commuting behaviors much more better an
d potentially we can optimize our transportation
network to be efficient for all the employees. This plot here shows the travel distances by
age groups. The horizontal axis shows the travel distances on the transportation network in five
markings and the colored bars show the portion of employees in each age group. Green bars are for
employees with less than 29 ages. Purple one is for 30 to 54 ages. And the orange one is for
plus 55 ages. Using these two graphs first we can compare the over
all distributions and we
can observe that a high portion of employees are experiencing higher than 20 miles going
to Sorrento Valley compared to Mission Valley. And also comparing the age groups together you
can see that like a higher portion of employees with plus 55 ages are experiencing longer
distances compared to other age groups. These are similar plots for Chula Vista and
Santee. The plot for Chula Vista is very highly left skewed meaning that the average
travel distance for going t
o this employment center is is the lowest among all employment
centers. And comparing the age groups we can see that plus 55 age group is experiencing very
short travel distances up to 5 miles compared to other age groups. But this observation is kind of
reversed in Santee. About 31% of employees with less than 29 age is like they're experiencing
travel distances up to 5 miles, but this value is at 27% for other age groups. Knowing this
will help us to to define more efficient policies cons
idering the age group needs.
This is the travel distance distribution by income groups. In the LEHD data we have three
income groups and for this analysis we decided to aggregate to upper level and compare with
the lower level of income groups. And here we can observe that the like the distributions
for these two groups are very similar for all employment centers. However there are cases that
the magnitudes are significantly different. For example, in Chula Vista about 39% of low income
pe
ople are experiencing up to 5 miles, but this value is 27% for higher level income people. And
this also applies to Santee. About 30% of low income people are traveling within 5 miles, but
26% of higher income level are traveling within this distance. And for Sorrento Valley and Mission
Valley this observation is kind of reversed. Lower income people are experiencing longer distances
and we believe that this is related to the affordable housing close to Chula Vista and Santee
compared to So
rrento Valley and Mission Valley. Another analysis that will help us to understand
the commuting behavior much more better is transportation mode choice analysis. This
table displays the mode shares of different transportation modes selected by employees to the
employment center. And we got this from our SANDAG activity based model. As expected, the highest
share is for private cars, but there are some variations. About 95% of employees going to Santee
they are using their private cars incl
uding drive alone and carpool, but about 87% of employees
going to Chula Vista are using their own cars. Mission Valley stands for a higher percentage
of using transit and as I mentioned earlier there's a transit center in Mission Valley
providing services to different bus lines and trolley lines. So we believe that like
that high percentage explained explain that phenomena. And also Chula Vista it stands for a
high percentage of active modes including bike and walking. Specifically 4% of e
mployees are walking
to work in Chula Vista employment center. And this is because the majority part of the employees are
living within the employment center as we showed on the residence map. And like in those maps if
you recall there were a very high percentage of points concentrated in the employment center.
Summarizing the main points covered in this presentation, in the first part we tracked
the historical trends of social demographic variables and we tracked the changes year
over yea
r. And using that potentially we can identify employment center tiers that
they have drastic changes over time. And in the second part we coupled LEHD data
with other sources and we focused more on the commuting behavior of employees going to for
representative employment centers. And as for the application of this study, we can potentially
use the findings of this study to improve the accessibility of transportation systems for all
employees regardless of their age, education, and ethnicit
y. We can address the specific
needs of low income people which rely more on transportation. On public transit systems. And
we can set more efficient policies considering age group specific needs. Recently SANDAG initiated a
policy to offer free transit services to students and our observations showed that that was a
really efficient policy to promote sustainable transportation systems such as transit systems.
And yeah. So we think that potentially we could use all of these findings to to p
rovide more
like more efficient plans for our region. We are excited to share that SANDAG
has launched our open data portal. In this portal users can download the data sets,
interact, and also visualize on the platform. And recently we published our new version
of employment centers and we encourage you to check it out at opendata.sandag.org and let
us know if you have any follow up questions. This concludes our presentation. We would
like to thank again the committee to give us the opport
unity to present our
research. And we look forward to the QA session. Thank you all for your time.
>> Thank you, speakers. Before we go to the questions, the presentation will be accessible on
the census academy website at census.gov/academy under the webinar tab in a couple of weeks. The
link has been added to the chat. Again please type all of your questions in the Q&A and keep your
questions pertaining to the presentation. The chat has been disabled to all attendees. And now for
the fir
st question. From Tamara Shales. Can you further explain how you identified the employment
centers and what are the base geographies, for example census block groups?
>> Liang Tian: Emmanuel, do you want to answer that?
>> Emmanuel Lucban: So we do actually provide a methodology on this on
the sandag.org website, but essentially we were using I believe it was the underlying
data was EDD. So it was point level data that was aggregated using was it quarter mile hexbins
and then a spatial clust
ering method was applied to that? And then from those clusters we
then overlaid them to a geography that we use internally at SANDAG called MGRA which is a
master geographic reference area. So it's kind of akin to basically like a census block group. And
then those kind of we used those to define the boundaries for those employment centers.
>> The next question from Rosanna Santana. Does the data show where people
who cross over from Mexico go to work? >> Liang Tian: Can you repeat again? So
rry.
>> Sure. Does the data show where people who cross over from Mexico go to work?
>> Sadra Sharifi: Not really. Like because LEHD data doesn't cover the
like commutes from Mexico to the U.S. >> Thank you. Next question from Mark G. What data
is available to apply to analysis methods to other geographies? Can you repeat it again?
>> Liang Tian: Yes please. >> What data is available to apply
excuse me. What data is available to apply your analysis methods to other geographies?
>> Liang Tian:
I think I can start. I think we use a lot of different data sources for this right?
We have a lot of public available data for example from census LEHD. We also have the California EDD
data as well. And then there's also our SANDAG modeling data. And we basically used a combination
fashioned of different data sources and then applied it to this study. I'm not sure if that
answered the question here. I'm not sure also, Sadra, if you want to add something to this.
>> Sadra Sharifi: Yeah. Basi
cally like I believe that we can use LEHD data which is publicly
available. But for Replica because Replica is not publicly available so technically we
could use Replica, but you will need to have license to use that.
>> Thank you. Next question’s from Stephanie Benson. It's how do
you access the LEHD data excuse me. If possible, I will provide that link in the chat.
To the next person is from is Phillip D. His question is would you elaborate
on the additional data sources and how they excu
se me. How they added to the study
or how they are added to the study. Again, would you elaborate on the additional data
sources and how they are added to the study? >> Sadra Sharifi: Yeah. If I understand correctly
like the question is about Replica and SANDAG activity based model. So basically we used
so in the first part we used the like direct instances from each origin to the workplaces,
but in the second part we wanted to examine the actual travel distances on the transportation
netw
ork. And this is something that we extracted from Replica data source. So because it doesn't
exist on the LEHD data source. So like we coupled Replica with LEHD to extract to like relate actual
travel distances with other social demographic variables. And also we used SANDAG activity based
model just for the mode choice analysis. Since we don't have any like similar data in LEHD.
>> The next question is from Stephanie Benson. How recently was the data updated?
>> Liang Tian: All right. Emmanu
el, do you want to talk about the LEHD data?
>> Emmanuel Lucban: So the first part of the analysis, so this is actually an analysis we
did a little over a year ago. So at the time we were using the LEHD 7 I think it was, yeah,
7.5. But for the second part of the analysis we were using LODES version 8 or I think
>> Sadra Sharifi: Version eight. Yeah. >> Emmanuel Lucban: I think up to 2021. But we
decided to stick with 2019 because we wanted we wanted to kind of give an analysis that was befor
e
COVID or, you know, without the effects of COVID. >> Sadra Sharifi: Yeah, but the latest
LEHD I think is 2020 if I'm correct. >> Liang Tian: Yeah. That's right Sadra.
>> Earlene Dowell: 2021. >> Sadra Sharifi: 2021. Yeah.
>> Okay. Our next question is from Edward Sullivan. Has SANDAG used LODES
data to analyze patterns of travel to the employment centers by industrial group?
>> Liang Tian: Is it by different industrial sectors? Or can you repeat the question?
>> Has SANDAG used LODES data t
o analyze patterns of travel to the employment
centers by industrial groups? >> Sadra Sharifi: Yeah. That's a
regular suggestion. No. We didn't differentiate between industries yet.
>> Liang Tian: But I think, Sadra, on some of the slides we did show the difference
by different tiers. Right? And because people go to I mean different tiers has a different
level of concentration by industry groups. Right? So I think we can possibly infer some
information from there which again is also on the
SANDAG open data portal. And then we do
have analysis by different tiers and by each different center. So I think I would suggest
maybe go there, take a look, and then if you still have further questions, email us and we can
definitely help you understand better on that. >> That's the last of our questions.
>> Earlene Dowell: Okay. Great. So thank you everyone for joining us this afternoon,
and thank you to our speakers from SANDAG for their excellent presentation. Emmanuel,
Sadra, and Lian
g, is there anything you would like to say before we close?
>> Liang Tian: Sadra? Emmanuel? >> Sadra Sharifi: Not really. Thank you again.
>> Emmanuel Lucban: Just wanted to say thank you for the opportunity to present for the
>> Earlene Dowell: Great. Thank you, guys. >> Liang Tian: Same here. Yeah. Yeah. And just
always maybe check out our SANDAG resources online and reach out to us for any questions.
>> Earlene Dowell: Great. So this concludes the February LED webinar. Please join us next
month on the third Wednesday of the month, March 20, at 1:30 PM Eastern time when Nidaal
Jubran presents, New Enhancements of the Census Business Builder. Be sure to register early
for this webinar. Upon exiting this webinar, you will receive a pop up to an evaluation
before you log off. We would appreciate that you take the time to answer the
questions so we may be better serve you in the future webinars. On behalf of the
U.S Census Bureau and the LED partnership, thank you for joining us.
Until next
time, enjoy the rest of your afternoon and thank you for spending your time with us.
>> Coordinator: This concludes today's webinar. Thank you for your participation.
You may disconnect at this time.
Comments